From patchwork Fri May 19 12:01:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248266 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 87C40C77B7F for ; Fri, 19 May 2023 12:02:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=NCurJZ/whh2z6Sr8veA7gDasbkYM3McWDQgmxUbeKTA=; b=wp3exzk5DlW4Kr Mmn+g6RyOaEzVFH1iFr0KIQIy4+mLNqrxI4+uKWqGFAvog63aNZMzGDrULkSCGSGKFq9p9Q9yyy+W PMCvDiZYrVNUN8wUTKCJjO59FUYLfoOgljd3+uJ/UaAYJdJgdVtucHORdN0rWvHht+I8YAPzvxgvN y+o9RwfS1DKELIsm9cKb7+cq8RusCS+XANGKGd+gy5rgmGfedByyD0uD4YWkewCd1l35jCLr4M8Bl 2bJ0k0ibE8YfMeOWPU1noE0DORTUxL2QzwJPaA/kafqZ/OSFeqOxmUaiEfMVMFbXpCk7WLBGo+G1C 4+ynfY7rC1WDsUI1MhcQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pzynn-00GB2y-1b; Fri, 19 May 2023 12:01:43 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pzynk-00GB11-06 for linux-arm-kernel@lists.infradead.org; Fri, 19 May 2023 12:01:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKTug7YtIVG7ORSr6AmdGrO8h7oDKo7Zn6imQe4xNdA=; b=UfkgHhoXDFpEGeQcrUgj6X/LCN+M5uQofauT1u/tP/nqee3jkCS5Y+prD1vDLKQ4xMreSI GgvhOhMDabToNJEPZOoow5O2IksTwR3BglWJPc0UHsDg1SQPPHuKAcu90NzMtQDOY6on5G KOieLUe6a6WGsgv10apDRekIhQDZJOA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-408-vKDXBfpQNGOY_VAdrU0AMw-1; Fri, 19 May 2023 08:01:33 -0400 X-MC-Unique: vKDXBfpQNGOY_VAdrU0AMw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B7897280AA69; Fri, 19 May 2023 12:01:32 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8315CC0004B; Fri, 19 May 2023 12:01:31 +0000 (UTC) Date: Fri, 19 May 2023 20:01:27 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230519_050140_145408_A8DC69B7 X-CRM114-Status: GOOD ( 28.04 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In the current __purge_vmap_area_lazy(), when trying to flush TLB of vmalloc area, it calculate the flushing the range with [min:max] of vas. That calculated range could be big because of the gap between the vas. E.g in below graph, there are only 12 (4 from va_1, 8 from va_2) pages. While the calculated flush range is 58. VA_1 VA_2 |....|-------------------------|............| 10 12 60 68 . mapped; - not mapped. Sometime the calculated flush range could be surprisingly huge because the vas could cross two kernel virtual address area. E.g the vmalloc and the kernel module area are very far away from each other on some architectures. So for systems which lack a full TLB flush, to flush a long range is a big problem(it takes time). Flushing va one by one becomes necessary in that case. Hence, introduce flush_tlb_kernel_vas() to try to flush va one by one. And add CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS to indicate if a certain architecture has provided a flush_tlb_kernel_vas() implementation. Otherwise, take the old way to calculate and flush the whole range. Signed-off-by: Thomas Gleixner Signed-off-by: Baoquan He #Fix error of 'undefined reference to `flush_tlb_kernel_vas'' --- arch/Kconfig | 4 ++++ arch/arm/Kconfig | 1 + arch/arm/kernel/smp_tlb.c | 23 +++++++++++++++++++++++ arch/x86/Kconfig | 1 + arch/x86/mm/tlb.c | 22 ++++++++++++++++++++++ include/linux/vmalloc.h | 8 ++++++++ mm/vmalloc.c | 32 ++++++++++++++++++++++---------- 7 files changed, 81 insertions(+), 10 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 205fd23e0cad..ca5413f1e4e0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -270,6 +270,10 @@ config ARCH_HAS_SET_MEMORY config ARCH_HAS_SET_DIRECT_MAP bool +# Select if architecture provides flush_tlb_kernel_vas() +config ARCH_HAS_FLUSH_TLB_KERNEL_VAS + bool + # # Select if the architecture provides the arch_dma_set_uncached symbol to # either provide an uncached segment alias for a DMA allocation, or diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fb4b218f665..c4de7f38f9a7 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -10,6 +10,7 @@ config ARM select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV select ARCH_HAS_MEMBARRIER_SYNC_CORE diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c index d4908b3736d8..22ec9b982cb1 100644 --- a/arch/arm/kernel/smp_tlb.c +++ b/arch/arm/kernel/smp_tlb.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,19 @@ static inline void ipi_flush_tlb_kernel_range(void *arg) local_flush_tlb_kernel_range(ta->ta_start, ta->ta_end); } +static inline void local_flush_tlb_kernel_vas(struct list_head *vmap_list) +{ + struct vmap_area *va; + + list_for_each_entry(va, vmap_list, list) + local_flush_tlb_kernel_range(va->va_start, va->va_end); +} + +static inline void ipi_flush_tlb_kernel_vas(void *arg) +{ + local_flush_tlb_kernel_vas(arg); +} + static inline void ipi_flush_bp_all(void *ignored) { local_flush_bp_all(); @@ -244,6 +258,15 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) broadcast_tlb_a15_erratum(); } +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (tlb_ops_need_broadcast()) { + on_each_cpu(ipi_flush_tlb_kernel_vas, vmap_list, 1); + } else + local_flush_tlb_kernel_vas(vmap_list); + broadcast_tlb_a15_erratum(); +} + void flush_bp_all(void) { if (tlb_ops_need_broadcast()) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 53bab123a8ee..7d7a44810a0b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -77,6 +77,7 @@ config X86 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_EARLY_DEBUG if KGDB select ARCH_HAS_ELF_RANDOMIZE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 267acf27480a..c39d77eb37e4 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -1081,6 +1082,27 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) } } +static void do_flush_tlb_vas(void *arg) +{ + struct list_head *vmap_list = arg; + struct vmap_area *va; + unsigned long addr; + + list_for_each_entry(va, vmap_list, list) { + /* flush range by one by one 'invlpg' */ + for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE) + flush_tlb_one_kernel(addr); + } +} + +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (num_entries > tlb_single_page_flush_ceiling) + on_each_cpu(do_flush_tlb_all, NULL, 1); + else + on_each_cpu(do_flush_tlb_vas, vmap_list, 1); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..a9a1e488261d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -295,4 +295,12 @@ bool vmalloc_dump_obj(void *object); static inline bool vmalloc_dump_obj(void *object) { return false; } #endif +#if defined(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS) +void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries); +#else +static inline void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries) +{ +} +#endif + #endif /* _LINUX_VMALLOC_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c0f80982eb06..31e8d9e93650 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1724,7 +1724,8 @@ static void purge_fragmented_blocks_allcpus(void); */ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) { - unsigned long resched_threshold; + unsigned long resched_threshold, num_entries = 0, num_alias_entries = 0; + struct vmap_area alias_va = { .va_start = start, .va_end = end }; unsigned int num_purged_areas = 0; struct list_head local_purge_list; struct vmap_area *va, *n_va; @@ -1736,18 +1737,29 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) list_replace_init(&purge_vmap_area_list, &local_purge_list); spin_unlock(&purge_vmap_area_lock); - if (unlikely(list_empty(&local_purge_list))) - goto out; + start = min(start, list_first_entry(&local_purge_list, struct vmap_area, list)->va_start); + end = max(end, list_last_entry(&local_purge_list, struct vmap_area, list)->va_end); + + if (IS_ENABLED(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS)) { + list_for_each_entry(va, &local_purge_list, list) + num_entries += (va->va_end - va->va_start) >> PAGE_SHIFT; + + if (unlikely(!num_entries)) + goto out; + + if (alias_va.va_end > alias_va.va_start) { + num_alias_entries = (alias_va.va_end - alias_va.va_start) >> PAGE_SHIFT; + list_add(&alias_va.list, &local_purge_list); + } - start = min(start, - list_first_entry(&local_purge_list, - struct vmap_area, list)->va_start); + flush_tlb_kernel_vas(&local_purge_list, num_entries + num_alias_entries); - end = max(end, - list_last_entry(&local_purge_list, - struct vmap_area, list)->va_end); + if (num_alias_entries) + list_del(&alias_va.list); + } else { + flush_tlb_kernel_range(start, end); + } - flush_tlb_kernel_range(start, end); resched_threshold = lazy_max_pages() << 1; spin_lock(&free_vmap_area_lock); From patchwork Fri May 19 12:02:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CFB67C77B7F for ; Fri, 19 May 2023 12:02:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=puG4JId6DwWFfljfZyBvcuUF+WPfos9FjGkNMxX42zg=; b=JmbYdGk8PilvNV J9NEMftzQxPoxk6WD5iNAHjfSeA1yHV9uZrT8wU+iITb9nHGjtrM6CHxoqyqZjCWyznuOdjpU3tUG MiA3Qtrtug/E+CIlTllhvz1PkvN+8ZTKxp+kGyQDqf+XQ5dQMC/CrNcLj2qClU11eTbstjROatckI dWDtgCllh1xK2oB4qAacSjRuwWJD7FJg3rSUAsdMV0IJ2d7Wdka5RFRNYMpbWXIXUgPlAqwOlHxvE rQP7sgVBue7ULkSxF658WnLwqs1VvTTZ5Md6hUykrKZ4aB5JfZ4Pp6PDggFUPkoFDZE6ucf8KpLf0 LeE98zU+yaDW+3mIi8kw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pzyoR-00GBBF-0A; Fri, 19 May 2023 12:02:23 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pzyoN-00GB9y-2C for linux-arm-kernel@lists.infradead.org; Fri, 19 May 2023 12:02:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=cwIaLMT16E5qj8r3voiOM19WTdKFjWSp1/HW3YYU9XQ=; b=gJ/E6MLuVu2EmXNG7Jfnt/BZmOSXrtS0gRK3fzrA54aArVII2QpT4L5fvQjmyN3yF4ci9d 2GfiQt+zo9nnJ435JNMIw+kZ8E3OpTL+uovFHiaUrp4tijkkmxiUVJwPfE+0ykzVkEarut W5wBjNwSzjwRzkTmRWE8jLS051g4BpY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-618-C3i_wYVyMvq9E-ledI_xKw-1; Fri, 19 May 2023 08:02:15 -0400 X-MC-Unique: C3i_wYVyMvq9E-ledI_xKw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D2DC985A5BF; Fri, 19 May 2023 12:02:14 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 05C7840CFD46; Fri, 19 May 2023 12:02:13 +0000 (UTC) Date: Fri, 19 May 2023 20:02:10 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230519_050219_792628_6FF40ACB X-CRM114-Status: GOOD ( 20.81 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When freeing vmalloc range mapping, only unmapping the page tables is done, TLB flush is lazily deferred to a late stage until lazy_max_pages() is met or vmalloc() can't find available virtual memory region. However, to free VM_FLUSH_RESET_PERMS memory of vmalloc, TLB flushing need be done immediately before freeing pages, and the direct map needs resetting permissions and TLB flushing. Please see commit 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions") for more details. In the current code, when freeing VM_FLUSH_RESET_PERMS memory, lazy purge is also done to try to save a TLB flush later. When doing that, it merges the direct map range with the percpu vbq dirty range and all purge ranges by calculating flush range of [min:max]. That will add the huge gap between direct map range and vmalloc range into the final TLB flush range. So here, only flush VM_FLUSH_RESET_PERMS area immediately, and leave the lazy flush to the normal points, e.g when allocating a new vmap_area, or lazy_max_pages() is met. Signed-off-by: Baoquan He --- mm/vmalloc.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 31e8d9e93650..87134dd8abc3 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2690,9 +2690,10 @@ static inline void set_area_direct_map(const struct vm_struct *area, */ static void vm_reset_perms(struct vm_struct *area) { - unsigned long start = ULONG_MAX, end = 0; + unsigned long start = ULONG_MAX, end = 0, pages = 0; unsigned int page_order = vm_area_page_order(area); - int flush_dmap = 0; + struct list_head local_flush_list; + struct vmap_area alias_va, va; int i; /* @@ -2708,17 +2709,33 @@ static void vm_reset_perms(struct vm_struct *area) page_size = PAGE_SIZE << page_order; start = min(addr, start); end = max(addr + page_size, end); - flush_dmap = 1; } } + va.va_start = (unsigned long)area->addr; + va.va_end = (unsigned long)(area->addr + area->size); /* * Set direct map to something invalid so that it won't be cached if * there are any accesses after the TLB flush, then flush the TLB and * reset the direct map permissions to the default. */ set_area_direct_map(area, set_direct_map_invalid_noflush); - _vm_unmap_aliases(start, end, flush_dmap); + if (IS_ENABLED(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS)) { + if (end > start) { + pages = (end - start) >> PAGE_SHIFT; + alias_va.va_start = (unsigned long)start; + alias_va.va_end = (unsigned long)end; + list_add(&alias_va.list, &local_flush_list); + } + + pages += area->size >> PAGE_SHIFT; + list_add(&va.list, &local_flush_list); + + flush_tlb_kernel_vas(&local_flush_list, pages); + } else { + flush_tlb_kernel_range(start, end); + flush_tlb_kernel_range(va.va_start, va.va_end); + } set_area_direct_map(area, set_direct_map_default_noflush); } From patchwork Fri May 19 12:03:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248268 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C2A12C77B7A for ; Fri, 19 May 2023 12:03:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=kq4nSx2UFTr7UuSJhl9LVfArKl/f+2d5lxeIhR4dLqY=; b=afB7POlHsOuyM0 yidmk/cvCf+IgidfSK5ANG95zi3m5qTd9Ul13zTiORWTdX1oQIJeN1o8kAsX/ym6bE4JIMmP1Ame+ OMnh5ls/g2u7yZMVbJ6CHVODiQwWimrVq7rOTvfOnQoRoGHNTC276+nj+axRVmEIgxf5mnWoQ0UCl yMUAXp9muJfaOGPCNoRoAX0eGbzi8KG1xC1Q0abEQ74aGb1/Wskr6qM0u3YZ9aBhQ1jiL1C7gb5Eq UPWoSz71+Z3fnfcH9TWDTaFV1HUeyWn+nyyAUHHKT0Gs+zLMOqxJHhVJgz/vV4h9qqo+F7KFjTYUB TjJWAlZZu+E5yrp5m8Kg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pzypT-00GBJT-15; Fri, 19 May 2023 12:03:27 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pzypQ-00GBIk-2F for linux-arm-kernel@lists.infradead.org; Fri, 19 May 2023 12:03:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZivJ7EerY3Vw+ercF4baigkPxVV4k6zKRN43xpFWDKk=; b=FaJjicGr2mIpUq5eqjma19aB6m6C5c5oDbJS4KHMAhd0K+PzRLxxI2NOG1M86vw5KE8Jim Hki1Jp2lgS8xXYKa9LfTCqfQmmpSDyjPizbtkFnAXra2msezTkWxqhoTI3t+mAriLl62te 1kTPPzaXzMGHVajloLqks80VnxNk+7E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-562-Z7r5N1YRPJyifeXJHy-YsA-1; Fri, 19 May 2023 08:03:17 -0400 X-MC-Unique: Z7r5N1YRPJyifeXJHy-YsA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1A20C800159; Fri, 19 May 2023 12:03:14 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 63AD140CFD45; Fri, 19 May 2023 12:03:13 +0000 (UTC) Date: Fri, 19 May 2023 20:03:09 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230519_050324_812365_00D817F2 X-CRM114-Status: GOOD ( 19.70 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org After vb_free() invocation, the va will be purged and put into purge tree/list if the entire vmap_block is dirty. If not entirely dirty, the vmap_block is still in percpu vmap_block_queue list, just like below two graphs: (1) |-----|------------|-----------|-------| |dirty|still mapped| dirty | free | 2) |------------------------------|-------| | dirty | free | In the current _vm_unmap_aliases(), to reclaim those unmapped range and flush, it will iterate percpu vbq to calculate the range from vmap_block like above two cases. Then call purge_fragmented_blocks_allcpus() to purge the vmap_block in case 2 since no mapping exists right now, and put these purged vmap_block va into purge tree/list. Then in __purge_vmap_area_lazy(), it will continue calculating the flush range from purge list. Obviously, this will take vmap_block va in the 2nd case into account twice. So here just move purge_fragmented_blocks_allcpus() up to purge the vmap_block va of case 2 firstly, then only need iterate and count in the dirty range in above 1st case. With the change, counting in the dirty region of vmap_block in 1st case is now inside vmap_purge_lock protection region, which makes the flush range calculation more reasonable and accurate by avoiding concurrent operation in other cpu. And also rename _vm_unmap_aliases() to vm_unmap_aliases(), since no other caller except of the old vm_unmap_aliases(). Signed-off-by: Baoquan He --- mm/vmalloc.c | 45 ++++++++++++++++++++------------------------- 1 file changed, 20 insertions(+), 25 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 87134dd8abc3..9f7cbd6182ad 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2236,8 +2236,23 @@ static void vb_free(unsigned long addr, unsigned long size) spin_unlock(&vb->lock); } -static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) +/** + * vm_unmap_aliases - unmap outstanding lazy aliases in the vmap layer + * + * The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily + * to amortize TLB flushing overheads. What this means is that any page you + * have now, may, in a former life, have been mapped into kernel virtual + * address by the vmap layer and so there might be some CPUs with TLB entries + * still referencing that page (additional to the regular 1:1 kernel mapping). + * + * vm_unmap_aliases flushes all such lazy mappings. After it returns, we can + * be sure that none of the pages we have control over will have any aliases + * from the vmap layer. + */ +void vm_unmap_aliases(void) { + unsigned long start = ULONG_MAX, end = 0; + bool flush = false; int cpu; if (unlikely(!vmap_initialized)) @@ -2245,6 +2260,9 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) might_sleep(); + mutex_lock(&vmap_purge_lock); + purge_fragmented_blocks_allcpus(); + for_each_possible_cpu(cpu) { struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); struct vmap_block *vb; @@ -2262,40 +2280,17 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) start = min(s, start); end = max(e, end); - flush = 1; + flush = true; } spin_unlock(&vb->lock); } rcu_read_unlock(); } - mutex_lock(&vmap_purge_lock); - purge_fragmented_blocks_allcpus(); if (!__purge_vmap_area_lazy(start, end) && flush) flush_tlb_kernel_range(start, end); mutex_unlock(&vmap_purge_lock); } - -/** - * vm_unmap_aliases - unmap outstanding lazy aliases in the vmap layer - * - * The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily - * to amortize TLB flushing overheads. What this means is that any page you - * have now, may, in a former life, have been mapped into kernel virtual - * address by the vmap layer and so there might be some CPUs with TLB entries - * still referencing that page (additional to the regular 1:1 kernel mapping). - * - * vm_unmap_aliases flushes all such lazy mappings. After it returns, we can - * be sure that none of the pages we have control over will have any aliases - * from the vmap layer. - */ -void vm_unmap_aliases(void) -{ - unsigned long start = ULONG_MAX, end = 0; - int flush = 0; - - _vm_unmap_aliases(start, end, flush); -} EXPORT_SYMBOL_GPL(vm_unmap_aliases); /**