From patchwork Sat Jul 11 20:25:23 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Daney X-Patchwork-Id: 6771351 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7AD629F44B for ; Sat, 11 Jul 2015 20:29:10 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7CD4D2066B for ; Sat, 11 Jul 2015 20:29:09 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8A2772061F for ; Sat, 11 Jul 2015 20:29:07 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZE1Lf-0006Ua-2Z; Sat, 11 Jul 2015 20:26:11 +0000 Received: from mail-ie0-x22c.google.com ([2607:f8b0:4001:c03::22c]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZE1LM-0006JV-0i for linux-arm-kernel@lists.infradead.org; Sat, 11 Jul 2015 20:25:52 +0000 Received: by ietj16 with SMTP id j16so41595795iet.0 for ; Sat, 11 Jul 2015 13:25:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=t/OKWGWybTn7/if7kdOnzZWndww7N//80zdTd7CD3R0=; b=E5R01EXUdDInj313AANWuawFUN+BLI01DDARlKMCL+UV7q2O4FmPZc3tngJDYEldTr dIO8YmXcxchdc+HZUKhy37DcSHWz5QcGUpqzfd4lyTZL1IcZhPd3AwglPfPB1jd8QEcJ BN7/WFBbn+rdhu2vwgkyfF3OX+ptk7Law/S3sYkiveyG2h0024yzD8a8I7PwfbZxBkCc 2VQ1w6D5r7mdDYQB3+YqC2wB1aKkBLzSHnZgac6rpyOGdPzPe3K3XEDAhym7r5+x+WLh PwDRCpgmNeGnejVGxqCfRjXKZkmidhhWmoSA0RydCqdWWGlXs7xAb9pfqKJUv/Dooysf i8Gg== X-Received: by 10.107.14.81 with SMTP id 78mr12057532ioo.147.1436646330246; Sat, 11 Jul 2015 13:25:30 -0700 (PDT) Received: from dl.caveonetworks.com (64.2.3.194.ptr.us.xo.net. [64.2.3.194]) by smtp.gmail.com with ESMTPSA id j3sm2294897igx.21.2015.07.11.13.25.27 (version=TLSv1 cipher=RC4-SHA bits=128/128); Sat, 11 Jul 2015 13:25:28 -0700 (PDT) Received: from dl.caveonetworks.com (localhost.localdomain [127.0.0.1]) by dl.caveonetworks.com (8.14.5/8.14.5) with ESMTP id t6BKPQHY010575; Sat, 11 Jul 2015 13:25:26 -0700 Received: (from ddaney@localhost) by dl.caveonetworks.com (8.14.5/8.14.5/Submit) id t6BKPQx3010574; Sat, 11 Jul 2015 13:25:26 -0700 From: David Daney To: linux-arm-kernel@lists.infradead.org, Catalin Marinas , Will Deacon Subject: [PATCH 3/3] arm64, mm: Use IPIs for TLB invalidation. Date: Sat, 11 Jul 2015 13:25:23 -0700 Message-Id: <1436646323-10527-4-git-send-email-ddaney.cavm@gmail.com> X-Mailer: git-send-email 1.7.11.7 In-Reply-To: <1436646323-10527-1-git-send-email-ddaney.cavm@gmail.com> References: <1436646323-10527-1-git-send-email-ddaney.cavm@gmail.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150711_132552_145129_605E2500 X-CRM114-Status: GOOD ( 14.52 ) X-Spam-Score: -2.0 (--) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Robert Richter , Andrew Morton , linux-kernel@vger.kernel.org, David Daney MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: David Daney Most broadcast TLB invalidations are unnecessary. So when invalidating for a given mm/vma target the only the needed CPUs via and IPI. For global TLB invalidations, also use IPI. Tested on Cavium ThunderX. This change reduces 'time make -j48' on kernel from 139s to 116s (83% as long). The patch is needed because of a ThunderX Pass1 erratum: Exclusive store operations unreliable in the presence of broadcast TLB invalidations. The performance improvements shown make it compelling even without the erratum workaround need. Signed-off-by: David Daney --- arch/arm64/include/asm/tlbflush.h | 67 ++++++--------------------------------- arch/arm64/mm/flush.c | 46 +++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 57 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 42c09ec..2c132b0 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -63,46 +63,22 @@ * only require the D-TLB to be invalidated. * - kaddr - Kernel virtual memory address */ -static inline void flush_tlb_all(void) -{ - dsb(ishst); - asm("tlbi vmalle1is"); - dsb(ish); - isb(); -} - -static inline void flush_tlb_mm(struct mm_struct *mm) -{ - unsigned long asid = (unsigned long)ASID(mm) << 48; +void flush_tlb_all(void); - dsb(ishst); - asm("tlbi aside1is, %0" : : "r" (asid)); - dsb(ish); -} +void flush_tlb_mm(struct mm_struct *mm); static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr) { - unsigned long addr = uaddr >> 12 | - ((unsigned long)ASID(vma->vm_mm) << 48); - - dsb(ishst); - asm("tlbi vae1is, %0" : : "r" (addr)); - dsb(ish); + /* Simplify to entire mm. */ + flush_tlb_mm(vma->vm_mm); } static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - unsigned long asid = (unsigned long)ASID(vma->vm_mm) << 48; - unsigned long addr; - start = asid | (start >> 12); - end = asid | (end >> 12); - - dsb(ishst); - for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12)) - asm("tlbi vae1is, %0" : : "r"(addr)); - dsb(ish); + /* Simplify to entire mm. */ + flush_tlb_mm(vma->vm_mm); } static inline void flush_tlb_all_local(void) @@ -112,40 +88,17 @@ static inline void flush_tlb_all_local(void) isb(); } -static inline void __flush_tlb_kernel_range(unsigned long start, unsigned long end) -{ - unsigned long addr; - start >>= 12; - end >>= 12; - - dsb(ishst); - for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12)) - asm("tlbi vaae1is, %0" : : "r"(addr)); - dsb(ish); - isb(); -} - -/* - * This is meant to avoid soft lock-ups on large TLB flushing ranges and not - * necessarily a performance improvement. - */ -#define MAX_TLB_RANGE (1024UL << PAGE_SHIFT) - static inline void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - if ((end - start) <= MAX_TLB_RANGE) - __flush_tlb_range(vma, start, end); - else - flush_tlb_mm(vma->vm_mm); + /* Simplify to entire mm. */ + flush_tlb_mm(vma->vm_mm); } static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end) { - if ((end - start) <= MAX_TLB_RANGE) - __flush_tlb_kernel_range(start, end); - else - flush_tlb_all(); + /* Simplify to all. */ + flush_tlb_all(); } /* diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index 4dfa397..45f24d3 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include @@ -27,6 +28,51 @@ #include "mm.h" +static void flush_tlb_local(void *info) +{ + asm volatile("\n" + " tlbi vmalle1\n" + " isb sy" + ); +} + +static void flush_tlb_mm_local(void *info) +{ + unsigned long asid = (unsigned long)info; + + asm volatile("\n" + " tlbi aside1, %0\n" + " isb sy" + : : "r" (asid) + ); +} + +void flush_tlb_all(void) +{ + /* Make sure page table modifications are visible. */ + dsb(ishst); + /* IPI to all CPUs to do local flush. */ + on_each_cpu(flush_tlb_local, NULL, 1); + +} +EXPORT_SYMBOL(flush_tlb_all); + +void flush_tlb_mm(struct mm_struct *mm) +{ + if (!mm) { + flush_tlb_all(); + } else { + unsigned long asid = (unsigned long)ASID(mm) << 48; + /* Make sure page table modifications are visible. */ + dsb(ishst); + /* IPI to all CPUs to do local flush. */ + on_each_cpu_mask(mm_cpumask(mm), + flush_tlb_mm_local, (void *)asid, 1); + } + +} +EXPORT_SYMBOL(flush_tlb_mm); + void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) {