From patchwork Sat Nov 3 04:00:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Fernandes X-Patchwork-Id: 10666495 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6374013BF for ; Sat, 3 Nov 2018 04:01:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3E02C2B3D2 for ; Sat, 3 Nov 2018 04:01:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 30E002B3D8; Sat, 3 Nov 2018 04:01:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B91B2B3D2 for ; Sat, 3 Nov 2018 04:00:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D31326B0008; Sat, 3 Nov 2018 00:00:55 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id CE01B6B000C; Sat, 3 Nov 2018 00:00:55 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCFB06B000D; Sat, 3 Nov 2018 00:00:55 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 7287C6B0008 for ; Sat, 3 Nov 2018 00:00:55 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id c15-v6so3621933pls.15 for ; Fri, 02 Nov 2018 21:00:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=4Vdpt8w1x7+88qZlyVHMbUtXp7FQV7eYUa8pKh7HFuc=; b=Akp2ACLUArspXQx0aCurQDdY0KRYx0TAythuG7ToJosArN4D0i4U7GMoFQjSZqkx0J DXrV/ImpVkcGTwtUOpfhB9N2U5n6XLbWZmRjFw5Bxu7/lwQa3Wf82gGl4YV31rw0VZ/D /8wBTAhvkeOjl/F0z4tcf5O+7ipmQfU86APOxjpp9Vl5hHmZDFXFYKKfIDMpDlWn8L2Y hs6BJIBxkUn8RANi4i6KbaE/GlCN04DDbR6wN8n9nlyGEqALQ+8X81a1hYHyo5zjJgHd O6sojKrdXoKhriqmcs0KHwUaEFglZCLIV27Pzb6AAXMoMkJUXu7cCKHqvF/Op+7ymMKn 3K5A== X-Gm-Message-State: AGRZ1gJ21veqaqyWb3lFYLU1mmQb7exQ70NwK0WkeswJ3LjhWE3lYbIN rSQXq5DA2/FqXoqQrbbd4MMxfzB7P3D5PdhAgvcirRrO/jDGhTt/1BVr92QxOKsCGrQauenNPVR UEXmpnLwzQEA7iYSSDwJIoe/5ChC9ySWnpzktBPI9HomMsmdzN3URFivWaehCGJdicfqlNaEARL cmYK5Hxqinr9n3x3rXrfityuwstl1rNZzVcvzJWQD0QwjGc3MomWacKq3VGecqM3xcYB2vwSGTn ZFP2I0FRsE0s2QDyuiKaxihLRvfi//uXj+AMREwTV+7UsUH0lk70wdJatqSJ0wpYjMYJxJ2kTok 2GGfPSEV53hjHqDk81/U+4lwPLD4VrU9D1IFT3WzT8tMgq0EVuXwXL8gv4zHbAXzFk5au4VxPsv g X-Received: by 2002:a63:5664:: with SMTP id g36mr12965769pgm.313.1541217655103; Fri, 02 Nov 2018 21:00:55 -0700 (PDT) X-Received: by 2002:a63:5664:: with SMTP id g36mr12965719pgm.313.1541217654215; Fri, 02 Nov 2018 21:00:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1541217654; cv=none; d=google.com; s=arc-20160816; b=ykFK5bduk3Re9wdYXHnqjO1yk8DtCIyXWTKF5srpo0UrV+U7cT6EJ/8av9m31gXzun FZJ9fAKvijFqfOTLzu5VqBiqxrYzGj5HKkU6gqbFaf6hIxGjoDw6yoX0KKROKTmA0YCo 5Pp3DnbEbkozTaYfgu41VI0vr88N0qf9MwHOby9IErWUexAlqlrLqb1lNVjoS7w2vrM6 6vBCbpaSxAqpJOM5IVFhY1cO08+0DPnkYf5UAr3tEKg8EimRfAvSc1Ic4IwG45lsSNZe rVZ4lN+4f0NZbGAgT7DfEgC52ctn37KVisBqZaQaSgAVUWjXahrU44z1n+Ir6LMTR5uC KeMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature; bh=4Vdpt8w1x7+88qZlyVHMbUtXp7FQV7eYUa8pKh7HFuc=; b=tyPG9SUceolDhlvQ9d6m+Quxx0oG8ul0T1EhLDgb8rWo47aSzqgsVlbIU8mzsHjo2K 567ECOZ0YDmehLQRISblY+yYblrXllUvAUFLWhpSbTNioNEqkIgUwvWr0goKFJNJD7bm V3XHvMtQBE2ZD+WjDe3RSh/0aNvp74A5Yu90fHsO9gAwmIcvHRDgBgpyPwejwQO3zqlE H4Mi+GjeVyzb0acozQI4xaXgb7ZA5eHlP8VLcDtcLNSnkMN4CxzMKAOwrHZi0zLjTtMB IhjXkJpn6BpZTIwLdCbwj0iMKo5UyBjleFLVH7Qeje+HLKOUAQ3vL+AoBdJun/ICOkEZ 20iQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=O1VPs1At; spf=pass (google.com: domain of joel@joelfernandes.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=joel@joelfernandes.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id a72sor3210008pge.21.2018.11.02.21.00.54 for (Google Transport Security); Fri, 02 Nov 2018 21:00:54 -0700 (PDT) Received-SPF: pass (google.com: domain of joel@joelfernandes.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=O1VPs1At; spf=pass (google.com: domain of joel@joelfernandes.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=joel@joelfernandes.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4Vdpt8w1x7+88qZlyVHMbUtXp7FQV7eYUa8pKh7HFuc=; b=O1VPs1AtCZe5Y4roHrhBhPmhNvdBh1rEO59rcRL8r952aGTqKq44fR14hT1hHaTQsN S0hH4ManKu+OM/M5WYY9nB6b+WItjdGWZuecDtTyvAmIaGP+B+/ydbpYtqXO7hI3VrgG vrkwLyMhAHVAIbpx1Yzq3cXw4ZnGQElDBkL2A= X-Google-Smtp-Source: AJdET5d0fZtXdJgYGItC/ky/qG/5i8cxsZTf4nF4g/lWwHUYac4MhmIcJHJpC72tKDWuxoW2htzcZQ== X-Received: by 2002:a63:f547:: with SMTP id e7mr13392017pgk.182.1541217653491; Fri, 02 Nov 2018 21:00:53 -0700 (PDT) Received: from joelaf.mtv.corp.google.com ([2620:0:1000:1601:3aef:314f:b9ea:889f]) by smtp.gmail.com with ESMTPSA id x36-v6sm35836232pgl.43.2018.11.02.21.00.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Nov 2018 21:00:52 -0700 (PDT) From: Joel Fernandes X-Google-Original-From: Joel Fernandes To: linux-kernel@vger.kernel.org Cc: kernel-team@android.com, "Joel Fernandes (Google)" , "Kirill A . Shutemov" , akpm@linux-foundation.org, Andrey Ryabinin , Andy Lutomirski , anton.ivanov@kot-begemot.co.uk, Borislav Petkov , Catalin Marinas , Chris Zankel , dancol@google.com, Dave Hansen , "David S. Miller" , elfring@users.sourceforge.net, Fenghua Yu , Geert Uytterhoeven , Guan Xuetao , Helge Deller , hughd@google.com, Ingo Molnar , "James E.J. Bottomley" , Jeff Dike , Jonas Bonn , Julia Lawall , kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu, Ley Foon Tan , linux-alpha@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@linux-mips.org, linux-mm@kvack.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, lokeshgidra@google.com, Max Filippov , Michal Hocko , minchan@kernel.org, nios2-dev@lists.rocketboards.org, pantin@google.com, Peter Zijlstra , Richard Weinberger , Rich Felker , Sam Creasey , sparclinux@vger.kernel.org, Stafford Horne , Stefan Kristiansson , Thomas Gleixner , Tony Luck , Will Deacon , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), Yoshinori Sato Subject: [PATCH -next 2/3] mm: speed up mremap by 20x on large regions (v4) Date: Fri, 2 Nov 2018 21:00:40 -0700 Message-Id: <20181103040041.7085-3-joelaf@google.com> X-Mailer: git-send-email 2.19.1.930.g4563a0d9d0-goog In-Reply-To: <20181103040041.7085-1-joelaf@google.com> References: <20181103040041.7085-1-joelaf@google.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: "Joel Fernandes (Google)" Android needs to mremap large regions of memory during memory management related operations. The mremap system call can be really slow if THP is not enabled. The bottleneck is move_page_tables, which is copying each pte at a time, and can be really slow across a large map. Turning on THP may not be a viable option, and is not for us. This patch speeds up the performance for non-THP system by copying at the PMD level when possible. The speed up is an order of magnitude on x86 (~20x). On a 1GB mremap, the mremap completion times drops from 3.4-3.6 milliseconds to 144-160 microseconds. Before: Total mremap time for 1GB data: 3521942 nanoseconds. Total mremap time for 1GB data: 3449229 nanoseconds. Total mremap time for 1GB data: 3488230 nanoseconds. After: Total mremap time for 1GB data: 150279 nanoseconds. Total mremap time for 1GB data: 144665 nanoseconds. Total mremap time for 1GB data: 158708 nanoseconds. Incase THP is enabled, the optimization is mostly skipped except in certain situations. Acked-by: Kirill A. Shutemov Signed-off-by: Joel Fernandes (Google) --- Note that since the bug fix in [1], we now have to flush the TLB every PMD move. The above numbers were obtained on x86 with a flush done every move. For arm64, I previously encountered performance issues doing a flush everytime we move, however Will Deacon says [2] the performance should be better now with recent release. Until we can evaluate arm64, I am dropping the HAVE_MOVE_PMD config enable patch for ARM64 for now. It can be added back once we finish the performance evaluation. Also of note is that the speed up on arm64 with this patch but without the TLB flush every PMD move is around 500x. [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=1695 [2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html arch/Kconfig | 5 +++++ mm/mremap.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index e1e540ffa979..b70c952ac838 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -535,6 +535,11 @@ config HAVE_IRQ_TIME_ACCOUNTING Archs need to ensure they use a high enough resolution clock to support irq time accounting and then call enable_sched_clock_irqtime(). +config HAVE_MOVE_PMD + bool + help + Archs that select this are able to move page tables at the PMD level. + config HAVE_ARCH_TRANSPARENT_HUGEPAGE bool diff --git a/mm/mremap.c b/mm/mremap.c index 7c9ab747f19d..7cf6b0943090 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -191,6 +191,50 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, drop_rmap_locks(vma); } +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, unsigned long old_end, + pmd_t *old_pmd, pmd_t *new_pmd) +{ + spinlock_t *old_ptl, *new_ptl; + struct mm_struct *mm = vma->vm_mm; + pmd_t pmd; + + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) + || old_end - old_addr < PMD_SIZE) + return false; + + /* + * The destination pmd shouldn't be established, free_pgtables() + * should have release it. + */ + if (WARN_ON(!pmd_none(*new_pmd))) + return false; + + /* + * We don't have to worry about the ordering of src and dst + * ptlocks because exclusive mmap_sem prevents deadlock. + */ + old_ptl = pmd_lock(vma->vm_mm, old_pmd); + new_ptl = pmd_lockptr(mm, new_pmd); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + + /* Clear the pmd */ + pmd = *old_pmd; + pmd_clear(old_pmd); + + VM_BUG_ON(!pmd_none(*new_pmd)); + + /* Set the new pmd */ + set_pmd_at(mm, new_addr, new_pmd, pmd); + flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + spin_unlock(old_ptl); + + return true; +} + unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, @@ -237,7 +281,23 @@ unsigned long move_page_tables(struct vm_area_struct *vma, split_huge_pmd(vma, old_pmd, old_addr); if (pmd_trans_unstable(old_pmd)) continue; + } else if (extent == PMD_SIZE && IS_ENABLED(CONFIG_HAVE_MOVE_PMD)) { + /* + * If the extent is PMD-sized, try to speed the move by + * moving at the PMD level if possible. + */ + bool moved; + + if (need_rmap_locks) + take_rmap_locks(vma); + moved = move_normal_pmd(vma, old_addr, new_addr, + old_end, old_pmd, new_pmd); + if (need_rmap_locks) + drop_rmap_locks(vma); + if (moved) + continue; } + if (pte_alloc(new_vma->vm_mm, new_pmd)) break; next = (new_addr + PMD_SIZE) & PMD_MASK;