From patchwork Mon Oct 5 15:40:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kalesh Singh X-Patchwork-Id: 11816887 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB3F592C for ; Mon, 5 Oct 2020 15:43:04 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A509A20578 for ; Mon, 5 Oct 2020 15:43:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="BQimKkP7"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="So8F9jG8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A509A20578 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:From:Subject:References:Mime-Version:Message-Id: In-Reply-To:Date:Reply-To:To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=egS1wWCgSWzYuvFY1XnwwUVRqpD9R82wMTTItrWoIhc=; b=BQimKkP77f8oC3J0HD5trZ18T It3W0qEowC7K2VmE5WqWsIsJWYf20tgYvJGm0OCpsVylwKrLT3/36Ns8j4A5bdseqlKKJ6alnSZFE YLm0IbGdlOgqRzXoHqJdOky9+8SGSk4r/ze3dL428YK4xBXCr5QZ8g3MQE6YBGU195DJqtsOEKVcJ 58v1XIaPsEMhPX+VDy30h6kRKLgnmRBzkrIQ2UuX3XBD0AgAEmOFFVHkdnjUg7ZNS/pCDKwhQdSpk eSts01KaLjUpGXe6+v8lkSmXFu/1b2MOx04X2bebK5aV4tEUuvW36R2bo45YM99clAwYKM+Yk6koQ TQ/5p57TA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPSbu-0001M8-Mf; Mon, 05 Oct 2020 15:41:11 +0000 Received: from mail-qk1-x74a.google.com ([2607:f8b0:4864:20::74a]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPSbm-0001Ft-Vg for linux-arm-kernel@lists.infradead.org; Mon, 05 Oct 2020 15:41:07 +0000 Received: by mail-qk1-x74a.google.com with SMTP id 139so6898614qkl.11 for ; Mon, 05 Oct 2020 08:40:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:cc:content-transfer-encoding; bh=ViHbUIdo6y6QRkUghaW2bu8mqdOzVI1S7XSCb79/RLs=; b=So8F9jG8AlwC/geULVxMlQ7R5NHAfZgujZTj+sIkmAHHDuuYkWcm6211DlkCWxRkDn 7cFYUKs9qWX2t99EzMKtbP7it379zMFU07ioIvOlNcL88EZg7d/S+yVcgS5gadkZnzKG t0ykNcOimqaCAkZcTw2VbIzFBYSNwlcWuQO8KNCPB7gF2XsOK6PESWV40kYdR6Yj05Yz QDaTbLdiiwmBVV3GfBe+cydOGeYAkuTYwTT0cud2l4SByErdWqdEL7KLLaYbI0gaVKXN bKrp2aXi1/3Sfdlo67j14b6u1n/H+rzlXGNnMNHt/EKEwUzi/4gyANr52/Ds8BWt9Gqs /78g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:cc:content-transfer-encoding; bh=ViHbUIdo6y6QRkUghaW2bu8mqdOzVI1S7XSCb79/RLs=; b=twWNZkvvFgQ1HeEt/2qgTg8WrCPMVSKvulMIDgm3g3OFSUmVfNrrTpBts3PcZkCOUB P2hcpumZlI8qt+K61sub4E50J1jY/mQl56Vc6a8RU5d8F9KcsJNjiYq2d+g+UZIDhw6s UsbXGvJ9bg2Clw5WGUzKgTpPYobFXM0f4dN/SeTzbDV2OkIlO5PgJwuTD/9rqM9IMVo4 Sx1EfYsrIZcnj8GmsK3QwtZc9LiQzBA8aksOk+NSBzjyQ12rVp442QI0GcRDa6ehpYdL cVHL2Bm6dYJebNyOtDalWJ7Fwc4GsH6fRzRlM92bnGrtPi8Jlf9cN6E+LcV6+wMfCixy eI0g== X-Gm-Message-State: AOAM531eCRyAsEsnPXMDMy5mOJBsPfkBgQ3RhI0nrg4WzyflSkuClEu8 n8kWwUc3XBncME48jWBK+VAVZXQvPo8K+jNXYw== X-Google-Smtp-Source: ABdhPJxScprKeeHRR48n/ayUAIeZaaGw7BQ0Y+YRJ9BMUwhRAxOy5VfuQXoJzjZfmgMQwnsesiIU9Bh9jneR3wtw0g== X-Received: from kaleshsingh.c.googlers.com ([fda3:e722:ac3:10:14:4d90:c0a8:2145]) (user=kaleshsingh job=sendgmr) by 2002:a0c:b6d7:: with SMTP id h23mr153117qve.17.1601912458020; Mon, 05 Oct 2020 08:40:58 -0700 (PDT) Date: Mon, 5 Oct 2020 15:40:06 +0000 In-Reply-To: <20201005154017.474722-1-kaleshsingh@google.com> Message-Id: <20201005154017.474722-4-kaleshsingh@google.com> Mime-Version: 1.0 References: <20201005154017.474722-1-kaleshsingh@google.com> X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog Subject: [PATCH v3 3/5] mm: Speedup mremap on 1GB or larger regions From: Kalesh Singh X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201005_114103_084465_E873A37E X-CRM114-Status: GOOD ( 30.57 ) X-Spam-Score: -6.5 (------) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-6.5 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:74a listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 1.2 MISSING_HEADERS Missing To: header -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.0 DKIMWL_WL_MED DKIMwl.org - Medium trust sender X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: joelaf@google.com, Mark Rutland , Gavin Shan , Brian Geffon , Peter Zijlstra , Catalin Marinas , kaleshsingh@google.com, Ram Pai , Dave Hansen , Will Deacon , lokeshgidra@google.com, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christian Brauner , Shuah Khan , SeongJae Park , Jia He , kernel test robot , "Aneesh Kumar K.V" , Masahiro Yamada , x86@kernel.org, Krzysztof Kozlowski , Ingo Molnar , Sami Tolvanen , kernel-team@android.com, Hassan Naveed , Masami Hiramatsu , Ralph Campbell , Kees Cook , minchan@google.com, Zhenyu Ye , John Hubbard , Frederic Weisbecker , Mark Brown , Borislav Petkov , Thomas Gleixner , surenb@google.com, linux-arm-kernel@lists.infradead.org, Chris von Recklinghausen , linux-mm@kvack.org, Stephen Boyd , linux-kernel@vger.kernel.org, Arnd Bergmann , "Kirill A. Shutemov" , Andrew Morton , Mike Rapoport , Sandipan Das Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Android needs to move large memory regions for garbage collection. The GC requires moving physical pages of multi-gigabyte heap using mremap. During this move, the application threads have to be paused for correctness. It is critical to keep this pause as short as possible to avoid jitters during user interaction. Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if the source and destination addresses are PUD-aligned. For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD entries, since the PUD entry is “folded back” onto the PGD entry. Add HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't supported/tested can turn this off by not selecting the config. Fix build test error from v1 of this series reported by kernel test robot in [1]. [1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4FH4NG7TGH2CVYX2UX76L25BTA3/ Signed-off-by: Kalesh Singh Reported-by: kernel test robot --- Changes in v2: - Update commit message with description of Android GC's use case. - Move set_pud_at() to a separate patch. - Use switch() instead of ifs in move_pgt_entry() - Fix build test error reported by kernel test robot on x86_64 in [1]. Guard move_huge_pmd() with IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE), since this section doesn't get optimized out in the kernel test robot's build test when HAVE_MOVE_PUD is enabled. - Keep WARN_ON_ONCE(1) instead of BUILD_BUG() for the aforementioned reason. Changes in v3: - Move get_old_pud() and alloc_new_pud() out of #ifdef CONFIG_HAVE_MOVE_PUD. - Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and alloc_old_pud(). - Use switch() in get_extent() instead of ifs. - Add BUILD_BUG() to default case of get_extent(). - Replace #ifdef CONFIG_HAVE_MOVE_PMD/PUD in move_page_tables() with IS_ENABLED(CONFIG_HAVE_MOVE_PMD/PUD). - Make lines 80 cols or less, where they don’t need to be longer. - s/= /= /g (Fixed double spaces after '='). arch/Kconfig | 7 ++ mm/mremap.c | 230 ++++++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 197 insertions(+), 40 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index af14a567b493..5eabaa00bf9b 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -602,6 +602,13 @@ config HAVE_IRQ_TIME_ACCOUNTING Archs need to ensure they use a high enough resolution clock to support irq time accounting and then call enable_sched_clock_irqtime(). +config HAVE_MOVE_PUD + bool + help + Architectures that select this are able to move page tables at the + PUD level. If there are only 3 page table levels, the move effectively + happens at the PGD level. + config HAVE_MOVE_PMD bool help diff --git a/mm/mremap.c b/mm/mremap.c index 138abbae4f75..078f731277b6 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -30,12 +30,11 @@ #include "internal.h" -static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr) +static pud_t *get_old_pud(struct mm_struct *mm, unsigned long addr) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; - pmd_t *pmd; pgd = pgd_offset(mm, addr); if (pgd_none_or_clear_bad(pgd)) @@ -49,6 +48,18 @@ static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr) if (pud_none_or_clear_bad(pud)) return NULL; + return pud; +} + +static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr) +{ + pud_t *pud; + pmd_t *pmd; + + pud = get_old_pud(mm, addr); + if (!pud) + return NULL; + pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) return NULL; @@ -56,19 +67,27 @@ static pmd_t *get_old_pmd(struct mm_struct *mm, unsigned long addr) return pmd; } -static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma, +static pud_t *alloc_new_pud(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr) { pgd_t *pgd; p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; pgd = pgd_offset(mm, addr); p4d = p4d_alloc(mm, pgd, addr); if (!p4d) return NULL; - pud = pud_alloc(mm, p4d, addr); + + return pud_alloc(mm, p4d, addr); +} + +static pmd_t *alloc_new_pmd(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr) +{ + pud_t *pud; + pmd_t *pmd; + + pud = alloc_new_pud(mm, vma, addr); if (!pud) return NULL; @@ -249,14 +268,148 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, return true; } +#else +static inline bool move_normal_pmd(struct vm_area_struct *vma, + unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, + pmd_t *new_pmd) +{ + return false; +} #endif +#ifdef CONFIG_HAVE_MOVE_PUD +static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, pud_t *old_pud, pud_t *new_pud) +{ + spinlock_t *old_ptl, *new_ptl; + struct mm_struct *mm = vma->vm_mm; + pud_t pud; + + /* + * The destination pud shouldn't be established, free_pgtables() + * should have released it. + */ + if (WARN_ON_ONCE(!pud_none(*new_pud))) + return false; + + /* + * We don't have to worry about the ordering of src and dst + * ptlocks because exclusive mmap_lock prevents deadlock. + */ + old_ptl = pud_lock(vma->vm_mm, old_pud); + new_ptl = pud_lockptr(mm, new_pud); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + + /* Clear the pud */ + pud = *old_pud; + pud_clear(old_pud); + + VM_BUG_ON(!pud_none(*new_pud)); + + /* Set the new pud */ + set_pud_at(mm, new_addr, new_pud, pud); + flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + spin_unlock(old_ptl); + + return true; +} +#else +static inline bool move_normal_pud(struct vm_area_struct *vma, + unsigned long old_addr, unsigned long new_addr, pud_t *old_pud, + pud_t *new_pud) +{ + return false; +} +#endif + +enum pgt_entry { + NORMAL_PMD, + HPAGE_PMD, + NORMAL_PUD, +}; + +/* + * Returns an extent of the corresponding size for the pgt_entry specified if + * valid. Else returns a smaller extent bounded by the end of the source and + * destination pgt_entry. + */ +static unsigned long get_extent(enum pgt_entry entry, unsigned long old_addr, + unsigned long old_end, unsigned long new_addr) +{ + unsigned long next, extent, mask, size; + + switch (entry) { + case HPAGE_PMD: + case NORMAL_PMD: + mask = PMD_MASK; + size = PMD_SIZE; + break; + case NORMAL_PUD: + mask = PUD_MASK; + size = PUD_SIZE; + break; + default: + BUILD_BUG(); + break; + } + + next = (old_addr + size) & mask; + /* even if next overflowed, extent below will be ok */ + extent = (next > old_end) ? old_end - old_addr : next - old_addr; + next = (new_addr + size) & mask; + if (extent > next - new_addr) + extent = next - new_addr; + return extent; +} + +/* + * Attempts to speedup the move by moving entry at the level corresponding to + * pgt_entry. Returns true if the move was successful, else false. + */ +static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, + unsigned long old_addr, unsigned long new_addr, + void *old_entry, void *new_entry, bool need_rmap_locks) +{ + bool moved = false; + + /* See comment in move_ptes() */ + if (need_rmap_locks) + take_rmap_locks(vma); + + switch (entry) { + case NORMAL_PMD: + moved = move_normal_pmd(vma, old_addr, new_addr, old_entry, + new_entry); + break; + case NORMAL_PUD: + moved = move_normal_pud(vma, old_addr, new_addr, old_entry, + new_entry); + break; + case HPAGE_PMD: + moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + move_huge_pmd(vma, old_addr, new_addr, old_entry, + new_entry); + break; + default: + WARN_ON_ONCE(1); + break; + } + + if (need_rmap_locks) + drop_rmap_locks(vma); + + return moved; +} + unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, bool need_rmap_locks) { - unsigned long extent, next, old_end; + unsigned long extent, old_end; struct mmu_notifier_range range; pmd_t *old_pmd, *new_pmd; @@ -269,53 +422,50 @@ unsigned long move_page_tables(struct vm_area_struct *vma, for (; old_addr < old_end; old_addr += extent, new_addr += extent) { cond_resched(); - next = (old_addr + PMD_SIZE) & PMD_MASK; - /* even if next overflowed, extent below will be ok */ - extent = next - old_addr; - if (extent > old_end - old_addr) - extent = old_end - old_addr; - next = (new_addr + PMD_SIZE) & PMD_MASK; - if (extent > next - new_addr) - extent = next - new_addr; + /* + * If extent is PUD-sized try to speed up the move by moving at the + * PUD level if possible. + */ + extent = get_extent(NORMAL_PUD, old_addr, old_end, new_addr); + if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) { + pud_t *old_pud, *new_pud; + + old_pud = get_old_pud(vma->vm_mm, old_addr); + if (!old_pud) + continue; + new_pud = alloc_new_pud(vma->vm_mm, vma, new_addr); + if (!new_pud) + break; + if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr, + old_pud, new_pud, need_rmap_locks)) + continue; + } + + extent = get_extent(NORMAL_PMD, old_addr, old_end, new_addr); old_pmd = get_old_pmd(vma->vm_mm, old_addr); if (!old_pmd) continue; new_pmd = alloc_new_pmd(vma->vm_mm, vma, new_addr); if (!new_pmd) break; - if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || pmd_devmap(*old_pmd)) { - if (extent == HPAGE_PMD_SIZE) { - bool moved; - /* See comment in move_ptes() */ - if (need_rmap_locks) - take_rmap_locks(vma); - moved = move_huge_pmd(vma, old_addr, new_addr, - old_pmd, new_pmd); - if (need_rmap_locks) - drop_rmap_locks(vma); - if (moved) - continue; - } + if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || + pmd_devmap(*old_pmd)) { + if (extent == HPAGE_PMD_SIZE && + move_pgt_entry(HPAGE_PMD, vma, old_addr, new_addr, + old_pmd, new_pmd, need_rmap_locks)) + continue; split_huge_pmd(vma, old_pmd, old_addr); if (pmd_trans_unstable(old_pmd)) continue; - } else if (extent == PMD_SIZE) { -#ifdef CONFIG_HAVE_MOVE_PMD + } else if (IS_ENABLED(CONFIG_HAVE_MOVE_PMD) && + extent == PMD_SIZE) { /* * If the extent is PMD-sized, try to speed the move by * moving at the PMD level if possible. */ - bool moved; - - if (need_rmap_locks) - take_rmap_locks(vma); - moved = move_normal_pmd(vma, old_addr, new_addr, - old_pmd, new_pmd); - if (need_rmap_locks) - drop_rmap_locks(vma); - if (moved) + if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr, + old_pmd, new_pmd, need_rmap_locks)) continue; -#endif } if (pte_alloc(new_vma->vm_mm, new_pmd))