From patchwork Tue Jul 24 20:46:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cannon Matthews X-Patchwork-Id: 10543193 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AD1C112B for ; Tue, 24 Jul 2018 20:46:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57C3129455 for ; Tue, 24 Jul 2018 20:46:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4B82129458; Tue, 24 Jul 2018 20:46:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8082629455 for ; Tue, 24 Jul 2018 20:46:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E6D66B000E; Tue, 24 Jul 2018 16:46:45 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6954A6B0010; Tue, 24 Jul 2018 16:46:45 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AE436B0266; Tue, 24 Jul 2018 16:46:45 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-yb0-f199.google.com (mail-yb0-f199.google.com [209.85.213.199]) by kanga.kvack.org (Postfix) with ESMTP id 2ADD86B000E for ; Tue, 24 Jul 2018 16:46:45 -0400 (EDT) Received: by mail-yb0-f199.google.com with SMTP id d6-v6so2712486ybn.14 for ; Tue, 24 Jul 2018 13:46:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:mime-version:date:message-id :subject:from:to:cc; bh=l0JQkD8TEB1qH/AuRZ8N7KTiKyOy0ls1e1RGSCGp+dw=; b=DytivtOwkHNsdKBrKvQilCfWqb+lrP4Gi/Z+2Y4kK82CSjBwJLXdqlBg5r8dfiEq0D LixJ8dpCxZTXqqsuwjmQEUz7bcat5yQPBfBUE44lAU7QEHjLaCNI+a9sFT70RbxNr8Fk PZ5cka4ZeulM0hcLaBueq4kFzEyk8k4idDvD/mQ0wOG2SXzL3GWKbIkUS+eMLyfqgZxo 4iOB8VQ4eruIsGzeUAXNiVuZM+kKSzDB75rnVP8mlz6OBuyvjQgzMldGAQ05yXhhtEGs t//M/Z/j6SeDi3DyVJruGVBLkQ/3Yp3r8I7mfUWCoecmLnhYVgm2jTldCvmmqJ+eXDzP KJTA== X-Gm-Message-State: AOUpUlEiOx2dSBVtJUgqfM6/z8ysex4x/+2CrIx8FzS4+nsl8Z2GbDh3 64onXXFYJstGgSEtSU6a6YUi0FvVU3KywlTYvPJadCb61JhOfCcNLlKtoxu8AgNmQjoDw7ux8UX 6D/RvXqE8dJgDAmgWmMI2l9Mc78ut1eotgy0azpPd9Glt1Wu+v+k0HBwv+NjKPLkZk/X3CbIYx5 tI8fheCpzThvj6c7im4cr39kV5cMdruQfm3tg3tkpAFhEyNZUN8IxKi2MT+/JvVJOrMK44TBeI7 3Mh+CiVqJCxEpoSVQ7J+KMj6WQn16nyS9m7YLwD/bHyA7/TkyTelJs6cM68TFdxdsIQFittttjt k/OTLqW1+fMnz1COsE4OgRc/rZv9R8iqMmJyUuJYzgOTOyZKDe8x6DECRI6AhxNyeKXth8Ybkmn F X-Received: by 2002:a0d:cf46:: with SMTP id r67-v6mr9770465ywd.464.1532465204804; Tue, 24 Jul 2018 13:46:44 -0700 (PDT) X-Received: by 2002:a0d:cf46:: with SMTP id r67-v6mr9770444ywd.464.1532465203920; Tue, 24 Jul 2018 13:46:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532465203; cv=none; d=google.com; s=arc-20160816; b=wO2rH4snQaxRg0RvtIKdVq74AJbtehKJmiZMYIiC0OO9sC5yi3SG2TPe4Sco61sOJq F91BVom3euxhlMGcnDpMBLSpcCMIVIcsLlK5GjCuarpEWaRZARkdxL9DayAzEfG1BJ3k sc/ihHqpCTOKgnwZ+VbaPvB72I8Fimm3nk2Lsa0i7GFnYFnm2jMgjszsgZS86ru0B+cB Ktrluhq5RwR2vo8oNGWcyl3pw9LNmMlb6IsU0yrgrjops8LFG2zlvCfO0XtyDiUKL6pO JuJ4nH1/+pOz5k0w7eKnnPM9XETbT2gl9C+PVh9IRbt1EiMvsaptCpnGSM9zKoyurcVX DkOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:date:mime-version:dkim-signature :arc-authentication-results; bh=l0JQkD8TEB1qH/AuRZ8N7KTiKyOy0ls1e1RGSCGp+dw=; b=BR1SvxgSo/UbpwtnK0RVYI+GdeX8bM+ZO1RMpMlODwcwgNTyP8h/k/aAkqbC3jzS/w wC23vC0NpgmWuRJ61BVO9aMf/UEvWHCX8kzk9uXsviJ5chjQPMm2P3qoHQCer6OCQ9tI XGLediGTiJ6yrKdVvFDbPmHnp+0HxyBWUEE/cxqhrBVzOmlI21umnIVcddhz8p+Sc/et sfUWN1YVMx5KQ9vcMG4S3Um/7v/Dzlv2oBKGuLiMrrUl2u88CmVYVHe9LmCmaqt/QCZW LhxCD3gHYrvAn972o5BjoNC7Zuxin1w3+71MYFH3cdIwEdxJWTljb4bwBa1CU+1dSUZ0 5SXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vgffrONe; spf=pass (google.com: domain of 3m5bxww4kcjiyw99a98wff30ie2aa270.ya8749gj-886hwy6.ad2@flex--cannonmatthews.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3M5BXWw4KCJIyw99A98wFF30IE2AA270.yA8749GJ-886Hwy6.AD2@flex--cannonmatthews.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from mail-sor-f73.google.com (mail-sor-f73.google.com. [209.85.220.73]) by mx.google.com with SMTPS id b7-v6sor2963220ybn.63.2018.07.24.13.46.43 for (Google Transport Security); Tue, 24 Jul 2018 13:46:43 -0700 (PDT) Received-SPF: pass (google.com: domain of 3m5bxww4kcjiyw99a98wff30ie2aa270.ya8749gj-886hwy6.ad2@flex--cannonmatthews.bounces.google.com designates 209.85.220.73 as permitted sender) client-ip=209.85.220.73; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vgffrONe; spf=pass (google.com: domain of 3m5bxww4kcjiyw99a98wff30ie2aa270.ya8749gj-886hwy6.ad2@flex--cannonmatthews.bounces.google.com designates 209.85.220.73 as permitted sender) smtp.mailfrom=3M5BXWw4KCJIyw99A98wFF30IE2AA270.yA8749GJ-886Hwy6.AD2@flex--cannonmatthews.bounces.google.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:date:message-id:subject:from:to:cc; bh=l0JQkD8TEB1qH/AuRZ8N7KTiKyOy0ls1e1RGSCGp+dw=; b=vgffrONeP8bhWRaPp4xvnQ8DqZf0XP/y8K2rcSPC5wli253MqZzgx5VYV8SYWtwSWn 2Bd+qsCH2T3XHLu6A625UYtMDKr2oVhdDrecfN193pfjbqyShP7YbH8p1FdTYCFURjjT kDSguGHL1J6sJFTW8B2alGnnZuLz3Pywo77fz6RmYIb6XEDie6XbsgDVERRqhcFvlQ9/ 1xhnD32u9g0uWI4r4uRGvxKeQjAYKKD3qjngiHoEogegyx5NoqqoQxrh2UxAZKdvx3aA U7nTqVgAdwBV9V9mLpVRd+o5HvwxDH/FbDDqTPo4Nu/mYt+nnxJtZ6vqF2TNDKRGM8CK a8GA== X-Google-Smtp-Source: AAOMgpfHzYmEAg6WPeBLurWFMrGQ4B1+JbuJ1y8Cb6k6EojUZ6x3U++bjFt/vDCJKsCjZjv1G4SqYT63HxYxVwvRaRrGIw== MIME-Version: 1.0 X-Received: by 2002:a25:2d50:: with SMTP id s16-v6mr5260751ybe.30.1532465203523; Tue, 24 Jul 2018 13:46:43 -0700 (PDT) Date: Tue, 24 Jul 2018 13:46:39 -0700 Message-Id: <20180724204639.26934-1-cannonmatthews@google.com> X-Mailer: git-send-email 2.18.0.233.g985f88cf7e-goog Subject: [PATCH] RFC: clear 1G pages with streaming stores on x86 From: Cannon Matthews To: Michal Hocko , Mike Kravetz , Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andres Lagar-Cavilla , Salman Qazi , Paul Turner , David Matlack , Peter Feiner , Alain Trinh , Cannon Matthews X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Reimplement clear_gigantic_page() to clear gigabytes pages using the non-temporal streaming store instructions that bypass the cache (movnti), since an entire 1GiB region will not fit in the cache anyway. Doing an mlock() on a 512GiB 1G-hugetlb region previously would take on average 134 seconds, about 260ms/GiB which is quite slow. Using `movnti` and optimizing the control flow over the constituent small pages, this can be improved roughly by a factor of 3-4x, with the 512GiB mlock() taking only 34 seconds on average, or 67ms/GiB. The assembly code for the __clear_page_nt routine is more or less taken directly from the output of gcc with -O3 for this function with some tweaks to support arbitrary sizes and moving memory barriers: void clear_page_nt_64i (void *page) { for (int i = 0; i < GiB /sizeof(long long int); ++i) { _mm_stream_si64 (((long long int*)page) + i, 0); } sfence(); } In general I would love to hear any thoughts and feedback on this approach and any ways it could be improved. Some specific questions: - What is the appropriate method for defining an arch specific implementation like this, is the #ifndef code sufficient, and did stuff land in appropriate files? - Are there any obvious pitfalls or caveats that have not been considered? In particular the iterator over mem_map_next() seemed like a no-op on x86, but looked like it could be important in certain configurations or architectures I am not familiar with. - Are there any x86_64 implementations that do not support SSE2 instructions like `movnti` ? What is the appropriate way to detect and code around that if so? - Is there anything that could be improved about the assembly code? I originally wrote it in C and don't have much experience hand writing x86 asm, which seems riddled with optimization pitfalls. - Is the highmem codepath really necessary? would 1GiB pages really be of much use on a highmem system? We recently removed some other parts of the code that support HIGHMEM for gigantic pages (see: http://lkml.kernel.org/r/20180711195913.1294-1-mike.kravetz@oracle.com) so this seems like a logical continuation. - The calls to cond_resched() have been reduced from between every 4k page to every 64, as between all of the 256K page seemed overly frequent. Does this seem like an appropriate frequency? On an idle system with many spare CPUs it get's rescheduled typically once or twice out of the 4096 times it calls cond_resched(), which seems like it is maybe the right amount, but more insight from a scheduling/latency point of view would be helpful. - Any other thoughts on the change overall and ways that this could be made more generally useful, and designed to be easily extensible to other platforms with non-temporal instructions and 1G pages, or any additional pitfalls I have not thought to consider. Tested: Time to `mlock()` a 512GiB region on broadwell CPU AVG time (s) % imp. ms/page clear_page_erms 133.584 - 261 clear_page_nt 34.154 74.43% 67 Signed-off-by: Cannon Matthews Signed-off-by: Cannon Matthews --- arch/x86/include/asm/page_64.h | 3 +++ arch/x86/lib/Makefile | 2 +- arch/x86/lib/clear_gigantic_page.c | 30 ++++++++++++++++++++++++++++++ arch/x86/lib/clear_page_64.S | 20 ++++++++++++++++++++ include/linux/mm.h | 3 +++ mm/memory.c | 5 ++++- 6 files changed, 61 insertions(+), 2 deletions(-) create mode 100644 arch/x86/lib/clear_gigantic_page.c -- 2.18.0.233.g985f88cf7e-goog diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 939b1cff4a7b..6c1ae21b4d84 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -56,6 +56,9 @@ static inline void clear_page(void *page) void copy_page(void *to, void *from); +#define __HAVE_ARCH_CLEAR_GIGANTIC_PAGE +void __clear_page_nt(void *page, u64 page_size); + #endif /* !__ASSEMBLY__ */ #ifdef CONFIG_X86_VSYSCALL_EMULATION diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 25a972c61b0a..4ba395234088 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -44,7 +44,7 @@ endif else obj-y += iomap_copy_64.o lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o - lib-y += clear_page_64.o copy_page_64.o + lib-y += clear_page_64.o copy_page_64.o clear_gigantic_page.o lib-y += memmove_64.o memset_64.o lib-y += copy_user_64.o lib-y += cmpxchg16b_emu.o diff --git a/arch/x86/lib/clear_gigantic_page.c b/arch/x86/lib/clear_gigantic_page.c new file mode 100644 index 000000000000..80e70f31ddbd --- /dev/null +++ b/arch/x86/lib/clear_gigantic_page.c @@ -0,0 +1,30 @@ +#include + +#include +#include +#include + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) +#define PAGES_BETWEEN_RESCHED 64 +void clear_gigantic_page(struct page *page, + unsigned long addr, + unsigned int pages_per_huge_page) +{ + int i; + void *dest = page_to_virt(page); + int resched_count = 0; + + BUG_ON(pages_per_huge_page % PAGES_BETWEEN_RESCHED != 0); + BUG_ON(!dest); + + might_sleep(); + for (i = 0; i < pages_per_huge_page; i += PAGES_BETWEEN_RESCHED) { + __clear_page_nt(dest + (i * PAGE_SIZE), + PAGES_BETWEEN_RESCHED * PAGE_SIZE); + resched_count += cond_resched(); + } + /* __clear_page_nt requrires and `sfence` barrier. */ + wmb(); + pr_debug("clear_gigantic_page: rescheduled %d times\n", resched_count); +} +#endif diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S index 88acd349911b..81a39804ac72 100644 --- a/arch/x86/lib/clear_page_64.S +++ b/arch/x86/lib/clear_page_64.S @@ -49,3 +49,23 @@ ENTRY(clear_page_erms) ret ENDPROC(clear_page_erms) EXPORT_SYMBOL_GPL(clear_page_erms) + +/* + * Zero memory using non temporal stores, bypassing the cache. + * Requires an `sfence` (wmb()) afterwards. + * %rdi - destination. + * %rsi - page size. Must be 64 bit aligned. +*/ +ENTRY(__clear_page_nt) + leaq (%rdi,%rsi), %rdx + xorl %eax, %eax + .p2align 4,,10 + .p2align 3 +.L2: + movnti %rax, (%rdi) + addq $8, %rdi + cmpq %rdx, %rdi + jne .L2 + ret +ENDPROC(__clear_page_nt) +EXPORT_SYMBOL(__clear_page_nt) diff --git a/include/linux/mm.h b/include/linux/mm.h index a0fbb9ffe380..d10ac4e7ef6a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2729,6 +2729,9 @@ enum mf_action_page_type { }; #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) +extern void clear_gigantic_page(struct page *page, + unsigned long addr, + unsigned int pages_per_huge_page); extern void clear_huge_page(struct page *page, unsigned long addr_hint, unsigned int pages_per_huge_page); diff --git a/mm/memory.c b/mm/memory.c index 7206a634270b..2515cae4af4e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -70,6 +70,7 @@ #include #include + #include #include #include @@ -4568,7 +4569,8 @@ EXPORT_SYMBOL(__might_fault); #endif #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) -static void clear_gigantic_page(struct page *page, +#ifndef __HAVE_ARCH_CLEAR_GIGANTIC_PAGE +void clear_gigantic_page(struct page *page, unsigned long addr, unsigned int pages_per_huge_page) { @@ -4582,6 +4584,7 @@ static void clear_gigantic_page(struct page *page, clear_user_highpage(p, addr + i * PAGE_SIZE); } } +#endif /* __HAVE_ARCH_CLEAR_GIGANTIC_PAGE */ void clear_huge_page(struct page *page, unsigned long addr_hint, unsigned int pages_per_huge_page) {