From patchwork Sat Mar 7 01:03:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cannon Matthews X-Patchwork-Id: 11424915 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F28B921 for ; Sat, 7 Mar 2020 01:04:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0A64206E6 for ; Sat, 7 Mar 2020 01:04:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sGMv+XNG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0A64206E6 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F24986B0003; Fri, 6 Mar 2020 20:03:59 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ED2AF6B0006; Fri, 6 Mar 2020 20:03:59 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC1DD6B0007; Fri, 6 Mar 2020 20:03:59 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C209C6B0003 for ; Fri, 6 Mar 2020 20:03:59 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 88D70180AD80F for ; Sat, 7 Mar 2020 01:03:59 +0000 (UTC) X-FDA: 76566769398.17.bead97_5b26d7ed1f5e X-Spam-Summary: 2,0,0,1b6eb2c6d23254bd,d41d8cd98f00b204,3_fjixg4kcggigttutsgzznkcymuumrk.iusrotad-ssqbgiq.uxm@flex--cannonmatthews.bounces.google.com,,RULES_HIT:2:41:152:355:379:541:800:960:967:973:988:989:1260:1277:1313:1314:1345:1431:1437:1516:1518:1535:1593:1594:1605:1730:1747:1777:1792:1801:2194:2197:2199:2200:2393:2525:2553:2559:2563:2682:2685:2693:2859:2933:2937:2939:2942:2945:2947:2951:2954:3022:3138:3139:3140:3141:3142:3152:3653:3865:3866:3867:3868:3870:3871:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4049:4120:4250:4321:4605:5007:6117:6119:6261:6653:7875:7903:8603:8784:9025:9121:9969:10004:11026:11232:11473:11657:11658:11914:12043:12291:12297:12438:12555:12679:12895:12986:13161:13229:14096:14097:14394:14659:21080:21221:21444:21451:21611:21627:21740:30012:30054:30056:30090,0,RBL:209.85.216.74:@flex--cannonmatthews.bounces.google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF: fp,MSBL: X-HE-Tag: bead97_5b26d7ed1f5e X-Filterd-Recvd-Size: 9369 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Sat, 7 Mar 2020 01:03:59 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id ne19so2259513pjb.1 for ; Fri, 06 Mar 2020 17:03:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=k5PT9JxXBQpJJ/C+GDUX6jmWCcLXzOHs3ae3IdX3WEA=; b=sGMv+XNGTDlZ6FFrxPT9k1fNcQ7yH1R47bkY7h+ZdX2oLVSEpzVb82XJJKsZpuOLBr 3XWBoUv4Cd+TSerOyyV2W+cJUZUCVds+NnD4TGu4dGNdXOZUQJh4Le3IWSNX2JIaNgMj yAqhO69Na7nSUOarO+5WrB8RNE8k4e1g8oZh/8oiORq60a/igK/nF/7Hpf4NnKFDXnsb SCS7kylOeEpQdWySpXtj58F+IS927YQZXeJq5/0LsrBkPSr0W8q5B7265BTWUNhGA5vH d5npvPYhCaQfB2y3MLr3BMjN4NUGZOARdQmPwovvMATvop+M9DdjvS34jQdYVBgU4ryF nndw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=k5PT9JxXBQpJJ/C+GDUX6jmWCcLXzOHs3ae3IdX3WEA=; b=c1cx1A2GPMc5hgEFNS508rvRxt87N7PuNjsUa8ZQUbS87i3yh373FCmc54xiNhIp4w 9HHHtFdCvyYpMDVXUl7JVgv8Oi/rU4bNMPm/MRvK3e/hqg2ChfyiFgay4GDFDLL80IAM UZAaPFEJSC96aYxchXhb/OMUlo7C/5nhtOPlb4hc+XTGXubl+IMDtIiUrabRysx3yjcD GixckxTUViGvuiOgTMGtu/zws4iw3EibpNtZZyoYFMLIT2ruBj4aeNcmRnlVlaOo7Irg ycstKmPM6Hb0Vgs+PLzmx1bmgb2aHGKEQQ+F35N0vKbD4RHMOiSMb/mBOslfbi6Ylqbh ohsQ== X-Gm-Message-State: ANhLgQ07oKffbtE8HWixqVkKi+l44vaEqJPWn0tYg36v3XafVWaIeXHP qfpnfGWZ2Qh9Dtd98GVLfHGfi0nG7frPPD6MVNOcLQ== X-Google-Smtp-Source: ADFU+vvUOr966MeOJZYRYgzhAZ75k+n47egEVLnv6WjuAOclWUGWEQ9SauBzgRdPPwfykGBiVmc6LtU0PW/z0Mv/er0g5w== X-Received: by 2002:a63:1862:: with SMTP id 34mr2622428pgy.191.1583543037600; Fri, 06 Mar 2020 17:03:57 -0800 (PST) Date: Fri, 6 Mar 2020 17:03:53 -0800 Message-Id: <20200307010353.172991-1-cannonmatthews@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.25.1.481.gfbce0eb801-goog Subject: [PATCH] mm: clear 1G pages with streaming stores on x86 From: Cannon Matthews To: Mike Kravetz , Andrew Morton Cc: Matthew Wilcox , Michal Hocko , David Rientjes , Greg Thelen , Salman Qazi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Cannon Matthews X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Reimplement clear_gigantic_page() to clear gigabytes pages using the non-temporal streaming store instructions that bypass the cache (movnti), since an entire 1GiB region will not fit in the cache anyway. Doing an mlock() on a 512GiB 1G-hugetlb region previously would take on average 134 seconds, about 260ms/GiB which is quite slow. Using `movnti` and optimizing the control flow over the constituent small pages, this can be improved roughly by a factor of 3-4x, with the 512GiB mlock() taking only 34 seconds on average, or 67ms/GiB. The assembly code for the __clear_page_nt routine is more or less taken directly from the output of gcc with -O3 for this function with some tweaks to support arbitrary sizes and moving memory barriers: void clear_page_nt_64i (void *page) { for (int i = 0; i < GiB /sizeof(long long int); ++i) { _mm_stream_si64 (((long long int*)page) + i, 0); } sfence(); } Tested: Time to `mlock()` a 512GiB region on broadwell CPU AVG time (s) % imp. ms/page clear_page_erms 133.584 - 261 clear_page_nt 34.154 74.43% 67 An earlier version of this code was sent as an RFC patch ~July 2018 https://patchwork.kernel.org/patch/10543193/ but never merged. Signed-off-by: Cannon Matthews Reported-by: kbuild test robot --- MAINTAINERS | 1 + arch/x86/Kconfig | 4 ++++ arch/x86/include/asm/page_64.h | 1 + arch/x86/lib/Makefile | 2 +- arch/x86/lib/clear_gigantic_page.c | 28 ++++++++++++++++++++++++++++ arch/x86/lib/clear_page_64.S | 19 +++++++++++++++++++ include/linux/mm.h | 2 ++ mm/memory.c | 2 ++ 8 files changed, 58 insertions(+), 1 deletion(-) create mode 100644 arch/x86/lib/clear_gigantic_page.c diff --git a/MAINTAINERS b/MAINTAINERS index 68eebf3650ac..efe84f085404 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7702,6 +7702,7 @@ S: Maintained F: fs/hugetlbfs/ F: mm/hugetlb.c F: include/linux/hugetlb.h +F: arch/x86/lib/clear_gigantic_page.c F: Documentation/admin-guide/mm/hugetlbpage.rst F: Documentation/vm/hugetlbfs_reserv.rst F: Documentation/ABI/testing/sysfs-kernel-mm-hugepages diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beea77046f9b..f49e7b6f6851 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -70,6 +70,7 @@ config X86 select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE + select ARCH_HAS_CLEAR_GIGANTIC_PAGE if X86_64 select ARCH_HAS_PMEM_API if X86_64 select ARCH_HAS_PTE_DEVMAP if X86_64 select ARCH_HAS_PTE_SPECIAL @@ -290,6 +291,9 @@ config ARCH_MAY_HAVE_PC_FDC config GENERIC_CALIBRATE_DELAY def_bool y +config ARCH_HAS_CLEAR_GIGANTIC_PAGE + bool + config ARCH_HAS_CPU_RELAX def_bool y diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 939b1cff4a7b..6ea60883b6d6 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -55,6 +55,7 @@ static inline void clear_page(void *page) } void copy_page(void *to, void *from); +void clear_page_nt(void *page, u64 page_size); #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 5246db42de45..a620c6636210 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -56,7 +56,7 @@ endif else obj-y += iomap_copy_64.o lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o - lib-y += clear_page_64.o copy_page_64.o + lib-y += clear_page_64.o copy_page_64.o clear_gigantic_page.o lib-y += memmove_64.o memset_64.o lib-y += copy_user_64.o lib-y += cmpxchg16b_emu.o diff --git a/arch/x86/lib/clear_gigantic_page.c b/arch/x86/lib/clear_gigantic_page.c new file mode 100644 index 000000000000..6fcb494ec9bc --- /dev/null +++ b/arch/x86/lib/clear_gigantic_page.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +#include +#include +#include + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) + +void clear_gigantic_page(struct page *page, unsigned long addr, + unsigned int pages) +{ + int i; + void *dest = page_to_virt(page); + + /* + * cond_resched() every 2M. Hypothetical page sizes not divisible by + * this are not supported. + */ + BUG_ON(pages % HPAGE_PMD_NR != 0); + for (i = 0; i < pages; i += HPAGE_PMD_NR) { + clear_page_nt(dest + (i * PAGE_SIZE), HPAGE_PMD_NR * PAGE_SIZE); + cond_resched(); + } + /* clear_page_nt requires an `sfence` barrier. */ + wmb(); +} +#endif /* defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) */ diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S index c4c7dd115953..1224094fd863 100644 --- a/arch/x86/lib/clear_page_64.S +++ b/arch/x86/lib/clear_page_64.S @@ -50,3 +50,22 @@ SYM_FUNC_START(clear_page_erms) ret SYM_FUNC_END(clear_page_erms) EXPORT_SYMBOL_GPL(clear_page_erms) + +/* + * Zero memory using non temporal stores, bypassing the cache. + * Requires an `sfence` (wmb()) afterwards. + * %rdi - destination. + * %rsi - page size. Must be 64 bit aligned. +*/ +SYM_FUNC_START(clear_page_nt) + leaq (%rdi,%rsi), %rdx + xorl %eax, %eax + .p2align 4,,10 + .p2align 3 +.L2: + movnti %rax, (%rdi) + addq $8, %rdi + cmpq %rdx, %rdi + jne .L2 + ret +SYM_FUNC_END(clear_page_nt) diff --git a/include/linux/mm.h b/include/linux/mm.h index c54fb96cb1e6..a57f9007374b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2856,6 +2856,8 @@ enum mf_action_page_type { }; #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) +extern void clear_gigantic_page(struct page *page, unsigned long addr, + unsigned int pages); extern void clear_huge_page(struct page *page, unsigned long addr_hint, unsigned int pages_per_huge_page); diff --git a/mm/memory.c b/mm/memory.c index e8bfdf0d9d1d..2a13bf102890 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4706,6 +4706,7 @@ static inline void process_huge_page( } } +#ifndef CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE static void clear_gigantic_page(struct page *page, unsigned long addr, unsigned int pages_per_huge_page) @@ -4720,6 +4721,7 @@ static void clear_gigantic_page(struct page *page, clear_user_highpage(p, addr + i * PAGE_SIZE); } } +#endif /* CONFIG_ARCH_HAS_CLEAR_GIGANTIC_PAGE */ static void clear_subpage(unsigned long addr, int idx, void *arg) {