From patchwork Fri Apr 3 11:29:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472561 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A720C1667 for ; Fri, 3 Apr 2020 11:29:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4AC2A216FD for ; Fri, 3 Apr 2020 11:29:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="yeb6Or9L" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4AC2A216FD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B5C338E000A; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AD3628E000D; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D00C8E000A; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 6CA598E0007 for ; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1E57D2DFD for ; Fri, 3 Apr 2020 11:29:37 +0000 (UTC) X-FDA: 76666323594.24.salt48_36c670913b04f X-Spam-Summary: 2,0,0,20f6025b1d8d7071,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:327:355:379:541:960:966:973:981:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2196:2199:2393:2559:2562:2899:2987:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4250:4321:4385:5007:6119:6238:6261:6653:7875:8603:8957:9036:10004:11026:11473:11657:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12895:12986:13161:13229:13894:14096:14394:21063:21080:21222:21324:21444:21451:21524:21627:21990:30029:30054:30056:30069:30070,0,RBL:209.85.167.54:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:49,LUA_SUMMARY:none X-HE-Tag: salt48_36c670913b04f X-Filterd-Recvd-Size: 25408 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:36 +0000 (UTC) Received: by mail-lf1-f54.google.com with SMTP id n20so5449667lfl.10 for ; Fri, 03 Apr 2020 04:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7/hJjd/ON6+SOYxrn0218veOrEIfonFa/BKVoAe1ytA=; b=yeb6Or9LO4XB6FMLqh7d+kZahYENtoSt2jk0+FuY6wOQZsAal8KZHA8AxMr/j9JwgN rlg4eJ4x702XlYaSQKnHAaosC1gVOLLxl0XeBhN7WnjxMZZzCAQWuEg28ztqs1igwIaF aY9y4lf4VVUYvI5TCPL1VDIhqAiLx+yekMETaG1w0+NqelrvdQ6cEfofObR3+6F3+et5 H4zHZ0t8SMBhg0rmwq9uqMSEbELwhUQNTYCbyh5rw9XJ5KFWpcnQpxpusLfilOzrBJtA mwgcYJKWfcrELBFlf1nm2OCWSgng1enJMrSt/2byg/qui8sZzsyUa5nZe2BRBDTk/5es 4R5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7/hJjd/ON6+SOYxrn0218veOrEIfonFa/BKVoAe1ytA=; b=adE3HWj4A/L6VoAdEhnhSgCBQFBhPihHvNPyJOjXIhwbU5bWk1Wg5Nll6totsLqNuh OSsW4vjScT1WBk5/bZMHjRQB/e41PmUTqXmgW9RwW2kT4jCxYN9PYsVT51q6F43QWdS8 P6KoIPwwwoYj4neuLFjQJLRa1sJXmrHAX38TCJ3zawFc3R1zB8M+IlXUCMPPFiF3pk4l PFmJGbK7977lXY3wBc6G9hogxW+ySJLdCE0No38iXZxLWgGUCUiJQNlAQM+57HMYzUNA rOotghIBKW/4jxVH+tHQeGhbaozU8qLr0uodITC32dobQe4jC4Ldz8M8fum6Gg5WOBeK Fzfg== X-Gm-Message-State: AGi0PuYCYhWftljvE5vGI8jOmgvQ8S51EVDMfNKwv6xhEQ4yBTI6L8bT 5yC7ev8NG9BT1XeoIdZau75e6vCYeZw= X-Google-Smtp-Source: APiQypIO3+w6aX9yGM6blrQ/SpxZd7Lg0qB16x0oSMf+046CLlZOt9OuhNO588q52tUjhy8oIMRd/w== X-Received: by 2002:a05:6512:695:: with SMTP id t21mr5175307lfe.158.1585913373900; Fri, 03 Apr 2020 04:29:33 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id m8sm4795046lji.12.2020.04.03.04.29.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:32 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 6443610132F; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 1/8] khugepaged: Add self test Date: Fri, 3 Apr 2020 14:29:21 +0300 Message-Id: <20200403112928.19742-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The test checks if khugepaged is able to recover huge page where we expetect to do so. It only covers anon-THP for now. Currenlty the test shows few failures. They are going to be addressed by the following patches. Signed-off-by: Kirill A. Shutemov Reviewed-by: Ralph Campbell --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/khugepaged.c | 899 ++++++++++++++++++++++++ 2 files changed, 900 insertions(+) create mode 100644 tools/testing/selftests/vm/khugepaged.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 7f9a8a8c31da..981d0dc21f9e 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,6 +18,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_FILES += khugepaged ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 sparc64 x86_64)) TEST_GEN_FILES += va_128TBswitch diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c new file mode 100644 index 000000000000..3eeff4a0fbc9 --- /dev/null +++ b/tools/testing/selftests/vm/khugepaged.c @@ -0,0 +1,899 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifndef MADV_PAGEOUT +#define MADV_PAGEOUT 21 +#endif + +#define BASE_ADDR ((void *)(1UL << 30)) +static unsigned long hpage_pmd_size; +static unsigned long page_size; +static int hpage_pmd_nr; + +#define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" + +enum thp_enabled { + THP_ALWAYS, + THP_MADVISE, + THP_NEVER, +}; + +static const char *thp_enabled_strings[] = { + "always", + "madvise", + "never", + NULL +}; + +enum thp_defrag { + THP_DEFRAG_ALWAYS, + THP_DEFRAG_DEFER, + THP_DEFRAG_DEFER_MADVISE, + THP_DEFRAG_MADVISE, + THP_DEFRAG_NEVER, +}; + +static const char *thp_defrag_strings[] = { + "always", + "defer", + "defer+madvise", + "madvise", + "never", + NULL +}; + +enum shmem_enabled { + SHMEM_ALWAYS, + SHMEM_WITHIN_SIZE, + SHMEM_ADVISE, + SHMEM_NEVER, + SHMEM_DENY, + SHMEM_FORCE, +}; + +static const char *shmem_enabled_strings[] = { + "always", + "within_size", + "advise", + "never", + "deny", + "force", + NULL +}; + +struct khugepaged_settings { + bool defrag; + unsigned int alloc_sleep_millisecs; + unsigned int scan_sleep_millisecs; + unsigned int max_ptes_none; + unsigned int max_ptes_swap; + unsigned long pages_to_scan; +}; + +struct settings { + enum thp_enabled thp_enabled; + enum thp_defrag thp_defrag; + enum shmem_enabled shmem_enabled; + bool debug_cow; + bool use_zero_page; + struct khugepaged_settings khugepaged; +}; + +static struct settings default_settings = { + .thp_enabled = THP_MADVISE, + .thp_defrag = THP_DEFRAG_ALWAYS, + .shmem_enabled = SHMEM_NEVER, + .debug_cow = 0, + .use_zero_page = 0, + .khugepaged = { + .defrag = 1, + .alloc_sleep_millisecs = 10, + .scan_sleep_millisecs = 10, + }, +}; + +static struct settings saved_settings; +static bool skip_settings_restore; + +static int exit_status; + +static void success(const char *msg) +{ + printf(" \e[32m%s\e[0m\n", msg); +} + +static void fail(const char *msg) +{ + printf(" \e[31m%s\e[0m\n", msg); + exit_status++; +} + +static int read_file(const char *path, char *buf, size_t buflen) +{ + int fd; + ssize_t numread; + + fd = open(path, O_RDONLY); + if (fd == -1) + return 0; + + numread = read(fd, buf, buflen - 1); + if (numread < 1) { + close(fd); + return 0; + } + + buf[numread] = '\0'; + close(fd); + + return (unsigned int) numread; +} + +static int write_file(const char *path, const char *buf, size_t buflen) +{ + int fd; + ssize_t numwritten; + + fd = open(path, O_WRONLY); + if (fd == -1) + return 0; + + numwritten = write(fd, buf, buflen - 1); + close(fd); + if (numwritten < 1) + return 0; + + return (unsigned int) numwritten; +} + +static int read_string(const char *name, const char *strings[]) +{ + char path[PATH_MAX]; + char buf[256]; + char *c; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + if (!read_file(path, buf, sizeof(buf))) { + perror(path); + exit(EXIT_FAILURE); + } + + c = strchr(buf, '['); + if (!c) { + printf("%s: Parse failure\n", __func__); + exit(EXIT_FAILURE); + } + + c++; + memmove(buf, c, sizeof(buf) - (c - buf)); + + c = strchr(buf, ']'); + if (!c) { + printf("%s: Parse failure\n", __func__); + exit(EXIT_FAILURE); + } + *c = '\0'; + + ret = 0; + while (strings[ret]) { + if (!strcmp(strings[ret], buf)) + return ret; + ret++; + } + + printf("Failed to parse %s\n", name); + exit(EXIT_FAILURE); +} + +static void write_string(const char *name, const char *val) +{ + char path[PATH_MAX]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + if (!write_file(path, val, strlen(val) + 1)) { + perror(path); + exit(EXIT_FAILURE); + } +} + +static const unsigned long read_num(const char *name) +{ + char path[PATH_MAX]; + char buf[21]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + ret = read_file(path, buf, sizeof(buf)); + if (ret < 0) { + perror("read_file(read_num)"); + exit(EXIT_FAILURE); + } + + return strtoul(buf, NULL, 10); +} + +static void write_num(const char *name, unsigned long num) +{ + char path[PATH_MAX]; + char buf[21]; + int ret; + + ret = snprintf(path, PATH_MAX, THP_SYSFS "%s", name); + if (ret >= PATH_MAX) { + printf("%s: Pathname is too long\n", __func__); + exit(EXIT_FAILURE); + } + + sprintf(buf, "%ld", num); + if (!write_file(path, buf, strlen(buf) + 1)) { + perror(path); + exit(EXIT_FAILURE); + } +} + +static void write_settings(struct settings *settings) +{ + struct khugepaged_settings *khugepaged = &settings->khugepaged; + + write_string("enabled", thp_enabled_strings[settings->thp_enabled]); + write_string("defrag", thp_defrag_strings[settings->thp_defrag]); + write_string("shmem_enabled", + shmem_enabled_strings[settings->shmem_enabled]); + write_num("debug_cow", settings->debug_cow); + write_num("use_zero_page", settings->use_zero_page); + + write_num("khugepaged/defrag", khugepaged->defrag); + write_num("khugepaged/alloc_sleep_millisecs", + khugepaged->alloc_sleep_millisecs); + write_num("khugepaged/scan_sleep_millisecs", + khugepaged->scan_sleep_millisecs); + write_num("khugepaged/max_ptes_none", khugepaged->max_ptes_none); + write_num("khugepaged/max_ptes_swap", khugepaged->max_ptes_swap); + write_num("khugepaged/pages_to_scan", khugepaged->pages_to_scan); +} + +static void restore_settings(int sig) +{ + if (skip_settings_restore) + goto out; + + printf("Restore THP and khugepaged settings..."); + write_settings(&saved_settings); + success("OK"); + if (sig) + exit(EXIT_FAILURE); +out: + exit(exit_status); +} + +static void save_settings(void) +{ + printf("Save THP and khugepaged settings..."); + saved_settings = (struct settings) { + .thp_enabled = read_string("enabled", thp_enabled_strings), + .thp_defrag = read_string("defrag", thp_defrag_strings), + .shmem_enabled = + read_string("shmem_enabled", shmem_enabled_strings), + .debug_cow = read_num("debug_cow"), + .use_zero_page = read_num("use_zero_page"), + }; + saved_settings.khugepaged = (struct khugepaged_settings) { + .defrag = read_num("khugepaged/defrag"), + .alloc_sleep_millisecs = + read_num("khugepaged/alloc_sleep_millisecs"), + .scan_sleep_millisecs = + read_num("khugepaged/scan_sleep_millisecs"), + .max_ptes_none = read_num("khugepaged/max_ptes_none"), + .max_ptes_swap = read_num("khugepaged/max_ptes_swap"), + .pages_to_scan = read_num("khugepaged/pages_to_scan"), + }; + success("OK"); + + signal(SIGTERM, restore_settings); + signal(SIGINT, restore_settings); + signal(SIGHUP, restore_settings); + signal(SIGQUIT, restore_settings); +} + +static void adjust_settings(void) +{ + + printf("Adjust settings..."); + write_settings(&default_settings); + success("OK"); +} + +#define CHECK_HUGE_FMT "sed -ne " \ + "'/^%lx/,/^AnonHugePages/{/^AnonHugePages:\\s*%ld kB/ q1}' " \ + "/proc/%d/smaps" + +static bool check_huge(void *p) +{ + char *cmd; + int ret; + + ret = asprintf(&cmd, CHECK_HUGE_FMT, + (unsigned long)p, hpage_pmd_size >> 10, getpid()); + if (ret < 0) { + perror("asprintf(CHECK_FMT)"); + exit(EXIT_FAILURE); + } + + ret = system(cmd); + free(cmd); + if (ret < 0 || !WIFEXITED(ret)) { + perror("system(check_huge)"); + exit(EXIT_FAILURE); + } + + return WEXITSTATUS(ret); +} + +#define CHECK_SWAP_FMT "sed -ne " \ + "'/^%lx/,/^Swap:/{/^Swap:\\s*%ld kB/ q1}' " \ + "/proc/%d/smaps" + +static bool check_swap(void *p, unsigned long size) +{ + char *cmd; + int ret; + + ret = asprintf(&cmd, CHECK_SWAP_FMT, + (unsigned long)p, size >> 10, getpid()); + if (ret < 0) { + perror("asprintf(CHECK_SWAP)"); + exit(EXIT_FAILURE); + } + + ret = system(cmd); + free(cmd); + if (ret < 0 || !WIFEXITED(ret)) { + perror("system(check_swap)"); + exit(EXIT_FAILURE); + } + + return WEXITSTATUS(ret); +} + +static void *alloc_mapping(void) +{ + void *p; + + p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (p != BASE_ADDR) { + printf("Failed to allocate VMA at %p\n", BASE_ADDR); + exit(EXIT_FAILURE); + } + + return p; +} + +static void fill_memory(int *p, unsigned long start, unsigned long end) +{ + int i; + + for (i = start / page_size; i < end / page_size; i++) + p[i * page_size / sizeof(*p)] = i + 0xdead0000; +} + +static void validate_memory(int *p, unsigned long start, unsigned long end) +{ + int i; + + for (i = start / page_size; i < end / page_size; i++) { + if (p[i * page_size / sizeof(*p)] != i + 0xdead0000) { + printf("Page %d is corrupted: %#x\n", + i, p[i * page_size / sizeof(*p)]); + exit(EXIT_FAILURE); + } + } +} + +#define TICK 500000 +static bool wait_for_scan(const char *msg, char *p) +{ + int full_scans; + int timeout = 6; /* 3 seconds */ + + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + + /* Wait until the second full_scan completed */ + full_scans = read_num("khugepaged/full_scans") + 2; + + printf("%s...", msg); + while (timeout--) { + if (check_huge(p)) + break; + if (read_num("khugepaged/full_scans") >= full_scans) + break; + printf("."); + usleep(TICK); + } + + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + + return !timeout; +} + +static void alloc_at_fault(void) +{ + struct settings settings = default_settings; + char *p; + + settings.thp_enabled = THP_ALWAYS; + write_settings(&settings); + + p = alloc_mapping(); + *p = 1; + printf("Allocate huge page on fault..."); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + write_settings(&default_settings); + + madvise(p, page_size, MADV_DONTNEED); + printf("Split huge PMD on MADV_DONTNEED..."); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + munmap(p, hpage_pmd_size); +} + +static void collapse_full(void) +{ + void *p; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + if (wait_for_scan("Collapse fully populated PTE table", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_empty(void) +{ + void *p; + + p = alloc_mapping(); + if (wait_for_scan("Do not collapse empty PTE table", p)) + fail("Timeout"); + else if (check_huge(p)) + fail("Fail"); + else + success("OK"); + munmap(p, hpage_pmd_size); +} + +static void collapse_signle_pte_entry(void) +{ + void *p; + + p = alloc_mapping(); + fill_memory(p, 0, page_size); + if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_max_ptes_none(void) +{ + int max_ptes_none = hpage_pmd_nr / 2; + struct settings settings = default_settings; + void *p; + + settings.khugepaged.max_ptes_none = max_ptes_none; + write_settings(&settings); + + p = alloc_mapping(); + + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); + if (wait_for_scan("Do not collapse with max_ptes_none exeeded", p)) + fail("Timeout"); + else if (check_huge(p)) + fail("Fail"); + else + success("OK"); + validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); + + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + + munmap(p, hpage_pmd_size); + write_settings(&default_settings); +} + +static void collapse_swapin_single_pte(void) +{ + void *p; + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Swapout one page..."); + if (madvise(p, page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } + + if (wait_for_scan("Collapse with swaping in single PTE entry", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); +out: + munmap(p, hpage_pmd_size); +} + +static void collapse_max_ptes_swap(void) +{ + int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); + void *p; + + p = alloc_mapping(); + + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap + 1, hpage_pmd_nr); + if (madvise(p, (max_ptes_swap + 1) * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, (max_ptes_swap + 1) * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } + + if (wait_for_scan("Do not collapse with max_ptes_swap exeeded", p)) + fail("Timeout"); + else if (check_huge(p)) + fail("Fail"); + else + success("OK"); + validate_memory(p, 0, hpage_pmd_size); + + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } + + if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); +out: + munmap(p, hpage_pmd_size); +} + +static void collapse_signle_pte_entry_compound(void) +{ + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + + printf("Split huge page leaving single PTE mapping compount page..."); + madvise(p + page_size, hpage_pmd_size - page_size, MADV_DONTNEED); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Collapse PTE table with single PTE mapping compount page", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_full_of_compound(void) +{ + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Split huge page leaving single PTE page table full of compount pages..."); + madvise(p, page_size, MADV_NOHUGEPAGE); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Collapse PTE table full of compound pages", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_compound_extreme(void) +{ + void *p; + int i; + + p = alloc_mapping(); + for (i = 0; i < hpage_pmd_nr; i++) { + printf("\rConstruct PTE page table full of different PTE-mapped compound pages %3d/%d...", + i + 1, hpage_pmd_nr); + + madvise(BASE_ADDR, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(BASE_ADDR, 0, hpage_pmd_size); + if (!check_huge(BASE_ADDR)) { + printf("Failed to allocate huge page\n"); + exit(EXIT_FAILURE); + } + madvise(BASE_ADDR, hpage_pmd_size, MADV_NOHUGEPAGE); + + p = mremap(BASE_ADDR - i * page_size, + i * page_size + hpage_pmd_size, + (i + 1) * page_size, + MREMAP_MAYMOVE | MREMAP_FIXED, + BASE_ADDR + 2 * hpage_pmd_size); + if (p == MAP_FAILED) { + perror("mremap+unmap"); + exit(EXIT_FAILURE); + } + + p = mremap(BASE_ADDR + 2 * hpage_pmd_size, + (i + 1) * page_size, + (i + 1) * page_size + hpage_pmd_size, + MREMAP_MAYMOVE | MREMAP_FIXED, + BASE_ADDR - (i + 1) * page_size); + if (p == MAP_FAILED) { + perror("mremap+alloc"); + exit(EXIT_FAILURE); + } + } + + munmap(BASE_ADDR, hpage_pmd_size); + fill_memory(p, 0, hpage_pmd_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Collapse PTE table full of different compound pages", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_fork(void) +{ + int wstatus; + void *p; + + p = alloc_mapping(); + + printf("Allocate small page..."); + fill_memory(p, 0, page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Share small page over fork()..."); + if (!fork()) { + /* Do not touch settings on child exit */ + skip_settings_restore = true; + exit_status = 0; + + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + fill_memory(p, page_size, 2 * page_size); + + if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); + exit(exit_status); + } + + wait(&wstatus); + exit_status += WEXITSTATUS(wstatus); + + printf("Check if parent still has small page..."); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, page_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_fork_compound(void) +{ + int wstatus; + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Share huge page over fork()..."); + if (!fork()) { + /* Do not touch settings on child exit */ + skip_settings_restore = true; + exit_status = 0; + + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Split huge page PMD in child process..."); + madvise(p, page_size, MADV_NOHUGEPAGE); + madvise(p, hpage_pmd_size, MADV_NOHUGEPAGE); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + fill_memory(p, 0, page_size); + + if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); + exit(exit_status); + } + + wait(&wstatus); + exit_status += WEXITSTATUS(wstatus); + + printf("Check if parent still has huge page..."); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +int main(void) +{ + setbuf(stdout, NULL); + + page_size = getpagesize(); + hpage_pmd_size = read_num("hpage_pmd_size"); + hpage_pmd_nr = hpage_pmd_size / page_size; + + default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; + default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; + default_settings.khugepaged.pages_to_scan = hpage_pmd_nr * 8; + + save_settings(); + adjust_settings(); + + alloc_at_fault(); + collapse_full(); + collapse_empty(); + collapse_signle_pte_entry(); + collapse_max_ptes_none(); + collapse_swapin_single_pte(); + collapse_max_ptes_swap(); + collapse_signle_pte_entry_compound(); + collapse_full_of_compound(); + collapse_compound_extreme(); + collapse_fork(); + collapse_fork_compound(); + + restore_settings(0); +} From patchwork Fri Apr 3 11:29:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9A511667 for ; Fri, 3 Apr 2020 11:29:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 95F0A20BED for ; Fri, 3 Apr 2020 11:29:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="K/3jDpUj" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 95F0A20BED Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC5568E000B; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E74B38E0007; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D42028E000B; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0050.hostedemail.com [216.40.44.50]) by kanga.kvack.org (Postfix) with ESMTP id AC6F38E0007 for ; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6234F40C3 for ; Fri, 3 Apr 2020 11:29:37 +0000 (UTC) X-FDA: 76666323594.12.tray00_36d17eda7a453 X-Spam-Summary: 2,0,0,7b62ce83062f3f5b,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3352:3608:3865:3866:3867:3870:3871:3872:3874:4250:4321:5007:6261:6653:10004:11026:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12679:12895:12986:13069:13161:13229:13255:13311:13357:13894:14096:14181:14384:14394:14721:21080:21433:21444:21451:21627:21990,0,RBL:209.85.208.194:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:105,LUA_SUMMARY:none X-HE-Tag: tray00_36d17eda7a453 X-Filterd-Recvd-Size: 4440 Received: from mail-lj1-f194.google.com (mail-lj1-f194.google.com [209.85.208.194]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:36 +0000 (UTC) Received: by mail-lj1-f194.google.com with SMTP id k21so6611242ljh.2 for ; Fri, 03 Apr 2020 04:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5yEswaSUA2DH71ZsH03UY5693O0nZ/nvg6oBTomxklk=; b=K/3jDpUjVVBeQnEhKWwM3gy5Gru8BoX47Ll9axoy1+xsPYklIrCBMX+fynu8VQkirs c7IdaWFPcUAldS81Lgl8ts8UDyFmKK3wuwf482yQ5bwzcLTPPeeEmvb5bjZbaEtS5n1J 980tEnkeEIMw580fqyrxQAJGlI8ZBc5u0zbNRGkxjEdap7yl1DDi31Sve1b1iRjJtidt K8hYew8M8h7Y4uEk48knMl1An7JC3fl1k9m7z60tTErq8ZJXAOHVNOwOCMj5aGJsNKBT INczse/l8Bw4CLc8ed/WJp0e/iooX+2Y+2EaKM3rsCunrr9NUn1zbLDKNy1WRvbv8m7S GWow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5yEswaSUA2DH71ZsH03UY5693O0nZ/nvg6oBTomxklk=; b=MKFnuvyRnQykbSFYfppZEllLDWQYieQzmcefVb4kAHgoZ+SPeW2wxIVRMpacPM/ulp 0n37tlDb7Ri6wqvtxSM9/KPfa9JV42d120rqwCi9h723GxppFdMdo40wK5EZS0kPUnyJ 27qwUyf9sYvr/BbAM1IxakLv7IjGllD11La1oBic9ikXPfOxoA8zm3ryiGN2ooqSiSty 2e3KaFolRk5xV4a1/ZRRHqTA3UdbMpwTIaZp/Exb0hYm+PiDu9+0eO8CnqGvuzuVi/s7 pX0j5BB5hxM30OxAoqsyonCJVIKIUcG+Icd7CNSnK7ITlBHuMUuP1Z1DEzRVKNl6kiCK JzaA== X-Gm-Message-State: AGi0PuboF50cug1QslnkcsrdOw8WXgDkesMX6qTf7/feKcT71vDKuT5W 6jshoDTsmljU7Nl5i7QdtM/aPz7mTy0= X-Google-Smtp-Source: APiQypKvudeHPoFOkTLttn1XgRDx3Ffmfb4xTNZsju3O5Jb2cdvCtz/BcvwoJutkC4kvL/iz3qhYPw== X-Received: by 2002:a2e:9718:: with SMTP id r24mr4294536lji.287.1585913374778; Fri, 03 Apr 2020 04:29:34 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id j2sm5765073lfh.70.2020.04.03.04.29.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:33 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 6B64E101330; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 2/8] khugepaged: Do not stop collapse if less than half PTEs are referenced Date: Fri, 3 Apr 2020 14:29:22 +0300 Message-Id: <20200403112928.19742-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __collapse_huge_page_swapin() check number of referenced PTE to decide if the memory range is hot enough to justify swapin. The problem is that it stops collapse altogether if there's not enough referenced pages, not only swappingin. Signed-off-by: Kirill A. Shutemov Fixes: 0db501f7a34c ("mm, thp: convert from optimistic swapin collapsing to conservative") Reviewed-by: Zi Yan Acked-by: Yang Shi --- mm/khugepaged.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 99bab7e4d05b..14d7afc90786 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -905,7 +905,8 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, /* we only decide to swapin, if there is enough young ptes */ if (referenced < HPAGE_PMD_NR/2) { trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); - return false; + /* Do not block collapse, only skip swapping in */ + return true; } vmf.pte = pte_offset_map(pmd, address); for (; vmf.address < address + HPAGE_PMD_NR*PAGE_SIZE; From patchwork Fri Apr 3 11:29:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4292B14DD for ; Fri, 3 Apr 2020 11:29:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0DC202082F for ; Fri, 3 Apr 2020 11:29:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="sCxEbPsB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0DC202082F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E267B8E0009; Fri, 3 Apr 2020 07:29:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DD6068E0007; Fri, 3 Apr 2020 07:29:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC6248E0009; Fri, 3 Apr 2020 07:29:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 93F768E0007 for ; Fri, 3 Apr 2020 07:29:35 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 53CF0181AEF1A for ; Fri, 3 Apr 2020 11:29:35 +0000 (UTC) X-FDA: 76666323510.19.coach28_369484cc46f47 X-Spam-Summary: 2,0,0,20f49ac07d78c40d,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:173:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1539:1711:1714:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3351:3865:3867:3868:3870:3872:4250:5007:6261:6653:10004:11026:11658:11914:12043:12297:12438:12517:12519:12555:12895:13069:13311:13357:13894:14096:14181:14384:14394:14721:21080:21324:21444:21451:21627:30054,0,RBL:209.85.208.176:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: coach28_369484cc46f47 X-Filterd-Recvd-Size: 4081 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:34 +0000 (UTC) Received: by mail-lj1-f176.google.com with SMTP id g27so6512268ljn.10 for ; Fri, 03 Apr 2020 04:29:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UDV67uaM0olVe42g3gcRUTFOTLKxm8Yf483YPZ3Dud8=; b=sCxEbPsBTakoE1xknw2RtO2rz8oxNAb6cCS8SzCviLU+wXJ5m+DFbQMKB/W7GcAtYQ D/0LK1slWaUCNicIxwMSPPcXyy0SuopgawhAqBDv8LGru/h2Rz//tPVcOCiAX0SeMfie Rtmac2n0Qy8TbXeXOE4KNNFMQCPwok9lSTX2O0YBd41GsuzvfaBW+oKM/+RQaey1e1CJ F7HBir5eC9AV6iJ3nDVZiaM9M6YOksUJ7CwPnmOYtDbgDLtLnz60hL49JSVAixZuYBQv zN9NlOYVLhwtvCwlZiirwMvkOanRxXL2qLV3wkgtzXR8972SVxScB7hqZL3eWlOC48Ye rPZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UDV67uaM0olVe42g3gcRUTFOTLKxm8Yf483YPZ3Dud8=; b=VwD4jq4cqi3IAe7vhBqhsO5Cdzq+NMqcWP8Rp57MuTX7+KLnoheyQSzxt0z4gzlB1q b+4sm3rN1hGPDnnSr8V9OcAstV3Tcy8eblaWNMiPG11ibKHPs0PQmgQadCJWAYpLbtwH pi6sw/hHq1NkX0g/Z8CnczQO7k/ywqy3qww2kCOHsDmiU3h8ETdmJPWWQ5JYnjiC6ncl ERRNGDtY5bbnMApyKtZ5gpQbP+pMkIxNkko5BYMlSdhfXGzdeWrffmu6PefJ95WvXpBx QnC8ofY4+KuYR2QyHSBp4Rj2dPnq4D+N88tGubgO2Ufa27FidodoDegmL9GvD8erBDdW kNHg== X-Gm-Message-State: AGi0PuY5cDow0UdqCmf7nuHK5znrrGFDxNJkpltGYHyOQ/Aj9gr7JwZn stPDr8H45I/NvwjsXITqIvLRTJqpw6o= X-Google-Smtp-Source: APiQypKJovl0REGHDL9yO1HGUDFWiM4HR9jobJ/t8s9UpGtLW4jkkAHDaE6pZx5lnJh/RuFHjMKwQA== X-Received: by 2002:a2e:3203:: with SMTP id y3mr4372576ljy.54.1585913373250; Fri, 03 Apr 2020 04:29:33 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id g5sm4839412ljl.106.2020.04.03.04.29.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:32 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 720FE101331; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 3/8] khugepaged: Drain all LRU caches before scanning pages Date: Fri, 3 Apr 2020 14:29:23 +0300 Message-Id: <20200403112928.19742-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Having a page in LRU add cache offsets page refcount and gives false-negative on PageLRU(). It reduces collapse success rate. Drain all LRU add caches before scanning. It happens relatively rare and should not disturb the system too much. Signed-off-by: Kirill A. Shutemov Acked-by: Yang Shi --- mm/khugepaged.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 14d7afc90786..fdc10ffde1ca 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2065,6 +2065,8 @@ static void khugepaged_do_scan(void) barrier(); /* write khugepaged_pages_to_scan to local stack */ + lru_add_drain_all(); + while (progress < pages) { if (!khugepaged_prealloc_page(&hpage, &wait)) break; From patchwork Fri Apr 3 11:29:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472559 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD17914DD for ; Fri, 3 Apr 2020 11:29:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A99602073B for ; Fri, 3 Apr 2020 11:29:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="x0yHw6L/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A99602073B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D1E78E000C; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 833208E000B; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AFE98E000A; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0038.hostedemail.com [216.40.44.38]) by kanga.kvack.org (Postfix) with ESMTP id 2FF158E0007 for ; Fri, 3 Apr 2020 07:29:37 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E5047180AD804 for ; Fri, 3 Apr 2020 11:29:36 +0000 (UTC) X-FDA: 76666323552.23.coach44_36c74ccc96c07 X-Spam-Summary: 2,0,0,8d3fa67fb637167d,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:355:379:541:960:968:973:988:989:1260:1311:1314:1345:1359:1437:1515:1534:1540:1711:1730:1747:1777:1792:2393:2553:2559:2562:3138:3139:3140:3141:3142:3352:3608:3865:3867:3870:3872:4250:5007:6261:6653:10004:11026:11658:11914:12043:12296:12297:12438:12517:12519:12555:12895:13069:13311:13357:13894:14096:14181:14384:14394:14721:21080:21444:21451:21627:21990:30054:30090,0,RBL:209.85.167.49:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: coach44_36c74ccc96c07 X-Filterd-Recvd-Size: 4134 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:36 +0000 (UTC) Received: by mail-lf1-f49.google.com with SMTP id e7so5511936lfq.1 for ; Fri, 03 Apr 2020 04:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mAGZx/EGSI/RDjikjycvglo1xEb2tWim3LCqQGVnzJI=; b=x0yHw6L/NgJHQg6clxAQ3NicETBimmrGkxc5bhLtUTHIZZ6gdjrI89zcBBRayGoUir 7ufdkT4qvJVe+kOZvQbza2ILOMRGvZLR8jyJ1EKVJVN5VePrJVVG9hKMEjkhfUj+6p30 cm6cVLDe1QxBjGcUYoMnxWjf3bfLuxj/jPNPguI1Akb5WzqU3t+fnrXihgwP/ppUph4A CT3L4rtI5GFKkmcmzxY7uLbnZDFQfWFqto6O/7V/qNfsjFmHmj+ow7Eikvk0jTBposlE HYt7MTTbT/vjnMgOVp6OtXYFrpQClrjVosE22wjvOGPykQyxB03RFBVO3UNwtMK4WO/z GT6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mAGZx/EGSI/RDjikjycvglo1xEb2tWim3LCqQGVnzJI=; b=QNxTWmzbTvRYWRkmEcRPduObglMjTcMBZJEDIGcHwx11fuvl+lz6B1Z/6CKlEPLGgL s4J+8VCNp09iBBo5hAcMzkk7J4bV8pkB7aXyFuXYjyBkkH4CzuPbTgonuJcgYW1YB6Jh TInNUwIaAuMkreLggWvWdpngW1e7ohd4Qj5VSkMGzrrIcUOkC1+x8JPWlL6y17/PQuG8 zOZR4AdjLYTHqpZ6bPZHwmnlxQam2DAxyUAyfR3JIgiYg27l61lMUi+TYo3AK8n2dCw8 hdhhyYeMrMBHtAd2zgID+WuhZe73PpyiLsFPkKGecLLYKn/CzFip92yYMU9L7+9+G+vB KQ+A== X-Gm-Message-State: AGi0PuaUCVwUkyZ3dsz0VoMGji22T94XM959nwizKr2HQ2oH9Xg0lvQ9 18/qtzupqIdaiT/30P2cdoQjIfGrXGo= X-Google-Smtp-Source: APiQypJP33b4u4UTOhlDU6gSnxe16ao8dSUMHIdTs8YvwSZz48wsEnm0lTloLTHHu7PruzVkkB7nSQ== X-Received: by 2002:a05:6512:14a:: with SMTP id m10mr688758lfo.152.1585913374487; Fri, 03 Apr 2020 04:29:34 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id p14sm3202760lfe.87.2020.04.03.04.29.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:33 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 78D6B101332; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 4/8] khugepaged: Drain LRU add pagevec after swapin Date: Fri, 3 Apr 2020 14:29:24 +0300 Message-Id: <20200403112928.19742-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __collapse_huge_page_isolate() may fail due to extra pin in the LRU add pagevec. It's petty common for swapin case: we swap in pages just to fail due to the extra pin. Drain LRU add pagevec on sucessfull swapin. Signed-off-by: Kirill A. Shutemov --- mm/khugepaged.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index fdc10ffde1ca..57ff287caf6b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -940,6 +940,11 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, } vmf.pte--; pte_unmap(vmf.pte); + + /* Drain LRU add pagevec to remove extra pin on the swapped in pages */ + if (swapped_in) + lru_add_drain(); + trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 1); return true; } From patchwork Fri Apr 3 11:29:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472567 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 607941805 for ; Fri, 3 Apr 2020 11:29:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D40920857 for ; Fri, 3 Apr 2020 11:29:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="e7D7Emgb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D40920857 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3EEF68E000D; Fri, 3 Apr 2020 07:29:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 39FD58E0007; Fri, 3 Apr 2020 07:29:39 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DEB68E000D; Fri, 3 Apr 2020 07:29:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id CD4F48E000D for ; Fri, 3 Apr 2020 07:29:38 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 600358245571 for ; Fri, 3 Apr 2020 11:29:38 +0000 (UTC) X-FDA: 76666323636.21.skirt96_3704d82ab9250 X-Spam-Summary: 2,0,0,f07e3b49ce5735b5,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:41:69:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1535:1542:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3354:3865:3867:3868:3871:3872:3874:4250:5007:6119:6261:6653:7903:8957:10004:11026:11232:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12895:13161:13229:13894:14096:14181:14394:14721:21080:21212:21444:21451:21627:21987:30003:30054:30070,0,RBL:209.85.208.175:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: skirt96_3704d82ab9250 X-Filterd-Recvd-Size: 5510 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:37 +0000 (UTC) Received: by mail-lj1-f175.google.com with SMTP id b1so6587979ljp.3 for ; Fri, 03 Apr 2020 04:29:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JyG4ZK4aUipdi7l1+T9raGPiyVfjwGerp10mgDADK/g=; b=e7D7EmgbeOg4wV7XYkB6LstaOwCRw67bTZ64C4M23mg6OujN+tMKmAkpzQ9jKTpyTJ KxuyKX9jKpE+54QURhvJKVl6D91OVqshcTmGntYjrpOrRN9Xsa79rJBPFl++qnp484tX yBPZQRoEesmxEQBb0ohdDqb3FYEhmSTuA+D7bmCVanFSAg/leJEfQp0/rkFHjc90Rt8c Gqx37GTb6CIFM2nz/wuMUUNIOqtHWTIyWKf341A4xRTJUCW7nBhwDmF8W8hdgaK756YS DvqxrRwGJHAQxWysJkOlgVwFrOiRaA+HpZNlWBVQsoPfZUOoK0G3KQYxrNB8ePQT6Hyy BtRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JyG4ZK4aUipdi7l1+T9raGPiyVfjwGerp10mgDADK/g=; b=QliGYEsVpy6O44AitdaoYUeZERcI7NJ+ZNCBunNYt5mqnnFO3HGMkMc/KCFoS6gRSx g5hGnI75nUzV62lwoEIWNRRy6xhrPafM/xCSHRU5KvXBVWBqQyPO53nXZPI7XnU4dkZe 2FeBDiIpCoUwHWmD2PZ5JlzNqTvnv1VYjvhVMXLtqxqNLW7kqlbXz9oEWRBKAtjImuio cLQYp3TnDOggPfLOIYEBrruEWEFjpPKNf+ktDYr2HmgDfodmq5x/sRAfaLQNiRL/6XHj Rr6cXrEP+3VYcs/3F0sfJr49Dg0+knPC2fsMkV9i4MfKsbgkGYBdcMMvGdyl6ZaIcNlF IKSg== X-Gm-Message-State: AGi0PuaqJZZuJK5cGp4Tjf0goYxbUQVj99LgwVY1aXy/auHv/wnYEbmu WTW6lCXSnP6CHUX22fDcdTRFA+qEJTI= X-Google-Smtp-Source: APiQypJ8HECNkEXU+lf1IzeCv/XpLsXDQZbo2YR6k66TkqgEGuWkHBHPcBP20GcHpu4L0lv61PyUhA== X-Received: by 2002:a2e:7a0b:: with SMTP id v11mr4343569ljc.120.1585913376132; Fri, 03 Apr 2020 04:29:36 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id c2sm6070260lfb.43.2020.04.03.04.29.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:34 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 7FF00101333; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 5/8] khugepaged: Allow to callapse a page shared across fork Date: Fri, 3 Apr 2020 14:29:25 +0300 Message-Id: <20200403112928.19742-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page can be included into collapse as long as it doesn't have extra pins (from GUP or otherwise). Signed-off-by: Kirill A. Shutemov --- mm/khugepaged.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 57ff287caf6b..1e7e6543ebca 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -581,11 +581,18 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } /* - * cannot use mapcount: can't collapse if there's a gup pin. - * The page must only be referenced by the scanned process - * and page swap cache. + * Check if the page has any GUP (or other external) pins. + * + * The page table that maps the page has been already unlinked + * from the page table tree and this process cannot get + * additinal pin on the page. + * + * New pins can come later if the page is shared across fork, + * but not for the this process. It is fine. The other process + * cannot write to the page, only trigger CoW. */ - if (page_count(page) != 1 + PageSwapCache(page)) { + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { unlock_page(page); result = SCAN_PAGE_COUNT; goto out; @@ -672,7 +679,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } else { src_page = pte_page(pteval); copy_user_highpage(page, src_page, address, vma); - VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); release_pte_page(src_page); /* * ptl mostly unnecessary, but preempt has to @@ -1206,12 +1212,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } - /* - * cannot use mapcount: can't collapse if there's a gup pin. - * The page must only be referenced by the scanned process - * and page swap cache. - */ - if (page_count(page) != 1 + PageSwapCache(page)) { + /* Check if the page has any GUP (or other external) pins */ + if (total_mapcount(page) + PageSwapCache(page) != + page_count(page)) { result = SCAN_PAGE_COUNT; goto out_unmap; } From patchwork Fri Apr 3 11:29:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472565 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36B0014DD for ; Fri, 3 Apr 2020 11:29:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE27220857 for ; Fri, 3 Apr 2020 11:29:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="HxR/QJ/L" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE27220857 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0E1868E000E; Fri, 3 Apr 2020 07:29:39 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 06D0E8E0007; Fri, 3 Apr 2020 07:29:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB18F8E000E; Fri, 3 Apr 2020 07:29:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id AF9088E0007 for ; Fri, 3 Apr 2020 07:29:38 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5D036824556B for ; Fri, 3 Apr 2020 11:29:38 +0000 (UTC) X-FDA: 76666323636.11.store05_3701f647bac5e X-Spam-Summary: 2,0,0,0a9c43daf9fc873e,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:1:2:41:69:355:379:541:960:966:968:973:981:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2194:2196:2199:2200:2393:2559:2562:2895:3138:3139:3140:3141:3142:3865:3867:3868:3870:3871:3872:4050:4250:4321:4385:4605:5007:6119:6261:6653:7903:8957:10004:11026:11232:11473:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12683:12895:13894:14096:14110:14394:21063:21080:21444:21451:21627:21966:21987:21990:30003:30054:30070,0,RBL:209.85.208.195:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: store05_3701f647bac5e X-Filterd-Recvd-Size: 10621 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:37 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id 19so6539110ljj.7 for ; Fri, 03 Apr 2020 04:29:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LuEy02dea2Id4xDMDSGK8RSYYGNAW5A/0ZkDN1BjR34=; b=HxR/QJ/L4Z6HP93x/D6CzjyRFeUboWgvzTyMgbZfnDaxgUHZN6tIaAcsHiEasuYhNH 5r/gCVehKDpxFiqtwJBjsLp0fDkEwMPG6M75u/aBZAg5RRr+kUJ1t7528sN06um6s34e +HarAoZzw6ob0Q+4xbBLr9YCpuByc629A8SomBymWKozj0ezCagw540mUpvay83YJQ8G ZOrHUWhr3kKtVIeOh6Nh+2Epm9gBMwcqjcnNUmXczD9YUlg+ON1QX840ur06fbbEHvi6 ecRCHW4hgy3PbS/LvDAPkBfCAl8BIFXGMj++xMLy2Iy8e7YIAQ4H158yqwpl8FWaC9Yh Dftg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LuEy02dea2Id4xDMDSGK8RSYYGNAW5A/0ZkDN1BjR34=; b=DBm+6OA+t/712BXacGa151Zg6vJP6EjaqvxXi75aFIcyvEPB1G3ac8E/Jy/b/pIDGE /5zl4FfFVGLEVDmO9eTXUStEImxu+3p7ildmP/lKQsGsvyPY4CSZtPhHn5Ui90kBnghh qtHbwYImwigG0JOjoS8ctwFi3pNaJnOgzGL7NNw0QDrcpcArtoCyiX7nXXcuPmVc9+1P XXPnxj6JJwVMhqX0hU4hROYulKypHIaP3q6wS4P1KdFSbOb7GILL4bphLTJ/LSZVLd4c poaLTgse3Mz4kvVlagK2ZxjiznsdLbeDroIdheIqEV93N2HXnxvYag8LcyynMYAr4vOh BMpw== X-Gm-Message-State: AGi0PuZCB/kP5aNpvhHI/w/52OT5N+cVo0G/mqXzJX2izC+YA/I4lmvP SlCC75jKeJBG0RZ7JyD4oQR9AWjrMtI= X-Google-Smtp-Source: APiQypJx19H3PKA94XWdS8fRP37nmflFSLxOtZfO5fJK+nPnKT5g2EYx9/vmKNzWDSlNFpPCBSsWsw== X-Received: by 2002:a2e:8850:: with SMTP id z16mr4499348ljj.284.1585913375878; Fri, 03 Apr 2020 04:29:35 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id n23sm4708341lji.59.2020.04.03.04.29.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:34 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 86739101334; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 6/8] khugepaged: Allow to collapse PTE-mapped compound pages Date: Fri, 3 Apr 2020 14:29:26 +0300 Message-Id: <20200403112928.19742-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We can collapse PTE-mapped compound pages. We only need to avoid handling them more than once: lock/unlock page only once if it's present in the PMD range multiple times as it handled on compound level. The same goes for LRU isolation and putback. Signed-off-by: Kirill A. Shutemov --- mm/khugepaged.c | 103 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 68 insertions(+), 35 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1e7e6543ebca..49e56e4e30d1 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -515,23 +515,37 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -compound_nr(page)); unlock_page(page); putback_lru_page(page); } -static void release_pte_pages(pte_t *pte, pte_t *_pte) +static void release_pte_pages(pte_t *pte, pte_t *_pte, + struct list_head *compound_pagelist) { + struct page *page, *tmp; + while (--_pte >= pte) { pte_t pteval = *_pte; - if (!pte_none(pteval) && !is_zero_pfn(pte_pfn(pteval))) - release_pte_page(pte_page(pteval)); + + page = pte_page(pteval); + if (!pte_none(pteval) && !is_zero_pfn(pte_pfn(pteval)) && + !PageCompound(page)) + release_pte_page(page); + } + + list_for_each_entry_safe(page, tmp, compound_pagelist, lru) { + list_del(&page->lru); + release_pte_page(page); } } static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, - pte_t *pte) + pte_t *pte, + struct list_head *compound_pagelist) { struct page *page = NULL; pte_t *_pte; @@ -561,13 +575,21 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, goto out; } - /* TODO: teach khugepaged to collapse THP mapped with pte */ + VM_BUG_ON_PAGE(!PageAnon(page), page); + if (PageCompound(page)) { - result = SCAN_PAGE_COMPOUND; - goto out; - } + struct page *p; + page = compound_head(page); - VM_BUG_ON_PAGE(!PageAnon(page), page); + /* + * Check if we have dealt with the compound page + * already + */ + list_for_each_entry(p, compound_pagelist, lru) { + if (page == p) + goto next; + } + } /* * We can do it before isolate_lru_page because the @@ -597,19 +619,15 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out; } - if (pte_write(pteval)) { - writable = true; - } else { - if (PageSwapCache(page) && - !reuse_swap_page(page, NULL)) { - unlock_page(page); - result = SCAN_SWAP_CACHE_PAGE; - goto out; - } + if (!pte_write(pteval) && PageSwapCache(page) && + !reuse_swap_page(page, NULL)) { /* - * Page is not in the swap cache. It can be collapsed - * into a THP. + * Page is in the swap cache and cannot be re-used. + * It cannot be collapsed into a THP. */ + unlock_page(page); + result = SCAN_SWAP_CACHE_PAGE; + goto out; } /* @@ -621,16 +639,23 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + compound_nr(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); + if (PageCompound(page)) + list_add_tail(&page->lru, compound_pagelist); +next: /* There should be enough young pte to collapse the page */ if (pte_young(pteval) || page_is_young(page) || PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, address)) referenced++; + + if (pte_write(pteval)) + writable = true; } if (likely(writable)) { if (likely(referenced)) { @@ -644,7 +669,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, } out: - release_pte_pages(pte, _pte); + release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); return 0; @@ -653,13 +678,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, static void __collapse_huge_page_copy(pte_t *pte, struct page *page, struct vm_area_struct *vma, unsigned long address, - spinlock_t *ptl) + spinlock_t *ptl, + struct list_head *compound_pagelist) { + struct page *src_page, *tmp; pte_t *_pte; for (_pte = pte; _pte < pte + HPAGE_PMD_NR; _pte++, page++, address += PAGE_SIZE) { pte_t pteval = *_pte; - struct page *src_page; if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { clear_user_highpage(page, address); @@ -679,7 +705,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } else { src_page = pte_page(pteval); copy_user_highpage(page, src_page, address, vma); - release_pte_page(src_page); /* * ptl mostly unnecessary, but preempt has to * be disabled to update the per-cpu stats @@ -693,9 +718,18 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, pte_clear(vma->vm_mm, address, _pte); page_remove_rmap(src_page, false); spin_unlock(ptl); - free_page_and_swap_cache(src_page); + if (!PageCompound(src_page)) { + release_pte_page(src_page); + free_page_and_swap_cache(src_page); + } } } + + list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) { + list_del(&src_page->lru); + release_pte_page(src_page); + free_page_and_swap_cache(src_page); + } } static void khugepaged_alloc_sleep(void) @@ -960,6 +994,7 @@ static void collapse_huge_page(struct mm_struct *mm, struct page **hpage, int node, int referenced) { + LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; pte_t *pte; pgtable_t pgtable; @@ -1059,7 +1094,8 @@ static void collapse_huge_page(struct mm_struct *mm, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - isolated = __collapse_huge_page_isolate(vma, address, pte); + isolated = __collapse_huge_page_isolate(vma, address, pte, + &compound_pagelist); spin_unlock(pte_ptl); if (unlikely(!isolated)) { @@ -1084,7 +1120,8 @@ static void collapse_huge_page(struct mm_struct *mm, */ anon_vma_unlock_write(vma->anon_vma); - __collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl); + __collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl, + &compound_pagelist); pte_unmap(pte); __SetPageUptodate(new_page); pgtable = pmd_pgtable(_pmd); @@ -1181,11 +1218,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } - /* TODO: teach khugepaged to collapse THP mapped with pte */ - if (PageCompound(page)) { - result = SCAN_PAGE_COMPOUND; - goto out_unmap; - } + page = compound_head(page); /* * Record which node the original page is from and save this From patchwork Fri Apr 3 11:29:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A47B1874 for ; Fri, 3 Apr 2020 11:29:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AFEFE2137B for ; Fri, 3 Apr 2020 11:29:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="qjmJhVAY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AFEFE2137B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6FFA18E0007; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5E9BC8E0011; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 305568E0007; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 0FCDC8E000F for ; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D080F3A82 for ; Fri, 3 Apr 2020 11:29:39 +0000 (UTC) X-FDA: 76666323678.09.pain59_3735a4143223b X-Spam-Summary: 2,0,0,c38fbf08645af519,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:1:2:41:69:355:379:541:960:966:968:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2196:2198:2199:2200:2380:2393:2559:2562:2731:2901:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4052:4250:4321:4385:4605:5007:6119:6261:6653:7903:8660:8957:9592:10004:11026:11232:11473:11658:11914:12043:12296:12297:12438:12517:12519:12555:12683:12895:12986:13148:13161:13184:13229:13230:13894:14096:14394:21080:21444:21451:21627:21740:21990:30003:30012:30054:30062:30070,0,RBL:209.85.167.68:@shutemov.name:.lbl8.mailshell.net-62.8.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: pain59_3735a4143223b X-Filterd-Recvd-Size: 12892 Received: from mail-lf1-f68.google.com (mail-lf1-f68.google.com [209.85.167.68]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:39 +0000 (UTC) Received: by mail-lf1-f68.google.com with SMTP id q5so5465743lfb.13 for ; Fri, 03 Apr 2020 04:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mtS4Geez2sFCEW3oJlTh2mlVokQ0FbRdHCxHRcjoHGw=; b=qjmJhVAYaNevh/KoKqqUbDM6aVQzeeWupIqCZbKXbykFicvE/Juy6ATajO66ijWkKt FidtBGF3YD08e4HmP67dv46eKs2T6pIq1tSYdYfkfSrxjTlWdUEQzdCXcIEPpDavGBRH NG4TwLwfIFOiWviA900BG/M2MNcP3cGgOzdgZz+NCNRVqXnHeR/M+6au4KK9y8lIkIK2 ZRmIb1H5HuuwluIK/+YCL0S4qU6ufId2YSZLDXRS0qnSiZPR3kH0DwWZ7U0ydEeWFXdo B+3BpibGYlssKLCEh5g+fAwDixCggILH22sXduJHPYYnb3B1RQZvwSJ/fnE8PtCQtZ55 RxDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mtS4Geez2sFCEW3oJlTh2mlVokQ0FbRdHCxHRcjoHGw=; b=LYMmWNq+VhKyeeJ2nUyigbobN4DwShbfVFr5h1K+LDumJUANvz4lLWJMuJGZgSE5uS HL0vRq1vaohyyIsx5DShxDWf6vjQrTdV4NFvqY11pGoLY4Mg1RfQ3DFJoUZl53lCIfGr vpcVsZM8JY4lKlyiYWiTPWNiKLcIZWaogZXYx3OdpcZgjc9skAcQ2shGQBOQqdU6flv5 LE7eMyESBi8hLRcsv9kihsf/OgFJJwZG4v8a3kTR4DtEdAMF5dq9vFkJFvGsoU13I4aM CGcU/jEgZwDZlma5kiwdLB3kEgaDRWxHeNLkgCuCFhz2g2TzU0i8OcRSBG5Nf8z/bILG A3zg== X-Gm-Message-State: AGi0PuYNlnpUxzJuQ9zdEOYwwoB29oM/geMnu4CsQEch1NSXEP9uXZKt rt05eaVCO4KkmcGVmdAJzg47/xZpUFk= X-Google-Smtp-Source: APiQypKwxMectHb/SBKCLufM7i4lDJSFMOwuYAz7ClmIHICGH+b+NTVHCDj/iXh9NHEVHVXZ+d3/uw== X-Received: by 2002:a19:ee0e:: with SMTP id g14mr5114358lfb.9.1585913377161; Fri, 03 Apr 2020 04:29:37 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id t13sm5692578lfc.68.2020.04.03.04.29.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:36 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 8D492101335; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 7/8] thp: Change CoW semantics for anon-THP Date: Fri, 3 Apr 2020 14:29:27 +0300 Message-Id: <20200403112928.19742-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently we have different copy-on-write semantics for anon- and file-THP. For anon-THP we try to allocate huge page on the write fault, but on file-THP we split PMD and allocate 4k page. Arguably, file-THP semantics is more desirable: we don't necessary want to unshare full PMD range from the parent on the first access. This is the primary reason THP is unusable for some workloads, like Redis. The original THP refcounting didn't allow to have PTE-mapped compound pages, so we had no options, but to allocate huge page on CoW (with fallback to 512 4k pages). The current refcounting doesn't have such limitations and we can cut a lot of complex code out of fault path. khugepaged is now able to recover THP from such ranges if the configuration allows. Signed-off-by: Kirill A. Shutemov --- mm/huge_memory.c | 249 +++++------------------------------------------ 1 file changed, 25 insertions(+), 224 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 24ad53b4dfc0..25b84cc0f17d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1206,262 +1206,63 @@ void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd) spin_unlock(vmf->ptl); } -static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, - pmd_t orig_pmd, struct page *page) -{ - struct vm_area_struct *vma = vmf->vma; - unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - struct mem_cgroup *memcg; - pgtable_t pgtable; - pmd_t _pmd; - int i; - vm_fault_t ret = 0; - struct page **pages; - struct mmu_notifier_range range; - - pages = kmalloc_array(HPAGE_PMD_NR, sizeof(struct page *), - GFP_KERNEL); - if (unlikely(!pages)) { - ret |= VM_FAULT_OOM; - goto out; - } - - for (i = 0; i < HPAGE_PMD_NR; i++) { - pages[i] = alloc_page_vma_node(GFP_HIGHUSER_MOVABLE, vma, - vmf->address, page_to_nid(page)); - if (unlikely(!pages[i] || - mem_cgroup_try_charge_delay(pages[i], vma->vm_mm, - GFP_KERNEL, &memcg, false))) { - if (pages[i]) - put_page(pages[i]); - while (--i >= 0) { - memcg = (void *)page_private(pages[i]); - set_page_private(pages[i], 0); - mem_cgroup_cancel_charge(pages[i], memcg, - false); - put_page(pages[i]); - } - kfree(pages); - ret |= VM_FAULT_OOM; - goto out; - } - set_page_private(pages[i], (unsigned long)memcg); - } - - for (i = 0; i < HPAGE_PMD_NR; i++) { - copy_user_highpage(pages[i], page + i, - haddr + PAGE_SIZE * i, vma); - __SetPageUptodate(pages[i]); - cond_resched(); - } - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - haddr, haddr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - - vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) - goto out_free_pages; - VM_BUG_ON_PAGE(!PageHead(page), page); - - /* - * Leave pmd empty until pte is filled note we must notify here as - * concurrent CPU thread might write to new page before the call to - * mmu_notifier_invalidate_range_end() happens which can lead to a - * device seeing memory write in different order than CPU. - * - * See Documentation/vm/mmu_notifier.rst - */ - pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); - - pgtable = pgtable_trans_huge_withdraw(vma->vm_mm, vmf->pmd); - pmd_populate(vma->vm_mm, &_pmd, pgtable); - - for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { - pte_t entry; - entry = mk_pte(pages[i], vma->vm_page_prot); - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - memcg = (void *)page_private(pages[i]); - set_page_private(pages[i], 0); - page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false); - mem_cgroup_commit_charge(pages[i], memcg, false, false); - lru_cache_add_active_or_unevictable(pages[i], vma); - vmf->pte = pte_offset_map(&_pmd, haddr); - VM_BUG_ON(!pte_none(*vmf->pte)); - set_pte_at(vma->vm_mm, haddr, vmf->pte, entry); - pte_unmap(vmf->pte); - } - kfree(pages); - - smp_wmb(); /* make pte visible before pmd */ - pmd_populate(vma->vm_mm, vmf->pmd, pgtable); - page_remove_rmap(page, true); - spin_unlock(vmf->ptl); - - /* - * No need to double call mmu_notifier->invalidate_range() callback as - * the above pmdp_huge_clear_flush_notify() did already call it. - */ - mmu_notifier_invalidate_range_only_end(&range); - - ret |= VM_FAULT_WRITE; - put_page(page); - -out: - return ret; - -out_free_pages: - spin_unlock(vmf->ptl); - mmu_notifier_invalidate_range_end(&range); - for (i = 0; i < HPAGE_PMD_NR; i++) { - memcg = (void *)page_private(pages[i]); - set_page_private(pages[i], 0); - mem_cgroup_cancel_charge(pages[i], memcg, false); - put_page(pages[i]); - } - kfree(pages); - goto out; -} - vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) { struct vm_area_struct *vma = vmf->vma; - struct page *page = NULL, *new_page; - struct mem_cgroup *memcg; + struct page *page; unsigned long haddr = vmf->address & HPAGE_PMD_MASK; - struct mmu_notifier_range range; - gfp_t huge_gfp; /* for allocation and charge */ - vm_fault_t ret = 0; vmf->ptl = pmd_lockptr(vma->vm_mm, vmf->pmd); VM_BUG_ON_VMA(!vma->anon_vma, vma); + if (is_huge_zero_pmd(orig_pmd)) - goto alloc; + goto fallback; + spin_lock(vmf->ptl); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) - goto out_unlock; + + if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); + return 0; + } page = pmd_page(orig_pmd); VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page); - /* - * We can only reuse the page if nobody else maps the huge page or it's - * part. - */ + + /* Lock page for reuse_swap_page() */ if (!trylock_page(page)) { get_page(page); spin_unlock(vmf->ptl); lock_page(page); spin_lock(vmf->ptl); if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { + spin_unlock(vmf->ptl); unlock_page(page); put_page(page); - goto out_unlock; + return 0; } put_page(page); } + + /* + * We can only reuse the page if nobody else maps the huge page or it's + * part. + */ if (reuse_swap_page(page, NULL)) { pmd_t entry; entry = pmd_mkyoung(orig_pmd); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) + if (pmdp_set_access_flags(vma, haddr, vmf->pmd, entry, 1)) update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - ret |= VM_FAULT_WRITE; unlock_page(page); - goto out_unlock; - } - unlock_page(page); - get_page(page); - spin_unlock(vmf->ptl); -alloc: - if (__transparent_hugepage_enabled(vma) && - !transparent_hugepage_debug_cow()) { - huge_gfp = alloc_hugepage_direct_gfpmask(vma); - new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PMD_ORDER); - } else - new_page = NULL; - - if (likely(new_page)) { - prep_transhuge_page(new_page); - } else { - if (!page) { - split_huge_pmd(vma, vmf->pmd, vmf->address); - ret |= VM_FAULT_FALLBACK; - } else { - ret = do_huge_pmd_wp_page_fallback(vmf, orig_pmd, page); - if (ret & VM_FAULT_OOM) { - split_huge_pmd(vma, vmf->pmd, vmf->address); - ret |= VM_FAULT_FALLBACK; - } - put_page(page); - } - count_vm_event(THP_FAULT_FALLBACK); - goto out; - } - - if (unlikely(mem_cgroup_try_charge_delay(new_page, vma->vm_mm, - huge_gfp, &memcg, true))) { - put_page(new_page); - split_huge_pmd(vma, vmf->pmd, vmf->address); - if (page) - put_page(page); - ret |= VM_FAULT_FALLBACK; - count_vm_event(THP_FAULT_FALLBACK); - goto out; - } - - count_vm_event(THP_FAULT_ALLOC); - count_memcg_events(memcg, THP_FAULT_ALLOC, 1); - - if (!page) - clear_huge_page(new_page, vmf->address, HPAGE_PMD_NR); - else - copy_user_huge_page(new_page, page, vmf->address, - vma, HPAGE_PMD_NR); - __SetPageUptodate(new_page); - - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - haddr, haddr + HPAGE_PMD_SIZE); - mmu_notifier_invalidate_range_start(&range); - - spin_lock(vmf->ptl); - if (page) - put_page(page); - if (unlikely(!pmd_same(*vmf->pmd, orig_pmd))) { spin_unlock(vmf->ptl); - mem_cgroup_cancel_charge(new_page, memcg, true); - put_page(new_page); - goto out_mn; - } else { - pmd_t entry; - entry = mk_huge_pmd(new_page, vma->vm_page_prot); - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); - page_add_new_anon_rmap(new_page, vma, haddr, true); - mem_cgroup_commit_charge(new_page, memcg, false, true); - lru_cache_add_active_or_unevictable(new_page, vma); - set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); - update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); - if (!page) { - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); - } else { - VM_BUG_ON_PAGE(!PageHead(page), page); - page_remove_rmap(page, true); - put_page(page); - } - ret |= VM_FAULT_WRITE; + return VM_FAULT_WRITE; } + + unlock_page(page); spin_unlock(vmf->ptl); -out_mn: - /* - * No need to double call mmu_notifier->invalidate_range() callback as - * the above pmdp_huge_clear_flush_notify() did already call it. - */ - mmu_notifier_invalidate_range_only_end(&range); -out: - return ret; -out_unlock: - spin_unlock(vmf->ptl); - return ret; +fallback: + __split_huge_pmd(vma, vmf->pmd, vmf->address, false, NULL); + return VM_FAULT_FALLBACK; } /* From patchwork Fri Apr 3 11:29:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A . Shutemov" X-Patchwork-Id: 11472569 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A137214DD for ; Fri, 3 Apr 2020 11:29:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4DD9A20857 for ; Fri, 3 Apr 2020 11:29:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="JRY+NjEM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DD9A20857 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 489F68E0010; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 41E488E000F; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 243EB8E0010; Fri, 3 Apr 2020 07:29:40 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id 014268E0007 for ; Fri, 3 Apr 2020 07:29:39 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AFDA82C06 for ; Fri, 3 Apr 2020 11:29:39 +0000 (UTC) X-FDA: 76666323678.14.ocean20_37324405c5708 X-Spam-Summary: 2,0,0,6396f3d22b6c955f,d41d8cd98f00b204,kirill@shutemov.name,,RULES_HIT:1:41:355:379:541:960:973:988:989:1260:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1801:2393:2553:2559:2562:2636:2901:3138:3139:3140:3141:3142:3608:3865:3866:3867:3868:3870:4250:4321:4605:5007:6117:6119:6261:6653:8957:10004:11026:11232:11473:11657:11658:11914:12043:12291:12296:12297:12438:12517:12519:12555:12895:13146:13230:13894:14093:14096:14394:21080:21222:21324:21444:21451:21524:21627:21740:21990:30003:30034:30054:30056:30070:30090,0,RBL:209.85.208.193:@shutemov.name:.lbl8.mailshell.net-66.201.201.201 62.8.0.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: ocean20_37324405c5708 X-Filterd-Recvd-Size: 13699 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Apr 2020 11:29:38 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id p14so6545154lji.11 for ; Fri, 03 Apr 2020 04:29:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MWvs7LRQFtpQN6wtJ53eEupGMg5RXsR7+qnD2hmDJ/Y=; b=JRY+NjEMWSmAKcb67EueFoz6597PPXsHwuW3bUuk6V8R+G4XT5awnTz1O2uK3ZDk4b OV586sizSXysoJ31SIY3nUokp6iun+B4Q2ulkE6fz2kWF2vYXDKH5pENQICDMgIPkKzd 7JEWXna+QkB3MB4Ct6jaSosyHqp8vpvJqjGpc98G4IK1xV1Wa1sNjwEiamler/SMji8F LaVTeHGHNtjjp4U1iiRao5RIRgw7DAJmUB8xjS3Mkx65+gdqrmUvbo8iszE6MYx3pxQB WrcPDtK3x0GGINvTwsnfrNyhkCgwqI/ZW4NefJNS3ntHvGmQk9xP0dBT+1s1Mb8H/Tcl AEUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MWvs7LRQFtpQN6wtJ53eEupGMg5RXsR7+qnD2hmDJ/Y=; b=FKRR0CTO1roMgPBaLiH47LMZnqobuFYgmjuS1SmU+3JvhNY+wEjE2br7S+Q+/rr2nE jnP86lE/ch1E+1wuQ73hIkzC4d/ztk4ireW1yIG+gdyJQBViSuc37T/HIvComnqeDR3V x2a56GcLlOGTFMuQ5C0bOOWZAjCgvhZg6v8pe8lt+aWeuS+q8POYDXzmL9aTUAjwsfiC +AAx2znASRUzUMCT49KDCUQWLBv5E/59iOuCRNweM2Weczsr4PD283gTG1gCTaAkxJeJ Mlfq6TttJ9pTHpD4rHKZFEToYMiybwi8xPQVKOQNRpj5/5rPr7kPfxHL6xBqxt7w9aXS JeFA== X-Gm-Message-State: AGi0PuY1O1LSKIJSxmHX7fQiS7P0ogzw9+2V/UfoT7NlpMDy7//qjmfG zodABsWrSyv4euq5KyXcNDYlZIFcbjg= X-Google-Smtp-Source: APiQypJKIv/dvcrdrgxeho4pE7sb0W9qlBvfZ7OumleZ6N8Vqq3pbY8l7ENAuXi/1z6z/g7m0RpcFA== X-Received: by 2002:a2e:9acd:: with SMTP id p13mr4353068ljj.131.1585913376700; Fri, 03 Apr 2020 04:29:36 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id h14sm3636697lfm.60.2020.04.03.04.29.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Apr 2020 04:29:36 -0700 (PDT) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id 95534101336; Fri, 3 Apr 2020 14:29:31 +0300 (+03) To: akpm@linux-foundation.org, Andrea Arcangeli Cc: Zi Yan , Yang Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 8/8] khugepaged: Introduce 'max_ptes_shared' tunable Date: Fri, 3 Apr 2020 14:29:28 +0300 Message-Id: <20200403112928.19742-9-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> References: <20200403112928.19742-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: ``max_ptes_shared`` speicies how many pages can be shared across multiple processes. Exeeding the number woul block the collapse:: /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared A higher value may increase memory footprint for some workloads. By default, at least half of pages has to be not shared. Signed-off-by: Kirill A. Shutemov --- Documentation/admin-guide/mm/transhuge.rst | 7 ++ include/trace/events/huge_memory.h | 3 +- mm/khugepaged.c | 52 ++++++++++++-- tools/testing/selftests/vm/khugepaged.c | 83 ++++++++++++++++++++++ 4 files changed, 140 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index bd5714547cee..d16e4f2bb70f 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being collapsed, resulting fewer pages being collapsed into THPs, and lower memory access performance. +``max_ptes_shared`` speicies how many pages can be shared across multiple +processes. Exeeding the number woul block the collapse:: + + /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared + +A higher value may increase memory footprint for some workloads. + Boot parameter ============== diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d82a0f4e824d..53532f5925c3 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -12,6 +12,8 @@ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ + EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ + EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \ EM( SCAN_PAGE_RO, "no_writable_page") \ EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \ @@ -30,7 +32,6 @@ EM( SCAN_DEL_PAGE_LRU, "could_not_delete_page_from_lru")\ EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ - EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_TRUNCATED, "truncated") \ EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 49e56e4e30d1..bfb6155f1d69 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,8 @@ enum scan_result { SCAN_SUCCEED, SCAN_PMD_NULL, SCAN_EXCEED_NONE_PTE, + SCAN_EXCEED_SWAP_PTE, + SCAN_EXCEED_SHARED_PTE, SCAN_PTE_NON_PRESENT, SCAN_PAGE_RO, SCAN_LACK_REFERENCED_PAGE, @@ -46,7 +48,6 @@ enum scan_result { SCAN_DEL_PAGE_LRU, SCAN_ALLOC_HUGE_PAGE_FAIL, SCAN_CGROUP_CHARGE_FAIL, - SCAN_EXCEED_SWAP_PTE, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, }; @@ -71,6 +72,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); */ static unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; +static unsigned int khugepaged_max_ptes_shared __read_mostly; #define MM_SLOTS_HASH_BITS 10 static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); @@ -290,15 +292,43 @@ static struct kobj_attribute khugepaged_max_ptes_swap_attr = __ATTR(max_ptes_swap, 0644, khugepaged_max_ptes_swap_show, khugepaged_max_ptes_swap_store); +static ssize_t khugepaged_max_ptes_shared_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sprintf(buf, "%u\n", khugepaged_max_ptes_shared); +} + +static ssize_t khugepaged_max_ptes_shared_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err; + unsigned long max_ptes_shared; + + err = kstrtoul(buf, 10, &max_ptes_shared); + if (err || max_ptes_shared > HPAGE_PMD_NR-1) + return -EINVAL; + + khugepaged_max_ptes_shared = max_ptes_shared; + + return count; +} + +static struct kobj_attribute khugepaged_max_ptes_shared_attr = + __ATTR(max_ptes_shared, 0644, khugepaged_max_ptes_shared_show, + khugepaged_max_ptes_shared_store); + static struct attribute *khugepaged_attr[] = { &khugepaged_defrag_attr.attr, &khugepaged_max_ptes_none_attr.attr, + &khugepaged_max_ptes_swap_attr.attr, + &khugepaged_max_ptes_shared_attr.attr, &pages_to_scan_attr.attr, &pages_collapsed_attr.attr, &full_scans_attr.attr, &scan_sleep_millisecs_attr.attr, &alloc_sleep_millisecs_attr.attr, - &khugepaged_max_ptes_swap_attr.attr, NULL, }; @@ -360,6 +390,7 @@ int __init khugepaged_init(void) khugepaged_pages_to_scan = HPAGE_PMD_NR * 8; khugepaged_max_ptes_none = HPAGE_PMD_NR - 1; khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8; + khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2; return 0; } @@ -549,7 +580,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, { struct page *page = NULL; pte_t *_pte; - int none_or_zero = 0, result = 0, referenced = 0; + int none_or_zero = 0, shared = 0, result = 0, referenced = 0; bool writable = false; for (_pte = pte; _pte < pte+HPAGE_PMD_NR; @@ -577,6 +608,12 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); + if (page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { + result = SCAN_EXCEED_SHARED_PTE; + goto out; + } + if (PageCompound(page)) { struct page *p; page = compound_head(page); @@ -1168,7 +1205,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, none_or_zero = 0, result = 0, referenced = 0; + int ret = 0, result = 0, referenced = 0; + int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; spinlock_t *ptl; @@ -1218,6 +1256,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out_unmap; } + if (page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { + result = SCAN_EXCEED_SHARED_PTE; + goto out_unmap; + } + page = compound_head(page); /* diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 3eeff4a0fbc9..9ae119234a39 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -77,6 +77,7 @@ struct khugepaged_settings { unsigned int scan_sleep_millisecs; unsigned int max_ptes_none; unsigned int max_ptes_swap; + unsigned int max_ptes_shared; unsigned long pages_to_scan; }; @@ -276,6 +277,7 @@ static void write_settings(struct settings *settings) khugepaged->scan_sleep_millisecs); write_num("khugepaged/max_ptes_none", khugepaged->max_ptes_none); write_num("khugepaged/max_ptes_swap", khugepaged->max_ptes_swap); + write_num("khugepaged/max_ptes_shared", khugepaged->max_ptes_shared); write_num("khugepaged/pages_to_scan", khugepaged->pages_to_scan); } @@ -312,6 +314,7 @@ static void save_settings(void) read_num("khugepaged/scan_sleep_millisecs"), .max_ptes_none = read_num("khugepaged/max_ptes_none"), .max_ptes_swap = read_num("khugepaged/max_ptes_swap"), + .max_ptes_shared = read_num("khugepaged/max_ptes_shared"), .pages_to_scan = read_num("khugepaged/pages_to_scan"), }; success("OK"); @@ -843,12 +846,90 @@ static void collapse_fork_compound(void) fail("Fail"); fill_memory(p, 0, page_size); + write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) fail("Timeout"); else if (check_huge(p)) success("OK"); else fail("Fail"); + write_num("khugepaged/max_ptes_shared", + default_settings.khugepaged.max_ptes_shared); + + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); + exit(exit_status); + } + + wait(&wstatus); + exit_status += WEXITSTATUS(wstatus); + + printf("Check if parent still has huge page..."); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + +static void collapse_max_ptes_shared() +{ + int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); + int wstatus; + void *p; + + p = alloc_mapping(); + + printf("Allocate huge page..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + fill_memory(p, 0, hpage_pmd_size); + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Share huge page over fork()..."); + if (!fork()) { + /* Do not touch settings on child exit */ + skip_settings_restore = true; + exit_status = 0; + + if (check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Trigger CoW in %d of %d...", + hpage_pmd_nr - max_ptes_shared - 1, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared - 1) * page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + if (wait_for_scan("Do not collapse with max_ptes_shared exeeded", p)) + fail("Timeout"); + else if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + printf("Trigger CoW in %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + + if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) + fail("Timeout"); + else if (check_huge(p)) + success("OK"); + else + fail("Fail"); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -877,6 +958,7 @@ int main(void) default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; + default_settings.khugepaged.max_ptes_shared = hpage_pmd_nr / 2; default_settings.khugepaged.pages_to_scan = hpage_pmd_nr * 8; save_settings(); @@ -894,6 +976,7 @@ int main(void) collapse_compound_extreme(); collapse_fork(); collapse_fork_compound(); + collapse_max_ptes_shared(); restore_settings(0); }