From patchwork Fri Feb 15 22:08:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815947 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 81E65139A for ; Fri, 15 Feb 2019 22:09:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F99E2E9B6 for ; Fri, 15 Feb 2019 22:09:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 635342E9E6; Fri, 15 Feb 2019 22:09:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 919182E9B6 for ; Fri, 15 Feb 2019 22:09:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E11688E0005; Fri, 15 Feb 2019 17:09:07 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DC2FA8E0004; Fri, 15 Feb 2019 17:09:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC8008E0005; Fri, 15 Feb 2019 17:09:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 8C4F28E0001 for ; Fri, 15 Feb 2019 17:09:07 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id m37so10331052qte.10 for ; Fri, 15 Feb 2019 14:09:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=RN4jTJoPdWMgq6GZ1y2ohLIXcnShrFSTPPlAuRog5E4=; b=r9hyjQ+n10IHuo1iZ/TmE5NcDB7YGzTsA0wiLRk5xvLLpV51SocgmDeU74RxAzWHoP LdqgC5sSyTChaPkzYyweIl9J0EDUhfuJBgDsKsdHaHQZ8addYKTtQQuBOhQPUinPk/JC NhNO0+D5uyQHSELU6eNiEP3NuILwrgnYfNWXqND499hMVNF6xdZazaudAeiHxqK0Vmj2 tUsJqa8YgGXJF75+W+Jw5gSRYzBqvgo6J9BRGNciVv1ey3nM89L9AsZw0Jubk9s3GwKM eqrnPpBR4w1oaZLzh9uEJfCR620+etu+dIwlQigGcQwtFeZ1vbkT5xKFwRdxHMJRU8tl TIhQ== X-Gm-Message-State: AHQUAuY3skzEbMWnSi28KnMFXdUHUyPIqgB4U1AnPCmDHkWlWc5kBTwP hh4NQsPNwTRxcWw6Rorny6eKFfxJARkG5ud8+hwEfeAgYusVwkXFyC+OwVOZIvGdHob7WbbGj3z tvbA7BIAXPV+uutBsbywK4tKpABB+RiPEnE3V5eEFTMyEOncWL7XLNZ0pGGp2rYsXBg== X-Received: by 2002:a0c:985d:: with SMTP id e29mr9044960qvd.16.1550268547266; Fri, 15 Feb 2019 14:09:07 -0800 (PST) X-Google-Smtp-Source: AHgI3IYVDrPYNz82Ldk5oaWBAEuPDZBhHupKJRBWiVIK+Ha182KGIm6Vpzn7DIWhq1Ncif8fIN4y X-Received: by 2002:a0c:985d:: with SMTP id e29mr9044873qvd.16.1550268545503; Fri, 15 Feb 2019 14:09:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268545; cv=none; d=google.com; s=arc-20160816; b=av4cKh2gEA3b0VJl20s7yM0DmLm5A8sSXNKgTR5dgKkG5wr9UiOtl9xXzlKa7Cas9U rmrTrfKgMTSNQ6iBJT1g+j+nTLu+IXs+9tqcwgTvuboLGT4cr8J3e4VS7AhDxha5M2uy +be1/V5TmjPzu1A8IkLkvU71iDH2InerlTAX+QWzhID8eP0BT6H2wjB2lGjVR+wVSGtB vZCPh5AJLIpXZTUbsO9yek4joDwqfAG/fTGKaQohPQrqEWrlurdAAXj/0s3pOuAKVE7Q 3aYuHU7F6SEafVnZoPBwNbeTO4VSFvZ9dwhtUJ69NtAaAE20+steuKauQlv1ydyJB/Nt sXAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=RN4jTJoPdWMgq6GZ1y2ohLIXcnShrFSTPPlAuRog5E4=; b=FQCltLn4b/MgXhsmOEBVy3mOW/zarsebcWELimoelQKt/vMOQAGlK03dPip+FTxH94 dXb0nNojT/o0MS+FA2b9gPnVzVW9rvsU6H3T7zFnBYaYC0DqsF8c3ZAbQX+JUJI8foQI OSybjUyA8d/8+MGFdFLXmCrVBQ+FZJ67urt3u1Glbwl1OtLlj/S+2NL2816V3hOG79bG OOq15OBum0A3znCAK3hS+GcTiwMe5OMcNq9olK9U7j30bM03/5KNjlNuZmghHwvRT4nQ 8XqSiLSx07R7S3BnnlQ6u2YNY2BDKjWb8lk25j5PzQ3NzYyE3YJ2NM8DD9PPmJZ3UEHr tufg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=i7j2CAoB; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Z5sYyg6S; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id n17si601965qvh.69.2019.02.15.14.09.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:05 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=i7j2CAoB; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Z5sYyg6S; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id ABCED31CA; Fri, 15 Feb 2019 17:09:03 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=RN4jTJoPdWMgq 6GZ1y2ohLIXcnShrFSTPPlAuRog5E4=; b=i7j2CAoBCFlpOq2n3HkJbbBwl+yJJ VIyt6w06xzOF9KNk6hHifjoxjTmYvtktTMvc8hSkkXt/M0gYCatxr+g+qksyn0AR wCSVYqnXN6Be1XubK2GiKV6aZ5C2VYYg/8J0zG/NOpkpAqxLuz+yhTAZaGAKx0Gb N81KHxxDRNw9hJwMl/sZnCFTlrzVuiuZ1BigVyCm8oc+TEQJMPvICYKdm7+ztSP2 7Cd5o2wXjDtxNQYddDZOon1wLG8nr5Pqh8T2FAOSk9pV5Bs9k4Wz0WovFjRpDk4n NpDU9CsHN5G3sWAjr0nrg5T89x4AcuBz9lj7YtS/OZ718HaDfvWxcfn8Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=RN4jTJoPdWMgq6GZ1y2ohLIXcnShrFSTPPlAuRog5E4=; b=Z5sYyg6S Z/adO9X+JvksOM354CK/MOU1k2BCim0rr83fFmesipk8FGvpZUBPGW7/nDokBrXE fX7TtFSXhRdhH5XEHUgGW9K5Lh7GcNQxz3/6I16+m9UlZtCOzCPAdpX262T8+TT3 XBgrtr9pE8UMRq3PvB/6KigEUfZTck3LnoDBjyKnc+9HZzR4suZaW+n66r585Wic hEsONqbDIH+M8e8DdTZdJjYioTPkQH5V6ZJYsxppLsL5FU6ldJJrxMAOAcw1i7BF MC4xHQasSo7PsSihxXWd0rdfwnQOGzbMQAY+DYEi5nzZlg3x3xeB6t0TS3NOg1k6 GsEgwE9Hksb0XQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedt X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 70871E4511; Fri, 15 Feb 2019 17:09:01 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 01/31] mm: migrate: Add exchange_pages to exchange two lists of pages. Date: Fri, 15 Feb 2019 14:08:26 -0800 Message-Id: <20190215220856.29749-2-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan In stead of using two migrate_pages(), a single exchange_pages() would be sufficient and without allocating new pages. Signed-off-by: Zi Yan --- include/linux/ksm.h | 5 + mm/Makefile | 1 + mm/exchange.c | 846 ++++++++++++++++++++++++++++++++++++++++++++ mm/internal.h | 6 + mm/ksm.c | 35 ++ mm/migrate.c | 4 +- 6 files changed, 895 insertions(+), 2 deletions(-) create mode 100644 mm/exchange.c diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 161e8164abcf..87c5b943a73c 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -53,6 +53,7 @@ struct page *ksm_might_need_to_copy(struct page *page, void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc); void ksm_migrate_page(struct page *newpage, struct page *oldpage); +void ksm_exchange_page(struct page *to_page, struct page *from_page); #else /* !CONFIG_KSM */ @@ -86,6 +87,10 @@ static inline void rmap_walk_ksm(struct page *page, static inline void ksm_migrate_page(struct page *newpage, struct page *oldpage) { } +static inline void ksm_exchange_page(struct page *to_page, + struct page *from_page) +{ +} #endif /* CONFIG_MMU */ #endif /* !CONFIG_KSM */ diff --git a/mm/Makefile b/mm/Makefile index d210cc9d6f80..1574ea5743e4 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -43,6 +43,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ obj-y += init-mm.o obj-y += memblock.o +obj-y += exchange.o ifdef CONFIG_MMU obj-$(CONFIG_ADVISE_SYSCALLS) += madvise.o diff --git a/mm/exchange.c b/mm/exchange.c new file mode 100644 index 000000000000..a607348cc6f4 --- /dev/null +++ b/mm/exchange.c @@ -0,0 +1,846 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2016 NVIDIA, Zi Yan + * + * Exchange two in-use pages. Page flags and page->mapping are exchanged + * as well. Only anonymous pages are supported. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include /* buffer_migrate_page */ +#include + + +#include "internal.h" + +struct exchange_page_info { + struct page *from_page; + struct page *to_page; + + struct anon_vma *from_anon_vma; + struct anon_vma *to_anon_vma; + + struct list_head list; +}; + +struct page_flags { + unsigned int page_error :1; + unsigned int page_referenced:1; + unsigned int page_uptodate:1; + unsigned int page_active:1; + unsigned int page_unevictable:1; + unsigned int page_checked:1; + unsigned int page_mappedtodisk:1; + unsigned int page_dirty:1; + unsigned int page_is_young:1; + unsigned int page_is_idle:1; + unsigned int page_swapcache:1; + unsigned int page_writeback:1; + unsigned int page_private:1; + unsigned int __pad:3; +}; + + +static void exchange_page(char *to, char *from) +{ + u64 tmp; + int i; + + for (i = 0; i < PAGE_SIZE; i += sizeof(tmp)) { + tmp = *((u64 *)(from + i)); + *((u64 *)(from + i)) = *((u64 *)(to + i)); + *((u64 *)(to + i)) = tmp; + } +} + +static inline void exchange_highpage(struct page *to, struct page *from) +{ + char *vfrom, *vto; + + vfrom = kmap_atomic(from); + vto = kmap_atomic(to); + exchange_page(vto, vfrom); + kunmap_atomic(vto); + kunmap_atomic(vfrom); +} + +static void __exchange_gigantic_page(struct page *dst, struct page *src, + int nr_pages) +{ + int i; + struct page *dst_base = dst; + struct page *src_base = src; + + for (i = 0; i < nr_pages; ) { + cond_resched(); + exchange_highpage(dst, src); + + i++; + dst = mem_map_next(dst, dst_base, i); + src = mem_map_next(src, src_base, i); + } +} + +static void exchange_huge_page(struct page *dst, struct page *src) +{ + int i; + int nr_pages; + + if (PageHuge(src)) { + /* hugetlbfs page */ + struct hstate *h = page_hstate(src); + + nr_pages = pages_per_huge_page(h); + + if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) { + __exchange_gigantic_page(dst, src, nr_pages); + return; + } + } else { + /* thp page */ + VM_BUG_ON(!PageTransHuge(src)); + nr_pages = hpage_nr_pages(src); + } + + for (i = 0; i < nr_pages; i++) { + cond_resched(); + exchange_highpage(dst + i, src + i); + } +} + +/* + * Copy the page to its new location without polluting cache + */ +static void exchange_page_flags(struct page *to_page, struct page *from_page) +{ + int from_cpupid, to_cpupid; + struct page_flags from_page_flags, to_page_flags; + struct mem_cgroup *to_memcg = page_memcg(to_page), + *from_memcg = page_memcg(from_page); + + from_cpupid = page_cpupid_xchg_last(from_page, -1); + + from_page_flags.page_error = TestClearPageError(from_page); + from_page_flags.page_referenced = TestClearPageReferenced(from_page); + from_page_flags.page_uptodate = PageUptodate(from_page); + ClearPageUptodate(from_page); + from_page_flags.page_active = TestClearPageActive(from_page); + from_page_flags.page_unevictable = TestClearPageUnevictable(from_page); + from_page_flags.page_checked = PageChecked(from_page); + ClearPageChecked(from_page); + from_page_flags.page_mappedtodisk = PageMappedToDisk(from_page); + ClearPageMappedToDisk(from_page); + from_page_flags.page_dirty = PageDirty(from_page); + ClearPageDirty(from_page); + from_page_flags.page_is_young = test_and_clear_page_young(from_page); + from_page_flags.page_is_idle = page_is_idle(from_page); + clear_page_idle(from_page); + from_page_flags.page_swapcache = PageSwapCache(from_page); + from_page_flags.page_writeback = test_clear_page_writeback(from_page); + + + to_cpupid = page_cpupid_xchg_last(to_page, -1); + + to_page_flags.page_error = TestClearPageError(to_page); + to_page_flags.page_referenced = TestClearPageReferenced(to_page); + to_page_flags.page_uptodate = PageUptodate(to_page); + ClearPageUptodate(to_page); + to_page_flags.page_active = TestClearPageActive(to_page); + to_page_flags.page_unevictable = TestClearPageUnevictable(to_page); + to_page_flags.page_checked = PageChecked(to_page); + ClearPageChecked(to_page); + to_page_flags.page_mappedtodisk = PageMappedToDisk(to_page); + ClearPageMappedToDisk(to_page); + to_page_flags.page_dirty = PageDirty(to_page); + ClearPageDirty(to_page); + to_page_flags.page_is_young = test_and_clear_page_young(to_page); + to_page_flags.page_is_idle = page_is_idle(to_page); + clear_page_idle(to_page); + to_page_flags.page_swapcache = PageSwapCache(to_page); + to_page_flags.page_writeback = test_clear_page_writeback(to_page); + + /* set to_page */ + if (from_page_flags.page_error) + SetPageError(to_page); + if (from_page_flags.page_referenced) + SetPageReferenced(to_page); + if (from_page_flags.page_uptodate) + SetPageUptodate(to_page); + if (from_page_flags.page_active) { + VM_BUG_ON_PAGE(from_page_flags.page_unevictable, from_page); + SetPageActive(to_page); + } else if (from_page_flags.page_unevictable) + SetPageUnevictable(to_page); + if (from_page_flags.page_checked) + SetPageChecked(to_page); + if (from_page_flags.page_mappedtodisk) + SetPageMappedToDisk(to_page); + + /* Move dirty on pages not done by migrate_page_move_mapping() */ + if (from_page_flags.page_dirty) + SetPageDirty(to_page); + + if (from_page_flags.page_is_young) + set_page_young(to_page); + if (from_page_flags.page_is_idle) + set_page_idle(to_page); + + /* set from_page */ + if (to_page_flags.page_error) + SetPageError(from_page); + if (to_page_flags.page_referenced) + SetPageReferenced(from_page); + if (to_page_flags.page_uptodate) + SetPageUptodate(from_page); + if (to_page_flags.page_active) { + VM_BUG_ON_PAGE(to_page_flags.page_unevictable, from_page); + SetPageActive(from_page); + } else if (to_page_flags.page_unevictable) + SetPageUnevictable(from_page); + if (to_page_flags.page_checked) + SetPageChecked(from_page); + if (to_page_flags.page_mappedtodisk) + SetPageMappedToDisk(from_page); + + /* Move dirty on pages not done by migrate_page_move_mapping() */ + if (to_page_flags.page_dirty) + SetPageDirty(from_page); + + if (to_page_flags.page_is_young) + set_page_young(from_page); + if (to_page_flags.page_is_idle) + set_page_idle(from_page); + + /* + * Copy NUMA information to the new page, to prevent over-eager + * future migrations of this same page. + */ + page_cpupid_xchg_last(to_page, from_cpupid); + page_cpupid_xchg_last(from_page, to_cpupid); + + ksm_exchange_page(to_page, from_page); + /* + * Please do not reorder this without considering how mm/ksm.c's + * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache(). + */ + ClearPageSwapCache(to_page); + ClearPageSwapCache(from_page); + if (from_page_flags.page_swapcache) + SetPageSwapCache(to_page); + if (to_page_flags.page_swapcache) + SetPageSwapCache(from_page); + + +#ifdef CONFIG_PAGE_OWNER + /* exchange page owner */ + BUILD_BUG(); +#endif + /* exchange mem cgroup */ + to_page->mem_cgroup = from_memcg; + from_page->mem_cgroup = to_memcg; + +} + +/* + * Replace the page in the mapping. + * + * The number of remaining references must be: + * 1 for anonymous pages without a mapping + * 2 for pages with a mapping + * 3 for pages with a mapping and PagePrivate/PagePrivate2 set. + */ + +static int exchange_page_move_mapping(struct address_space *to_mapping, + struct address_space *from_mapping, + struct page *to_page, struct page *from_page, + struct buffer_head *to_head, + struct buffer_head *from_head, + enum migrate_mode mode, + int to_extra_count, int from_extra_count) +{ + int to_expected_count = 1 + to_extra_count, + from_expected_count = 1 + from_extra_count; + unsigned long from_page_index = from_page->index; + unsigned long to_page_index = to_page->index; + int to_swapbacked = PageSwapBacked(to_page), + from_swapbacked = PageSwapBacked(from_page); + struct address_space *to_mapping_value = to_page->mapping; + struct address_space *from_mapping_value = from_page->mapping; + + VM_BUG_ON_PAGE(to_mapping != page_mapping(to_page), to_page); + VM_BUG_ON_PAGE(from_mapping != page_mapping(from_page), from_page); + + if (!to_mapping) { + /* Anonymous page without mapping */ + if (page_count(to_page) != to_expected_count) + return -EAGAIN; + } + + if (!from_mapping) { + /* Anonymous page without mapping */ + if (page_count(from_page) != from_expected_count) + return -EAGAIN; + } + + /* both are anonymous pages */ + if (!from_mapping && !to_mapping) { + /* from_page */ + from_page->index = to_page_index; + from_page->mapping = to_mapping_value; + + ClearPageSwapBacked(from_page); + if (to_swapbacked) + SetPageSwapBacked(from_page); + + + /* to_page */ + to_page->index = from_page_index; + to_page->mapping = from_mapping_value; + + ClearPageSwapBacked(to_page); + if (from_swapbacked) + SetPageSwapBacked(to_page); + } else if (!from_mapping && to_mapping) { + /* from is anonymous, to is file-backed */ + struct zone *from_zone, *to_zone; + void **to_pslot; + int dirty; + + from_zone = page_zone(from_page); + to_zone = page_zone(to_page); + + xa_lock_irq(&to_mapping->i_pages); + + to_pslot = radix_tree_lookup_slot(&to_mapping->i_pages, + page_index(to_page)); + + to_expected_count += 1 + page_has_private(to_page); + if (page_count(to_page) != to_expected_count || + radix_tree_deref_slot_protected(to_pslot, + &to_mapping->i_pages.xa_lock) != to_page) { + xa_unlock_irq(&to_mapping->i_pages); + return -EAGAIN; + } + + if (!page_ref_freeze(to_page, to_expected_count)) { + xa_unlock_irq(&to_mapping->i_pages); + pr_debug("cannot freeze page count\n"); + return -EAGAIN; + } + + if (mode == MIGRATE_ASYNC && to_head && + !buffer_migrate_lock_buffers(to_head, mode)) { + page_ref_unfreeze(to_page, to_expected_count); + xa_unlock_irq(&to_mapping->i_pages); + + pr_debug("cannot lock buffer head\n"); + return -EAGAIN; + } + + if (!page_ref_freeze(from_page, from_expected_count)) { + page_ref_unfreeze(to_page, to_expected_count); + xa_unlock_irq(&to_mapping->i_pages); + + return -EAGAIN; + } + /* + * Now we know that no one else is looking at the page: + * no turning back from here. + */ + ClearPageSwapBacked(from_page); + ClearPageSwapBacked(to_page); + + /* from_page */ + from_page->index = to_page_index; + from_page->mapping = to_mapping_value; + /* to_page */ + to_page->index = from_page_index; + to_page->mapping = from_mapping_value; + + if (to_swapbacked) + __SetPageSwapBacked(from_page); + else + VM_BUG_ON_PAGE(PageSwapCache(to_page), to_page); + + if (from_swapbacked) + __SetPageSwapBacked(to_page); + else + VM_BUG_ON_PAGE(PageSwapCache(from_page), from_page); + + dirty = PageDirty(to_page); + + radix_tree_replace_slot(&to_mapping->i_pages, + to_pslot, from_page); + + /* move cache reference */ + page_ref_unfreeze(to_page, to_expected_count - 1); + page_ref_unfreeze(from_page, from_expected_count + 1); + + xa_unlock(&to_mapping->i_pages); + + /* + * If moved to a different zone then also account + * the page for that zone. Other VM counters will be + * taken care of when we establish references to the + * new page and drop references to the old page. + * + * Note that anonymous pages are accounted for + * via NR_FILE_PAGES and NR_ANON_MAPPED if they + * are mapped to swap space. + */ + if (to_zone != from_zone) { + __dec_node_state(to_zone->zone_pgdat, NR_FILE_PAGES); + __inc_node_state(from_zone->zone_pgdat, NR_FILE_PAGES); + if (PageSwapBacked(to_page) && !PageSwapCache(to_page)) { + __dec_node_state(to_zone->zone_pgdat, NR_SHMEM); + __inc_node_state(from_zone->zone_pgdat, NR_SHMEM); + } + if (dirty && mapping_cap_account_dirty(to_mapping)) { + __dec_node_state(to_zone->zone_pgdat, NR_FILE_DIRTY); + __dec_zone_state(to_zone, NR_ZONE_WRITE_PENDING); + __inc_node_state(from_zone->zone_pgdat, NR_FILE_DIRTY); + __inc_zone_state(from_zone, NR_ZONE_WRITE_PENDING); + } + } + local_irq_enable(); + + } else { + /* from is file-backed to is anonymous: fold this to the case above */ + /* both are file-backed */ + VM_BUG_ON(1); + } + + return MIGRATEPAGE_SUCCESS; +} + +static int exchange_from_to_pages(struct page *to_page, struct page *from_page, + enum migrate_mode mode) +{ + int rc = -EBUSY; + struct address_space *to_page_mapping, *from_page_mapping; + struct buffer_head *to_head = NULL, *to_bh = NULL; + + VM_BUG_ON_PAGE(!PageLocked(from_page), from_page); + VM_BUG_ON_PAGE(!PageLocked(to_page), to_page); + + /* copy page->mapping not use page_mapping() */ + to_page_mapping = page_mapping(to_page); + from_page_mapping = page_mapping(from_page); + + /* from_page has to be anonymous page */ + VM_BUG_ON(from_page_mapping); + VM_BUG_ON(PageWriteback(from_page)); + /* writeback has to finish */ + BUG_ON(PageWriteback(to_page)); + + + /* to_page is anonymous */ + if (!to_page_mapping) { +exchange_mappings: + /* actual page mapping exchange */ + rc = exchange_page_move_mapping(to_page_mapping, from_page_mapping, + to_page, from_page, NULL, NULL, mode, 0, 0); + } else { + if (to_page_mapping->a_ops->migratepage == buffer_migrate_page) { + + if (!page_has_buffers(to_page)) + goto exchange_mappings; + + to_head = page_buffers(to_page); + + rc = exchange_page_move_mapping(to_page_mapping, + from_page_mapping, to_page, from_page, + to_head, NULL, mode, 0, 0); + + if (rc != MIGRATEPAGE_SUCCESS) + return rc; + + /* + * In the async case, migrate_page_move_mapping locked the buffers + * with an IRQ-safe spinlock held. In the sync case, the buffers + * need to be locked now + */ + if (mode != MIGRATE_ASYNC) + VM_BUG_ON(!buffer_migrate_lock_buffers(to_head, mode)); + + ClearPagePrivate(to_page); + set_page_private(from_page, page_private(to_page)); + set_page_private(to_page, 0); + /* transfer private page count */ + put_page(to_page); + get_page(from_page); + + to_bh = to_head; + do { + set_bh_page(to_bh, from_page, bh_offset(to_bh)); + to_bh = to_bh->b_this_page; + + } while (to_bh != to_head); + + SetPagePrivate(from_page); + + to_bh = to_head; + } else if (!to_page_mapping->a_ops->migratepage) { + /* fallback_migrate_page */ + if (PageDirty(to_page)) { + if (mode != MIGRATE_SYNC) + return -EBUSY; + return writeout(to_page_mapping, to_page); + } + if (page_has_private(to_page) && + !try_to_release_page(to_page, GFP_KERNEL)) + return -EAGAIN; + + goto exchange_mappings; + } + } + /* actual page data exchange */ + if (rc != MIGRATEPAGE_SUCCESS) + return rc; + + + if (PageHuge(from_page) || PageTransHuge(from_page)) + exchange_huge_page(to_page, from_page); + else + exchange_highpage(to_page, from_page); + rc = 0; + + /* + * 1. buffer_migrate_page: + * private flag should be transferred from to_page to from_page + * + * 2. anon<->anon, fallback_migrate_page: + * both have none private flags or to_page's is cleared. + */ + VM_BUG_ON(!((page_has_private(from_page) && !page_has_private(to_page)) || + (!page_has_private(from_page) && !page_has_private(to_page)))); + + exchange_page_flags(to_page, from_page); + + if (to_bh) { + VM_BUG_ON(to_bh != to_head); + do { + unlock_buffer(to_bh); + put_bh(to_bh); + to_bh = to_bh->b_this_page; + + } while (to_bh != to_head); + } + + return rc; +} + +static int unmap_and_exchange(struct page *from_page, + struct page *to_page, enum migrate_mode mode) +{ + int rc = -EAGAIN; + struct anon_vma *from_anon_vma = NULL; + struct anon_vma *to_anon_vma = NULL; + int from_page_was_mapped = 0; + int to_page_was_mapped = 0; + int from_page_count = 0, to_page_count = 0; + int from_map_count = 0, to_map_count = 0; + unsigned long from_flags, to_flags; + pgoff_t from_index, to_index; + struct address_space *from_mapping, *to_mapping; + + if (!trylock_page(from_page)) { + if (mode == MIGRATE_ASYNC) + goto out; + lock_page(from_page); + } + + if (!trylock_page(to_page)) { + if (mode == MIGRATE_ASYNC) + goto out_unlock; + lock_page(to_page); + } + + /* from_page is supposed to be an anonymous page */ + VM_BUG_ON_PAGE(PageWriteback(from_page), from_page); + + if (PageWriteback(to_page)) { + /* + * Only in the case of a full synchronous migration is it + * necessary to wait for PageWriteback. In the async case, + * the retry loop is too short and in the sync-light case, + * the overhead of stalling is too much + */ + if (mode != MIGRATE_SYNC) { + rc = -EBUSY; + goto out_unlock_both; + } + wait_on_page_writeback(to_page); + } + + if (PageAnon(from_page) && !PageKsm(from_page)) + from_anon_vma = page_get_anon_vma(from_page); + + if (PageAnon(to_page) && !PageKsm(to_page)) + to_anon_vma = page_get_anon_vma(to_page); + + from_page_count = page_count(from_page); + from_map_count = page_mapcount(from_page); + to_page_count = page_count(to_page); + to_map_count = page_mapcount(to_page); + from_flags = from_page->flags; + to_flags = to_page->flags; + from_mapping = from_page->mapping; + to_mapping = to_page->mapping; + from_index = from_page->index; + to_index = to_page->index; + + /* + * Corner case handling: + * 1. When a new swap-cache page is read into, it is added to the LRU + * and treated as swapcache but it has no rmap yet. + * Calling try_to_unmap() against a page->mapping==NULL page will + * trigger a BUG. So handle it here. + * 2. An orphaned page (see truncate_complete_page) might have + * fs-private metadata. The page can be picked up due to memory + * offlining. Everywhere else except page reclaim, the page is + * invisible to the vm, so the page can not be migrated. So try to + * free the metadata, so the page can be freed. + */ + if (!from_page->mapping) { + VM_BUG_ON_PAGE(PageAnon(from_page), from_page); + if (page_has_private(from_page)) { + try_to_free_buffers(from_page); + goto out_unlock_both; + } + } else if (page_mapped(from_page)) { + /* Establish migration ptes */ + VM_BUG_ON_PAGE(PageAnon(from_page) && !PageKsm(from_page) && + !from_anon_vma, from_page); + try_to_unmap(from_page, + TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); + from_page_was_mapped = 1; + } + + if (!to_page->mapping) { + VM_BUG_ON_PAGE(PageAnon(to_page), to_page); + if (page_has_private(to_page)) { + try_to_free_buffers(to_page); + goto out_unlock_both_remove_from_migration_pte; + } + } else if (page_mapped(to_page)) { + /* Establish migration ptes */ + VM_BUG_ON_PAGE(PageAnon(to_page) && !PageKsm(to_page) && + !to_anon_vma, to_page); + try_to_unmap(to_page, + TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS); + to_page_was_mapped = 1; + } + + if (!page_mapped(from_page) && !page_mapped(to_page)) + rc = exchange_from_to_pages(to_page, from_page, mode); + + + if (to_page_was_mapped) { + /* swap back to_page->index to be compatible with + * remove_migration_ptes(), which assumes both from_page and to_page + * below have the same index. + */ + if (rc == MIGRATEPAGE_SUCCESS) + swap(to_page->index, to_index); + + remove_migration_ptes(to_page, + rc == MIGRATEPAGE_SUCCESS ? from_page : to_page, false); + + if (rc == MIGRATEPAGE_SUCCESS) + swap(to_page->index, to_index); + } + +out_unlock_both_remove_from_migration_pte: + if (from_page_was_mapped) { + /* swap back from_page->index to be compatible with + * remove_migration_ptes(), which assumes both from_page and to_page + * below have the same index. + */ + if (rc == MIGRATEPAGE_SUCCESS) + swap(from_page->index, from_index); + + remove_migration_ptes(from_page, + rc == MIGRATEPAGE_SUCCESS ? to_page : from_page, false); + + if (rc == MIGRATEPAGE_SUCCESS) + swap(from_page->index, from_index); + } + +out_unlock_both: + if (to_anon_vma) + put_anon_vma(to_anon_vma); + unlock_page(to_page); +out_unlock: + /* Drop an anon_vma reference if we took one */ + if (from_anon_vma) + put_anon_vma(from_anon_vma); + unlock_page(from_page); +out: + return rc; +} + +/* + * Exchange pages in the exchange_list + * + * Caller should release the exchange_list resource. + * + */ +static int exchange_pages(struct list_head *exchange_list, + enum migrate_mode mode, + int reason) +{ + struct exchange_page_info *one_pair, *one_pair2; + int failed = 0; + + list_for_each_entry_safe(one_pair, one_pair2, exchange_list, list) { + struct page *from_page = one_pair->from_page; + struct page *to_page = one_pair->to_page; + int rc; + int retry = 0; + +again: + if (page_count(from_page) == 1) { + /* page was freed from under us. So we are done */ + ClearPageActive(from_page); + ClearPageUnevictable(from_page); + + mod_node_page_state(page_pgdat(from_page), NR_ISOLATED_ANON + + page_is_file_cache(from_page), + -hpage_nr_pages(from_page)); + put_page(from_page); + + if (page_count(to_page) == 1) { + ClearPageActive(to_page); + ClearPageUnevictable(to_page); + put_page(to_page); + mod_node_page_state(page_pgdat(to_page), NR_ISOLATED_ANON + + page_is_file_cache(to_page), + -hpage_nr_pages(to_page)); + } else + goto putback_to_page; + + continue; + } + + if (page_count(to_page) == 1) { + /* page was freed from under us. So we are done */ + ClearPageActive(to_page); + ClearPageUnevictable(to_page); + + mod_node_page_state(page_pgdat(to_page), NR_ISOLATED_ANON + + page_is_file_cache(to_page), + -hpage_nr_pages(to_page)); + put_page(to_page); + + mod_node_page_state(page_pgdat(from_page), NR_ISOLATED_ANON + + page_is_file_cache(from_page), + -hpage_nr_pages(from_page)); + putback_lru_page(from_page); + continue; + } + + /* TODO: compound page not supported */ + /* to_page can be file-backed page */ + if (PageCompound(from_page) || + page_mapping(from_page) + ) { + ++failed; + goto putback; + } + + rc = unmap_and_exchange(from_page, to_page, mode); + + if (rc == -EAGAIN && retry < 3) { + ++retry; + goto again; + } + + if (rc != MIGRATEPAGE_SUCCESS) + ++failed; + +putback: + mod_node_page_state(page_pgdat(from_page), NR_ISOLATED_ANON + + page_is_file_cache(from_page), + -hpage_nr_pages(from_page)); + + putback_lru_page(from_page); +putback_to_page: + mod_node_page_state(page_pgdat(to_page), NR_ISOLATED_ANON + + page_is_file_cache(to_page), + -hpage_nr_pages(to_page)); + + putback_lru_page(to_page); + } + return failed; +} + +int exchange_two_pages(struct page *page1, struct page *page2) +{ + struct exchange_page_info page_info; + LIST_HEAD(exchange_list); + int err = -EFAULT; + int pagevec_flushed = 0; + + VM_BUG_ON_PAGE(PageTail(page1), page1); + VM_BUG_ON_PAGE(PageTail(page2), page2); + + if (!(PageLRU(page1) && PageLRU(page2))) + return -EBUSY; + +retry_isolate1: + if (!get_page_unless_zero(page1)) + return -EBUSY; + err = isolate_lru_page(page1); + put_page(page1); + if (err) { + if (!pagevec_flushed) { + migrate_prep(); + pagevec_flushed = 1; + goto retry_isolate1; + } + return err; + } + mod_node_page_state(page_pgdat(page1), + NR_ISOLATED_ANON + page_is_file_cache(page1), + hpage_nr_pages(page1)); + +retry_isolate2: + if (!get_page_unless_zero(page2)) { + putback_lru_page(page1); + return -EBUSY; + } + err = isolate_lru_page(page2); + put_page(page2); + if (err) { + if (!pagevec_flushed) { + migrate_prep(); + pagevec_flushed = 1; + goto retry_isolate2; + } + return err; + } + mod_node_page_state(page_pgdat(page2), + NR_ISOLATED_ANON + page_is_file_cache(page2), + hpage_nr_pages(page2)); + + page_info.from_page = page1; + page_info.to_page = page2; + INIT_LIST_HEAD(&page_info.list); + list_add(&page_info.list, &exchange_list); + + + return exchange_pages(&exchange_list, MIGRATE_SYNC, 0); + +} diff --git a/mm/internal.h b/mm/internal.h index f4a7bb02decf..77e205c423ce 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -543,4 +543,10 @@ static inline bool is_migrate_highatomic_page(struct page *page) void setup_zone_pageset(struct zone *zone); extern struct page *alloc_new_node_page(struct page *page, unsigned long node); + +bool buffer_migrate_lock_buffers(struct buffer_head *head, + enum migrate_mode mode); +int writeout(struct address_space *mapping, struct page *page); +extern int exchange_two_pages(struct page *page1, struct page *page2); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/ksm.c b/mm/ksm.c index 6c48ad13b4c9..dc1ec06b71a0 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2665,6 +2665,41 @@ void ksm_migrate_page(struct page *newpage, struct page *oldpage) set_page_stable_node(oldpage, NULL); } } + +void ksm_exchange_page(struct page *to_page, struct page *from_page) +{ + struct stable_node *to_stable_node, *from_stable_node; + + VM_BUG_ON_PAGE(!PageLocked(to_page), to_page); + VM_BUG_ON_PAGE(!PageLocked(from_page), from_page); + + to_stable_node = page_stable_node(to_page); + from_stable_node = page_stable_node(from_page); + if (to_stable_node) { + VM_BUG_ON_PAGE(to_stable_node->kpfn != page_to_pfn(from_page), + from_page); + to_stable_node->kpfn = page_to_pfn(to_page); + /* + * newpage->mapping was set in advance; now we need smp_wmb() + * to make sure that the new stable_node->kpfn is visible + * to get_ksm_page() before it can see that oldpage->mapping + * has gone stale (or that PageSwapCache has been cleared). + */ + smp_wmb(); + } + if (from_stable_node) { + VM_BUG_ON_PAGE(from_stable_node->kpfn != page_to_pfn(to_page), + to_page); + from_stable_node->kpfn = page_to_pfn(from_page); + /* + * newpage->mapping was set in advance; now we need smp_wmb() + * to make sure that the new stable_node->kpfn is visible + * to get_ksm_page() before it can see that oldpage->mapping + * has gone stale (or that PageSwapCache has been cleared). + */ + smp_wmb(); + } +} #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE diff --git a/mm/migrate.c b/mm/migrate.c index d4fd680be3b0..b8c79aa62134 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -701,7 +701,7 @@ EXPORT_SYMBOL(migrate_page); #ifdef CONFIG_BLOCK /* Returns true if all buffers are successfully locked */ -static bool buffer_migrate_lock_buffers(struct buffer_head *head, +bool buffer_migrate_lock_buffers(struct buffer_head *head, enum migrate_mode mode) { struct buffer_head *bh = head; @@ -849,7 +849,7 @@ int buffer_migrate_page_norefs(struct address_space *mapping, /* * Writeback a page to clean the dirty state */ -static int writeout(struct address_space *mapping, struct page *page) +int writeout(struct address_space *mapping, struct page *page) { struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, From patchwork Fri Feb 15 22:08:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815949 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7276F139A for ; Fri, 15 Feb 2019 22:09:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 635CD2E9B6 for ; Fri, 15 Feb 2019 22:09:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 570C72EA04; Fri, 15 Feb 2019 22:09:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9A1232E9B6 for ; Fri, 15 Feb 2019 22:09:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06C8F8E0001; Fri, 15 Feb 2019 17:09:08 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EACCA8E0006; Fri, 15 Feb 2019 17:09:07 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D272D8E0001; Fri, 15 Feb 2019 17:09:07 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id AAC2E8E0004 for ; Fri, 15 Feb 2019 17:09:07 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id 203so9038258qke.7 for ; Fri, 15 Feb 2019 14:09:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=nAM+pSX97PAFy4Xg9SX78TGwNfne/lhBfg+g4IBK468=; b=aTQMhZhkIjxk2xlOVc+Z5YVE6IjBZZA+QtTc7aXkVLlfClV9ezmSyhKqDRTLfdxDgK GVGSpe0YKFb4nwya+whiCBabyzxjmclkA6K2OUCYtoToy8irG7svqhfeKZrjxkpXX8VM CXyggZhhc/oPuojRSzJD/Brw7LzvT6cdVEoy0RvsGpl4hpgoKJgqhUjQ8h1zLf//wAex KSt1C4vdNYggxpmBpNWJi3BNpNv63/vMWUSaz66W65IH0oWGJYuk55X0LSBJhJuzzalg TrWDbp8JyIJQkIvQ5roVkaKyDoe/xRIFyx1yusisBP4Y5OxAHkX3e4p4uIvbLVII3MBR WvQQ== X-Gm-Message-State: AHQUAuZMrvFYCUS5qEsvqGWl7pEFoQ/efmbBto1s39gQ7F3quhswZ+i7 VGVTw+d7pKcx5L6lEZp7CialcuKLiVs3zy5slW97DqEml9ywk8efyHzJ7bZ3NJK8Etxm7FVCall I4rj8GyKvhFcSiPKQLGqHWxvaaylXooE8w2u6ze2x84EXKlM0lsZe+u9siksso5JXbQ== X-Received: by 2002:a37:5c43:: with SMTP id q64mr8433358qkb.329.1550268547409; Fri, 15 Feb 2019 14:09:07 -0800 (PST) X-Google-Smtp-Source: AHgI3IZXlCmv3NEVOScDNDCyllVz8ZKP3SIbks/69L5fD6MZiZVsqPmmrs45oYbYof6DSYHDnPVS X-Received: by 2002:a37:5c43:: with SMTP id q64mr8433323qkb.329.1550268546719; Fri, 15 Feb 2019 14:09:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268546; cv=none; d=google.com; s=arc-20160816; b=WMqSHeDppk7dTU+wNZY9k2kPxNuvdluPGMLzsKJxqrfRFs/BhKlCKbu1yKJ5W59Z4o 23lEmkFZA0lnDzpPuPSsWOA49kxUKseLtE3QMW14PHdzrCG0CaqAdljCLoalgTUieBp6 hCGal4ETmzrS9ji3HAZUygX0Ul4QSr6WTgJYGe1Oa0shTpW4U/uuaLPy546bFiEg6XnH r/3HcgvAYOeOTHjcSOrgKfCHHgugrDep4ccaiAyazpqe3+PS3egjIF7xnOy/SHec2azj ZicwlHy+PUpf9M56S/oIfmlwGbT6HIRSJYfVTm6LbaOi+CuAFri3MDJMv6yvdDRgyBfg 6bqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=nAM+pSX97PAFy4Xg9SX78TGwNfne/lhBfg+g4IBK468=; b=ef0rG1piFcpXn6tqahKtGt+7l8nAh+QiIrPlmL5KK68qp49gAOQdN0TuzG16UcRhBa MT7cX3M2Icjp+G+sUJdbZqa9imQAdaYnWG+RA6wYcgST3p3l8L/ajAjgPHFKhFEM8mZN WEk2M3YLFI3E9TGUzhokJIv7RrevfOQ4y9pWL8B2/nyAEffxvqkA7/fZnESYagtQ72cR CWEgiJEnJ8l/GCZOF8qYJHERA8XXpJbgsJEoRzzkI1zO+k0IVal48TLWZmVH+EU9FKU1 1f5USgy+edZcXMvjq8jABTuqvPAMBhQcmWoCkiBQ7vKqrWa9L5FMO/id8eTTroZkIKoM R7bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=HSEJSvzK; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=IW1zIufo; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id i6si971528qkd.16.2019.02.15.14.09.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:06 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=HSEJSvzK; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=IW1zIufo; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id ED436321D; Fri, 15 Feb 2019 17:09:04 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=nAM+pSX97PAFy 4Xg9SX78TGwNfne/lhBfg+g4IBK468=; b=HSEJSvzKRLMkcD5hknvApfMFk+Cn/ HDJ7P16WBYgGG71vqLGXuw1xUu+Y8FQhUm6oonaWYIe1WyIo/StE1hDBWHLHo0Ye 8q1vF6FsXAoCranG8VGu+yfTN6dnm2+B1H5PZ0klBa7jpEJbHKnJ3/yukLqlWSru h0rOg8pdHYdzmf9vHsrpAqnmhRPpd13DVFeOg5CMBLnIJC9bnvJOvCoNojouvMUr AvCkttqB9e7l3q+e6gCVE6epG7x1SP9cxOJ1nCuxC4y2F2luFF3FWS/86p43hEb9 wRmjaTJ/YnSJ1rDC9W5UJzN50XngD8GbMuyaaCg46vLczVaEUuMGIAMsQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=nAM+pSX97PAFy4Xg9SX78TGwNfne/lhBfg+g4IBK468=; b=IW1zIufo P1Rp8kuXE67Tqh3KSAYNhlYaB1Ycl2Q43Zc1/PrHudXg8HeBx3xW/xWanGcPMYTr b14z+AtBYrPr/RzAot96HpH8v2MfRpMBrPkyQImbuqiogLAVJ0Hc/+YIxdloFN6y B0a1+Pq4q4JU8xp68B9PQYervjKbqmL8E0he7Rz44dZh8+9TtWEE1EyaLXMDlhL1 /iyDGgQgsNlDv1wANjkENUxTqbkynqwrZd0iFz6TKziR59XD23vBU535MOYHRxk6 9FhvfOqjVBEXPY/a6BKDgBYGDntsZj7BhrSTNI2oD6fsYb88pCQwri8IUJv3aVB5 vai1AtbF8ZHCrw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedu X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id DCB4CE462B; Fri, 15 Feb 2019 17:09:02 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 02/31] mm: migrate: Add THP exchange support. Date: Fri, 15 Feb 2019 14:08:27 -0800 Message-Id: <20190215220856.29749-3-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Add support for exchange between two THPs. Signed-off-by: Zi Yan --- mm/exchange.c | 47 ++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 38 insertions(+), 9 deletions(-) diff --git a/mm/exchange.c b/mm/exchange.c index a607348cc6f4..8cf286fc0f10 100644 --- a/mm/exchange.c +++ b/mm/exchange.c @@ -48,7 +48,8 @@ struct page_flags { unsigned int page_swapcache:1; unsigned int page_writeback:1; unsigned int page_private:1; - unsigned int __pad:3; + unsigned int page_doublemap:1; + unsigned int __pad:2; }; @@ -125,20 +126,23 @@ static void exchange_huge_page(struct page *dst, struct page *src) static void exchange_page_flags(struct page *to_page, struct page *from_page) { int from_cpupid, to_cpupid; - struct page_flags from_page_flags, to_page_flags; + struct page_flags from_page_flags = {0}, to_page_flags = {0}; struct mem_cgroup *to_memcg = page_memcg(to_page), *from_memcg = page_memcg(from_page); from_cpupid = page_cpupid_xchg_last(from_page, -1); - from_page_flags.page_error = TestClearPageError(from_page); + from_page_flags.page_error = PageError(from_page); + if (from_page_flags.page_error) + ClearPageError(from_page); from_page_flags.page_referenced = TestClearPageReferenced(from_page); from_page_flags.page_uptodate = PageUptodate(from_page); ClearPageUptodate(from_page); from_page_flags.page_active = TestClearPageActive(from_page); from_page_flags.page_unevictable = TestClearPageUnevictable(from_page); from_page_flags.page_checked = PageChecked(from_page); - ClearPageChecked(from_page); + if (from_page_flags.page_checked) + ClearPageChecked(from_page); from_page_flags.page_mappedtodisk = PageMappedToDisk(from_page); ClearPageMappedToDisk(from_page); from_page_flags.page_dirty = PageDirty(from_page); @@ -148,18 +152,22 @@ static void exchange_page_flags(struct page *to_page, struct page *from_page) clear_page_idle(from_page); from_page_flags.page_swapcache = PageSwapCache(from_page); from_page_flags.page_writeback = test_clear_page_writeback(from_page); + from_page_flags.page_doublemap = PageDoubleMap(from_page); to_cpupid = page_cpupid_xchg_last(to_page, -1); - to_page_flags.page_error = TestClearPageError(to_page); + to_page_flags.page_error = PageError(to_page); + if (to_page_flags.page_error) + ClearPageError(to_page); to_page_flags.page_referenced = TestClearPageReferenced(to_page); to_page_flags.page_uptodate = PageUptodate(to_page); ClearPageUptodate(to_page); to_page_flags.page_active = TestClearPageActive(to_page); to_page_flags.page_unevictable = TestClearPageUnevictable(to_page); to_page_flags.page_checked = PageChecked(to_page); - ClearPageChecked(to_page); + if (to_page_flags.page_checked) + ClearPageChecked(to_page); to_page_flags.page_mappedtodisk = PageMappedToDisk(to_page); ClearPageMappedToDisk(to_page); to_page_flags.page_dirty = PageDirty(to_page); @@ -169,6 +177,7 @@ static void exchange_page_flags(struct page *to_page, struct page *from_page) clear_page_idle(to_page); to_page_flags.page_swapcache = PageSwapCache(to_page); to_page_flags.page_writeback = test_clear_page_writeback(to_page); + to_page_flags.page_doublemap = PageDoubleMap(to_page); /* set to_page */ if (from_page_flags.page_error) @@ -195,6 +204,8 @@ static void exchange_page_flags(struct page *to_page, struct page *from_page) set_page_young(to_page); if (from_page_flags.page_is_idle) set_page_idle(to_page); + if (from_page_flags.page_doublemap) + SetPageDoubleMap(to_page); /* set from_page */ if (to_page_flags.page_error) @@ -221,6 +232,8 @@ static void exchange_page_flags(struct page *to_page, struct page *from_page) set_page_young(from_page); if (to_page_flags.page_is_idle) set_page_idle(from_page); + if (to_page_flags.page_doublemap) + SetPageDoubleMap(from_page); /* * Copy NUMA information to the new page, to prevent over-eager @@ -280,6 +293,7 @@ static int exchange_page_move_mapping(struct address_space *to_mapping, VM_BUG_ON_PAGE(to_mapping != page_mapping(to_page), to_page); VM_BUG_ON_PAGE(from_mapping != page_mapping(from_page), from_page); + VM_BUG_ON(PageCompound(from_page) != PageCompound(to_page)); if (!to_mapping) { /* Anonymous page without mapping */ @@ -600,7 +614,6 @@ static int unmap_and_exchange(struct page *from_page, to_mapping = to_page->mapping; from_index = from_page->index; to_index = to_page->index; - /* * Corner case handling: * 1. When a new swap-cache page is read into, it is added to the LRU @@ -691,6 +704,23 @@ static int unmap_and_exchange(struct page *from_page, return rc; } +static bool can_be_exchanged(struct page *from, struct page *to) +{ + if (PageCompound(from) != PageCompound(to)) + return false; + + if (PageHuge(from) != PageHuge(to)) + return false; + + if (PageHuge(from) || PageHuge(to)) + return false; + + if (compound_order(from) != compound_order(to)) + return false; + + return true; +} + /* * Exchange pages in the exchange_list * @@ -751,9 +781,8 @@ static int exchange_pages(struct list_head *exchange_list, continue; } - /* TODO: compound page not supported */ /* to_page can be file-backed page */ - if (PageCompound(from_page) || + if (!can_be_exchanged(from_page, to_page) || page_mapping(from_page) ) { ++failed; From patchwork Fri Feb 15 22:08:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815951 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBE05139A for ; Fri, 15 Feb 2019 22:09:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC9A32E9B6 for ; Fri, 15 Feb 2019 22:09:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A03C62EA04; Fri, 15 Feb 2019 22:09:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42EBA2E9B6 for ; Fri, 15 Feb 2019 22:09:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 622BE8E0006; Fri, 15 Feb 2019 17:09:09 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5D48B8E0004; Fri, 15 Feb 2019 17:09:09 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 474658E0006; Fri, 15 Feb 2019 17:09:09 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 1ED988E0004 for ; Fri, 15 Feb 2019 17:09:09 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id y31so10419662qty.9 for ; Fri, 15 Feb 2019 14:09:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=0R137jNOULA0X2GJv6mEJu5LnxgF5CPDaCLQF5uXUrw=; b=GGz6zFB3zQm2YwpBP+tlHl/q+VAnRTNFuPwsWBKo8vgW8NF7QM0KUYm8zO4Xoy+ROD TFxpXLPYe8r0KgZSCF7h4/EDawHGoXxxxzJEooOF56XSYeSW8x2Xo7tTj38EkttskuBQ l1R9E+iO1FPSJNccpiOkh5C6SP51WEV65q8HGlvHJCzmi2Q+tUHapfmpyQ/jXwKgVJwL 3ZHwyyeoV0do7nEk7SNU69C+9aCtjPGDBT6MjcGNicVenuKQRHQOvxl4epNGIUqw9Ddx MoiqmoAzmnroexVeKkXC1Qtx6s0aLALy4z1L68SZeP6zc/Ni9bjodBd2enGXkYcSCvgA 96VA== X-Gm-Message-State: AHQUAubFHwHd4C9s11Qf0e6oyGsL3y89CQZJKzx1ea3R+V8GQjJdcCPO cB+dkyE1/zLOF0KcBBfUKviGvlCI2Uh/FjCk/8WdRPsv1hTneyM8fSXLCq6Kf9vn/brtwDsnd3o uT1RAi91EaWoP9jta8i192q06OjtbSm+3Fc6u1pJ7uG/NZO40usfNYUCC7cXiOxTlyQ== X-Received: by 2002:ac8:2e16:: with SMTP id r22mr9431298qta.384.1550268548887; Fri, 15 Feb 2019 14:09:08 -0800 (PST) X-Google-Smtp-Source: AHgI3IZbSWF2Z2jC+DlQkwPSoo+Vrp21BvAxrJ+QnRaN6JDClxLbqomy95WnCRfRLwBuxG69PqTc X-Received: by 2002:ac8:2e16:: with SMTP id r22mr9431250qta.384.1550268548114; Fri, 15 Feb 2019 14:09:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268548; cv=none; d=google.com; s=arc-20160816; b=cdn0Anr0mMZNhARSA90SJUbRnfZJMo1uo5xDYGjF/fNzuDtPEGBb1m9nNV+R4nyLtG XRIZRiPH/auSQUcPEh9MEWBPW5Fa5bMl2zk3RAHFmk2rAwdrZJbBVphloupeD4GPD80A zVp48kQJnud1IzKlxIUZ6VKEncz9rU5IFMJwETX2aD84dwNrfF4Cf0xvDWoTs+r5ut1e fvSYSHtNa1QZssrSARlK3qjmPrghHShOfSk3y6PxvgMk4ThyhfoaD1d8IkCEmTHKPkA/ GcJ9lkqRnqvR6UIeRiY7HItwbsHYn+mfMpsU7LShMYH9pE9547OYlE2/0JtoUXPXtOGB V+MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=0R137jNOULA0X2GJv6mEJu5LnxgF5CPDaCLQF5uXUrw=; b=jQa5IlOdtIGBkqQTO48BG+J5VYvZO7iYbKnsibV9gWUH9QAp9SCeMS2Z3T9xMOLz7p pAt0OLr/5nukhNLWhCLXOTPTcNoJS0Uy0uwL7qnamwoLCPTkX0OR7NRaLC2UIM3vtsT1 rIg1YHb/NTaCmBylUjsTKn+FyAvvAmFVqjgOZmERaKoihpZYZfC4keeT4dJompbpD8bW CBOGRl/XI0L1q9OjDR4yEE2tk8PZQNObv1julO+CLdwSd1ZrfVOealBo1L6g0zyri0bx T6b0pa/o4XaR4wxcZ/QLoPLnfEojGaED+r5j56YhnSrFNWHoBKjvYwtCZNvyGc0vNJhU lYTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=GAhYBfQM; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=hJ8TkaDB; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id p2si3728033qtn.261.2019.02.15.14.09.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:08 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=GAhYBfQM; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=hJ8TkaDB; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 4460031E4; Fri, 15 Feb 2019 17:09:06 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=0R137jNOULA0X 2GJv6mEJu5LnxgF5CPDaCLQF5uXUrw=; b=GAhYBfQMmMRzrEHYF5T9M2KOo4dry SLuvQ9eILeoNUaXhEQY7COUT/A3kkS2ktOFJU94tA9dmQ7FYOC+QE47OpMSvVVoV FPIKfIBrMFRkN0aI8Vt29uXi9f4EsBptLlzhcvDiy/9sTHFTeNpyjVy0JF2oVa9Y qvcXUDDU1aCMPdMVhTpsTgtsB3KNIot7yoHXl7GRjQCbJSmttB3C74c45K7ju+qo dj2OtrJuDC5KZy9igtlw36VqsDRcUZ9tFc/8tYhOBJ8EsRckG7sN5Tc8vtn4lP2j k/k24X5OlVW6DMf2JIVDUL0DOPk5ebHDSrifOCiQnwUdijrCEaZpnAsWw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=0R137jNOULA0X2GJv6mEJu5LnxgF5CPDaCLQF5uXUrw=; b=hJ8TkaDB yewsnxFqRp9q1orVZP7Tviy/yYFVMXM54lVvD8Mw1fVOJhRdGY+F41C0udeuWqOT E1ZOFzs/5e9CwB+Y8IfshApHXlnOWSJI71Usb5zjVRwI5EbZ+FcSe2Y5txfPjEFW e0JLu+Y1dTVLgSwR7jU2KtjOcX3i3XtZJSss2ni+Ai3M2IpquYPKUtdXung5LpqD zHA/Xz4MTHHXW/etVqaqYBqN4IxODCrIthiEPgM7p2eLYwO7wdmm0uofxB/n0uD9 YiVf5ysutMlbtxgR+xtBEscxqFb/m93V05mwMUizfp16iIxo9qMBQNGTV4pIReM+ f882ZfB1ihjnJw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedu X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 4187CE46AB; Fri, 15 Feb 2019 17:09:04 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 03/31] mm: migrate: Add tmpfs exchange support. Date: Fri, 15 Feb 2019 14:08:28 -0800 Message-Id: <20190215220856.29749-4-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan tmpfs uses the same migrate routine as anonymous pages, enabling exchange pages for it is easy. Signed-off-by: Zi Yan --- mm/exchange.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/exchange.c b/mm/exchange.c index 8cf286fc0f10..851f1a99b48b 100644 --- a/mm/exchange.c +++ b/mm/exchange.c @@ -466,7 +466,10 @@ static int exchange_from_to_pages(struct page *to_page, struct page *from_page, rc = exchange_page_move_mapping(to_page_mapping, from_page_mapping, to_page, from_page, NULL, NULL, mode, 0, 0); } else { - if (to_page_mapping->a_ops->migratepage == buffer_migrate_page) { + /* shmem */ + if (to_page_mapping->a_ops->migratepage == migrate_page) + goto exchange_mappings; + else if (to_page_mapping->a_ops->migratepage == buffer_migrate_page) { if (!page_has_buffers(to_page)) goto exchange_mappings; From patchwork Fri Feb 15 22:08:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815957 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2B008139A for ; Fri, 15 Feb 2019 22:09:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 13BD52FE50 for ; Fri, 15 Feb 2019 22:09:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 066552FE33; Fri, 15 Feb 2019 22:09:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 004262FE33 for ; Fri, 15 Feb 2019 22:09:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7E4C8E000A; Fri, 15 Feb 2019 17:09:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AB5DE8E0009; Fri, 15 Feb 2019 17:09:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7AA2E8E000A; Fri, 15 Feb 2019 17:09:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 387D68E0004 for ; Fri, 15 Feb 2019 17:09:14 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id y31so10419895qty.9 for ; Fri, 15 Feb 2019 14:09:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=3cA+2Qizqf7/apl9HV9mIsP7j8kj4eY2mH2wbR+KHig=; b=paVwNvKqwjQ9bxKWzb+P0aXMrRaB8UGJcZ1VASIuzs5QbWSgtYOS8PJqoMtH+FCZj2 V43xIGs6/8vbcr4zx3Sh+yE6kygoxtqgdpZtTNVJTPvi2cF7tAswa0ipsPkQ+g/vbTTx cRpQYu+lfql/kbILx0Gp48WaGdYm6cyNALHiIejYI08NS1we+0Nuunz43REV+lr90x/B BXGCXBCIyOGuE+uoUM5Q9fS+IpdbTVhmXukvezzM35Su+waaWlVe7N43vqmTRIn6yB3Y Z6UjZqebSiL/iaHiDxGDrbXJPFD7LQ9+9OwMl7tDfoQfz010IvMYgb0FuURe80p35E3T o2gg== X-Gm-Message-State: AHQUAubQ10wSV7SqlFo6Ug7aGo2Va6OZ86xRM1dzvejdmcOpsQ5Ko5LD qYsuYmnk8SSv355quK7chEjjAdY8oFM029+rZBpN86S/vLB49+A9Jh+bC3W9XufA1FDvveQFk77 tiKNRjvzO47OlqRwvENLfhqUz0AKdSJPOtnfQpw1tNqdZIkjvhZKlmP0v/ndNh0b7ZA== X-Received: by 2002:a05:620a:136d:: with SMTP id d13mr8861036qkl.256.1550268553755; Fri, 15 Feb 2019 14:09:13 -0800 (PST) X-Google-Smtp-Source: AHgI3IaSr1H6QLY3cRAaS3/2DFTBQm7YFQmRvjm18/3o4yCwJgFFHuwrwusy5BtuDhtEcfa+dxHF X-Received: by 2002:a05:620a:136d:: with SMTP id d13mr8860831qkl.256.1550268549836; Fri, 15 Feb 2019 14:09:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268549; cv=none; d=google.com; s=arc-20160816; b=E9Jj/ZkTChITiFrhXqcfmE71bmSItfzLPelvkeRdLw83aDtxbfw6ifI+6xWEvlMdUt QS5wI5/1w/V/hEKmgayc00ceh7bc/1Pp132mSLQn12Et+7PJbUmHs26W+ccgOboa9ITg UzoZdiNMBVdOZ4U5FRlIH+Ztn4+gjC18rW5iUw8EHLYrPliddL3JvKVDPZZ3781IVU5N NBOCS6CoZjA+vF2dF5Q3+AUzVnSn5vETxEWMARs63LqHtwMiwUXWGxbYAYsDHtaN5T53 BkgLm1QrAS3zyOK42HO5q4RvgJAqi4u6xLp7ocFQtfZHgz8YEU9UGM6EPlpcrJd3zLVn Wj3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=3cA+2Qizqf7/apl9HV9mIsP7j8kj4eY2mH2wbR+KHig=; b=LsmG8t9YGt7jacVAw6nTP08Ca3TEbKNTsjcgaJzGaBc6zoZgI5tkMl+AE36WyrTTbW f0qtEKQFdZHQHwuOegSz+7So3m7753nxVZI/cQlB/5P8mruJ9ODPJv+SzzPK8A/HWy3S PovhQWIByQD2LCxvT1YjY7bAFNrJrdZpCf7Njrao4AA0YOUP/C6n9lItNQkMjsK3+uCJ aZ6qSx7cN0HKDO3oehkSCL9C4yl1f+VGB1ewXj3w4ULg0Bp9rfdglaa8dbhtKC3OYXpC RgQWPnkQ8zQwmIDrSdESi50xdHoGg2snxiv1WDGNZVHqloMsJPsBm7sd4o5OHJrgKhdr AYyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=c2lHRGx9; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="bfbHC/yF"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id u10si834070qvm.104.2019.02.15.14.09.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:09 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=c2lHRGx9; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="bfbHC/yF"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id E13C1328F; Fri, 15 Feb 2019 17:09:07 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=3cA+2Qizqf7/a pl9HV9mIsP7j8kj4eY2mH2wbR+KHig=; b=c2lHRGx9Nz5W6H4kv5OG6D7lgiOiu La+zEdWorQy2nBfdlMBaTSY9rPOJXYKur7/xu1dxnOhTHupyecUoeee7+Ej+Giv8 JATdqE7/qUHjwjqE7Xr+YQYyL4s98cz/LxqDxzHWTfnWc4lBc64OHU7HXwTuPjqQ 8U4NsGyWjHia0eD9JoCO2kE0ov/TxNJ/YFcKoeHx+b6bISTU85z9al7gL4Fz9cCi DlXKtpvWimTcAfSeo6JFBOLJ6zLgsZb1pBgYD2PHMsjLByMjQwIt+ZKoGB+FkUxh pdDmeoi4U5cNb5KudMeDf85xEPkAt5Eza6h9uV5j1Hwzo/rTZ1Em+1SYw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=3cA+2Qizqf7/apl9HV9mIsP7j8kj4eY2mH2wbR+KHig=; b=bfbHC/yF HpNYqR8ocwDwAhvwUtJazb0JFtD8IYOjZhOy5rTdvNi8sWLbBgGs6oR5Um0AMaIx KSo0D0aE2ijTOmUBtqLUc5LOhOF8H4Cqu6zx8rnfvwXSDLZqmis0By5UYU6TN9pp TkSOMQ3GRxJ5BS50UzWbUuxO2A/qEs5aS+NcWBPHuatvpNQFjgLWJ9h40bpG+rNc IpJNCgaGTkM+4bANtTDS9xdTKNQVMd6vaKZD5YJdrz5mjuC1unouUgJ7NPrysLkx mlygnTYDKusRBzR9FpVnMAlxHeo84pgZLUOJkjy15hioRkqvrlNAzIjUmxOh+ZwH snyoT+4S7m613Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedu X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 9ADDBE4597; Fri, 15 Feb 2019 17:09:05 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 04/31] mm: add mem_defrag functionality. Date: Fri, 15 Feb 2019 14:08:29 -0800 Message-Id: <20190215220856.29749-5-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Create contiguous physical memory regions by migrating/exchanging pages. 1. It scans all VMAs in one process and determines an anchor pair (VPN, PFN) for each VMA. 2. Then, it migrates/exchanges pages in each VMA to make them aligned with the anchor pair. At the end of day, a physically contiguous region should be created for each VMA in the physical address range [PFN, PFN + VMA size). Signed-off-by: Zi Yan --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/exec.c | 4 + include/linux/mem_defrag.h | 60 + include/linux/mm.h | 12 + include/linux/mm_types.h | 4 + include/linux/sched/coredump.h | 3 + include/linux/syscalls.h | 3 + include/linux/vm_event_item.h | 23 + include/uapi/asm-generic/mman-common.h | 3 + kernel/fork.c | 9 + kernel/sysctl.c | 79 +- mm/Makefile | 1 + mm/compaction.c | 17 +- mm/huge_memory.c | 4 + mm/internal.h | 28 + mm/khugepaged.c | 1 + mm/madvise.c | 15 + mm/mem_defrag.c | 1782 ++++++++++++++++++++++++ mm/memory.c | 7 + mm/mmap.c | 29 + mm/page_alloc.c | 4 +- mm/vmstat.c | 21 + 22 files changed, 2096 insertions(+), 14 deletions(-) create mode 100644 include/linux/mem_defrag.h create mode 100644 mm/mem_defrag.c diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709a5ffb..374c11e3cf80 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,7 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 common scan_process_memory __x64_sys_scan_process_memory # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/exec.c b/fs/exec.c index fb72d36f7823..b71b9d305d7d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1010,7 +1010,11 @@ static int exec_mmap(struct mm_struct *mm) { struct task_struct *tsk; struct mm_struct *old_mm, *active_mm; + int move_mem_defrag = current->mm ? + test_bit(MMF_VM_MEM_DEFRAG_ALL, ¤t->mm->flags):0; + if (move_mem_defrag && mm) + set_bit(MMF_VM_MEM_DEFRAG_ALL, &mm->flags); /* Notify parent that we're no longer interested in the old VM */ tsk = current; old_mm = current->mm; diff --git a/include/linux/mem_defrag.h b/include/linux/mem_defrag.h new file mode 100644 index 000000000000..43954a316752 --- /dev/null +++ b/include/linux/mem_defrag.h @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * mem_defrag.h Memory defragmentation function. + * + * Copyright (C) 2019 Zi Yan + * + * + */ +#ifndef _LINUX_KMEM_DEFRAGD_H +#define _LINUX_KMEM_DEFRAGD_H + +#include /* MMF_VM_MEM_DEFRAG */ + +#define MEM_DEFRAG_SCAN 0 +#define MEM_DEFRAG_MARK_SCAN_ALL 1 +#define MEM_DEFRAG_CLEAR_SCAN_ALL 2 +#define MEM_DEFRAG_DEFRAG 3 +#define MEM_DEFRAG_CONTIG_SCAN 5 + +enum mem_defrag_action { + MEM_DEFRAG_FULL_STATS = 0, + MEM_DEFRAG_DO_DEFRAG, + MEM_DEFRAG_CONTIG_STATS, +}; + +extern int kmem_defragd_always; + +extern int __kmem_defragd_enter(struct mm_struct *mm); +extern void __kmem_defragd_exit(struct mm_struct *mm); +extern int memdefrag_madvise(struct vm_area_struct *vma, + unsigned long *vm_flags, int advice); + +static inline int kmem_defragd_fork(struct mm_struct *mm, + struct mm_struct *oldmm) +{ + if (test_bit(MMF_VM_MEM_DEFRAG, &oldmm->flags)) + return __kmem_defragd_enter(mm); + return 0; +} + +static inline void kmem_defragd_exit(struct mm_struct *mm) +{ + if (test_bit(MMF_VM_MEM_DEFRAG, &mm->flags)) + __kmem_defragd_exit(mm); +} + +static inline int kmem_defragd_enter(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + if (!test_bit(MMF_VM_MEM_DEFRAG, &vma->vm_mm->flags)) + if (((kmem_defragd_always || + ((vm_flags & VM_MEMDEFRAG))) && + !(vm_flags & VM_NOMEMDEFRAG)) || + test_bit(MMF_VM_MEM_DEFRAG_ALL, &vma->vm_mm->flags)) + if (__kmem_defragd_enter(vma->vm_mm)) + return -ENOMEM; + return 0; +} + +#endif /* _LINUX_KMEM_DEFRAGD_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb6408fe73..5bcc1b03372a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -251,13 +251,20 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_6 38 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) +#define VM_HIGH_ARCH_6 BIT(VM_HIGH_ARCH_BIT_6) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ +#define VM_MEMDEFRAG VM_HIGH_ARCH_5 /* memory defrag */ +#define VM_NOMEMDEFRAG VM_HIGH_ARCH_6 /* no memory defrag */ + #ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ @@ -487,6 +494,9 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); + vma->anchor_page_rb = RB_ROOT_CACHED; + vma->vma_create_jiffies = jiffies; + vma->vma_defrag_jiffies = 0; } static inline void vma_set_anonymous(struct vm_area_struct *vma) @@ -2837,6 +2847,8 @@ static inline bool debug_guardpage_enabled(void) { return false; } static inline bool page_is_guard(struct page *page) { return false; } #endif /* CONFIG_DEBUG_PAGEALLOC */ +void free_anchor_pages(struct vm_area_struct *vma); + #if MAX_NUMNODES > 1 void __init setup_nr_node_ids(void); #else diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2c471a2c43fa..32549b255d25 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -328,6 +328,10 @@ struct vm_area_struct { struct mempolicy *vm_policy; /* NUMA policy for the VMA */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; + struct rb_root_cached anchor_page_rb; + unsigned long vma_create_jiffies; /* life time of the vma */ + unsigned long vma_modify_jiffies; /* being modified time of the vma */ + unsigned long vma_defrag_jiffies; /* being defragged time of the vma */ } __randomize_layout; struct core_thread { diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index ecdc6542070f..52ad71db6687 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -76,5 +76,8 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK) +#define MMF_VM_MEM_DEFRAG 26 /* set mm is added to do mem defrag */ +#define MMF_VM_MEM_DEFRAG_ALL 27 /* all vmas in the mm will be mem defrag */ + #endif /* _LINUX_SCHED_COREDUMP_H */ diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 257cccba3062..caa6f043b29a 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -926,6 +926,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer); asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len, int flags, uint32_t sig); +asmlinkage long sys_scan_process_memory(pid_t pid, char __user *out_buf, + int buf_len, int action); /* * Architecture-specific system calls @@ -1315,4 +1317,5 @@ static inline unsigned int ksys_personality(unsigned int personality) return old; } + #endif diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 47a3441cf4c4..6b32c8243616 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -109,6 +109,29 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, +#endif + /* MEM_DEFRAG STATS */ + MEM_DEFRAG_DEFRAG_NUM, + MEM_DEFRAG_SCAN_NUM, + MEM_DEFRAG_DST_FREE_PAGES, + MEM_DEFRAG_DST_ANON_PAGES, + MEM_DEFRAG_DST_FILE_PAGES, + MEM_DEFRAG_DST_NONLRU_PAGES, + MEM_DEFRAG_DST_FREE_PAGES_FAILED, + MEM_DEFRAG_DST_FREE_PAGES_OVERFLOW_FAILED, + MEM_DEFRAG_DST_ANON_PAGES_FAILED, + MEM_DEFRAG_DST_FILE_PAGES_FAILED, + MEM_DEFRAG_DST_NONLRU_PAGES_FAILED, + MEM_DEFRAG_SRC_ANON_PAGES_FAILED, + MEM_DEFRAG_SRC_COMP_PAGES_FAILED, + MEM_DEFRAG_DST_SPLIT_HUGEPAGES, +#ifdef CONFIG_COMPACTION + /* memory compaction */ + COMPACT_MIGRATE_PAGES, +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* thp collapse */ + THP_COLLAPSE_MIGRATE_PAGES, #endif NR_VM_EVENT_ITEMS }; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index e7ee32861d51..d1ec94a1970d 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -66,6 +66,9 @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_MEMDEFRAG 20 /* Worth backing with hugepages */ +#define MADV_NOMEMDEFRAG 21 /* Not worth backing with hugepages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/kernel/fork.c b/kernel/fork.c index b69248e6f0e0..dcefa978c232 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -92,6 +92,7 @@ #include #include #include +#include #include #include @@ -343,12 +344,16 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) if (new) { *new = *orig; INIT_LIST_HEAD(&new->anon_vma_chain); + new->anchor_page_rb = RB_ROOT_CACHED; + new->vma_create_jiffies = jiffies; + new->vma_defrag_jiffies = 0; } return new; } void vm_area_free(struct vm_area_struct *vma) { + free_anchor_pages(vma); kmem_cache_free(vm_area_cachep, vma); } @@ -496,6 +501,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, if (retval) goto out; retval = khugepaged_fork(mm, oldmm); + if (retval) + goto out; + retval = kmem_defragd_fork(mm, oldmm); if (retval) goto out; @@ -1044,6 +1052,7 @@ static inline void __mmput(struct mm_struct *mm) exit_aio(mm); ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ + kmem_defragd_exit(mm); exit_mmap(mm); mm_put_huge_zero_page(mm); set_mm_exe_file(mm, NULL); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ba4d9e85feb8..6bf0be1af7e0 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -115,6 +115,13 @@ extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max; extern int sysctl_nr_trim_pages; #endif +extern int kmem_defragd_always; +extern int vma_scan_percentile; +extern int vma_scan_threshold_type; +extern int vma_no_repeat_defrag; +extern int num_breakout_chunks; +extern int defrag_size_threshold; + /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR static int sixty = 60; @@ -1303,7 +1310,7 @@ static struct ctl_table vm_table[] = { .proc_handler = overcommit_kbytes_handler, }, { - .procname = "page-cluster", + .procname = "page-cluster", .data = &page_cluster, .maxlen = sizeof(int), .mode = 0644, @@ -1691,6 +1698,58 @@ static struct ctl_table vm_table[] = { .extra2 = (void *)&mmap_rnd_compat_bits_max, }, #endif + { + .procname = "kmem_defragd_always", + .data = &kmem_defragd_always, + .maxlen = sizeof(kmem_defragd_always), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, + { + .procname = "vma_scan_percentile", + .data = &vma_scan_percentile, + .maxlen = sizeof(vma_scan_percentile), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one_hundred, + }, + { + .procname = "vma_scan_threshold_type", + .data = &vma_scan_threshold_type, + .maxlen = sizeof(vma_scan_threshold_type), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, + { + .procname = "vma_no_repeat_defrag", + .data = &vma_no_repeat_defrag, + .maxlen = sizeof(vma_no_repeat_defrag), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, + { + .procname = "num_breakout_chunks", + .data = &num_breakout_chunks, + .maxlen = sizeof(num_breakout_chunks), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + }, + { + .procname = "defrag_size_threshold", + .data = &defrag_size_threshold, + .maxlen = sizeof(defrag_size_threshold), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + }, { } }; @@ -1807,7 +1866,7 @@ static struct ctl_table fs_table[] = { .mode = 0555, .child = inotify_table, }, -#endif +#endif #ifdef CONFIG_EPOLL { .procname = "epoll", @@ -2252,12 +2311,12 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, int *i, vleft, first = 1, err = 0; size_t left; char *kbuf = NULL, *p; - + if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } - + i = (int *) tbl_data; vleft = table->maxlen / sizeof(*i); left = *lenp; @@ -2483,7 +2542,7 @@ static int do_proc_douintvec(struct ctl_table *table, int write, * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. + * values from/to the user buffer, treated as an ASCII string. * * Returns 0 on success. */ @@ -2974,7 +3033,7 @@ static int do_proc_dointvec_ms_jiffies_conv(bool *negp, unsigned long *lvalp, * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. + * values from/to the user buffer, treated as an ASCII string. * The values read are assumed to be in seconds, and are converted into * jiffies. * @@ -2996,8 +3055,8 @@ int proc_dointvec_jiffies(struct ctl_table *table, int write, * @ppos: pointer to the file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. - * The values read are assumed to be in 1/USER_HZ seconds, and + * values from/to the user buffer, treated as an ASCII string. + * The values read are assumed to be in 1/USER_HZ seconds, and * are converted into jiffies. * * Returns 0 on success. @@ -3019,8 +3078,8 @@ int proc_dointvec_userhz_jiffies(struct ctl_table *table, int write, * @ppos: the current position in the file * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. - * The values read are assumed to be in 1/1000 seconds, and + * values from/to the user buffer, treated as an ASCII string. + * The values read are assumed to be in 1/1000 seconds, and * are converted into jiffies. * * Returns 0 on success. diff --git a/mm/Makefile b/mm/Makefile index 1574ea5743e4..925f21c717db 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -44,6 +44,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ obj-y += init-mm.o obj-y += memblock.o obj-y += exchange.o +obj-y += mem_defrag.o ifdef CONFIG_MMU obj-$(CONFIG_ADVISE_SYSCALLS) += madvise.o diff --git a/mm/compaction.c b/mm/compaction.c index ef29490b0f46..54c4bfdbdbc3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -50,7 +50,7 @@ static inline void count_compact_events(enum vm_event_item item, long delta) #define pageblock_start_pfn(pfn) block_start_pfn(pfn, pageblock_order) #define pageblock_end_pfn(pfn) block_end_pfn(pfn, pageblock_order) -static unsigned long release_freepages(struct list_head *freelist) +unsigned long release_freepages(struct list_head *freelist) { struct page *page, *next; unsigned long high_pfn = 0; @@ -58,7 +58,10 @@ static unsigned long release_freepages(struct list_head *freelist) list_for_each_entry_safe(page, next, freelist, lru) { unsigned long pfn = page_to_pfn(page); list_del(&page->lru); - __free_page(page); + if (PageCompound(page)) + __free_pages(page, compound_order(page)); + else + __free_page(page); if (pfn > high_pfn) high_pfn = pfn; } @@ -1593,6 +1596,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) { int err; + int num_migrated_pages = 0; + struct page *iter; switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: @@ -1611,6 +1616,9 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro ; } + list_for_each_entry(iter, &cc->migratepages, lru) + num_migrated_pages++; + err = migrate_pages(&cc->migratepages, compaction_alloc, compaction_free, (unsigned long)cc, cc->mode, MR_COMPACTION); @@ -1618,6 +1626,11 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro trace_mm_compaction_migratepages(cc->nr_migratepages, err, &cc->migratepages); + list_for_each_entry(iter, &cc->migratepages, lru) + num_migrated_pages--; + + count_vm_events(COMPACT_MIGRATE_PAGES, num_migrated_pages); + /* All pages were either migrated or will be released */ cc->nr_migratepages = 0; if (err) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index faf357eaf0ce..ffcae07a87d3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -695,6 +696,9 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) return VM_FAULT_OOM; if (unlikely(khugepaged_enter(vma, vma->vm_flags))) return VM_FAULT_OOM; + /* Make it defrag */ + if (unlikely(kmem_defragd_enter(vma, vma->vm_flags))) + return VM_FAULT_OOM; if (!(vmf->flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(vma->vm_mm) && transparent_hugepage_use_zero_page()) { diff --git a/mm/internal.h b/mm/internal.h index 77e205c423ce..4fe8d1a4d7bb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -15,6 +15,7 @@ #include #include #include +#include /* * The set of flags that only affect watermark checking and reclaim @@ -549,4 +550,31 @@ bool buffer_migrate_lock_buffers(struct buffer_head *head, int writeout(struct address_space *mapping, struct page *page); extern int exchange_two_pages(struct page *page1, struct page *page2); +struct anchor_page_info { + struct list_head list; + struct page *anchor_page; + unsigned long vaddr; + unsigned long start; + unsigned long end; +}; + +struct anchor_page_node { + struct interval_tree_node node; + unsigned long anchor_pfn; /* struct page can be calculated from pfn_to_page() */ + unsigned long anchor_vpn; +}; + +unsigned long release_freepages(struct list_head *freelist); + +void free_anchor_pages(struct vm_area_struct *vma); + +extern int exchange_two_pages(struct page *page1, struct page *page2); + +void expand(struct zone *zone, struct page *page, + int low, int high, struct free_area *area, + int migratetype); + +void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, + unsigned int alloc_flags); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4f017339ddb2..aedaa9f75806 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -660,6 +660,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, } else { src_page = pte_page(pteval); copy_user_highpage(page, src_page, address, vma); + count_vm_event(THP_COLLAPSE_MIGRATE_PAGES); VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page); release_pte_page(src_page); /* diff --git a/mm/madvise.c b/mm/madvise.c index 21a7881a2db4..9cef96d633e8 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -24,6 +24,7 @@ #include #include #include +#include #include @@ -616,6 +617,13 @@ static long madvise_remove(struct vm_area_struct *vma, return error; } +static long madvise_memdefrag(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end, int behavior) +{ + *prev = vma; + return memdefrag_madvise(vma, &vma->vm_flags, behavior); +} #ifdef CONFIG_MEMORY_FAILURE /* * Error injection support for memory error handling. @@ -697,6 +705,9 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); + case MADV_MEMDEFRAG: + case MADV_NOMEMDEFRAG: + return madvise_memdefrag(vma, prev, start, end, behavior); default: return madvise_behavior(vma, prev, start, end, behavior); } @@ -731,6 +742,8 @@ madvise_behavior_valid(int behavior) case MADV_SOFT_OFFLINE: case MADV_HWPOISON: #endif + case MADV_MEMDEFRAG: + case MADV_NOMEMDEFRAG: return true; default: @@ -785,6 +798,8 @@ madvise_behavior_valid(int behavior) * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. + * MADV_MEMDEFRAG - allow mem defrag running on this region. + * MADV_NOMEMDEFRAG - no mem defrag here. * * return values: * zero - success diff --git a/mm/mem_defrag.c b/mm/mem_defrag.c new file mode 100644 index 000000000000..414909e1c19c --- /dev/null +++ b/mm/mem_defrag.c @@ -0,0 +1,1782 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Memory defragmentation. + * + * Copyright (C) 2019 Zi Yan + * + * Two lists: + * 1) a mm list, representing virtual address spaces + * 2) a anon_vma list, representing the physical address space. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include "internal.h" + + +struct contig_stats { + int err; + unsigned long contig_pages; + unsigned long first_vaddr_in_chunk; + unsigned long first_paddr_in_chunk; +}; + +struct defrag_result_stats { + unsigned long aligned; + unsigned long migrated; + unsigned long src_pte_thp_failed; + unsigned long src_thp_dst_not_failed; + unsigned long src_not_present; + unsigned long dst_out_of_bound_failed; + unsigned long dst_pte_thp_failed; + unsigned long dst_thp_src_not_failed; + unsigned long dst_isolate_free_failed; + unsigned long dst_migrate_free_failed; + unsigned long dst_anon_failed; + unsigned long dst_file_failed; + unsigned long dst_non_lru_failed; + unsigned long dst_non_moveable_failed; + unsigned long not_defrag_vpn; +}; + +enum { + VMA_THRESHOLD_TYPE_TIME = 0, + VMA_THRESHOLD_TYPE_SIZE, +}; + +int num_breakout_chunks; +int vma_scan_percentile = 100; +int vma_scan_threshold_type = VMA_THRESHOLD_TYPE_TIME; +int vma_no_repeat_defrag; +int kmem_defragd_always; +int defrag_size_threshold = 5; +static DEFINE_SPINLOCK(kmem_defragd_mm_lock); + +#define MM_SLOTS_HASH_BITS 10 +static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS); + +static struct kmem_cache *mm_slot_cache __read_mostly; + +struct defrag_scan_control { + struct mm_struct *mm; + unsigned long scan_address; + char __user *out_buf; + int buf_len; + int used_len; + enum mem_defrag_action action; + bool scan_in_vma; + unsigned long vma_scan_threshold; +}; + +/** + * struct mm_slot - hash lookup from mm to mm_slot + * @hash: hash collision list + * @mm_node: kmem_defragd scan list headed in kmem_defragd_scan.mm_head + * @mm: the mm that this information is valid for + */ +struct mm_slot { + struct hlist_node hash; + struct list_head mm_node; + struct mm_struct *mm; +}; + +/** + * struct kmem_defragd_scan - cursor for scanning + * @mm_head: the head of the mm list to scan + * @mm_slot: the current mm_slot we are scanning + * @address: the next address inside that to be scanned + * + * There is only the one kmem_defragd_scan instance of this cursor structure. + */ +struct kmem_defragd_scan { + struct list_head mm_head; + struct mm_slot *mm_slot; + unsigned long address; +}; + +static struct kmem_defragd_scan kmem_defragd_scan = { + .mm_head = LIST_HEAD_INIT(kmem_defragd_scan.mm_head), +}; + + +static inline struct mm_slot *alloc_mm_slot(void) +{ + if (!mm_slot_cache) /* initialization failed */ + return NULL; + return kmem_cache_zalloc(mm_slot_cache, GFP_KERNEL); +} + +static inline void free_mm_slot(struct mm_slot *mm_slot) +{ + kmem_cache_free(mm_slot_cache, mm_slot); +} + +static struct mm_slot *get_mm_slot(struct mm_struct *mm) +{ + struct mm_slot *mm_slot; + + hash_for_each_possible(mm_slots_hash, mm_slot, hash, (unsigned long)mm) + if (mm == mm_slot->mm) + return mm_slot; + + return NULL; +} + +static void insert_to_mm_slots_hash(struct mm_struct *mm, + struct mm_slot *mm_slot) +{ + mm_slot->mm = mm; + hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); +} + +static inline int kmem_defragd_test_exit(struct mm_struct *mm) +{ + return atomic_read(&mm->mm_users) == 0; +} + +int __kmem_defragd_enter(struct mm_struct *mm) +{ + struct mm_slot *mm_slot; + + mm_slot = alloc_mm_slot(); + if (!mm_slot) + return -ENOMEM; + + /* __kmem_defragd_exit() must not run from under us */ + VM_BUG_ON_MM(kmem_defragd_test_exit(mm), mm); + if (unlikely(test_and_set_bit(MMF_VM_MEM_DEFRAG, &mm->flags))) { + free_mm_slot(mm_slot); + return 0; + } + + spin_lock(&kmem_defragd_mm_lock); + insert_to_mm_slots_hash(mm, mm_slot); + /* + * Insert just behind the scanning cursor, to let the area settle + * down a little. + */ + list_add_tail(&mm_slot->mm_node, &kmem_defragd_scan.mm_head); + spin_unlock(&kmem_defragd_mm_lock); + + atomic_inc(&mm->mm_count); + + return 0; +} + +void __kmem_defragd_exit(struct mm_struct *mm) +{ + struct mm_slot *mm_slot; + int free = 0; + + spin_lock(&kmem_defragd_mm_lock); + mm_slot = get_mm_slot(mm); + if (mm_slot && kmem_defragd_scan.mm_slot != mm_slot) { + hash_del(&mm_slot->hash); + list_del(&mm_slot->mm_node); + free = 1; + } + spin_unlock(&kmem_defragd_mm_lock); + + if (free) { + clear_bit(MMF_VM_MEM_DEFRAG, &mm->flags); + free_mm_slot(mm_slot); + mmdrop(mm); + } else if (mm_slot) { + /* + * This is required to serialize against + * kmem_defragd_test_exit() (which is guaranteed to run + * under mmap sem read mode). Stop here (after we + * return all pagetables will be destroyed) until + * kmem_defragd has finished working on the pagetables + * under the mmap_sem. + */ + down_write(&mm->mmap_sem); + up_write(&mm->mmap_sem); + } +} + +static void collect_mm_slot(struct mm_slot *mm_slot) +{ + struct mm_struct *mm = mm_slot->mm; + + VM_BUG_ON(NR_CPUS != 1 && !spin_is_locked(&kmem_defragd_mm_lock)); + + if (kmem_defragd_test_exit(mm)) { + /* free mm_slot */ + hash_del(&mm_slot->hash); + list_del(&mm_slot->mm_node); + + /* + * Not strictly needed because the mm exited already. + * + * clear_bit(MMF_VM_HUGEPAGE, &mm->flags); + */ + + /* kmem_defragd_mm_lock actually not necessary for the below */ + free_mm_slot(mm_slot); + mmdrop(mm); + } +} + +static bool mem_defrag_vma_check(struct vm_area_struct *vma) +{ + if ((!test_bit(MMF_VM_MEM_DEFRAG_ALL, &vma->vm_mm->flags) && + !(vma->vm_flags & VM_MEMDEFRAG) && !kmem_defragd_always) || + (vma->vm_flags & VM_NOMEMDEFRAG)) + return false; + if (shmem_file(vma->vm_file)) { + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE)) + return false; + return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, + HPAGE_PMD_NR); + } + if (is_vm_hugetlb_page(vma)) + return true; + if (!vma->anon_vma || vma->vm_ops) + return false; + if (is_vma_temporary_stack(vma)) + return false; + return true; +} + +static int do_vma_stat(struct mm_struct *mm, struct vm_area_struct *vma, + char *kernel_buf, int pos, int *remain_buf_len) +{ + int used_len; + int init_remain_len = *remain_buf_len; + + if (!*remain_buf_len || !kernel_buf) + return -1; + + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, "%p, 0x%lx, 0x%lx, " + "0x%lx, -1\n", + mm, (unsigned long)vma+vma->vma_create_jiffies, + vma->vm_start, vma->vm_end); + + *remain_buf_len -= used_len; + + if (*remain_buf_len == 1) { + *remain_buf_len = init_remain_len; + kernel_buf[pos] = '\0'; + return -1; + } + + return 0; +} + +static inline int get_contig_page_size(struct page *page) +{ + int page_size = PAGE_SIZE; + + if (PageCompound(page)) { + struct page *head_page = compound_head(page); + int compound_size = PAGE_SIZE<first_vaddr_in_chunk) { + contig_stats->first_vaddr_in_chunk = vaddr; + contig_stats->first_paddr_in_chunk = paddr; + contig_stats->contig_pages = 0; + } + + /* scan_in_vma is set to true if buffer runs out while scanning a + * vma. A corner case happens, when buffer runs out, then vma changes, + * scan_address is reset to vm_start. Then, vma info is printed twice. + */ + if (vaddr == vma->vm_start && !scan_in_vma) { + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, + "%p, 0x%lx, 0x%lx, 0x%lx, ", + mm, (unsigned long)vma+vma->vma_create_jiffies, + vma->vm_start, vma->vm_end); + + *remain_buf_len -= used_len; + + if (*remain_buf_len == 1) { + contig_stats->err = 1; + goto out_of_buf; + } + pos += used_len; + } + + if (page) { + if (contig_stats->first_paddr_in_chunk) { + if (((long long)vaddr - contig_stats->first_vaddr_in_chunk) == + ((long long)paddr - contig_stats->first_paddr_in_chunk)) + contig_stats->contig_pages += num_pages; + else { + /* output present contig chunk */ + contig_pages = contig_stats->contig_pages; + goto output_contig_info; + } + } else { /* the previous chunk is not present pages */ + /* output non-present contig chunk */ + contig_pages = -(long long)contig_stats->contig_pages; + goto output_contig_info; + } + } else { + /* the previous chunk is not present pages */ + if (!contig_stats->first_paddr_in_chunk) { + VM_BUG_ON(contig_stats->first_vaddr_in_chunk + + contig_stats->contig_pages * PAGE_SIZE != + vaddr); + ++contig_stats->contig_pages; + } else { + /* output present contig chunk */ + contig_pages = contig_stats->contig_pages; + + goto output_contig_info; + } + } + +check_last_entry: + /* if vaddr is the last page, we need to dump stats as well */ + if ((vaddr + num_pages * PAGE_SIZE) >= vma->vm_end) { + if (contig_stats->first_paddr_in_chunk) + contig_pages = contig_stats->contig_pages; + else + contig_pages = -(long long)contig_stats->contig_pages; + last_entry = true; + } else + return 0; +output_contig_info: + if (last_entry) + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, + "%lld, -1\n", contig_pages); + else + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, + "%lld, ", contig_pages); + + *remain_buf_len -= used_len; + if (*remain_buf_len == 1) { + contig_stats->err = 1; + goto out_of_buf; + } else { + pos += used_len; + if (last_entry) { + /* clear contig_stats */ + contig_stats->first_vaddr_in_chunk = 0; + contig_stats->first_paddr_in_chunk = 0; + contig_stats->contig_pages = 0; + return 0; + } else { + /* set new contig_stats */ + contig_stats->first_vaddr_in_chunk = vaddr; + contig_stats->first_paddr_in_chunk = paddr; + contig_stats->contig_pages = num_pages; + goto check_last_entry; + } + } + return 0; + } + + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, + "%p, %p, 0x%lx, 0x%lx, 0x%lx, 0x%llx", + mm, vma, vma->vm_start, vma->vm_end, + vaddr, page ? PFN_PHYS(page_to_pfn(page)) : 0); + + *remain_buf_len -= used_len; + if (*remain_buf_len == 1) + goto out_of_buf; + pos += used_len; + + if (page && PageAnon(page)) { + /* check page order */ + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, ", %d", + compound_order(page)); + *remain_buf_len -= used_len; + if (*remain_buf_len == 1) + goto out_of_buf; + pos += used_len; + + anon_vma = page_anon_vma(page); + if (!anon_vma) + goto end_of_stat; + anon_vma_lock_read(anon_vma); + + do { + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, ", %p", + anon_vma); + *remain_buf_len -= used_len; + if (*remain_buf_len == 1) { + anon_vma_unlock_read(anon_vma); + goto out_of_buf; + } + pos += used_len; + + anon_vma = anon_vma->parent; + } while (anon_vma != anon_vma->parent); + + anon_vma_unlock_read(anon_vma); + } +end_of_stat: + /* end of one page stat */ + used_len = scnprintf(kernel_buf + pos, *remain_buf_len, ", %d\n", end_note); + *remain_buf_len -= used_len; + if (*remain_buf_len == 1) + goto out_of_buf; + + return 0; +out_of_buf: /* revert incomplete data */ + *remain_buf_len = init_remain_len; + kernel_buf[pos] = '\0'; + return -1; + +} + +static int isolate_free_page_no_wmark(struct page *page, unsigned int order) +{ + struct zone *zone; + int mt; + + VM_BUG_ON(!PageBuddy(page)); + + zone = page_zone(page); + mt = get_pageblock_migratetype(page); + + + /* Remove page from free list */ + list_del(&page->lru); + zone->free_area[order].nr_free--; + __ClearPageBuddy(page); + set_page_private(page, 0); + + /* + * Set the pageblock if the isolated page is at least half of a + * pageblock + */ + if (order >= pageblock_order - 1) { + struct page *endpage = page + (1 << order) - 1; + + for (; page < endpage; page += pageblock_nr_pages) { + int mt = get_pageblock_migratetype(page); + + if (!is_migrate_isolate(mt) && !is_migrate_cma(mt) + && mt != MIGRATE_HIGHATOMIC) + set_pageblock_migratetype(page, + MIGRATE_MOVABLE); + } + } + + return 1UL << order; +} + +struct exchange_alloc_info { + struct list_head list; + struct page *src_page; + struct page *dst_page; +}; + +struct exchange_alloc_head { + struct list_head exchange_list; + struct list_head freelist; + struct list_head migratepage_list; + unsigned long num_freepages; +}; + +static int create_exchange_alloc_info(struct vm_area_struct *vma, + unsigned long scan_address, struct page *first_in_use_page, + int total_free_pages, + struct list_head *freelist, + struct list_head *exchange_list, + struct list_head *migratepage_list) +{ + struct page *in_use_page; + struct page *freepage; + struct exchange_alloc_info *one_pair; + int err; + int pagevec_flushed = 0; + + down_read(&vma->vm_mm->mmap_sem); + in_use_page = follow_page(vma, scan_address, + FOLL_GET|FOLL_MIGRATION | FOLL_REMOTE); + up_read(&vma->vm_mm->mmap_sem); + + freepage = list_first_entry_or_null(freelist, struct page, lru); + + if (first_in_use_page != in_use_page || + !freepage || + PageCompound(in_use_page) != PageCompound(freepage) || + compound_order(in_use_page) != compound_order(freepage)) { + if (in_use_page) + put_page(in_use_page); + return -EBUSY; + } + one_pair = kmalloc(sizeof(struct exchange_alloc_info), + GFP_KERNEL | __GFP_ZERO); + + if (!one_pair) { + put_page(in_use_page); + return -ENOMEM; + } + +retry_isolate: + /* isolate in_use_page */ + err = isolate_lru_page(in_use_page); + if (err) { + if (!pagevec_flushed) { + migrate_prep(); + pagevec_flushed = 1; + goto retry_isolate; + } + put_page(in_use_page); + in_use_page = NULL; + } + + if (in_use_page) { + put_page(in_use_page); + mod_node_page_state(page_pgdat(in_use_page), + NR_ISOLATED_ANON + page_is_file_cache(in_use_page), + hpage_nr_pages(in_use_page)); + list_add_tail(&in_use_page->lru, migratepage_list); + } + /* fill info */ + one_pair->src_page = in_use_page; + one_pair->dst_page = freepage; + INIT_LIST_HEAD(&one_pair->list); + + list_add_tail(&one_pair->list, exchange_list); + + return 0; +} + +static void free_alloc_info(struct list_head *alloc_info_list) +{ + struct exchange_alloc_info *item, *item2; + + list_for_each_entry_safe(item, item2, alloc_info_list, list) { + list_del(&item->list); + kfree(item); + } +} + +/* + * migrate callback: give a specific free page when it is called to guarantee + * contiguity after migration. + */ +static struct page *exchange_alloc(struct page *migratepage, + unsigned long data) +{ + struct exchange_alloc_head *head = (struct exchange_alloc_head *)data; + struct page *freepage = NULL; + struct exchange_alloc_info *info; + + list_for_each_entry(info, &head->exchange_list, list) { + if (migratepage == info->src_page) { + freepage = info->dst_page; + /* remove it from freelist */ + list_del(&freepage->lru); + if (PageTransHuge(freepage)) + head->num_freepages -= HPAGE_PMD_NR; + else + head->num_freepages--; + break; + } + } + + return freepage; +} + +static void exchange_free(struct page *freepage, unsigned long data) +{ + struct exchange_alloc_head *head = (struct exchange_alloc_head *)data; + + list_add_tail(&freepage->lru, &head->freelist); + if (PageTransHuge(freepage)) + head->num_freepages += HPAGE_PMD_NR; + else + head->num_freepages++; +} + +int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start_addr, unsigned long end_addr, + struct page *anchor_page, unsigned long page_vaddr, + struct defrag_result_stats *defrag_stats) +{ + /*unsigned long va_pa_page_offset = (unsigned long)-1;*/ + unsigned long scan_address; + unsigned long page_size = PAGE_SIZE; + int failed = 0; + int not_present = 0; + bool src_thp = false; + + for (scan_address = start_addr; scan_address < end_addr; + scan_address += page_size) { + struct page *scan_page; + unsigned long scan_phys_addr; + long long page_dist; + + cond_resched(); + + down_read(&vma->vm_mm->mmap_sem); + scan_page = follow_page(vma, scan_address, FOLL_MIGRATION | FOLL_REMOTE); + up_read(&vma->vm_mm->mmap_sem); + scan_phys_addr = PFN_PHYS(scan_page ? page_to_pfn(scan_page) : 0); + + page_size = PAGE_SIZE; + + if (!scan_phys_addr) { + not_present++; + failed += 1; + defrag_stats->src_not_present += 1; + continue; + } + + page_size = get_contig_page_size(scan_page); + + /* PTE-mapped THP not allowed */ + if ((scan_page == compound_head(scan_page)) && + PageTransHuge(scan_page) && !PageHuge(scan_page)) + src_thp = true; + + /* Allow THPs */ + if (PageCompound(scan_page) && !src_thp) { + count_vm_events(MEM_DEFRAG_SRC_COMP_PAGES_FAILED, page_size/PAGE_SIZE); + failed += (page_size/PAGE_SIZE); + defrag_stats->src_pte_thp_failed += (page_size/PAGE_SIZE); + + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + continue; + } + + VM_BUG_ON(!anchor_page); + + page_dist = ((long long)scan_address - page_vaddr) / PAGE_SIZE; + + /* already in the contiguous pos */ + if (page_dist == (long long)(scan_page - anchor_page)) { + defrag_stats->aligned += (page_size/PAGE_SIZE); + continue; + } else { /* migrate pages according to the anchor pages in the vma. */ + struct page *dest_page = anchor_page + page_dist; + int page_drained = 0; + bool dst_thp = false; + int scan_page_order = src_thp?compound_order(scan_page):0; + + if (!zone_spans_pfn(page_zone(anchor_page), + (page_to_pfn(anchor_page) + page_dist))) { + failed += 1; + defrag_stats->dst_out_of_bound_failed += 1; + + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + continue; + } + +retry_defrag: + /* migrate */ + if (PageBuddy(dest_page)) { + struct zone *zone = page_zone(dest_page); + spinlock_t *zone_lock = &zone->lock; + unsigned long zone_lock_flags; + unsigned long free_page_order = 0; + int err = 0; + struct exchange_alloc_head exchange_alloc_head = {0}; + int migratetype = get_pageblock_migratetype(dest_page); + + INIT_LIST_HEAD(&exchange_alloc_head.exchange_list); + INIT_LIST_HEAD(&exchange_alloc_head.freelist); + INIT_LIST_HEAD(&exchange_alloc_head.migratepage_list); + + count_vm_events(MEM_DEFRAG_DST_FREE_PAGES, 1<lock */ + spin_lock_irqsave(zone_lock, zone_lock_flags); + + if (!PageBuddy(dest_page)) { + err = -EINVAL; + goto freepage_isolate_fail; + } + + free_page_order = page_order(dest_page); + + /* fail early if not enough free pages */ + if (free_page_order < scan_page_order) { + err = -ENOMEM; + goto freepage_isolate_fail; + } + + /* __isolate_free_page() */ + err = isolate_free_page_no_wmark(dest_page, free_page_order); + if (!err) + goto freepage_isolate_fail; + + expand(zone, dest_page, scan_page_order, free_page_order, + &(zone->free_area[free_page_order]), + migratetype); + + if (!is_migrate_isolate(migratetype)) + __mod_zone_freepage_state(zone, -(1UL << scan_page_order), + migratetype); + + prep_new_page(dest_page, scan_page_order, + __GFP_MOVABLE|(scan_page_order?__GFP_COMP:0), 0); + + if (scan_page_order) { + VM_BUG_ON(!src_thp); + VM_BUG_ON(scan_page_order != HPAGE_PMD_ORDER); + prep_transhuge_page(dest_page); + } + + list_add(&dest_page->lru, &exchange_alloc_head.freelist); + +freepage_isolate_fail: + spin_unlock_irqrestore(zone_lock, zone_lock_flags); + + if (err < 0) { + failed += (page_size/PAGE_SIZE); + defrag_stats->dst_isolate_free_failed += (page_size/PAGE_SIZE); + + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + continue; + } + + /* gather in-use pages + * create a exchange_alloc_info structure, a list of + * tuples, each like: + * (in_use_page, free_page) + * + * so in exchange_alloc, the code needs to traverse the list + * and find the tuple from in_use_page. Then return the + * corresponding free page. + * + * This can guarantee the contiguity after migration. + */ + + err = create_exchange_alloc_info(vma, scan_address, scan_page, + 1<dst_migrate_free_failed += + exchange_alloc_head.num_freepages; + } + defrag_stats->migrated += ((1UL<dst_pte_thp_failed += page_size/PAGE_SIZE; + + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + + if (src_thp != dst_thp) { + failed += get_contig_page_size(scan_page); + if (src_thp && !dst_thp) + defrag_stats->src_thp_dst_not_failed += + page_size/PAGE_SIZE; + else /* !src_thp && dst_thp */ + defrag_stats->dst_thp_src_not_failed += + page_size/PAGE_SIZE; + + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + /*continue;*/ + } + + /* free page on pcplist */ + if (page_count(dest_page) == 0) { + /* not managed pages */ + if (!dest_page->flags) { + failed += 1; + defrag_stats->dst_out_of_bound_failed += 1; + + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + /* spill order-0 pages to buddy allocator from pcplist */ + if (!page_drained) { + drain_all_pages(NULL); + page_drained = 1; + goto retry_defrag; + } + } + + if (PageAnon(dest_page)) { + count_vm_events(MEM_DEFRAG_DST_ANON_PAGES, + 1<dst_anon_failed += 1<dst_file_failed += 1<dst_non_lru_failed += 1<dst_non_moveable_failed += 1<migrated += 1<not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + + } + } + } +quit_defrag: + return failed; +} + +struct anchor_page_node *get_anchor_page_node_from_vma(struct vm_area_struct *vma, + unsigned long address) +{ + struct interval_tree_node *prev_vma_node; + + prev_vma_node = interval_tree_iter_first(&vma->anchor_page_rb, + address, address); + + if (!prev_vma_node) + return NULL; + + return container_of(prev_vma_node, struct anchor_page_node, node); +} + +unsigned long get_undefragged_area(struct zone *zone, struct vm_area_struct *vma, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct vm_area_struct *scan_vma = NULL; + unsigned long vma_size = end_addr - start_addr; + bool first_vma = true; + + for (scan_vma = mm->mmap; scan_vma; scan_vma = scan_vma->vm_next) + if (!RB_EMPTY_ROOT(&scan_vma->anchor_page_rb.rb_root)) + break; + /* no defragged area */ + if (!scan_vma) + return zone->zone_start_pfn; + + scan_vma = mm->mmap; + while (scan_vma) { + if (!RB_EMPTY_ROOT(&scan_vma->anchor_page_rb.rb_root)) { + struct interval_tree_node *node = NULL, *next_node = NULL; + struct anchor_page_node *anchor_node, *next_anchor_node = NULL; + struct vm_area_struct *next_vma = scan_vma->vm_next; + unsigned long end_pfn; + /* each vma get one anchor range */ + node = interval_tree_iter_first(&scan_vma->anchor_page_rb, + scan_vma->vm_start, scan_vma->vm_end - 1); + if (!node) { + scan_vma = scan_vma->vm_next; + continue; + } + + anchor_node = container_of(node, struct anchor_page_node, node); + end_pfn = (anchor_node->anchor_pfn + + ((scan_vma->vm_end - scan_vma->vm_start)>>PAGE_SHIFT)); + + /* check space before first vma */ + if (first_vma) { + first_vma = false; + if (zone->zone_start_pfn + vma_size < anchor_node->anchor_pfn) + return zone->zone_start_pfn; + /* remove existing anchor if new vma is much larger */ + if (vma_size > (scan_vma->vm_end - scan_vma->vm_start)*2) { + first_vma = true; + interval_tree_remove(node, &scan_vma->anchor_page_rb); + kfree(anchor_node); + scan_vma = scan_vma->vm_next; + continue; + } + } + + /* find next vma with anchor range */ + for (next_vma = scan_vma->vm_next; + next_vma && RB_EMPTY_ROOT(&next_vma->anchor_page_rb.rb_root); + next_vma = next_vma->vm_next); + + if (!next_vma) + return end_pfn; + else { + next_node = interval_tree_iter_first(&next_vma->anchor_page_rb, + next_vma->vm_start, next_vma->vm_end - 1); + VM_BUG_ON(!next_node); + next_anchor_node = container_of(next_node, + struct anchor_page_node, node); + if (end_pfn + vma_size < next_anchor_node->anchor_pfn) + return end_pfn; + } + scan_vma = next_vma; + } else + scan_vma = scan_vma->vm_next; + } + + return zone->zone_start_pfn; +} + +/* + * anchor pages decide the va pa offset in each vma + * + */ +static int find_anchor_pages_in_vma(struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start_addr) +{ + struct anchor_page_node *anchor_node; + unsigned long end_addr = vma->vm_end - PAGE_SIZE; + struct interval_tree_node *existing_anchor = NULL; + unsigned long scan_address = start_addr; + struct page *present_page = NULL; + struct zone *present_zone = NULL; + unsigned long alignment_size = PAGE_SIZE; + + /* Out of range query */ + if (start_addr >= vma->vm_end || start_addr < vma->vm_start) + return -1; + + /* + * Clean up unrelated anchor infor + * + * VMA range can change and leave some anchor info out of range, + * so clean it here. + * It should be cleaned when vma is changed, but the code there + * is too complicated. + */ + if (!RB_EMPTY_ROOT(&vma->anchor_page_rb.rb_root) && + !interval_tree_iter_first(&vma->anchor_page_rb, + vma->vm_start, vma->vm_end - 1)) { + struct interval_tree_node *node = NULL; + + for (node = interval_tree_iter_first(&vma->anchor_page_rb, + 0, (unsigned long)-1); + node;) { + struct anchor_page_node *anchor_node = container_of(node, + struct anchor_page_node, node); + interval_tree_remove(node, &vma->anchor_page_rb); + node = interval_tree_iter_next(node, 0, (unsigned long)-1); + kfree(anchor_node); + } + } + + /* no range at all */ + if (RB_EMPTY_ROOT(&vma->anchor_page_rb.rb_root)) + goto insert_new_range; + + /* look for first range has start_addr or after it */ + existing_anchor = interval_tree_iter_first(&vma->anchor_page_rb, + start_addr, end_addr); + + /* first range has start_addr or after it */ + if (existing_anchor) { + /* redundant range, do nothing */ + if (existing_anchor->start == start_addr) + return 0; + else if (existing_anchor->start < start_addr && + existing_anchor->last >= start_addr){ + return 0; + } else { /* a range after start_addr */ + struct anchor_page_node *existing_node = container_of(existing_anchor, + struct anchor_page_node, node); + VM_BUG_ON(!(existing_anchor->start > start_addr)); + /* remove the existing range and insert a new one, since expanding + * forward can make the range go beyond the zone limit + */ + interval_tree_remove(existing_anchor, &vma->anchor_page_rb); + kfree(existing_node); + VM_BUG_ON(!RB_EMPTY_ROOT(&vma->anchor_page_rb.rb_root)); + goto insert_new_range; + } + } else { + struct interval_tree_node *prev_anchor = NULL, *cur_anchor; + /* there is a range before start_addr */ + + /* find the range just before start_addr */ + for (cur_anchor = interval_tree_iter_first(&vma->anchor_page_rb, + vma->vm_start, start_addr - PAGE_SIZE); + cur_anchor; + prev_anchor = cur_anchor, + cur_anchor = interval_tree_iter_next(cur_anchor, + vma->vm_start, start_addr - PAGE_SIZE)); + + interval_tree_remove(prev_anchor, &vma->anchor_page_rb); + prev_anchor->last = vma->vm_end; + interval_tree_insert(prev_anchor, &vma->anchor_page_rb); + + goto out; + } + +insert_new_range: /* start_addr to end_addr */ + down_read(&vma->vm_mm->mmap_sem); + /* find the first present page and use it as the anchor page */ + while (!present_page && scan_address < end_addr) { + present_page = follow_page(vma, scan_address, + FOLL_MIGRATION | FOLL_REMOTE); + scan_address += present_page?get_contig_page_size(present_page):PAGE_SIZE; + } + up_read(&vma->vm_mm->mmap_sem); + + if (!present_page) + goto out; + + anchor_node = + kmalloc(sizeof(struct anchor_page_node), GFP_KERNEL | __GFP_ZERO); + if (!anchor_node) + return -ENOMEM; + + present_zone = page_zone(present_page); + + anchor_node->node.start = start_addr; + anchor_node->node.last = end_addr; + + anchor_node->anchor_vpn = start_addr>>PAGE_SHIFT; + anchor_node->anchor_pfn = get_undefragged_area(present_zone, + vma, start_addr, end_addr); + + /* adjust VPN and PFN alignment according to VMA size */ + if (vma->vm_end - vma->vm_start >= HPAGE_PUD_SIZE) { + if ((anchor_node->anchor_vpn & ((HPAGE_PUD_SIZE>>PAGE_SHIFT) - 1)) < + (anchor_node->anchor_pfn & ((HPAGE_PUD_SIZE>>PAGE_SHIFT) - 1))) + anchor_node->anchor_pfn += (HPAGE_PUD_SIZE>>PAGE_SHIFT); + + anchor_node->anchor_pfn = (anchor_node->anchor_pfn & (PUD_MASK>>PAGE_SHIFT)) | + (anchor_node->anchor_vpn & ((HPAGE_PUD_SIZE>>PAGE_SHIFT) - 1)); + + alignment_size = HPAGE_PUD_SIZE; + } else if (vma->vm_end - vma->vm_start >= HPAGE_PMD_SIZE) { + if ((anchor_node->anchor_vpn & ((HPAGE_PMD_SIZE>>PAGE_SHIFT) - 1)) < + (anchor_node->anchor_pfn & ((HPAGE_PMD_SIZE>>PAGE_SHIFT) - 1))) + anchor_node->anchor_pfn += (HPAGE_PMD_SIZE>>PAGE_SHIFT); + + anchor_node->anchor_pfn = (anchor_node->anchor_pfn & (PMD_MASK>>PAGE_SHIFT)) | + (anchor_node->anchor_vpn & ((HPAGE_PMD_SIZE>>PAGE_SHIFT) - 1)); + + alignment_size = HPAGE_PMD_SIZE; + } else + alignment_size = PAGE_SIZE; + + /* move the range into the zone limit */ + if (!(zone_spans_pfn(present_zone, anchor_node->anchor_pfn))) { + while (anchor_node->anchor_pfn >= zone_end_pfn(present_zone)) + anchor_node->anchor_pfn -= alignment_size / PAGE_SHIFT; + while (anchor_node->anchor_pfn < present_zone->zone_start_pfn) + anchor_node->anchor_pfn += alignment_size / PAGE_SHIFT; + } + + interval_tree_insert(&anchor_node->node, &vma->anchor_page_rb); + +out: + return 0; +} + +static inline bool is_stats_collection(enum mem_defrag_action action) +{ + switch (action) { + case MEM_DEFRAG_FULL_STATS: + case MEM_DEFRAG_CONTIG_STATS: + return true; + default: + return false; + } + return false; +} + +/* comparator for sorting vma's lifetime */ +static int unsigned_long_cmp(const void *a, const void *b) +{ + const unsigned long *l = a, *r = b; + + if (*l < *r) + return -1; + if (*l > *r) + return 1; + return 0; +} + +/* + * scan all to-be-defragged VMA lifetime and calculate VMA defragmentation + * threshold. + */ +static void scan_all_vma_lifetime(struct defrag_scan_control *sc) +{ + struct mm_struct *mm = sc->mm; + struct vm_area_struct *vma = NULL; + unsigned long current_jiffies = jiffies; /* fix one jiffies */ + unsigned int num_vma = 0, index = 0; + unsigned long *vma_scan_list = NULL; + + down_read(&mm->mmap_sem); + for (vma = find_vma(mm, 0); vma; vma = vma->vm_next) + /* only care about to-be-defragged vmas */ + if (mem_defrag_vma_check(vma)) + ++num_vma; + + vma_scan_list = kmalloc(num_vma*sizeof(unsigned long), + GFP_KERNEL | __GFP_ZERO); + + if (ZERO_OR_NULL_PTR(vma_scan_list)) { + sc->vma_scan_threshold = 1; + goto out; + } + + for (vma = find_vma(mm, 0); vma; vma = vma->vm_next) + /* only care about to-be-defragged vmas */ + if (mem_defrag_vma_check(vma)) { + if (vma_scan_threshold_type == VMA_THRESHOLD_TYPE_TIME) + vma_scan_list[index] = current_jiffies - vma->vma_create_jiffies; + else if (vma_scan_threshold_type == VMA_THRESHOLD_TYPE_SIZE) + vma_scan_list[index] = vma->vm_end - vma->vm_start; + ++index; + if (index >= num_vma) + break; + } + + sort(vma_scan_list, num_vma, sizeof(unsigned long), + unsigned_long_cmp, NULL); + + index = (100 - vma_scan_percentile) * num_vma / 100; + + sc->vma_scan_threshold = vma_scan_list[index]; + + kfree(vma_scan_list); +out: + up_read(&mm->mmap_sem); +} + +/* + * Scan single mm_struct. + * The function will down_read mmap_sem. + * + */ +static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) +{ + struct mm_struct *mm = sc->mm; + struct vm_area_struct *vma = NULL; + unsigned long *scan_address = &sc->scan_address; + char *stats_buf = NULL; + int remain_buf_len = sc->buf_len; + int err = 0; + struct contig_stats contig_stats; + + + if (sc->out_buf && + sc->buf_len) { + stats_buf = vzalloc(sc->buf_len); + if (!stats_buf) + goto breakouterloop; + } + + /*down_read(&mm->mmap_sem);*/ + if (unlikely(kmem_defragd_test_exit(mm))) + vma = NULL; + else { + /* get vma_scan_threshold */ + if (!sc->vma_scan_threshold) + scan_all_vma_lifetime(sc); + + vma = find_vma(mm, *scan_address); + } + + for (; vma; vma = vma->vm_next) { + unsigned long vstart, vend; + struct anchor_page_node *anchor_node = NULL; + int scanned_chunks = 0; + + + if (unlikely(kmem_defragd_test_exit(mm))) + break; + if (!mem_defrag_vma_check(vma)) { + /* collect contiguity stats for this VMA */ + if (is_stats_collection(sc->action)) + if (do_vma_stat(mm, vma, stats_buf, sc->buf_len - remain_buf_len, + &remain_buf_len)) + goto breakouterloop; + *scan_address = vma->vm_end; + goto done_one_vma; + } + + + vstart = vma->vm_start; + vend = vma->vm_end; + if (vstart >= vend) + goto done_one_vma; + if (*scan_address > vend) + goto done_one_vma; + if (*scan_address < vstart) + *scan_address = vstart; + + if (sc->action == MEM_DEFRAG_DO_DEFRAG) { + /* Check VMA size, skip if below the size threshold */ + if (vma->vm_end - vma->vm_start < + defrag_size_threshold * HPAGE_PMD_SIZE) + goto done_one_vma; + + /* + * Check VMA lifetime or size, skip if below the lifetime/size + * threshold derived from a percentile + */ + if (vma_scan_threshold_type == VMA_THRESHOLD_TYPE_TIME) { + if ((jiffies - vma->vma_create_jiffies) < sc->vma_scan_threshold) + goto done_one_vma; + } else if (vma_scan_threshold_type == VMA_THRESHOLD_TYPE_SIZE) { + if ((vma->vm_end - vma->vm_start) < sc->vma_scan_threshold) + goto done_one_vma; + } + + /* Avoid repeated defrag if the vma has not been changed */ + if (vma_no_repeat_defrag && + vma->vma_defrag_jiffies > vma->vma_modify_jiffies) + goto done_one_vma; + + /* vma contiguity stats collection */ + if (remain_buf_len && stats_buf) { + int used_len; + int pos = sc->buf_len - remain_buf_len; + + used_len = scnprintf(stats_buf + pos, remain_buf_len, + "vma: 0x%lx, 0x%lx, 0x%lx, -1\n", + (unsigned long)vma+vma->vma_create_jiffies, + vma->vm_start, vma->vm_end); + + remain_buf_len -= used_len; + + if (remain_buf_len == 1) { + stats_buf[pos] = '\0'; + remain_buf_len = 0; + } + } + + anchor_node = get_anchor_page_node_from_vma(vma, vma->vm_start); + + if (!anchor_node) { + find_anchor_pages_in_vma(mm, vma, vma->vm_start); + anchor_node = get_anchor_page_node_from_vma(vma, vma->vm_start); + + if (!anchor_node) + goto done_one_vma; + } + } + + contig_stats = (struct contig_stats) {0}; + + while (*scan_address < vend) { + struct page *page; + + cond_resched(); + + if (unlikely(kmem_defragd_test_exit(mm))) + goto breakouterloop; + + if (is_stats_collection(sc->action)) { + down_read(&vma->vm_mm->mmap_sem); + page = follow_page(vma, *scan_address, + FOLL_MIGRATION | FOLL_REMOTE); + + if (do_page_stat(mm, vma, page, *scan_address, + stats_buf, sc->buf_len - remain_buf_len, + &remain_buf_len, sc->action, &contig_stats, + sc->scan_in_vma)) { + /* reset scan_address to the beginning of the contig. + * So next scan will get the whole contig. + */ + if (contig_stats.err) { + *scan_address = contig_stats.first_vaddr_in_chunk; + sc->scan_in_vma = true; + } + goto breakouterloop; + } + /* move to next address */ + if (page) + *scan_address += get_contig_page_size(page); + else + *scan_address += PAGE_SIZE; + up_read(&vma->vm_mm->mmap_sem); + } else if (sc->action == MEM_DEFRAG_DO_DEFRAG) { + /* go to nearest 1GB aligned address */ + unsigned long defrag_end = min_t(unsigned long, + (*scan_address + HPAGE_PUD_SIZE) & HPAGE_PUD_MASK, + vend); + int defrag_result; + + anchor_node = get_anchor_page_node_from_vma(vma, *scan_address); + + /* in case VMA size changes */ + if (!anchor_node) { + find_anchor_pages_in_vma(mm, vma, *scan_address); + anchor_node = get_anchor_page_node_from_vma(vma, *scan_address); + } + + if (!anchor_node) + goto done_one_vma; + + /* + * looping through the 1GB region and defrag 2MB range in each + * iteration. + */ + while (*scan_address < defrag_end) { + unsigned long defrag_sub_chunk_end = min_t(unsigned long, + (*scan_address + HPAGE_PMD_SIZE) & HPAGE_PMD_MASK, + defrag_end); + struct defrag_result_stats defrag_stats = {0}; +continue_defrag: + if (!anchor_node) { + anchor_node = get_anchor_page_node_from_vma(vma, + *scan_address); + if (!anchor_node) { + find_anchor_pages_in_vma(mm, vma, *scan_address); + anchor_node = get_anchor_page_node_from_vma(vma, + *scan_address); + } + if (!anchor_node) + goto done_one_vma; + } + + defrag_result = defrag_address_range(mm, vma, *scan_address, + defrag_sub_chunk_end, + pfn_to_page(anchor_node->anchor_pfn), + anchor_node->anchor_vpn<buf_len - remain_buf_len; + + used_len = scnprintf(stats_buf + pos, remain_buf_len, + "[0x%lx, 0x%lx):%lu [alig:%lu, migrated:%lu, " + "src: not:%lu, src_thp_dst_not:%lu, src_pte_thp:%lu " + "dst: out_bound:%lu, dst_thp_src_not:%lu, " + "dst_pte_thp:%lu, isolate_free:%lu, " + "migrate_free:%lu, anon:%lu, file:%lu, " + "non-lru:%lu, non-moveable:%lu], " + "anchor: (%lx, %lx), range: [%lx, %lx], " + "vma: 0x%lx, not_defrag_vpn: %lx\n", + *scan_address, defrag_sub_chunk_end, + (defrag_sub_chunk_end - *scan_address)/PAGE_SIZE, + defrag_stats.aligned, + defrag_stats.migrated, + defrag_stats.src_not_present, + defrag_stats.src_thp_dst_not_failed, + defrag_stats.src_pte_thp_failed, + defrag_stats.dst_out_of_bound_failed, + defrag_stats.dst_thp_src_not_failed, + defrag_stats.dst_pte_thp_failed, + defrag_stats.dst_isolate_free_failed, + defrag_stats.dst_migrate_free_failed, + defrag_stats.dst_anon_failed, + defrag_stats.dst_file_failed, + defrag_stats.dst_non_lru_failed, + defrag_stats.dst_non_moveable_failed, + anchor_node->anchor_vpn, + anchor_node->anchor_pfn, + anchor_node->node.start, + anchor_node->node.last, + (unsigned long)vma+vma->vma_create_jiffies, + defrag_stats.not_defrag_vpn + ); + + remain_buf_len -= used_len; + + if (remain_buf_len == 1) { + stats_buf[pos] = '\0'; + remain_buf_len = 0; + } + } + + /* + * skip the page which cannot be defragged and restart + * from the next page + */ + if (defrag_stats.not_defrag_vpn && + defrag_stats.not_defrag_vpn < defrag_sub_chunk_end) { + VM_BUG_ON(defrag_sub_chunk_end != defrag_end && + defrag_stats.not_defrag_vpn > defrag_sub_chunk_end); + + *scan_address = defrag_stats.not_defrag_vpn; + defrag_stats.not_defrag_vpn = 0; + goto continue_defrag; + } + + /* Done with current 2MB chunk */ + *scan_address = defrag_sub_chunk_end; + scanned_chunks++; + /* + * if the knob is set, break out of the defrag loop after + * a preset number of 2MB chunks are defragged + */ + if (num_breakout_chunks && scanned_chunks >= num_breakout_chunks) { + scanned_chunks = 0; + goto breakouterloop; + } + } + + } + } +done_one_vma: + sc->scan_in_vma = false; + if (sc->action == MEM_DEFRAG_DO_DEFRAG) + vma->vma_defrag_jiffies = jiffies; + } +breakouterloop: + + /* copy stats to user space */ + if (sc->out_buf && + sc->buf_len) { + err = copy_to_user(sc->out_buf, stats_buf, + sc->buf_len - remain_buf_len); + sc->used_len = sc->buf_len - remain_buf_len; + } + + if (stats_buf) + vfree(stats_buf); + + /* 0: scan a vma complete, 1: scan a vma incomplete */ + return vma == NULL ? 0 : 1; +} + +SYSCALL_DEFINE4(scan_process_memory, pid_t, pid, char __user *, out_buf, + int, buf_len, int, action) +{ + const struct cred *cred = current_cred(), *tcred; + struct task_struct *task; + struct mm_struct *mm; + int err = 0; + static struct defrag_scan_control defrag_scan_control = {0}; + + /* Find the mm_struct */ + rcu_read_lock(); + task = pid ? find_task_by_vpid(pid) : current; + if (!task) { + rcu_read_unlock(); + return -ESRCH; + } + get_task_struct(task); + + /* + * Check if this process has the right to modify the specified + * process. The right exists if the process has administrative + * capabilities, superuser privileges or the same + * userid as the target process. + */ + tcred = __task_cred(task); + if (!uid_eq(cred->euid, tcred->suid) && !uid_eq(cred->euid, tcred->uid) && + !uid_eq(cred->uid, tcred->suid) && !uid_eq(cred->uid, tcred->uid) && + !capable(CAP_SYS_NICE)) { + rcu_read_unlock(); + err = -EPERM; + goto out; + } + rcu_read_unlock(); + + err = security_task_movememory(task); + if (err) + goto out; + + mm = get_task_mm(task); + put_task_struct(task); + + if (!mm) + return -EINVAL; + + switch (action) { + case MEM_DEFRAG_SCAN: + case MEM_DEFRAG_CONTIG_SCAN: + count_vm_event(MEM_DEFRAG_SCAN_NUM); + /* + * We allow scanning one process's address space for multiple + * iterations. When we change the scanned process, reset + * defrag_scan_control's mm_struct + */ + if (!defrag_scan_control.mm || + defrag_scan_control.mm != mm) { + defrag_scan_control = (struct defrag_scan_control){0}; + defrag_scan_control.mm = mm; + } + defrag_scan_control.out_buf = out_buf; + defrag_scan_control.buf_len = buf_len; + if (action == MEM_DEFRAG_SCAN) + defrag_scan_control.action = MEM_DEFRAG_FULL_STATS; + else if (action == MEM_DEFRAG_CONTIG_SCAN) + defrag_scan_control.action = MEM_DEFRAG_CONTIG_STATS; + else { + err = -EINVAL; + break; + } + + defrag_scan_control.used_len = 0; + + if (unlikely(!access_ok(out_buf, buf_len))) { + err = -EFAULT; + break; + } + + /* clear mm once it is fully scanned */ + if (!kmem_defragd_scan_mm(&defrag_scan_control) && + !defrag_scan_control.used_len) + defrag_scan_control.mm = NULL; + + err = defrag_scan_control.used_len; + break; + case MEM_DEFRAG_MARK_SCAN_ALL: + set_bit(MMF_VM_MEM_DEFRAG_ALL, &mm->flags); + __kmem_defragd_enter(mm); + break; + case MEM_DEFRAG_CLEAR_SCAN_ALL: + clear_bit(MMF_VM_MEM_DEFRAG_ALL, &mm->flags); + break; + case MEM_DEFRAG_DEFRAG: + count_vm_event(MEM_DEFRAG_DEFRAG_NUM); + + if (!defrag_scan_control.mm || + defrag_scan_control.mm != mm) { + defrag_scan_control = (struct defrag_scan_control){0}; + defrag_scan_control.mm = mm; + } + defrag_scan_control.action = MEM_DEFRAG_DO_DEFRAG; + + defrag_scan_control.out_buf = out_buf; + defrag_scan_control.buf_len = buf_len; + + /* clear mm once it is fully defragged */ + if (buf_len) { + if (!kmem_defragd_scan_mm(&defrag_scan_control) && + !defrag_scan_control.used_len) { + defrag_scan_control.mm = NULL; + } + err = defrag_scan_control.used_len; + } else { + err = kmem_defragd_scan_mm(&defrag_scan_control); + if (err == 0) + defrag_scan_control.mm = NULL; + } + break; + default: + err = -EINVAL; + break; + } + + mmput(mm); + return err; + +out: + put_task_struct(task); + return err; +} + +static unsigned int kmem_defragd_scan_mm_slot(void) +{ + struct mm_slot *mm_slot; + int scan_status = 0; + struct defrag_scan_control defrag_scan_control = {0}; + + spin_lock(&kmem_defragd_mm_lock); + if (kmem_defragd_scan.mm_slot) + mm_slot = kmem_defragd_scan.mm_slot; + else { + mm_slot = list_entry(kmem_defragd_scan.mm_head.next, + struct mm_slot, mm_node); + kmem_defragd_scan.address = 0; + kmem_defragd_scan.mm_slot = mm_slot; + } + spin_unlock(&kmem_defragd_mm_lock); + + defrag_scan_control.mm = mm_slot->mm; + defrag_scan_control.scan_address = kmem_defragd_scan.address; + defrag_scan_control.action = MEM_DEFRAG_DO_DEFRAG; + + scan_status = kmem_defragd_scan_mm(&defrag_scan_control); + + kmem_defragd_scan.address = defrag_scan_control.scan_address; + + spin_lock(&kmem_defragd_mm_lock); + VM_BUG_ON(kmem_defragd_scan.mm_slot != mm_slot); + /* + * Release the current mm_slot if this mm is about to die, or + * if we scanned all vmas of this mm. + */ + if (kmem_defragd_test_exit(mm_slot->mm) || !scan_status) { + /* + * Make sure that if mm_users is reaching zero while + * kmem_defragd runs here, kmem_defragd_exit will find + * mm_slot not pointing to the exiting mm. + */ + if (mm_slot->mm_node.next != &kmem_defragd_scan.mm_head) { + kmem_defragd_scan.mm_slot = list_first_entry( + &mm_slot->mm_node, + struct mm_slot, mm_node); + kmem_defragd_scan.address = 0; + } else + kmem_defragd_scan.mm_slot = NULL; + + if (kmem_defragd_test_exit(mm_slot->mm)) + collect_mm_slot(mm_slot); + else if (!scan_status) { + list_del(&mm_slot->mm_node); + list_add_tail(&mm_slot->mm_node, &kmem_defragd_scan.mm_head); + } + } + spin_unlock(&kmem_defragd_mm_lock); + + return 0; +} + +int memdefrag_madvise(struct vm_area_struct *vma, + unsigned long *vm_flags, int advice) +{ + switch (advice) { + case MADV_MEMDEFRAG: + *vm_flags &= ~VM_NOMEMDEFRAG; + *vm_flags |= VM_MEMDEFRAG; + /* + * If the vma become good for kmem_defragd to scan, + * register it here without waiting a page fault that + * may not happen any time soon. + */ + if (kmem_defragd_enter(vma, *vm_flags)) + return -ENOMEM; + break; + case MADV_NOMEMDEFRAG: + *vm_flags &= ~VM_MEMDEFRAG; + *vm_flags |= VM_NOMEMDEFRAG; + /* + * Setting VM_NOMEMDEFRAG will prevent kmem_defragd from scanning + * this vma even if we leave the mm registered in kmem_defragd if + * it got registered before VM_NOMEMDEFRAG was set. + */ + break; + } + + return 0; +} + + +void __init kmem_defragd_destroy(void) +{ + kmem_cache_destroy(mm_slot_cache); +} + +int __init kmem_defragd_init(void) +{ + mm_slot_cache = kmem_cache_create("kmem_defragd_mm_slot", + sizeof(struct mm_slot), + __alignof__(struct mm_slot), 0, NULL); + if (!mm_slot_cache) + return -ENOMEM; + + return 0; +} + +subsys_initcall(kmem_defragd_init); \ No newline at end of file diff --git a/mm/memory.c b/mm/memory.c index e11ca9dd823f..019036e87088 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -69,6 +69,7 @@ #include #include #include +#include #include #include @@ -2926,6 +2927,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Allocate our own private page. */ if (unlikely(anon_vma_prepare(vma))) goto oom; + /* Make it defrag */ + if (unlikely(kmem_defragd_enter(vma, vma->vm_flags))) + goto oom; page = alloc_zeroed_user_highpage_movable(vma, vmf->address); if (!page) goto oom; @@ -3844,6 +3848,9 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, p4d_t *p4d; vm_fault_t ret; + /* Zi: page faults modify vma */ + vma->vma_modify_jiffies = jiffies; + pgd = pgd_offset(mm, address); p4d = p4d_alloc(mm, pgd, address); if (!p4d) diff --git a/mm/mmap.c b/mm/mmap.c index f901065c4c64..653dd99d5145 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -169,6 +169,28 @@ void unlink_file_vma(struct vm_area_struct *vma) } } +void free_anchor_pages(struct vm_area_struct *vma) +{ + struct interval_tree_node *node; + + if (!vma) + return; + + if (RB_EMPTY_ROOT(&vma->anchor_page_rb.rb_root)) + return; + + for (node = interval_tree_iter_first(&vma->anchor_page_rb, + 0, (unsigned long)-1); + node;) { + struct anchor_page_node *anchor_node = container_of(node, + struct anchor_page_node, node); + interval_tree_remove(node, &vma->anchor_page_rb); + node = interval_tree_iter_next(node, 0, (unsigned long)-1); + kfree(anchor_node); + } + +} + /* * Close a vm structure and free it, returning the next. */ @@ -181,6 +203,7 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma) vma->vm_ops->close(vma); if (vma->vm_file) fput(vma->vm_file); + free_anchor_pages(vma); mpol_put(vma_policy(vma)); vm_area_free(vma); return next; @@ -725,10 +748,15 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, long adjust_next = 0; int remove_next = 0; + /*free_anchor_pages(vma);*/ + + vma->vma_modify_jiffies = jiffies; + if (next && !insert) { struct vm_area_struct *exporter = NULL, *importer = NULL; if (end >= next->vm_end) { + /*free_anchor_pages(next);*/ /* * vma expands, overlapping all the next, and * perhaps the one after too (mprotect case 6). @@ -775,6 +803,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start, exporter = next->vm_next; } else if (end > next->vm_start) { + /*free_anchor_pages(next);*/ /* * vma expands, overlapping part of the next: * mprotect case 5 shifting the boundary up. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 35fdde041f5c..a35605e0924a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1828,7 +1828,7 @@ void __init init_cma_reserved_pageblock(struct page *page) * * -- nyc */ -static inline void expand(struct zone *zone, struct page *page, +inline void expand(struct zone *zone, struct page *page, int low, int high, struct free_area *area, int migratetype) { @@ -1950,7 +1950,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, set_page_owner(page, order, gfp_flags); } -static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, +void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags) { int i; diff --git a/mm/vmstat.c b/mm/vmstat.c index 83b30edc2f7f..c18a42250a5c 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1293,6 +1293,27 @@ const char * const vmstat_text[] = { "swap_ra", "swap_ra_hit", #endif + "memdefrag_defrag", + "memdefrag_scan", + "memdefrag_dest_free_pages", + "memdefrag_dest_anon_pages", + "memdefrag_dest_file_pages", + "memdefrag_dest_non_lru_pages", + "memdefrag_dest_free_pages_failed", + "memdefrag_dest_free_pages_overflow_failed", + "memdefrag_dest_anon_pages_failed", + "memdefrag_dest_file_pages_failed", + "memdefrag_dest_nonlru_pages_failed", + "memdefrag_src_anon_pages_failed", + "memdefrag_src_compound_pages_failed", + "memdefrag_dst_split_hugepage", +#ifdef CONFIG_COMPACTION + "compact_migrate_pages", +#endif +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + "thp_collapse_migrate_pages" +#endif + #endif /* CONFIG_VM_EVENTS_COUNTERS */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA */ From patchwork Fri Feb 15 22:08:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815953 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6361F13B5 for ; Fri, 15 Feb 2019 22:09:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 511652E9B6 for ; Fri, 15 Feb 2019 22:09:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 453692EA04; Fri, 15 Feb 2019 22:09:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A1902E9B6 for ; Fri, 15 Feb 2019 22:09:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 212F58E0007; Fri, 15 Feb 2019 17:09:12 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 169878E0004; Fri, 15 Feb 2019 17:09:12 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFE378E0007; Fri, 15 Feb 2019 17:09:11 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id B71FD8E0004 for ; Fri, 15 Feb 2019 17:09:11 -0500 (EST) Received: by mail-qk1-f198.google.com with SMTP id s65so9319696qke.16 for ; Fri, 15 Feb 2019 14:09:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=902xFr+JOntSJP1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=BrvcTPbEMFo3VHE7oUn+m/OwUgd91lrwLzLWoPin0Lh8Zh9WEklm973+36KOB6Xj7c aSQSqt9JlPXWXcVHrwq9vjlEJjJfYjBOGgTKeu/BBtPjvxDwfcdWUASQ8Gxlfj1DEJa2 dHq2hUmJHL3+RQkIQOHZ3Uzuuupk1N2oBEJzUDdKML16Ht1/bRNrZ/IB0+iOn3lVMgWM k7rj29kVq4KD62C0dqjyaiwrpCd5VDHNCWKRnI1fePwWbm1e8WxQEOAszVIPH4K3GSuF FS7nkx+KmL8WHZ6I48hTIIz0mjw8ncuZcVnq9rU9TLDFJt90yq2g9gbkrHu5Qa5lpFTw 8IHA== X-Gm-Message-State: AHQUAuZZIMEkEeTRkIRFWcXr12q0gqyVd7H1w2sHzG7cFIgH6YCfmlsZ l/yrOsp2YoH4RKdy5I4tAZ2jIUQugeCwmGVo8TX5xSWS7xEIK6knsd06jQsnaVRQM6+1i8BY1c8 9/YvM/I18tLf5ysqB7Kekwfm2wIOd1sFu1ZjUZiR7WmFPNGMU4cZ2QeV6pFfSjBiycg== X-Received: by 2002:ac8:1a56:: with SMTP id q22mr9316885qtk.59.1550268551474; Fri, 15 Feb 2019 14:09:11 -0800 (PST) X-Google-Smtp-Source: AHgI3IYYIlioFxUnrsnaY/86CEpEfgS0DEze8dNvNQ9K30WcpjLGFNlksVRBwTG2YKIbfk8ZhJkI X-Received: by 2002:ac8:1a56:: with SMTP id q22mr9316834qtk.59.1550268550668; Fri, 15 Feb 2019 14:09:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268550; cv=none; d=google.com; s=arc-20160816; b=Q0dsbRjKRdsswLF1oABt4UHGg4HJpmzL5z18RFfs3KQJZzNBYl0ZR9lgnIQF+nFlwh Nstcv5JPgLjCj7XE4q+av+MHICYg7XbOBzCKkPaWr/ewP0vIQZfAD1Ukprt7aHNRX5V6 jp5bUH2SLJkjJ9RGyBV1omc8j97fRWpPbcRaGt2/8U9KUGDIDpsbRObivW8fKHpNTrVO VT3pxgeuU/NBmAABVj7JB+5yXcAF0kH0Uc8RP8EvPNClR+RjugPkxqYOYRPyGb3jMGaV klIGDQPJl+8/3orbZhNe4fDa6ai6Ff1Yw2MOqfolOkRl43EzQIaDuM353SDXfXZ6F2YT a7sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=902xFr+JOntSJP1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=K/LfMernB4MSacw8cxILlPutdd1otJigD60797oHssnz0PcJiXhxdMX09LJnfh6ANe nWWprHrwZFNl78zxYhHSmXP3gsqZbKqOXgCzMVaplKVBsnsCgMC+M1V+pkmBx1vCEmog bN8OgmP9/534OvwQLXoXNjSThFp+Khc0I9sTBgWh5VZpMGW5dWVQBGJu8ML28kM88Uas XsIIRgbqjHWurrWsWDV4VcOYBbdl3WopiNSPqTQUumCsGzGUew/HWFDwanaPZFAC3FYq 7VQ4ymD4q1Jh9faquAqYSRO/ZyhQ9Z8hIQ/C753a7EzMf36Sfo5s0OIDQxDnWFUonjyq SnVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=KNcQRgco; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=VqdFBUXY; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id d32si1504612qtd.307.2019.02.15.14.09.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:10 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=KNcQRgco; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=VqdFBUXY; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id D84D2329C; Fri, 15 Feb 2019 17:09:08 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=902xFr+JOntSJ P1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=KNcQRgco65nLK8HcP4rreHPMghTtD tbAicuwBbTE1FHBK/mt1I3d7CrhIMayrxJ7vIYABDxqxw244SAuoqDpERuzwc21A /lenuvOPKvu41G7xdU1zJQKfmGI3Z3Eqwe77Qm7YCFQo7c2UAZsR3TdLMNRLadV1 viPsvHoH32VRO5hDpT7PE08kVsGqwhj9ja219+Z/ckfW0t6cDOfW0xf1GAyPFSQ9 qX12ECY2+hbG5EpEudUaVecKVJnIEyYGGywK7RL8ez/ZkDvv05hlxLtbYDapTgyP SuiDA0rOE78fPWsjpeLtTPfcFQ54n8nA3rvb3S+mQAtbpi5n4KOBvfyhw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=902xFr+JOntSJP1cg0P3STe45DXLaJ7bWFq0OYVQRqQ=; b=VqdFBUXY WLb1yElVXLoPwOR5wnvkIS2U9RfZpTs95eMz6DgbT6jN9jlUWm+08d6hcyXPJbF8 iAP/a7c1WbBagMx/3a+9kAkRRWiZLCeWmaMN9J1RC9ferz+shh6Zu2FATorUpCEG j+LPn15EmNthHAmK/HAWfOxSkgV/UMjCNgrYIakQjJSDSHiWTj9KbxL8+o6LeYnh 8V7xtwgYcdAotJTRPsoJDnssnGkBe8oXLSEPQ3zCk/+TYvaZYXWyW9wUyIusaeDv CaGE46LCYxcqdViEhGJhm1IZm0y5a9ByEsEx3QsuuWKVVvF6FqqE0X8MuKDDlUWi jeQzH7SrN1/WeA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedu X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 1E32EE4511; Fri, 15 Feb 2019 17:09:07 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 05/31] mem_defrag: split a THP if either src or dst is THP only. Date: Fri, 15 Feb 2019 14:08:30 -0800 Message-Id: <20190215220856.29749-6-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan During the process of generating physically contiguous memory, it is possible that we want to move a THP to a place with 512 base pages. Exchange pages has not implemented the exchange of a THP and 512 base pages. Instead, we can split the THP and exchange 512 base pages. This increases the chance of creating a large contiguous region. A split THP could be promoted back after all 512 pages are moved to the destination or if none of its subpages is moved. In-place THP promotion will be introduced later in this patch serie. Signed-off-by: Zi Yan --- mm/internal.h | 4 ++ mm/mem_defrag.c | 155 +++++++++++++++++++++++++++++++++++++----------- mm/page_alloc.c | 45 ++++++++++++++ 3 files changed, 168 insertions(+), 36 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 4fe8d1a4d7bb..70a6ef603e5b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -574,6 +574,10 @@ void expand(struct zone *zone, struct page *page, int low, int high, struct free_area *area, int migratetype); +int expand_free_page(struct zone *zone, struct page *buddy_head, + struct page *page, int buddy_order, int page_order, + struct free_area *area, int migratetype); + void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags); diff --git a/mm/mem_defrag.c b/mm/mem_defrag.c index 414909e1c19c..4d458b125c95 100644 --- a/mm/mem_defrag.c +++ b/mm/mem_defrag.c @@ -643,6 +643,15 @@ static void exchange_free(struct page *freepage, unsigned long data) head->num_freepages++; } +static bool page_can_migrate(struct page *page) +{ + if (PageAnon(page)) + return true; + if (page_mapping(page)) + return true; + return false; +} + int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start_addr, unsigned long end_addr, struct page *anchor_page, unsigned long page_vaddr, @@ -655,6 +664,7 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, int not_present = 0; bool src_thp = false; +restart: for (scan_address = start_addr; scan_address < end_addr; scan_address += page_size) { struct page *scan_page; @@ -683,6 +693,8 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, if ((scan_page == compound_head(scan_page)) && PageTransHuge(scan_page) && !PageHuge(scan_page)) src_thp = true; + else + src_thp = false; /* Allow THPs */ if (PageCompound(scan_page) && !src_thp) { @@ -720,13 +732,17 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, } retry_defrag: - /* migrate */ - if (PageBuddy(dest_page)) { + /* free pages */ + if (page_count(dest_page) == 0 && dest_page->mapping == NULL) { + int buddy_page_order = 0; + unsigned long pfn = page_to_pfn(dest_page); + unsigned long buddy_pfn; + struct page *buddy = dest_page; struct zone *zone = page_zone(dest_page); spinlock_t *zone_lock = &zone->lock; unsigned long zone_lock_flags; unsigned long free_page_order = 0; - int err = 0; + int err = 0, expand_err = 0; struct exchange_alloc_head exchange_alloc_head = {0}; int migratetype = get_pageblock_migratetype(dest_page); @@ -734,32 +750,77 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, INIT_LIST_HEAD(&exchange_alloc_head.freelist); INIT_LIST_HEAD(&exchange_alloc_head.migratepage_list); - count_vm_events(MEM_DEFRAG_DST_FREE_PAGES, 1<flags) { + failed += 1; + defrag_stats->dst_out_of_bound_failed += 1; + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + /* spill order-0 pages to buddy allocator from pcplist */ + if (!PageBuddy(dest_page) && !page_drained) { + drain_all_pages(zone); + page_drained = 1; + goto retry_defrag; + } /* lock page_zone(dest_page)->lock */ spin_lock_irqsave(zone_lock, zone_lock_flags); - if (!PageBuddy(dest_page)) { + while (!PageBuddy(buddy) && buddy_page_order < MAX_ORDER) { + buddy_pfn = pfn & ~((1<free_area[free_page_order]), migratetype); + if (expand_err) + goto freepage_isolate_fail; if (!is_migrate_isolate(migratetype)) __mod_zone_freepage_state(zone, -(1UL << scan_page_order), @@ -778,7 +839,7 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, freepage_isolate_fail: spin_unlock_irqrestore(zone_lock, zone_lock_flags); - +freepage_isolate_fail_unlocked: if (err < 0) { failed += (page_size/PAGE_SIZE); defrag_stats->dst_isolate_free_failed += (page_size/PAGE_SIZE); @@ -844,6 +905,8 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, if ((dest_page == compound_head(dest_page)) && PageTransHuge(dest_page) && !PageHuge(dest_page)) dst_thp = true; + else + dst_thp = false; if (PageCompound(dest_page) && !dst_thp) { failed += get_contig_page_size(dest_page); @@ -854,37 +917,56 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, } if (src_thp != dst_thp) { - failed += get_contig_page_size(scan_page); - if (src_thp && !dst_thp) - defrag_stats->src_thp_dst_not_failed += - page_size/PAGE_SIZE; - else /* !src_thp && dst_thp */ - defrag_stats->dst_thp_src_not_failed += - page_size/PAGE_SIZE; + if (src_thp && !dst_thp) { + int ret; + + if (!page_can_migrate(dest_page)) { + failed += get_contig_page_size(scan_page); + defrag_stats->not_defrag_vpn = scan_address + page_size; + goto quit_defrag; + } + get_page(scan_page); + lock_page(scan_page); + if (!PageCompound(scan_page) || is_huge_zero_page(scan_page)) { + ret = 0; + src_thp = false; + goto split_src_done; + } + ret = split_huge_page(scan_page); +split_src_done: + unlock_page(scan_page); + put_page(scan_page); + if (ret) + defrag_stats->src_thp_dst_not_failed += page_size/PAGE_SIZE; + else + goto restart; + } else {/* !src_thp && dst_thp */ + int ret; + + get_page(dest_page); + lock_page(dest_page); + if (!PageCompound(dest_page) || is_huge_zero_page(dest_page)) { + ret = 0; + dst_thp = false; + goto split_dst_done; + } + ret = split_huge_page(dest_page); +split_dst_done: + unlock_page(dest_page); + put_page(dest_page); + if (ret) + defrag_stats->dst_thp_src_not_failed += page_size/PAGE_SIZE; + else + goto retry_defrag; + } + + failed += get_contig_page_size(scan_page); defrag_stats->not_defrag_vpn = scan_address + page_size; goto quit_defrag; /*continue;*/ } - /* free page on pcplist */ - if (page_count(dest_page) == 0) { - /* not managed pages */ - if (!dest_page->flags) { - failed += 1; - defrag_stats->dst_out_of_bound_failed += 1; - - defrag_stats->not_defrag_vpn = scan_address + page_size; - goto quit_defrag; - } - /* spill order-0 pages to buddy allocator from pcplist */ - if (!page_drained) { - drain_all_pages(NULL); - page_drained = 1; - goto retry_defrag; - } - } - if (PageAnon(dest_page)) { count_vm_events(MEM_DEFRAG_DST_ANON_PAGES, 1<dst_anon_failed += 1<= buddy_head && page < (buddy_head + (1< page_order) { + struct page *page_to_free; + + area--; + buddy_order--; + size >>= 1; + + if (page < (buddy_head + size)) + page_to_free = buddy_head + size; + else { + page_to_free = buddy_head; + buddy_head = buddy_head + size; + } + + /* + * Mark as guard pages (or page), that will allow to + * merge back to allocator when buddy will be freed. + * Corresponding page table entries will not be touched, + * pages will stay not present in virtual address space + */ + if (set_page_guard(zone, page_to_free, buddy_order, migratetype)) + continue; + + list_add(&page_to_free->lru, &area->free_list[migratetype]); + area->nr_free++; + set_page_order(page_to_free, buddy_order); + } + return 0; +} + static void check_new_page_bad(struct page *page) { const char *bad_reason = NULL; From patchwork Fri Feb 15 22:08:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815955 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D309013B5 for ; Fri, 15 Feb 2019 22:09:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C29842FE33 for ; Fri, 15 Feb 2019 22:09:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B3BD22FE51; Fri, 15 Feb 2019 22:09:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A53E2FE33 for ; Fri, 15 Feb 2019 22:09:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4079A8E0008; Fri, 15 Feb 2019 17:09:13 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 362F68E0004; Fri, 15 Feb 2019 17:09:13 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22C078E0008; Fri, 15 Feb 2019 17:09:13 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id ECF008E0004 for ; Fri, 15 Feb 2019 17:09:12 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id a11so9324880qkk.10 for ; Fri, 15 Feb 2019 14:09:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=6ymhdlS0PDEIxuW6nq05LKSmTCjvZyCs+nOTiikq+3M=; b=Q/mhhqCnSCBF65jvx47vojLx2gqJwrdokwreQyDvTiO+tBXgp42jWP7QDdHhCMbxS6 UMTpFJHtBOwAABMNSEoIFhGoetzQ8fSLhRiU/CE5+lm7KVJgWAiQElxdSfexdaeLPs5N rZKhMzvbUXXKAlNMhwzCFSE/sDc2VELx1JzbXN4V+2LfZW1E5oCShWN+uJ/pKUkWGFSO Kj2tsHnAs7I1cotB5ISY3uyx4WP5KDPQTNJI0kYZDQIcNMEz3EdmSrezumsZ0WLLAC2v CHwJfk8ZNWD/XWHeSEkiGfaP9j1OQQTOlmS6Gymmfh0X5xSE6wKMrtKEKPSIlR63FrKy 8K+A== X-Gm-Message-State: AHQUAub9rCQHBJvVsIah3ICU/Z7FrErhH9UXlHIIc9qqZrJhkbpx9fac e9Mn9lhH4qigTP5YS4Qisj/2FWbVgtRR3FulgrLBj91ATo25am6E8rbuDBEa+2A1o8CXuZC68lS AjeT2kh7/R8EpZ3KoGcrYL+OYrnfHrAB2CHQmbzma2QCMt/FJUDQrkZkkIhCWKsrg0A== X-Received: by 2002:a37:7fc6:: with SMTP id a189mr8744317qkd.12.1550268552733; Fri, 15 Feb 2019 14:09:12 -0800 (PST) X-Google-Smtp-Source: AHgI3Ia7No94nS7XrWS58izy7H3xO7qp9lxagvl5DhoAsj3VYh9enUl3+PlSSlfFET124PhAolQ1 X-Received: by 2002:a37:7fc6:: with SMTP id a189mr8744288qkd.12.1550268552123; Fri, 15 Feb 2019 14:09:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268552; cv=none; d=google.com; s=arc-20160816; b=swZvVcNGj/7jAAbQUHVvOPRDWN5FM7K8gdjgx00dkGjeJsZQFTDPGAOwZlHJu76vzb ABe8P/CQ9ACKEIPKlgU2ZcvLlP067jC1n6svsYseHvIJG1T6+tMbdGmALp45XsdovHrG B0Clbs4kcG/ldOkoKQ0QunPVp1AcWej05QJDrxsE4sfMl0RNb1XHLhK9GKWES9DGJKJb zPD4U+Wczorlag4I1a3HXRJaDdCLY2XdSMSx1Jxf6T4PuEpuMLVycmUsQrIhyOj1vK1x KnOB8UKJT8TAyhr2dB3TAs2rJ/RhUB2Jtsh6CZ49tUCMIYYqsoO+4gKtKnEEIKytENvZ iNug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=6ymhdlS0PDEIxuW6nq05LKSmTCjvZyCs+nOTiikq+3M=; b=Uf2eHMA5F6TkFkptcEtJv2juYrnr++xluEdY7enw5TzpD2takCi2i55g8OKcQcePfu XLlmw5rrc05gYI9LLNiYHOTPxglikNFYDZZPNrR7JJaPOZiu3HbDhkq2IuOSJpkKXyB2 llZPSnohAUsrqWl/WtYt5phtMv6OuRtsSMvqL/ZC0Z/4Rk7ePCAbadH3A04m3bNG/etM JJiyHdaE/ScChrE4OW2ozim1x75+osbMZZXCVagJolKbs/RjxqlbjSLLqdODTxNL0t83 VgBJx5pjdaHh1n0NVQVXO+Yj3j6u4AH5qQQZ5GtbOP4y6cUUZpLp+hqwava5brLgjGDY McqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=ZAWqHFVS; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=7EJVwMDl; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id x54si3162137qth.374.2019.02.15.14.09.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:12 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=ZAWqHFVS; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=7EJVwMDl; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 4F0A6310A; Fri, 15 Feb 2019 17:09:10 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=6ymhdlS0PDEIx uW6nq05LKSmTCjvZyCs+nOTiikq+3M=; b=ZAWqHFVSjc7AMUIPUmVIUa6hcCX+X Ob2ZssfT10pMRBWr/zR+1ObMQyajh8+qwCJbD8rMpMudIV3IqGfK1Cg/nQdlV9X5 uIRNDPywDowLQM82w8SRqIl+V+nmHXLqfcR87ullpaMMqpgUKbPupsPfr3yKWANX U1F6UJs2FYVBVyQMkgdkzL0Z/qxfXQfdGhNjDwt+TLajRm4T9kjYLu+Sjrn8wbox 3LvgzM8ysGXv0Ufm+CQQhR02liZ0JCrEeY9o02prDi5UZEGRt405Oi3sDRTBYs5+ JSaOM8nzV8Ue+yR+bytxuZMoOAhV9NBeut//pOp162k9zHp7GcjH/jzeg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=6ymhdlS0PDEIxuW6nq05LKSmTCjvZyCs+nOTiikq+3M=; b=7EJVwMDl uKYJsQjnqcYbihQy6KGbt8K216aqhCQYZkKwRZFkYKgtNqSWaQ/SZyJPShOxPff/ 0Mbqh/2JNwJsciSqnw3c0E5taomwFv97PvP3Pc+XYX63SLve1Idd9/knB8gMB94n l4/kD/8BhizsJ+QhLLQWAwxgSKpyINs+cmJieie+HgbhmcwuZCBF+k5eIJ1NBgSb ZYco42z77qgkk9zZhhW2N6wbWUWI6eFZbE+BCqxgaugva08jcfbaBK6ZZuq0LAF8 t4WaMzFjcGweQcEd7ZWu9XdsVbk/62OQWwabrrwS5sZ3ct8x27RO56pqMD4lv3U1 W9ZbmDFDgBpTkQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeeh X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 7598FE4599; Fri, 15 Feb 2019 17:09:08 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 06/31] mm: Make MAX_ORDER configurable in Kconfig for buddy allocator. Date: Fri, 15 Feb 2019 14:08:31 -0800 Message-Id: <20190215220856.29749-7-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan To test 1GB THP implemented in the following patches, this patch enables changing MAX_ORDER of the buddy allocator. It should be dropped later when we solely rely on mem_defrag to generate 1GB THPs. Signed-off-by: Zi Yan --- arch/x86/Kconfig | 15 +++++++++++++++ arch/x86/include/asm/sparsemem.h | 4 ++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 68261430fe6e..f766ff5651d5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1665,6 +1665,21 @@ config X86_PMEM_LEGACY Say Y if unsure. +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + range 11 20 + default "11" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 11 means that the largest free memory block is 2^10 pages. + config HIGHPTE bool "Allocate 3rd-level pagetables from highmem" depends on HIGHMEM diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h index 199218719a86..2df61d5ccc2d 100644 --- a/arch/x86/include/asm/sparsemem.h +++ b/arch/x86/include/asm/sparsemem.h @@ -21,12 +21,12 @@ # define MAX_PHYSADDR_BITS 36 # define MAX_PHYSMEM_BITS 36 # else -# define SECTION_SIZE_BITS 26 +# define SECTION_SIZE_BITS 31 # define MAX_PHYSADDR_BITS 32 # define MAX_PHYSMEM_BITS 32 # endif #else /* CONFIG_X86_32 */ -# define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */ +# define SECTION_SIZE_BITS 31 /* matt - 128 is convenient right now */ # define MAX_PHYSADDR_BITS (pgtable_l5_enabled() ? 52 : 44) # define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46) #endif From patchwork Fri Feb 15 22:08:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815959 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E4C517E0 for ; Fri, 15 Feb 2019 22:09:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B8712FE33 for ; Fri, 15 Feb 2019 22:09:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4FA982FE50; Fri, 15 Feb 2019 22:09:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB1ED2FE51 for ; Fri, 15 Feb 2019 22:09:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E50508E0004; Fri, 15 Feb 2019 17:09:14 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B9EF88E000B; Fri, 15 Feb 2019 17:09:14 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A67148E0004; Fri, 15 Feb 2019 17:09:14 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id 70B378E0009 for ; Fri, 15 Feb 2019 17:09:14 -0500 (EST) Received: by mail-qk1-f198.google.com with SMTP id q15so9395342qki.14 for ; Fri, 15 Feb 2019 14:09:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=YB7G8kYGI8bRnBYacSSGAZcz9slxKgMGLPhOLzrEt8M=; b=ujzSxHTHRmuezParrY/8ugFUuqnN4Vi+6R1YIDkxOYE6sRgt8h58hu4ZX6U0m3L9ys W+dudlBzJ5AnDPFMiirj7Zk0LWUM/QiSSCXb2/NJGrjUmeyZv8ZuCEminKtvspKRaisj DwmK7UhXDvOvv+5DtjG1MGzO5JaPFbYkV+oyUOwbqcD20PuZ5xDhis7WrZHA4w/W1Qjq +lN073waW0fBT52B9yrBRknrsVljqZM4zSOCKfWn0ltF+Q1XbpDv34fhQtGx9ozub8a1 2Kqd7Z6gsrmdOKSPFQz0cR9tWPGijyK1+ou5r3ECYBWRNmqiBYNet6dLgj2s9tPdsxXL bjQA== X-Gm-Message-State: AHQUAub2wjEk3bTGGQ+ef7GBkundXAl+uiOmqhO8Zj2Elc43vgyDugzp M9N0es2vW12+SvHdn0z4JAYG3mr3nRbK/bp9xJVi+qYVPJfg2MYKDIaawLSbF6QZWu7zkhg6jEG Kyj5GSE7vTm4S6bIgLMNaxTKbt1EGpJ2vNOS1+K6YmBCoHn7AfOiNka6b5FBdPQUKBA== X-Received: by 2002:a37:8882:: with SMTP id k124mr8544274qkd.1.1550268554230; Fri, 15 Feb 2019 14:09:14 -0800 (PST) X-Google-Smtp-Source: AHgI3IZIXq2dVgwNCZwVhRj9MiwSWK9tx9nJXVQ6NjedyIJivU1HuBJdFHSNJ/t9pK/dc9Y8j2Ad X-Received: by 2002:a37:8882:: with SMTP id k124mr8544241qkd.1.1550268553534; Fri, 15 Feb 2019 14:09:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268553; cv=none; d=google.com; s=arc-20160816; b=HaUjqt4ddYLy79nhIdz02/TZll1m616ev2TDsOZ9fu1TeQY+vaFvXPwapnLf7Cj4ur 5LMy/PIpdhzcLc6udICRjTvaDCbOt9zgkmNHbrSQkHsuUsWHJ/Afpm1SOYZYinyPgjnR 491XlF51mqpraDzTWPEaHt3mWxbNXpTeoc22Ji4SnFPaIQzF0D0SO2xk+PJi/RzdNwD6 SE4LiV765ecNd8CCMywV4DC21Ma+DoEbdOwpqXG6y/wofJTXThfxkBlfHKEokYTWxeHf 9QgZUrI2RkdsGpC5VSfWLl/S5XZLE+jGm1lt5XtJL5/D4OBACVMU+r9YrbJlxjMr24bQ sZSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=YB7G8kYGI8bRnBYacSSGAZcz9slxKgMGLPhOLzrEt8M=; b=gTHK7GdGfnlkroIYbOZemv5+8IhXLmPmkPmqUAEOr/E75D+ih+Tc1SKKx23NP0jonD meMqmSw4jmyBI+xif3ti+H4XfqH/jKZecBycspZMZ+9D4VN37B0Ft7vYMey5xUjlwpkv RPZd7JUSJkPJh2/CDYz2g3kicQAZl6iFfQK9Nhhg8q1Pyo8pnDLZTc2ClR0fCZi75ysA 819WyOoWXKhmZCQ885pGAwjBELt6VRuHTUO4aagr2e9oajebpb4J04gLZ71gGg565CG0 y9X37jXhNqQVMUrsd3Yqq/ixcqai8/hG9WvsyGiIDo4yJ7pEHT4WCjD2OKU6SFoYdpKo kfhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=H7QS7MH5; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="COCQw14/"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id z18si2925262qti.297.2019.02.15.14.09.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:13 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=H7QS7MH5; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="COCQw14/"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id B9B8F3299; Fri, 15 Feb 2019 17:09:11 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=YB7G8kYGI8bRn BYacSSGAZcz9slxKgMGLPhOLzrEt8M=; b=H7QS7MH59u8xwucq0Z1GHNPXlaQRC FHeYGCF1tAMxxXP403wg/3pcu+/Qod0JH10zVxAksBVNfZokneF7fPoWXeikvNBF iH6blrzJNdL3s1JvOgRHKdzOfP/PgsVcoIjflgyuzwwG++f7WuSfQUXjy9KO5GND WV0oWxdRZqahNjyVEPk0miHYhtF9HcRWQuQ1uxn+filUOUAf24WLtoBGwqU5+IHv bGptQory6nGB3qjJGbu30AcQ94zmv4OegwV/mOo5oO7imkgnh6W/yVzHlthZ7Qij l+Yn6h5uDHJDeuUM0kJZJdlQ3LiBkTGODdWgnaKTvxpQ2Tgc0KP/xAhrw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=YB7G8kYGI8bRnBYacSSGAZcz9slxKgMGLPhOLzrEt8M=; b=COCQw14/ OnB2gMlAGBaEv2afOVWzIFuk76BGlb4G/8ob3jmC8kdWDb3wM/OULj/7wB2jRCRl DWsRFMwU71vAmreUChlz7cUU8S+4CYjlwNnXBRS/B/9agl6rlHaF72VmKkOtSIUi ca6bFjH0vbxC+jHeE5+alLiihoAMbp0WnVysOCB71tJx2l86dXNFbq4NnB1rHfnz +0RwROQ4V/xM1lKxz3zE1SyW5lZvMDJKjJQVIjIuLIQqihCEcNHAoVEKgnuI/0pZ chLn0SG84GIKVqYSCApaFt2y5ulkKk7q0f4OA9gWEf7DkuMUpJdSfe7IlX5lm0m7 z4JhtXzq2GAmEw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeeh X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id C8156E4680; Fri, 15 Feb 2019 17:09:09 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 07/31] mm: deallocate pages with order > MAX_ORDER. Date: Fri, 15 Feb 2019 14:08:32 -0800 Message-Id: <20190215220856.29749-8-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan When MAX_ORDER is not set to allocate 1GB pages and 1GB THPs are created from in-place promotion, we need this to properly free 1GB THPs. Signed-off-by: Zi Yan --- mm/page_alloc.c | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9ba2cdc320f2..cfa99bb54bd6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1287,6 +1287,24 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) } } +static void destroy_compound_gigantic_page(struct page *page, + unsigned int order) +{ + int i; + int nr_pages = 1 << order; + struct page *p = page + 1; + + atomic_set(compound_mapcount_ptr(page), 0); + for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) { + clear_compound_head(p); + set_page_refcounted(p); + } + + set_compound_order(page, 0); + __ClearPageHead(page); + set_page_refcounted(page); +} + static void __free_pages_ok(struct page *page, unsigned int order) { unsigned long flags; @@ -1296,11 +1314,16 @@ static void __free_pages_ok(struct page *page, unsigned int order) if (!free_pages_prepare(page, order, true)) return; - migratetype = get_pfnblock_migratetype(page, pfn); - local_irq_save(flags); - __count_vm_events(PGFREE, 1 << order); - free_one_page(page_zone(page), page, pfn, order, migratetype); - local_irq_restore(flags); + if (order > MAX_ORDER) { + destroy_compound_gigantic_page(page, order); + free_contig_range(page_to_pfn(page), 1 << order); + } else { + migratetype = get_pfnblock_migratetype(page, pfn); + local_irq_save(flags); + __count_vm_events(PGFREE, 1 << order); + free_one_page(page_zone(page), page, pfn, order, migratetype); + local_irq_restore(flags); + } } static void __init __free_pages_boot_core(struct page *page, unsigned int order) @@ -8281,6 +8304,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, return ret; } +#endif + void free_contig_range(unsigned long pfn, unsigned nr_pages) { unsigned int count = 0; @@ -8293,7 +8318,6 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages) } WARN(count != 0, "%d pages are still in use!\n", count); } -#endif #ifdef CONFIG_MEMORY_HOTPLUG /* From patchwork Fri Feb 15 22:08:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815961 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6C95139A for ; Fri, 15 Feb 2019 22:09:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5E462FE33 for ; Fri, 15 Feb 2019 22:09:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA1602FE51; Fri, 15 Feb 2019 22:09:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2DE6A2FE33 for ; Fri, 15 Feb 2019 22:09:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 009848E000B; Fri, 15 Feb 2019 17:09:16 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ED5008E0009; Fri, 15 Feb 2019 17:09:15 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9FE28E000B; Fri, 15 Feb 2019 17:09:15 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id A9D1D8E0009 for ; Fri, 15 Feb 2019 17:09:15 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id a11so9324966qkk.10 for ; Fri, 15 Feb 2019 14:09:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=uwePujYO5Bi6rSoxtQfAUMSLQwvWmv1VR4nTNRj43D8=; b=Zamj5IsG5sIPNhNvVl/E5aDjCSVhrKA6yHiCvnhzAJI1Bo99IePQBJZ/AF47gdbw+r eb/KdO35o8Xr8uL0ULAIq5vTt+TgcLfrOiVyU7jjtka5qS0tvWvAyDP8uxQNiIysqzfd 4OdoX5lB6x8yd8iU3byPRMUbR/oKpVxpEbiErPEDZeNDVoifbyjxBjZ5Z0XOT3UjeNNh Zp89485ORBBbKReiPJ5FgthUcPlpBq/CzKR8gA1cxw4YcHB/VvCSw0b1y9Wecy6Ym6d0 BSbasHsX+IdXUbM+Fv0OIXeO+rsYKrK1mEdMpo61L+8uPF12QjIrAFt0u1AU8E9JkSyf ORKw== X-Gm-Message-State: AHQUAub2c4an14qJRvXL2UEIX+QwInl+1lqN7WpB5EemQ+GPf2Bi2jWD nDdkQ3a46BrW9k0hyH3fRByx27lCIkQ9IdBHLx87WmdYyqjdYJFfBuNu1SptLQFvJ6yoTqQUOPd rVzj0hz4XbynwBrRdO2NykTyqKBBxEsJms0j9c1I0pUvnPcplPnAWN7UqyAXw6SUQPQ== X-Received: by 2002:a37:b581:: with SMTP id e123mr8526412qkf.183.1550268555455; Fri, 15 Feb 2019 14:09:15 -0800 (PST) X-Google-Smtp-Source: AHgI3IZfo1K0aIBAGnmqApEiUa4lE6cuWFIZsJZmMX99h4RUAiyccAgSxXxxEWUNpLSSCvdf/mck X-Received: by 2002:a37:b581:: with SMTP id e123mr8526381qkf.183.1550268554813; Fri, 15 Feb 2019 14:09:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268554; cv=none; d=google.com; s=arc-20160816; b=MFC8fH5pxSs37BPQqurrJJNYyqWuToErxY8d4ueYk4G3bNPnVcVG3fHBgBorf7hym1 D7mlJLse2P/+IgIfxxTQ/axxfqkmxFGpky58t77Q1EJqbJirFMEn3T2xu7yq9s1h2Hxj wmNLDB7MHEwjBL8RJfiiMzj3m3CmlDotaQ7cl4XYWAGB5rFnIZZo/8tmG6fYTmw00Ira FL/qfMb1NMPgPRRLgWyw7/PygUkyzwWDyoFW1Q6JuUvmFOMiMKEvcAS7OwjKebmtpIGC 6fXmhhPVKTMnUZo2WShY3b+Xx/JEggvKIZ9hkorm3F4B8ZLsFz4NPvDXrjj5J5m17cQz 5cyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=uwePujYO5Bi6rSoxtQfAUMSLQwvWmv1VR4nTNRj43D8=; b=k1cB0GDSKDU6pUJOu4C7/32XffWHWX++p/8bXvcMzaEeAZNU8jod0eR5Z85tNjXs9o 1jpOxPeKor1Kotjbee+eOMdvSB8jDR79MCcCl2++4I+XlkSJ4lMJxcnpa0o9YUoDvGhI 7sa6btznZ9WBZiZJXTKepsajdBC64h/ER2wBUtdcikJ1xgIRJ80qvxcbWAfOXuuK9PGg t/lxxpbipYHdsY/0mjBcZQrYM8FDXmlZgb3b9eJ45LiHKkXLj5K5AFKBTuMPbksvRK5X YGIYAasNoVosjTDbnisk6ZWPvnorISJMW7EO2Z/8KHwXVK6s+Tg+pQixACwCvuijJRnk UNEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=nC8wJYxr; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Cr27w0fX; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id f36si4192660qtk.149.2019.02.15.14.09.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:14 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=nC8wJYxr; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Cr27w0fX; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 0AF6631E4; Fri, 15 Feb 2019 17:09:12 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=uwePujYO5Bi6r SoxtQfAUMSLQwvWmv1VR4nTNRj43D8=; b=nC8wJYxrkic1or95bLI0GYWFtsGrQ 9VYMM75Q7iaUOojaj3DxJATMc98abDC5rnSiIqSVbMboOPzaiQZGZFYIY0JNXbzt fn5Ccwqpq8a8eFx/cewftjMPzK1g0swu1KgrHqn++lE1FBn9oHzbFHnxcygnuHMT ajoGJCK8DhNIdAFN2g1nmnLyISw+3Zuj9IJYFakrkeqGu4tQ94WeBwmYx8DB6JKF HEh7V32pyFVtybjuC8R0yJv+1arzGA3MGHKbl1xMwRElzsFNtTdpM+koDKP5Xhbt FhFcB6p4c6JSoFGBC26ad0XVlVLCzxKE0dY3BB9RYftHlkX2mCwiI3v9w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=uwePujYO5Bi6rSoxtQfAUMSLQwvWmv1VR4nTNRj43D8=; b=Cr27w0fX JNjaiz9AQcalW7mj+chni9lvuDy2hlj5bN++V4Zz/pzLbTyJJ6+gWvhbeejSPrKF qE4LVQwDMHxWuFUz2dp0NM2ZWU/Jy6tEhkke3NGTgbIu9mKwYdPCc57qbMoS2J03 jnzWObyX20UCFARwxhjxDbFY05pOZWmFOfcr0+Hix2Nb9URberL/7Zh0O5F4btjf dMZX9spArEGnOaEFtCd2yBEEg2VlhFro8OAdG+KhpwBpwO4YHMvOT1wBOB6Qgd3b l4qcgYuCq0BRbqXK+tO1EmKjR9IZgNXagRS7xSJQdSejFu2NXBuJh8y9fZ3vhD+s C+V/O/eL6Hv0Kg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeeh X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 27FBCE4511; Fri, 15 Feb 2019 17:09:11 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 08/31] mm: add pagechain container for storing multiple pages. Date: Fri, 15 Feb 2019 14:08:33 -0800 Message-Id: <20190215220856.29749-9-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan When depositing page table pages for 1GB THPs, we need 512 PTE pages + 1 PMD page. Instead of counting and depositing 513 pages, we can use the PMD page as a leader page and chain the rest 512 PTE pages with ->lru. This, however, prevents us depositing PMD pages with ->lru, which is currently used by depositing PTE pages for 2MB THPs. So add a new pagechain container for PMD pages. Signed-off-by: Zi Yan --- include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 include/linux/pagechain.h diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h new file mode 100644 index 000000000000..be536142b413 --- /dev/null +++ b/include/linux/pagechain.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * include/linux/pagechain.h + * + * In many places it is efficient to batch an operation up against multiple + * pages. A pagechain is a multipage container which is used for that. + */ + +#ifndef _LINUX_PAGECHAIN_H +#define _LINUX_PAGECHAIN_H + +#include + +/* 14 pointers + two long's align the pagechain structure to a power of two */ +#define PAGECHAIN_SIZE 13 + +struct page; + +struct pagechain { + struct list_head list; + unsigned int nr; + struct page *pages[PAGECHAIN_SIZE]; +}; + +static inline void pagechain_init(struct pagechain *pchain) +{ + pchain->nr = 0; + INIT_LIST_HEAD(&pchain->list); +} + +static inline void pagechain_reinit(struct pagechain *pchain) +{ + pchain->nr = 0; +} + +static inline unsigned int pagechain_count(struct pagechain *pchain) +{ + return pchain->nr; +} + +static inline unsigned int pagechain_space(struct pagechain *pchain) +{ + return PAGECHAIN_SIZE - pchain->nr; +} + +static inline bool pagechain_empty(struct pagechain *pchain) +{ + return pchain->nr == 0; +} + +/* + * Add a page to a pagechain. Returns the number of slots still available. + */ +static inline unsigned int pagechain_deposit(struct pagechain *pchain, struct page *page) +{ + VM_BUG_ON(!pagechain_space(pchain)); + pchain->pages[pchain->nr++] = page; + return pagechain_space(pchain); +} + +static inline struct page *pagechain_withdraw(struct pagechain *pchain) +{ + if (!pagechain_count(pchain)) + return NULL; + return pchain->pages[--pchain->nr]; +} + +void __init pagechain_cache_init(void); +struct pagechain *pagechain_alloc(void); +void pagechain_free(struct pagechain *pchain); + +#endif /* _LINUX_PAGECHAIN_H */ + From patchwork Fri Feb 15 22:08:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815963 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD4C6139A for ; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 970D12E6D4 for ; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 889912E9E6; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E30682FE33 for ; Fri, 15 Feb 2019 22:09:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90EA78E000C; Fri, 15 Feb 2019 17:09:18 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8203B8E0009; Fri, 15 Feb 2019 17:09:18 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BFC58E000C; Fri, 15 Feb 2019 17:09:18 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 3074E8E0009 for ; Fri, 15 Feb 2019 17:09:18 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id c9so2510219qte.11 for ; Fri, 15 Feb 2019 14:09:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=oa/dF8R2QGjPtC6Q8g6KGYStpyoteQyjNj2+6vec29I=; b=E9yZDJNyeS3tF39wvteB1BgsMfNZfPYsi1ho6palcSA9tILmcLdOTd/nPu57xT384Z QslbmXAeiAWrrSZXe31v3ChtOIAuZUDYlckAtLvqKNETCn4VfpBIDpw/FRtgU/Bp+jv2 /niJryoSVVuljoYp4CZSS9jrrnTPDAso63aA4PolAmgbsYGg+maVLc9FYnEYHXLZl51M umeEEiulInxn5fWmHdhoXFLQg8RpiB5hRGN5BIOWTMVepVFwyRVkywODArIqkItAnhET IuBFy1RS55W38JSWwtmpUNDfQwJ40KtX4XThNZ5Sm47VVE77Nm9ASNqcO702AjUuY5pj moNA== X-Gm-Message-State: AHQUAuZPeVXwqoNTZwlYeO27/QlZDLrZcHSGkc9BdUm9nXrfwpra609y arWk18UqWta0wA4JZg2bXP7BekB6J+i1TL2YZCuTB8ozyGgtFVKrmd/dsfNwrxagmlen95Eq9H+ sp+lLZlyCBpKSbLo5CWoMiULBPzrq3tyCTsfhzqfzKMwgJOg3pmwLUkfaYBVyz8RP7Q== X-Received: by 2002:aed:3964:: with SMTP id l91mr9441195qte.33.1550268557877; Fri, 15 Feb 2019 14:09:17 -0800 (PST) X-Google-Smtp-Source: AHgI3Ibku1PecIF/Ee8vwYTwuKgtKFGkdqa8nbyQAv/iGeN94hNHs/hgHfNwL4XHUemHw+v6A4K5 X-Received: by 2002:aed:3964:: with SMTP id l91mr9441106qte.33.1550268556383; Fri, 15 Feb 2019 14:09:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268556; cv=none; d=google.com; s=arc-20160816; b=IA757XAGmuCowLc/pe697d88lhfw6TfR5fmj0TZ7kPOo2mdA+fUzl2DIpCx6w9mku+ /VFGCz2hkvwj5lWW4WRAiki3OIOLtij885j+pMZBwHJQvEv6W42/8tFZb2Z5JMjh6Fhd oIQUJbteO51crgYwkKkfnkceT+hLYRItlqcsaarfjGM8cAvjuOVIl5wJdsJl9lUfrJms J+zq+OZnqlkQjrOSPUrzKuZH+JDvuaJvxhKz/PSLeqjyzgvd/S33qZceUlT4lSVuUUqD GWfOmT6+Vprk73usVzs7N0CM9Gh26qQztwel1/VRKDbV0aH4aqSlEQk4fw2vA585OLW7 G9LQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=oa/dF8R2QGjPtC6Q8g6KGYStpyoteQyjNj2+6vec29I=; b=ZHGIjJkduRbjWa55lTERvM7L8hnrULwxEESP2t8te7SV//1zavQdQHuTHtIi6lGqfH xQsycKfVmYnmLi501TS1INVFZY+TqUU9azrWeriVAnDSd55susZ7TZYGNEmxex7H3dy0 PuHwcqSbXTHZfkHtSqW21Sfd8D1Mo9/iPt+zlFTEi5OnLAHIUvKszOhn2gGY9QPdatv6 OoR+MwIHDHlIAl0yRtX40+zNfECJ67Rxcw1zUrCqZMTcysXImt0zTjKwgyY1DE26JccX k5b89TUmF2BIPVG5naU1w3FtrGDX+qyeWmUC03I6/R3HVPSJQqOLgJox//AMQWgnjZ4d 7dUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=luujar5O; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=UrF6Li26; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id b20si4244865qvd.185.2019.02.15.14.09.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:16 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=luujar5O; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=UrF6Li26; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 94721310A; Fri, 15 Feb 2019 17:09:14 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=oa/dF8R2QGjPt C6Q8g6KGYStpyoteQyjNj2+6vec29I=; b=luujar5ODCU+O2/+uwbwjjus4qceG Enib9BNOHyjFEJUjK2KN20YEtoljGKq7qkcC2EjEkAivOOMLCYQ830FXxrh7A9cI qsGF1zG/JADVvqKRvJ5kZmBCZbtvj3Yyxvgg4u2LQbwvdvTXjuMZiOiaX4ZyR4HB SQg8qufkM95BlnJvYpgnSTNLvwYJgGhx992+Neg/32WKM3Y8KgJJ/0kFbqgdtcBE 5KJKOhOlcf3aYj9RUGHno9Dw1IhrgS5ZjolcwPXKm+dLjdZuL+WJNVa9r+TYsy8K n2SRVkyaaLfa59qr/f1IMap2nTTA8l/9BuQeP7a4sm/XhC5C3LKcxucDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=oa/dF8R2QGjPtC6Q8g6KGYStpyoteQyjNj2+6vec29I=; b=UrF6Li26 GW652u9rOpjSnz/n3zGVXtGN2IdPrSfjoSaT3axO0XzynXfQ8p5ZTuqMtSK8wvvU qx/Gm9wyTreHfxNzHZNd3u0A2yJWMD2ufBFUk61tnGvQwQcT4EGKhD/H9fZzdI1f SU8tW7cQ6RL5pwNL5crFa1LcFUge3cpvWbo80zNgVh2oEyba8oIZk5EuxZmg+Dfw Cz5B8FBqTuqO2IUgBomgdzLvlzsySBZ4OXv5+Z48+vQ7bF+/xi1qo8zEM2AJgH2V VALNg1lVVmzm824skU4HzhkbP19gPYGqf0NxSpP9EqVaNrCZCtvyCzPs10ORu54X vNTcY7vyowaCGA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeeh X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 7DF09E4649; Fri, 15 Feb 2019 17:09:12 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 09/31] mm: thp: 1GB anonymous page implementation. Date: Fri, 15 Feb 2019 14:08:34 -0800 Message-Id: <20190215220856.29749-10-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan This adds 1GB THP support for anonymous pages. Applications can get 1GB pages during page faults when their VMAs are larger than 1GB. For read-only 1GB zero THP, a shared 1GB zero THP is created for all readers. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgalloc.h | 58 +++++++ arch/x86/include/asm/pgtable.h | 2 + arch/x86/mm/pgtable.c | 25 +++ drivers/base/node.c | 4 +- fs/proc/meminfo.c | 3 +- include/asm-generic/pgtable.h | 3 + include/linux/huge_mm.h | 17 ++- include/linux/mm.h | 4 + include/linux/mm_types.h | 1 + include/linux/mmzone.h | 1 + include/linux/sched/coredump.h | 1 + include/linux/vm_event_item.h | 2 + kernel/fork.c | 5 + mm/huge_memory.c | 267 ++++++++++++++++++++++++++++++++- mm/memory.c | 28 +++- mm/page_alloc.c | 3 +- mm/pgtable-generic.c | 47 +++++- mm/rmap.c | 28 +++- mm/vmstat.c | 3 + 19 files changed, 484 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index a281e61ec60c..6e29ad9b9d7f 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -49,6 +49,7 @@ extern void pgd_free(struct mm_struct *mm, pgd_t *pgd); extern pte_t *pte_alloc_one_kernel(struct mm_struct *); extern pgtable_t pte_alloc_one(struct mm_struct *); +extern pgtable_t pte_alloc_order(struct mm_struct *, unsigned long, int); /* Should really implement gc for free page table pages. This could be done with a reference count in struct page. */ @@ -65,6 +66,17 @@ static inline void pte_free(struct mm_struct *mm, struct page *pte) __free_page(pte); } +static inline void pte_free_order(struct mm_struct *mm, struct page *pte, + int order) +{ + int i; + + for (i = 0; i < (1<= 0) { + pgtable_page_dtor(&pte[i]); + __free_page(&pte[i]); + } + return NULL; + } + } + return pte; +} + static int __init setup_userpte(char *arg) { if (!arg) diff --git a/drivers/base/node.c b/drivers/base/node.c index 86d6cd92ce3d..f21d2235bf97 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -150,7 +150,9 @@ static ssize_t node_read_meminfo(struct device *dev, #ifdef CONFIG_TRANSPARENT_HUGEPAGE , nid, K(node_page_state(pgdat, NR_ANON_THPS) * - HPAGE_PMD_NR), + HPAGE_PMD_NR) + + K(node_page_state(pgdat, NR_ANON_THPS_PUD) * + HPAGE_PUD_NR), nid, K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR), nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 568d90e17c17..9d127e440e4c 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -131,7 +131,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_TRANSPARENT_HUGEPAGE show_val_kb(m, "AnonHugePages: ", - global_node_page_state(NR_ANON_THPS) * HPAGE_PMD_NR); + global_node_page_state(NR_ANON_THPS) * HPAGE_PMD_NR + + global_node_page_state(NR_ANON_THPS_PUD) * HPAGE_PUD_NR); show_val_kb(m, "ShmemHugePages: ", global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR); show_val_kb(m, "ShmemPmdMapped: ", diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 05e61e6c843f..0f626d6177c3 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -303,10 +303,13 @@ static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pgtable); +extern void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, + pgtable_t pgtable); #endif #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); +extern pgtable_t pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp); #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 381e872bfde0..c6272e6ffc35 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -18,10 +18,15 @@ extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD extern void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); +extern int do_huge_pud_anonymous_page(struct vm_fault *vmf); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { } +extern int do_huge_pud_anonymous_page(struct vm_fault *vmf) +{ + return VM_FAULT_FALLBACK; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -80,6 +85,9 @@ extern struct kobj_attribute shmem_enabled_attr; #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1< #include #include +#include struct mempolicy; struct anon_vma; @@ -1985,6 +1986,7 @@ static inline void pgtable_init(void) { ptlock_cache_init(); pgtable_cache_init(); + pagechain_cache_init(); } static inline bool pgtable_page_ctor(struct page *page) @@ -2101,6 +2103,8 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud) return ptl; } +#define pud_huge_pte(mm, pud) ((mm)->pud_huge_pte) + extern void __init pagecache_init(void); extern void free_area_init(unsigned long * zones_size); extern void __init free_area_init_node(int nid, unsigned long * zones_size, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 32549b255d25..a5ac5946a375 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -466,6 +466,7 @@ struct mm_struct { #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif + struct list_head pud_huge_pte; /* protected by page_table_lock */ #ifdef CONFIG_NUMA_BALANCING /* * numa_next_scan is the next time that the PTEs will be marked diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 842f9189537b..ea84d6a1802d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -177,6 +177,7 @@ enum node_stat_item { NR_SHMEM_THPS, NR_SHMEM_PMDMAPPED, NR_ANON_THPS, + NR_ANON_THPS_PUD, NR_UNSTABLE_NFS, /* NFS unstable pages */ NR_VMSCAN_WRITE, NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 52ad71db6687..4893849d11eb 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -73,6 +73,7 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_OOM_VICTIM 25 /* mm is the oom victim */ #define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) +#define MMF_HUGE_PUD_ZERO_PAGE 26 /* mm has ever used the global huge pud zero page */ #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ MMF_DISABLE_THP_MASK) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 6b32c8243616..4550667b2274 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -82,6 +82,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_DEFERRED_SPLIT_PAGE, THP_SPLIT_PMD, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + THP_FAULT_ALLOC_PUD, + THP_FAULT_FALLBACK_PUD, THP_SPLIT_PUD, #endif THP_ZERO_PAGE_ALLOC, diff --git a/kernel/fork.c b/kernel/fork.c index dcefa978c232..fc5a925e0496 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -662,6 +662,10 @@ static void check_mm(struct mm_struct *mm) #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS VM_BUG_ON_MM(mm->pmd_huge_pte, mm); #endif + VM_BUG_ON_MM(!list_empty(&mm->pud_huge_pte) && + !pagechain_empty(list_first_entry(&mm->pud_huge_pte, + struct pagechain, list)), + mm); } #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) @@ -1003,6 +1007,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte = NULL; #endif + INIT_LIST_HEAD(&mm->pud_huge_pte); mm_init_uprobes_state(mm); if (current->mm) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ffcae07a87d3..cad4ef01f607 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -62,6 +62,8 @@ static struct shrinker deferred_split_shrinker; static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; +static atomic_t huge_pud_zero_refcount; +struct page *huge_pud_zero_page __read_mostly; bool transparent_hugepage_enabled(struct vm_area_struct *vma) { @@ -109,6 +111,42 @@ static void put_huge_zero_page(void) BUG_ON(atomic_dec_and_test(&huge_zero_refcount)); } +static struct page *get_huge_pud_zero_page(void) +{ + struct page *zero_page; +retry: + if (likely(atomic_inc_not_zero(&huge_pud_zero_refcount))) + return READ_ONCE(huge_pud_zero_page); + + zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, + HPAGE_PUD_ORDER); + if (!zero_page) { + count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED); + return NULL; + } + count_vm_event(THP_ZERO_PAGE_ALLOC); + preempt_disable(); + if (cmpxchg(&huge_pud_zero_page, NULL, zero_page)) { + preempt_enable(); + __free_pages(zero_page, compound_order(zero_page)); + goto retry; + } + + /* We take additional reference here. It will be put back by shrinker */ + atomic_set(&huge_pud_zero_refcount, 2); + preempt_enable(); + return READ_ONCE(huge_pud_zero_page); +} + +static void put_huge_pud_zero_page(void) +{ + /* + * Counter should never go to zero here. Only shrinker can put + * last reference. + */ + BUG_ON(atomic_dec_and_test(&huge_pud_zero_refcount)); +} + struct page *mm_get_huge_zero_page(struct mm_struct *mm) { if (test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) @@ -123,9 +161,23 @@ struct page *mm_get_huge_zero_page(struct mm_struct *mm) return READ_ONCE(huge_zero_page); } +struct page *mm_get_huge_pud_zero_page(struct mm_struct *mm) +{ + if (test_bit(MMF_HUGE_PUD_ZERO_PAGE, &mm->flags)) + return READ_ONCE(huge_pud_zero_page); + + if (!get_huge_pud_zero_page()) + return NULL; + + if (test_and_set_bit(MMF_HUGE_PUD_ZERO_PAGE, &mm->flags)) + put_huge_pud_zero_page(); + + return READ_ONCE(huge_pud_zero_page); +} + void mm_put_huge_zero_page(struct mm_struct *mm) { - if (test_bit(MMF_HUGE_ZERO_PAGE, &mm->flags)) + if (test_bit(MMF_HUGE_PUD_ZERO_PAGE, &mm->flags)) put_huge_zero_page(); } @@ -859,6 +911,175 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr, return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); + +static int __do_huge_pud_anonymous_page(struct vm_fault *vmf, struct page *page, + gfp_t gfp) +{ + struct vm_area_struct *vma = vmf->vma; + struct mem_cgroup *memcg; + pmd_t *pmd_pgtable; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + int ret = 0; + + VM_BUG_ON_PAGE(!PageCompound(page), page); + + if (mem_cgroup_try_charge(page, vma->vm_mm, gfp, &memcg, true)) { + put_page(page); + count_vm_event(THP_FAULT_FALLBACK_PUD); + return VM_FAULT_FALLBACK; + } + + pmd_pgtable = pmd_alloc_one_page_with_ptes(vma->vm_mm, haddr); + if (unlikely(!pmd_pgtable)) { + ret = VM_FAULT_OOM; + goto release; + } + + clear_huge_page(page, vmf->address, HPAGE_PUD_NR); + /* + * The memory barrier inside __SetPageUptodate makes sure that + * clear_huge_page writes become visible before the set_pmd_at() + * write. + */ + __SetPageUptodate(page); + + vmf->ptl = pud_lock(vma->vm_mm, vmf->pud); + if (unlikely(!pud_none(*vmf->pud))) { + goto unlock_release; + } else { + pud_t entry; + int i; + + ret = check_stable_address_space(vma->vm_mm); + if (ret) + goto unlock_release; + + /* Deliver the page fault to userland */ + if (userfaultfd_missing(vma)) { + int ret; + + spin_unlock(vmf->ptl); + mem_cgroup_cancel_charge(page, memcg, true); + put_page(page); + pmd_free_page_with_ptes(vma->vm_mm, pmd_pgtable); + ret = handle_userfault(vmf, VM_UFFD_MISSING); + VM_BUG_ON(ret & VM_FAULT_FALLBACK); + return ret; + } + + entry = mk_huge_pud(page, vma->vm_page_prot); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + page_add_new_anon_rmap(page, vma, haddr, true); + mem_cgroup_commit_charge(page, memcg, false, true); + lru_cache_add_active_or_unevictable(page, vma); + pgtable_trans_huge_pud_deposit(vma->vm_mm, vmf->pud, + virt_to_page(pmd_pgtable)); + set_pud_at(vma->vm_mm, haddr, vmf->pud, entry); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PUD_NR); + mm_inc_nr_pmds(vma->vm_mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_inc_nr_ptes(vma->vm_mm); + spin_unlock(vmf->ptl); + count_vm_event(THP_FAULT_ALLOC_PUD); + } + + return 0; +unlock_release: + spin_unlock(vmf->ptl); +release: + if (pmd_pgtable) + pmd_free_page_with_ptes(vma->vm_mm, pmd_pgtable); + mem_cgroup_cancel_charge(page, memcg, true); + put_page(page); + return ret; + +} + +/* Caller must hold page table lock. */ +static bool set_huge_pud_zero_page(pgtable_t pmd_pgtable, + struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long haddr, pud_t *pud, + struct page *zero_page) +{ + pud_t entry; + int i; + + if (!pud_none(*pud)) + return false; + entry = mk_pud(zero_page, vma->vm_page_prot); + entry = pud_mkhuge(entry); + if (pmd_pgtable) + pgtable_trans_huge_pud_deposit(mm, pud, pmd_pgtable); + set_pud_at(mm, haddr, pud, entry); + mm_inc_nr_pmds(mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_inc_nr_ptes(mm); + return true; +} + +int do_huge_pud_anonymous_page(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + gfp_t gfp; + struct page *page; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + + if (haddr < vma->vm_start || haddr + HPAGE_PUD_SIZE > vma->vm_end) + return VM_FAULT_FALLBACK; + if (unlikely(anon_vma_prepare(vma))) + return VM_FAULT_OOM; + if (unlikely(khugepaged_enter(vma, vma->vm_flags))) + return VM_FAULT_OOM; + if (!(vmf->flags & FAULT_FLAG_WRITE) && + !mm_forbids_zeropage(vma->vm_mm) && + transparent_hugepage_use_zero_page()) { + pmd_t *pmd_pgtable; + struct page *zero_page; + bool set; + int ret; + + pmd_pgtable = pmd_alloc_one_page_with_ptes(vma->vm_mm, haddr); + if (unlikely(!pmd_pgtable)) + return VM_FAULT_OOM; + zero_page = mm_get_huge_pud_zero_page(vma->vm_mm); + if (unlikely(!zero_page)) { + pmd_free_page_with_ptes(vma->vm_mm, pmd_pgtable); + count_vm_event(THP_FAULT_FALLBACK_PUD); + return VM_FAULT_FALLBACK; + } + vmf->ptl = pud_lock(vma->vm_mm, vmf->pud); + ret = 0; + set = false; + if (pud_none(*vmf->pud)) { + ret = check_stable_address_space(vma->vm_mm); + if (ret) { + spin_unlock(vmf->ptl); + } else if (userfaultfd_missing(vma)) { + spin_unlock(vmf->ptl); + ret = handle_userfault(vmf, VM_UFFD_MISSING); + VM_BUG_ON(ret & VM_FAULT_FALLBACK); + } else { + set_huge_pud_zero_page(virt_to_page(pmd_pgtable), + vma->vm_mm, vma, haddr, vmf->pud, zero_page); + spin_unlock(vmf->ptl); + set = true; + } + } else + spin_unlock(vmf->ptl); + if (!set) + pmd_free_page_with_ptes(vma->vm_mm, pmd_pgtable); + return ret; + } + gfp = alloc_hugepage_direct_gfpmask(vma); + page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PUD_ORDER); + if (unlikely(!page)) { + count_vm_event(THP_FAULT_FALLBACK_PUD); + return VM_FAULT_FALLBACK; + } + prep_transhuge_page(page); + return __do_huge_pud_anonymous_page(vmf, page, gfp); +} + #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, @@ -1980,12 +2201,27 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) } #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static inline void zap_pud_deposited_table(struct mm_struct *mm, pud_t *pud) +{ + pgtable_t pgtable; + int i; + + pgtable = pgtable_trans_huge_pud_withdraw(mm, pud); + pmd_free_page_with_ptes(mm, (pmd_t *)page_address(pgtable)); + + mm_dec_nr_pmds(mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_dec_nr_ptes(mm); +} + int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr) { pud_t orig_pud; spinlock_t *ptl; + tlb_remove_check_page_size_change(tlb, HPAGE_PUD_SIZE); + ptl = __pud_trans_huge_lock(pud, vma); if (!ptl) return 0; @@ -2001,9 +2237,34 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, if (vma_is_dax(vma)) { spin_unlock(ptl); /* No zero page support yet */ + } else if (is_huge_zero_pud(orig_pud)) { + zap_pud_deposited_table(tlb->mm, pud); + spin_unlock(ptl); + tlb_remove_page_size(tlb, pud_page(orig_pud), HPAGE_PUD_SIZE); } else { - /* No support for anonymous PUD pages yet */ - BUG(); + struct page *page = NULL; + int flush_needed = 1; + + if (pud_present(orig_pud)) { + page = pud_page(orig_pud); + page_remove_rmap(page, true); + VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); + VM_BUG_ON_PAGE(!PageHead(page), page); + } else + WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + + if (PageAnon(page)) { + zap_pud_deposited_table(tlb->mm, pud); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PUD_NR); + } else { + if (arch_needs_pgtable_deposit()) + zap_pud_deposited_table(tlb->mm, pud); + add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PUD_NR); + } + + spin_unlock(ptl); + if (flush_needed) + tlb_remove_page_size(tlb, page, HPAGE_PUD_SIZE); } return 1; } diff --git a/mm/memory.c b/mm/memory.c index 019036e87088..177478d5ee47 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3710,7 +3710,7 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf) #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* No support for anonymous transparent PUD pages yet */ if (vma_is_anonymous(vmf->vma)) - return VM_FAULT_FALLBACK; + return do_huge_pud_anonymous_page(vmf); if (vmf->vma->vm_ops->huge_fault) return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -4593,3 +4593,29 @@ void ptlock_free(struct page *page) kmem_cache_free(page_ptl_cachep, page->ptl); } #endif + +static struct kmem_cache *pagechain_cachep; + +void __init pagechain_cache_init(void) +{ + pagechain_cachep = kmem_cache_create("pagechain", + sizeof(struct pagechain), 0, SLAB_PANIC, NULL); +} + +struct pagechain *pagechain_alloc(void) +{ + struct pagechain *chain; + + chain = kmem_cache_alloc(pagechain_cachep, GFP_ATOMIC); + + if (!chain) + return NULL; + + pagechain_init(chain); + return chain; +} + +void pagechain_free(struct pagechain *pchain) +{ + kmem_cache_free(pagechain_cachep, pchain); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfa99bb54bd6..a3b295ea7348 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5157,7 +5157,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR), K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR), - K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR), + K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR + + node_page_state(pgdat, NR_ANON_THPS_PUD) * HPAGE_PUD_NR), #endif K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), K(node_page_state(pgdat, NR_UNSTABLE_NFS)), diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 532c29276fce..0b79568fba1c 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -9,6 +9,7 @@ #include #include +#include #include #include @@ -44,7 +45,7 @@ void pmd_clear_bad(pmd_t *pmd) #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS /* - * Only sets the access flags (dirty, accessed), as well as write + * Only sets the access flags (dirty, accessed), as well as write * permission. Furthermore, we know it always gets set to a "more * permissive" setting, which allows most architectures to optimize * this. We return whether the PTE actually changed, which in turn @@ -161,6 +162,23 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru); pmd_huge_pte(mm, pmdp) = pgtable; } + +void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, + pgtable_t pgtable) +{ + struct pagechain *chain = NULL; + + assert_spin_locked(pud_lockptr(mm, pudp)); + /* FIFO */ + chain = list_first_entry_or_null(&pud_huge_pte(mm, pudp), + struct pagechain, list); + + if (!chain || !pagechain_space(chain)) { + chain = pagechain_alloc(); + list_add(&chain->list, &pud_huge_pte(mm, pudp)); + } + pagechain_deposit(chain, pgtable); +} #endif #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW @@ -179,6 +197,33 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) list_del(&pgtable->lru); return pgtable; } + +pgtable_t pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp) +{ + pgtable_t pgtable; + struct pagechain *chain = NULL; + + assert_spin_locked(pud_lockptr(mm, pudp)); + + /* FIFO */ +retry: + chain = list_first_entry_or_null(&pud_huge_pte(mm, pudp), + struct pagechain, list); + + if (!chain) + return NULL; + + if (pagechain_empty(chain)) { + if (list_is_singular(&chain->list)) + return NULL; + list_del(&chain->list); + pagechain_free(chain); + goto retry; + } + + pgtable = pagechain_withdraw(chain); + return pgtable; +} #endif #ifndef __HAVE_ARCH_PMDP_INVALIDATE diff --git a/mm/rmap.c b/mm/rmap.c index 0454ecc29537..dae66a4329ea 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -712,6 +712,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) pgd_t *pgd; p4d_t *p4d; pud_t *pud; + pud_t pude; pmd_t *pmd = NULL; pmd_t pmde; @@ -724,7 +725,10 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pud = pud_offset(p4d, address); - if (!pud_present(*pud)) + + pude = *pud; + barrier(); + if (!pud_present(pude) || pud_trans_huge(pude)) goto out; pmd = pmd_offset(pud, address); @@ -1121,8 +1125,12 @@ void do_page_add_anon_rmap(struct page *page, * pte lock(a spinlock) is held, which implies preemption * disabled. */ - if (compound) - __inc_node_page_state(page, NR_ANON_THPS); + if (compound) { + if (nr == HPAGE_PMD_NR) + __inc_node_page_state(page, NR_ANON_THPS); + else + __inc_node_page_state(page, NR_ANON_THPS_PUD); + } __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr); } if (unlikely(PageKsm(page))) @@ -1160,7 +1168,10 @@ void page_add_new_anon_rmap(struct page *page, VM_BUG_ON_PAGE(!PageTransHuge(page), page); /* increment count (starts at -1) */ atomic_set(compound_mapcount_ptr(page), 0); - __inc_node_page_state(page, NR_ANON_THPS); + if (nr == HPAGE_PMD_NR) + __inc_node_page_state(page, NR_ANON_THPS); + else + __inc_node_page_state(page, NR_ANON_THPS_PUD); } else { /* Anon THP always mapped first with PMD */ VM_BUG_ON_PAGE(PageTransCompound(page), page); @@ -1265,19 +1276,22 @@ static void page_remove_anon_compound_rmap(struct page *page) if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) return; - __dec_node_page_state(page, NR_ANON_THPS); + if (hpage_nr_pages(page) == HPAGE_PMD_NR) + __dec_node_page_state(page, NR_ANON_THPS); + else + __dec_node_page_state(page, NR_ANON_THPS_PUD); if (TestClearPageDoubleMap(page)) { /* * Subpages can be mapped with PTEs too. Check how many of * themi are still mapped. */ - for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) { + for (i = 0, nr = 0; i < hpage_nr_pages(page); i++) { if (atomic_add_negative(-1, &page[i]._mapcount)) nr++; } } else { - nr = HPAGE_PMD_NR; + nr = hpage_nr_pages(page); } if (unlikely(PageMlocked(page))) diff --git a/mm/vmstat.c b/mm/vmstat.c index c18a42250a5c..25a88693e417 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1158,6 +1158,7 @@ const char * const vmstat_text[] = { "nr_shmem_hugepages", "nr_shmem_pmdmapped", "nr_anon_transparent_hugepages", + "nr_anon_transparent_pud_hugepages", "nr_unstable", "nr_vmscan_write", "nr_vmscan_immediate_reclaim", @@ -1259,6 +1260,8 @@ const char * const vmstat_text[] = { "thp_deferred_split_page", "thp_split_pmd", #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + "thp_fault_alloc_pud", + "thp_fault_fallback_pud", "thp_split_pud", #endif "thp_zero_page_alloc", From patchwork Fri Feb 15 22:08:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815969 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5137913B5 for ; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4111A2E9E6 for ; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3549C2FE51; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E44822FE50 for ; Fri, 15 Feb 2019 22:09:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38F9F8E000D; Fri, 15 Feb 2019 17:09:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 314778E0009; Fri, 15 Feb 2019 17:09:19 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DBB78E000D; Fri, 15 Feb 2019 17:09:19 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id E8B028E0009 for ; Fri, 15 Feb 2019 17:09:18 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id c9so2510259qte.11 for ; Fri, 15 Feb 2019 14:09:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=jPUMl05hp4GyYNPKyUU/j1/LhtwQWzBCTR2KiGtIjhE=; b=HUeVbpD1FUvfBBXm1GZOFbhbsud+TNonGQwf/Vy6zhu0WsuWJsKPli77bQCgIzDudr b649XafSwVayG435mqGsmMzfbD1scpamLp4wz3b9LRs7uTKKiWcKHi3h0fNSLK372dfD IkkKPIU6zxiHaBgTzGMLqax0Ip+FQOddZr1BEzTW04XiKcgCDl4zkdhFUZVQfl+3Yh/c Bj2jtid2Rs+yMs1Ipy8rtFldej5QgMSy8blPD6PQY+pVEXCJMm+VIgEWN+iBNiZ7Uom2 2/qDcGthnIOOOAnEePGvbwO0CG5Ga9sZBGhqe6ssAEijp2nVq7w0CGIRgVAlWj7fFy7L dJug== X-Gm-Message-State: AHQUAuY6sJhq/vh3vRS2sQu5D0V1EI12NmnOb4sN4ptRPYr0oA+LvSRh iTOaT3m6lLgIVyYDqH2ImiYJKBSrlZL0xBq+AynQbmAcHRT2i+J6K7x+adTo1U+AmJ6Sm4J1XqJ K4NQ0/eODvA0oevVwNE+sgcSnVe/NX6zduFOukxoLiFH4WDq4TSwxaFAdFIUPstOcLQ== X-Received: by 2002:ac8:396b:: with SMTP id t40mr9188508qtb.159.1550268558754; Fri, 15 Feb 2019 14:09:18 -0800 (PST) X-Google-Smtp-Source: AHgI3IZ1l5A6MeKFGEzqxVhYGI/N5VWAgGcYqP5RXDrwD44ENI3cJvtZAbGdrAhsY0eX2bV0PrHy X-Received: by 2002:ac8:396b:: with SMTP id t40mr9188469qtb.159.1550268558135; Fri, 15 Feb 2019 14:09:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268558; cv=none; d=google.com; s=arc-20160816; b=d87r28LgYNdd7tXj5Ya7jlPcbd5O/AdQqI7y2anAyV+DAwm6pTDl1R71mcnwXATpJM OUoWu+YWcPcQL2iIUB1dL8ISd6gl0Qrle13eMImq503KQHAtT0PuWfiFqjAl6JRVbGes RgGvnjaSkhoCSCEYvZTO++KyAYT9g/XCMwf0IWS0aBrZP0PMqT4Bt2jB7b358mTpJ9iP 060bIlxZC4OwMXl0JMhOHJsHQ6NVXA7mTOwGoBl5+zWNSHHdDirjwyuXA/7EUn6jUr66 RnvZFxMJeb1alWE0OKpx2fJRq2haWh1O+KGeUF5mGzJovo5n0otPc+c3ILdSgsPsLn9I +C6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=jPUMl05hp4GyYNPKyUU/j1/LhtwQWzBCTR2KiGtIjhE=; b=n/VpN1k2J8QEBVKGyiVnMJ+s8S4YJfFtffWq1sQKPHQVQ9bvcfZxhO6FLSF1+TG7J5 j6RxU2oxNRu40PHXYwtDjpDa78JlNvxOvUIXe9LjuffPf/8A6JwK7NZNga7/aomQliGT HMFDjunqp3x2wSaTp9Uv9rpbxLgIhl2okFloMuWs4CIOcoe+frE79y/gt803KfBvf2Ev MAGr2uQD0Mw8rSymw2X9Hc6IcBws6OfNFpaxjdHxAce/BvoM+LXret8ppGK8Sc7KACBl O0e6eesrgXX8maH2S/NkRCPCphqX0SwPfaJ8ThdHmhKDZrAdX+zf3ODl3qmnJk1Sx9Kn w8uw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=BQ6h4EMP; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="3v/Qhodh"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id l15si4605480qti.7.2019.02.15.14.09.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:18 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=BQ6h4EMP; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="3v/Qhodh"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 4C83B31E4; Fri, 15 Feb 2019 17:09:16 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=jPUMl05hp4GyY NPKyUU/j1/LhtwQWzBCTR2KiGtIjhE=; b=BQ6h4EMPTg4jJ0R72Qup5JbWpeCpy dzOaqn8Ww8vNnPWA3Gsra+/513+QVm3TnPdfFClO5KVwub/InOwz/Mnx+T3N6UKN /zRSRUSUzkx8O9RHe9QTJM4nfusej/0Ti0wJrf5Sl6JgqKJ+B54Xy3qM0E3BqAX8 rmKpUXAZCFfpVvSoATLHZdgBtA6r1XW823RVM9iqQ+4PZfU4RF5yNVGIzGo++IJz HMLLR4odcMaEAp126CPnk0kJXug+Apz9T1VpbD/iYyYitRDC+NoJgXgeJMTT+6OW 47thcQNkT0SIPHA4jQAeZSBheymoaWDiQ1y8jJb3iA1HRWuqR0RfYdkXw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=jPUMl05hp4GyYNPKyUU/j1/LhtwQWzBCTR2KiGtIjhE=; b=3v/Qhodh InEnJMOf3bLckfH9lSJhVQmHnWX/0B2Q4Y2ByWqwrlIcXJI1yMDZA4FMUEpAz6Kx sfD4ukW/MRGzTnI79jbXaeyWKo2EQZGiANrOZuQYzvQD6rYitA4xMo6t5vS6ahzJ 3I9793OkBllaXj2mne6aJa3FsPJOcenUyqAnotNAri7k6anYgC4rvS/19Pj/XCA1 aVNGKgwn/IzSH5TefCaI3OM0otU2R8Y7nSWLtwDINEAvFntb5FWDjla1Jgy4A2IZ JkxolW+rRHnJxQMfM4ms+Avce8xytqW+AcfcBqskAyhcUP9ue4lvzQRCp2zTONBk AP4rId/C4vq3rA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeel X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id E04D5E4680; Fri, 15 Feb 2019 17:09:13 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 10/31] mm: proc: add 1GB THP kpageflag. Date: Fri, 15 Feb 2019 14:08:35 -0800 Message-Id: <20190215220856.29749-11-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Bit 27 is used to identify 1GB THP. Signed-off-by: Zi Yan --- fs/proc/page.c | 2 ++ include/uapi/linux/kernel-page-flags.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/proc/page.c b/fs/proc/page.c index 40b05e0d4274..5d1471a6082a 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -138,6 +138,8 @@ u64 stable_page_flags(struct page *page) u |= 1 << KPF_ZERO_PAGE; u |= 1 << KPF_THP; } + if (compound_order(head) == HPAGE_PUD_ORDER) + u |= 1 << KPF_PUD_THP; } else if (is_zero_pfn(page_to_pfn(page))) u |= 1 << KPF_ZERO_PAGE; diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h index 21b9113c69da..743bd730917d 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -36,5 +36,7 @@ #define KPF_ZERO_PAGE 24 #define KPF_IDLE 25 #define KPF_PGTABLE 26 +#define KPF_PUD_THP 27 + #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ From patchwork Fri Feb 15 22:08:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815965 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E57CD17E0 for ; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D5E152FEAE for ; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C9D1F2E9E6; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B8F72FEAE for ; Fri, 15 Feb 2019 22:09:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EF7F8E000E; Fri, 15 Feb 2019 17:09:20 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2A09A8E0009; Fri, 15 Feb 2019 17:09:20 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 167218E000E; Fri, 15 Feb 2019 17:09:20 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id E33498E0009 for ; Fri, 15 Feb 2019 17:09:19 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id d13so10342786qth.6 for ; Fri, 15 Feb 2019 14:09:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=s+M9Q1Dn/74r7Hqs3gLQcg1STi/Xk0kU2xFV7ZybJ5M=; b=L2BFguBpr/DNcgJm9VMuViyDOxQdf/4C/jefn3WFBRU6yJRfAiyoejkcQRrYuICDSr n9Tx8HJzQmZQpGWh2jA8uqij5RHPl5Yd7wbnLRXWi7G1+rYsnqiLnvMhRMgJhcN/3vlb sau1P8bw0aV74H/N+YHQd4cMpXkWf7aoTM+25gm441OjKJZULJdffc5UpEQIdcjXGHfA 0e+ydcPCooP9KdkFmAlEgRNV7+tjr0LB6lM6kcXeO1U/g9qlis04vItD9z83nVRJCnDk Ll5g82YfWFG/KPTWvLzb94C2nncxIfc7S7M9190BhXXr+3luuG+hdkiTfIVgFZ3sLIFI JskA== X-Gm-Message-State: AHQUAuZeQ2c1iT03XzPUofPOoxEEuRVPC0eKVZTxDEhBKOFxCvwH+Gq4 wnZ8jRRg1T+F8uYqW5kZk16CQMKq//umwihxgHR4/VWBHHuIUVxiJKDm895AU5wJNqUhy9QgEir zqlLcOGuWb//CI39Sw6KUJk61cXpZQ8APcaZrXybW7cFd7ptn+DNLQqbMxBpPIo097Q== X-Received: by 2002:a37:9a13:: with SMTP id c19mr4963184qke.48.1550268559730; Fri, 15 Feb 2019 14:09:19 -0800 (PST) X-Google-Smtp-Source: AHgI3IbwqqwK27XoMgTt+/+jBduorQua0qN+eUAcWGWBZ9goM5CEW7XbIzXbdUJfUtXsDJIsBMyq X-Received: by 2002:a37:9a13:: with SMTP id c19mr4963153qke.48.1550268559263; Fri, 15 Feb 2019 14:09:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268559; cv=none; d=google.com; s=arc-20160816; b=HUqEjDukVMuD0MfWON3gH2f/DTAfI8Z5a+HDYkX5GlAApBnPdutefDek1aPr1U2zTi DGD7qcpN4dgdDCHctL5UyjlLmDE0RqHUM71p4MaiGpZE6RxqQ8hFL+q+KRvcUAnobXtT VSQw1lSBZJV0S2GxvSfSpD5XvJW6qJ294iyAx/Y8DvELE8iP5ZtRJxyqLS6sFEvT5Zgm 3SQrXaI/sEUgrCeOngw1iE6bbpu3z8harVxo5f1DMa6BCc4Tagg9tMvd/3/i5SHAk5YE +cVRMFzW3KYOcJwTPnrDvUOi4hvTN/2RfdFjT9KykJqyNfRw63z9hOObbsjG0M+3khup X7Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=s+M9Q1Dn/74r7Hqs3gLQcg1STi/Xk0kU2xFV7ZybJ5M=; b=o4oVP7xRDOK9rIa8iqnlEXp2RvzhTBvsfEpLpduQbek/klUM08fKdTzHkQhrbQD4pb tzhcNR4QQUfytV7v9lew/AtdWvOKCIfTCxVypZpq0GLKi7rz+9pxO8V74W9rnzt/s6UN fH1TEn4N9M/uoLyLzf/cwjtp0pby89pSTjkvKSh1us7paim4XokU0zsYbjSv9v4k8w1b I5LxdTWNhHC5gmy1B4ezBPCTjCXWuNFDo/CtrgVEyfYhF8luHi1C68AuqTyyiWn12ofM 06XAOJSzHlRrlm1keBo23KplRw2zNEu8sWxCwUzgJPti+4ZZWfHJJ2rsXOnXSoaP2Nss rPXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=LMQQ6Foj; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="OIv/vxn8"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id m37si509442qvc.174.2019.02.15.14.09.19 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:19 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=LMQQ6Foj; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="OIv/vxn8"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 7BBD7310A; Fri, 15 Feb 2019 17:09:17 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=s+M9Q1Dn/74r7 Hqs3gLQcg1STi/Xk0kU2xFV7ZybJ5M=; b=LMQQ6FojPBEWUxYcZyupLkHS7Gi6/ lCmVLgDcfuWz9E42THOfhFiodqUK9AGh/8qXtf2Oz92BOz8GBlswGNkO10eAsI8p MnPC+Z6phDvGEVAQJx3NS0hZBEIrm+Zykik8xVKRicMjNuMx/Q5Pseh/BkMkdOPE mVyJTbEZepYPiFmRSwzOYwIyafdxqPSjfy8tMAvmCxtYnm+Y5m7FfyqLmSfsD7be gHmAJQ+E9QlOo7671DKVipxixMf2koJcueOM/TJhVPBR2JG35KIyWX93ehf6wXz+ HmTH5oKrFuZVcMTORI4XtdgikLmA6D3ixeLcAvtgo6XjQ+Wyo8Olt3AEg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=s+M9Q1Dn/74r7Hqs3gLQcg1STi/Xk0kU2xFV7ZybJ5M=; b=OIv/vxn8 z8uHrE0XwEs5J3iyx9QT7uKYL6ToBVMAWN5DFv8SVn3iTZNUSJa8TdtF/FnfxOuW ydrpe04VTkGrfL2oJenrQ5PVE1XQ0mc5KNg3vxskG6exsbG63ILI6X+rRapRgWHH WaHKW16CkiQF1x7giugKKyZPqsUr9bh2OQTSCV7JPYHe/CwzqTd+aNnYtMASt+vs HLA4JPfbL8CjOvnHUBvbESMGVjEzXx4Ar7HNpQl+4aggb0Gyi2rJXOu0E8F2cZ1T 4tU6/j3coPB5+DfUjENijh2Nd+dv0CucbaI06/1vRIFVfLjSTxVRasibOPW5C5gr OLmyU8z8PTtlAg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeel X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 922BBE4511; Fri, 15 Feb 2019 17:09:15 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 11/31] mm: debug: print compound page order in dump_page(). Date: Fri, 15 Feb 2019 14:08:36 -0800 Message-Id: <20190215220856.29749-12-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Since we have more than just PMD-level THPs, printing compound page order is helpful to check the actual compound page sizes. Signed-off-by: Zi Yan --- mm/debug.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/debug.c b/mm/debug.c index 0abb987dad9b..21d211d7776c 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -68,8 +68,12 @@ void __dump_page(struct page *page, const char *reason) pr_warn("page:%px count:%d mapcount:%d mapping:%px index:%#lx", page, page_ref_count(page), mapcount, page->mapping, page_to_pgoff(page)); - if (PageCompound(page)) - pr_cont(" compound_mapcount: %d", compound_mapcount(page)); + if (PageCompound(page)) { + struct page *head = compound_head(page); + + pr_cont(" compound_mapcount: %d, order: %d", compound_mapcount(page), + compound_order(head)); + } pr_cont("\n"); if (PageAnon(page)) pr_warn("anon "); From patchwork Fri Feb 15 22:08:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1377917E9 for ; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 041D22FEAE for ; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EC5AC2E9E6; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 78FF52FEBD for ; Fri, 15 Feb 2019 22:09:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A69668E000F; Fri, 15 Feb 2019 17:09:21 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A19748E0009; Fri, 15 Feb 2019 17:09:21 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E2938E000F; Fri, 15 Feb 2019 17:09:21 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id 5B2018E0009 for ; Fri, 15 Feb 2019 17:09:21 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id u197so9234787qka.8 for ; Fri, 15 Feb 2019 14:09:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=rMjYn468lmWbJJ/Y9UnoFTDhCSw4u+AQXz4D4ksBw0s=; b=KZJ9C8N9UXNF6uRT35U8lm8b0uwvZz32iUY+0Df+URw7+lQZAwtHMO/sA4xdkbJFob H0i6gLSLkGeM4/iqtgnWUu0PNdpPoIoF+xTidDuXfqPn9ZSoqt5bKyM4V4PQtZ0oROzz oMqACVmsp/u7u5nHLlukxQ2m/Fsfmx3ck+jJtPDUdgb+nH1JLkxVIzIfq1NX3r44uFi0 ONQ46z8ogl/VntUy5AsmjbNf3oxEoGgvF7x6YE3sGEspbUC4g84VZrehmrdB/ZC5OVPa FtSnkTkXFKn53SznGXl9YC/6oWHTr6yJZ0bBrUfbIENCbYLjMaxMuZMsXwgt3j3seukR pqiA== X-Gm-Message-State: AHQUAuZOZAL9PKvD5me0mjuZhZf+6hpZV37p53/iRfqB7PDBYPoNM492 9/H1QsiAiTx1oammscbPW9/NGgBsl9QOfLPOiHKBXVwtpPKSj+TRmGIu/xDAfrOZfCsXlhGcqoL 2aRVIExWhoB/3TouZxs0L6vwkjbeth3SR5JhnJOyNxTSewlNdxLTNyzgDxwLgQW3P7g== X-Received: by 2002:a37:4a84:: with SMTP id x126mr8750419qka.326.1550268561149; Fri, 15 Feb 2019 14:09:21 -0800 (PST) X-Google-Smtp-Source: AHgI3IYRC5yz7gIudHkD6aJeEzrKNC5qlxCOTc5eDh8+kYkzvx+RCCNVqsFTQCq9ju22RjIIm/dQ X-Received: by 2002:a37:4a84:: with SMTP id x126mr8750390qka.326.1550268560665; Fri, 15 Feb 2019 14:09:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268560; cv=none; d=google.com; s=arc-20160816; b=Q7VmJBrwoBsYpUvxcHO53lU/+yBF76Qu4nYa94J9rZ9KOEZeokaOqJroa0FqxUcHA1 cSgLiO6Uv5U2r49toRA9+YcDx71YP6wj9WltkhcWhjSQ4qX87He1Za5oy+tjJY1lCcVv Jn7xfbxXwqafvlLzRhUk6BCkb6HNrlTqBEw4psVTrx4PROJoSd0sLj/BgAR757U9rmZr tLmMDIUfYL1nS1of5FCvhqP6h7lLv4CzdUn4R+vDlI4Xy+lPKfkhTZUuEARG+kBYwRbl rMrcLjqvuc6rRIS5/X6dAuHrty2On5z05SodEmVAS1qmIF6oFxhs5gGgvKROTK8JRgB9 QB+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=rMjYn468lmWbJJ/Y9UnoFTDhCSw4u+AQXz4D4ksBw0s=; b=COcm4zIzMgwdDJUNwYxQ6Q5/0ItrT62qNe41b+ycArxlW0nSd4UhyjkQLrvvbgMhsG 7T5huS8fCoNgly0jk1ze44Q2fw4WYMLOqsMF9C4CwmipDuLhqgTmqAX2nA1agJs+bP1x dvd+Q9TAacqXISg+mE+LtHzla82RomhcVNNiEw3cGHgiBm1qLmEwxdIOIua4pbw8Ze+w kTL3ZMfV5chM+YtrKYcpmwqEtVli2/dOCmTNZr2ZMllbBnJViS/KKScmCFbvzl5qVYSN uxyICVBSt3KyktEbRjTfMKn25JQCfFy6phOl7AG7McU48bmL2nOECifbZJorL7NkbISZ MjOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=A4ujf58V; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=ZBrE7Vyw; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id j8si1303751qth.175.2019.02.15.14.09.20 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:20 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=A4ujf58V; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=ZBrE7Vyw; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id DA113328F; Fri, 15 Feb 2019 17:09:18 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=rMjYn468lmWbJ J/Y9UnoFTDhCSw4u+AQXz4D4ksBw0s=; b=A4ujf58V1MFEFOPLdNKwAJHAyesXQ SQ34q5PKA3J4Rfuw1dW3pWYteDdWQpzuBYEfB2WGmgiveEW3p4lqNVtmgs3lpeOW Ci8Jcp72+T1q/C3kicjr86sJfKICROGfrf0apD97U8oL4oZFwv00Ln3SOdNh+pe/ r9D18oGy9SvezbjLwItFw05IvtZUm5rokh02KTUv4s7RgYZeX8tEgDzoBx7qYzA0 zBDHwRlzmNqpp3ZyEGlnP7N/ROp+WMG5I5nVbM3O0pU9FWsHK3KH6WBuVk128JeW Kb8HclfadEkbyofFGyq1j7qXN5Fyx4dGJCo2pVRqAVOJbPJd+t44nI9aA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=rMjYn468lmWbJJ/Y9UnoFTDhCSw4u+AQXz4D4ksBw0s=; b=ZBrE7Vyw 38eugooyUyUUG1AKmaXNOP5GWjtIEhS3NB3jWp1ADXoCuy1Q0TFL5ISc7QS4RNaG jxGOdKluNN4kFtDYvt6CRg5Pbm6PhRPGq2+6PP4uy0Uc1DqGcuxI0e3OtSSna69K OEOYi1czI4SoArwx51Dp+B/JbNRF7o2a5/MwnnAF7QBnR5jU315NYwGEsPViMxVB ilI5UfGFIz1mDVzdzE2bk6LrO3aPzShgbIfylrbGLviAI/sFJQVAXKD5NqhlcbML QD+DKp15ej7KfEnMbcCzIATWLfPJ9jVfdzIlX58sHegkX2ncao6ZfVKgsGXls7wr 0HZXDk0SLwJQgQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeel X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id E7EA2E462B; Fri, 15 Feb 2019 17:09:16 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 12/31] mm: stats: Separate PMD THP and PUD THP stats. Date: Fri, 15 Feb 2019 14:08:37 -0800 Message-Id: <20190215220856.29749-13-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan PMD THPs and PUD THPs are shown in separate stats. Signed-off-by: Zi Yan --- drivers/base/node.c | 5 +++-- fs/proc/meminfo.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index f21d2235bf97..5d947a17b61b 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -127,6 +127,7 @@ static ssize_t node_read_meminfo(struct device *dev, "Node %d SUnreclaim: %8lu kB\n" #ifdef CONFIG_TRANSPARENT_HUGEPAGE "Node %d AnonHugePages: %8lu kB\n" + "Node %d AnonHugePages(1GB): %8lu kB\n" "Node %d ShmemHugePages: %8lu kB\n" "Node %d ShmemPmdMapped: %8lu kB\n" #endif @@ -150,8 +151,8 @@ static ssize_t node_read_meminfo(struct device *dev, #ifdef CONFIG_TRANSPARENT_HUGEPAGE , nid, K(node_page_state(pgdat, NR_ANON_THPS) * - HPAGE_PMD_NR) + - K(node_page_state(pgdat, NR_ANON_THPS_PUD) * + HPAGE_PMD_NR), + nid, K(node_page_state(pgdat, NR_ANON_THPS_PUD) * HPAGE_PUD_NR), nid, K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR), diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 9d127e440e4c..44a4d2dbd1d4 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -131,7 +131,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_TRANSPARENT_HUGEPAGE show_val_kb(m, "AnonHugePages: ", - global_node_page_state(NR_ANON_THPS) * HPAGE_PMD_NR + + global_node_page_state(NR_ANON_THPS) * HPAGE_PMD_NR); + show_val_kb(m, "AnonHugePages(1GB): ", global_node_page_state(NR_ANON_THPS_PUD) * HPAGE_PUD_NR); show_val_kb(m, "ShmemHugePages: ", global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR); From patchwork Fri Feb 15 22:08:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815971 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 95254139A for ; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82E172E9E6 for ; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 769E92FE50; Fri, 15 Feb 2019 22:09:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0C52B2FEE4 for ; Fri, 15 Feb 2019 22:09:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7D7E8E0010; Fri, 15 Feb 2019 17:09:23 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B057B8E0009; Fri, 15 Feb 2019 17:09:23 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 980458E0010; Fri, 15 Feb 2019 17:09:23 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id 6194E8E0009 for ; Fri, 15 Feb 2019 17:09:23 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id i66so9295096qke.21 for ; Fri, 15 Feb 2019 14:09:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=6izkSh0QkhVkdlYUdv/QvNmi6VpmN/HVg2zSBUK92No=; b=uISxrtZfNP1+WeocFiIhRXTu5wjUsjULlWKL5ay9W8G7M/2tiGHlNFyTuWFFit6mXx o10MpobUSC+39BQUfTuuMY8FVHyyirGVpBWxugazHPlgGzV77urvaCVD4+GslqlPRqgs 5NkEd9J7Okau1g5xWDSuR52unIM1kcsNmAzofRYGm25B0RUuyT4KHN2AO5LLMuQswP3Y kJ4hooatBV77Fo7BRlv4l/B2+wLBbzHonYhIIly1yCfTB1To44WuPAONUqVlyA49yGr/ SkHfWQa80w1XJW1+ARuvBfKa3jZBTAZK5IVXol+SXWzgWmzm54ZaA0DSm5UHJQUZKa2e gtFg== X-Gm-Message-State: AHQUAuaQnZfOeXF9u60NrubSIRhVkoq0libFCIo4GAh1W7uaVY15fZFw WVHdIk0qVF91Lgv0MnGLHsmL3kVfXc/6AXnORDkc0q/+3PwveMgIhHKuNbTVdL24ioDysj46d2m UrTDi7YNthlNMmL+Q4zWmwsGAFFlaP5sOo90z+nn/Ll5lS9VaqbkDMWDA9kaGKNwMeg== X-Received: by 2002:ac8:2798:: with SMTP id w24mr9301313qtw.280.1550268563143; Fri, 15 Feb 2019 14:09:23 -0800 (PST) X-Google-Smtp-Source: AHgI3IY1UShbkO641PmYLTjgE7MRJYGYBfPoSIik/RVh9bDlDWAOl5z0mxU76dOpHBM8ggeQ25Zw X-Received: by 2002:ac8:2798:: with SMTP id w24mr9301264qtw.280.1550268562194; Fri, 15 Feb 2019 14:09:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268562; cv=none; d=google.com; s=arc-20160816; b=QjgTJjscy2i54qB7g7GPgL90I+bNyu3+wbfFmkNucclyVhHrwTr2oSNRW9yqojYqjP AuNOnr2U6RriAYLJ9SF///MJu6OMe8PhueTzl+c/Ue+CC9xNUpxbtQqtsruKwdnKVMjd 1LZkea9fRh9nR5p6CAR/KJqB8augbA5RlzcmeB65/LblZbPEIaUvTY0zeUCkJES151x9 bqC1eQ14BfdY9Vq2JATrlbz2z7tRwC4CNJbOwj73oL/MCQdQV+K3slhI0+YzzTg1BDWe 9z6qQ21VWftAHG2xl3xLR9dsKRL26mkLsSHpYK7gHVf+7PfCH+bp+dakn0B4fm//smOV dFhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=6izkSh0QkhVkdlYUdv/QvNmi6VpmN/HVg2zSBUK92No=; b=DdoSzD8TPjqhjmklzuRpTXjMYA8++2ucYjURrUWRUDMlPBQavDgoOtoKnnGPK9kuas gXrxhGCKrv8A7duicas49L3UPuGzIaw0QCvjUqjRJvHE4EMiCekG83VXxiAwJ32jKCoD kR8k1RTzRa6VpYVbv8wZ3xdG0qOw+IFFo08+MNduPpGZ0BF2BP86NfnNwt6U9vp/9oAu TsXqIc+b9qYObzhnGqWrTY/rmwSIGB5Ea6hvGYYNHxq9EEGqGQM3PCVhDbycNWx8FZ+G 3vlKaRarCdIiB1pEKd/oHAoxZQjRgP0KIMAr3m2OtGeFfPcYUT6h2vTwXqhlr529hZf9 B6OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=NboxJUb4; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="V5O/6hnz"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id i67si111062qke.61.2019.02.15.14.09.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:22 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=NboxJUb4; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="V5O/6hnz"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 68849310A; Fri, 15 Feb 2019 17:09:20 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=6izkSh0QkhVkd lYUdv/QvNmi6VpmN/HVg2zSBUK92No=; b=NboxJUb4JJbErEEjJuHEaXojREiBj IoGoDAYLyW9M4ZlbXDLD5a4W8IWjKO0Lx/GTmmtBGOVdE/IgQi/QNUaq55/aIrxS rfsX3qFXTjklZIOjgrvQ8pnEC/BVBzTtlkPotu+d4rHZKzVsclO7/1srDu3xjcws mcpveWVzvfWxQdEra7EDSuGQ4jlnBG52NRa5E6/oeK792X1izyhuWiqJWXFflZzr bGoZvbegF31pe+hYlJATS0O3Xbta4fiBppVpWO1O/kc/ghnjUeAgRGKoea6T7QTb 685QmxY2A/JbgfJikJFEHWANCEyIWszkcoUh87eCOtGz9a+pDVNR91RdQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=6izkSh0QkhVkdlYUdv/QvNmi6VpmN/HVg2zSBUK92No=; b=V5O/6hnz 5DVgh0AQ+oFk/xn02eM91U6It3D895X9V5Ru8KPmXp30Hx1BOLt+nEtLby5e9l4K twJRM8vyfAzvVv2GNSMerTUpAMtDZLa1MjeoWQlUnx97nA12lLv/23sLeqIVQ+hN 7aBGO/QT1yd6lgJJZqTbzsOBEkjDvA9MPAOqLLZRz38wdN+cyoqUKkAq12oi3Sgp qO2bAGFxFuwDFILE4F3BXCss//2W1QPIGgA8DCEeo/6LV072rKcXcKZ2QflJhOuc yoQYgpfqltmK+2IGR4Srp7BeDLGFLzsw6+yHLLrarsRTrTU7gJf8p+LWMRN5nzDZ +mbPucr9aPVIcg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeel X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 4AB1DE46AB; Fri, 15 Feb 2019 17:09:18 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 13/31] mm: thp: 1GB THP copy on write implementation. Date: Fri, 15 Feb 2019 14:08:38 -0800 Message-Id: <20190215220856.29749-14-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan COW on 1GB THPs will fall back to 2MB THPs if 1GB THP is not available. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgalloc.h | 9 + include/linux/huge_mm.h | 5 + mm/huge_memory.c | 319 ++++++++++++++++++++++++++++++++- mm/memory.c | 2 +- 4 files changed, 331 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index 6e29ad9b9d7f..ebcb022f6bb9 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -110,6 +110,15 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, #define pmd_pgtable(pmd) pmd_page(pmd) +static inline void pud_populate_with_pgtable(struct mm_struct *mm, pud_t *pud, + struct page *pte) +{ + unsigned long pfn = page_to_pfn(pte); + + paravirt_alloc_pmd(mm, pfn); + set_pud(pud, __pud(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); +} + #if CONFIG_PGTABLE_LEVELS > 2 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) { diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c6272e6ffc35..02419fa91e12 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -19,6 +19,7 @@ extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD extern void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); extern int do_huge_pud_anonymous_page(struct vm_fault *vmf); +extern int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { @@ -27,6 +28,10 @@ extern int do_huge_pud_anonymous_page(struct vm_fault *vmf) { return VM_FAULT_FALLBACK; } +extern int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) +{ + return VM_FAULT_FALLBACK; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cad4ef01f607..0a006592f3fe 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1284,7 +1284,12 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, { spinlock_t *dst_ptl, *src_ptl; pud_t pud; - int ret; + pmd_t *pmd_pgtable = NULL; + int ret = -ENOMEM; + + pmd_pgtable = pmd_alloc_one_page_with_ptes(vma->vm_mm, addr); + if (unlikely(!pmd_pgtable)) + goto out; dst_ptl = pud_lock(dst_mm, dst_pud); src_ptl = pud_lockptr(src_mm, src_pud); @@ -1292,8 +1297,13 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) { + pmd_free_page_with_ptes(dst_mm, pmd_pgtable); goto out_unlock; + } + + if (pud_devmap(pud)) + pmd_free_page_with_ptes(dst_mm, pmd_pgtable); /* * When page table lock is held, the huge zero pud should not be @@ -1301,7 +1311,32 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, * a page table. */ if (is_huge_zero_pud(pud)) { - /* No huge zero pud yet */ + struct page *zero_page; + /* + * get_huge_zero_page() will never allocate a new page here, + * since we already have a zero page to copy. It just takes a + * reference. + */ + zero_page = mm_get_huge_pud_zero_page(dst_mm); + set_huge_pud_zero_page(virt_to_page(pmd_pgtable), + dst_mm, vma, addr, dst_pud, zero_page); + ret = 0; + goto out_unlock; + } + + if (pud_trans_huge(pud)) { + struct page *src_page; + int i; + + src_page = pud_page(pud); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + get_page(src_page); + page_dup_rmap(src_page, true); + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PUD_NR); + mm_inc_nr_pmds(dst_mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_pud_deposit(dst_mm, dst_pud, virt_to_page(pmd_pgtable)); } pudp_set_wrprotect(src_mm, addr, src_pud); @@ -1312,6 +1347,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, out_unlock: spin_unlock(src_ptl); spin_unlock(dst_ptl); +out: return ret; } @@ -1335,6 +1371,283 @@ void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) unlock: spin_unlock(vmf->ptl); } + +static int do_huge_pud_wp_page_fallback(struct vm_fault *vmf, pud_t orig_pud, + struct page *page) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + struct mem_cgroup *memcg; + pgtable_t pgtable, pmd_pgtable; + pud_t _pud; + int ret = 0, i, j; + struct page **pages; + struct mmu_notifier_range range; + + pages = kmalloc(sizeof(struct page *) * HPAGE_PUD_NR, + GFP_KERNEL); + if (unlikely(!pages)) { + ret |= VM_FAULT_OOM; + goto out; + } + + pmd_pgtable = pte_alloc_order(vma->vm_mm, haddr, + HPAGE_PUD_ORDER - HPAGE_PMD_ORDER); + if (!pmd_pgtable) { + ret |= VM_FAULT_OOM; + goto out_kfree_pages; + } + + for (i = 0; i < (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)); i++) { + pages[i] = alloc_page_vma_node(GFP_TRANSHUGE, vma, + vmf->address, page_to_nid(page)); + if (unlikely(!pages[i] || + mem_cgroup_try_charge(pages[i], vma->vm_mm, + GFP_KERNEL, &memcg, true))) { + if (pages[i]) + put_page(pages[i]); + while (--i >= 0) { + memcg = (void *)page_private(pages[i]); + set_page_private(pages[i], 0); + mem_cgroup_cancel_charge(pages[i], memcg, + true); + put_page(pages[i]); + } + kfree(pages); + pte_free_order(vma->vm_mm, pmd_pgtable, + HPAGE_PMD_ORDER - HPAGE_PMD_ORDER); + ret |= VM_FAULT_OOM; + goto out; + } + count_vm_event(THP_FAULT_ALLOC); + set_page_private(pages[i], (unsigned long)memcg); + prep_transhuge_page(pages[i]); + } + + for (i = 0; i < (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)); i++) { + for (j = 0; j < HPAGE_PMD_NR; j++) { + copy_user_highpage(pages[i] + j, page + i * HPAGE_PMD_NR + j, + haddr + PAGE_SIZE * (i * HPAGE_PMD_NR + j), vma); + cond_resched(); + } + __SetPageUptodate(pages[i]); + } + + mmu_notifier_range_init(&range, vma->vm_mm, haddr, + haddr + HPAGE_PUD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + vmf->ptl = pud_lock(vma->vm_mm, vmf->pud); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) + goto out_free_pages; + VM_BUG_ON_PAGE(!PageHead(page), page); + + /* + * Leave pmd empty until pte is filled note we must notify here as + * concurrent CPU thread might write to new page before the call to + * mmu_notifier_invalidate_range_end() happens which can lead to a + * device seeing memory write in different order than CPU. + * + * See Documentation/vm/mmu_notifier.txt + */ + pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); + + pgtable = pgtable_trans_huge_pud_withdraw(vma->vm_mm, vmf->pud); + pud_populate_with_pgtable(vma->vm_mm, &_pud, pgtable); + + for (i = 0; i < (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)); + i++, haddr += (PAGE_SIZE * HPAGE_PMD_NR)) { + pmd_t entry; + + entry = mk_huge_pmd(pages[i], vma->vm_page_prot); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + memcg = (void *)page_private(pages[i]); + set_page_private(pages[i], 0); + page_add_new_anon_rmap(pages[i], vmf->vma, haddr, true); + mem_cgroup_commit_charge(pages[i], memcg, false, true); + lru_cache_add_active_or_unevictable(pages[i], vma); + vmf->pmd = pmd_offset(&_pud, haddr); + VM_BUG_ON(!pmd_none(*vmf->pmd)); + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, &pmd_pgtable[i]); + set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); + } + kfree(pages); + + smp_wmb(); /* make pte visible before pmd */ + pud_populate_with_pgtable(vma->vm_mm, vmf->pud, pgtable); + page_remove_rmap(page, true); + spin_unlock(vmf->ptl); + + /* + * No need to double call mmu_notifier->invalidate_range() callback as + * the above pmdp_huge_clear_flush_notify() did already call it. + */ + mmu_notifier_invalidate_range_only_end(&range); + + ret |= VM_FAULT_WRITE; + put_page(page); + +out: + return ret; + +out_free_pages: + spin_unlock(vmf->ptl); + mmu_notifier_invalidate_range_end(&range); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)); i++) { + memcg = (void *)page_private(pages[i]); + set_page_private(pages[i], 0); + mem_cgroup_cancel_charge(pages[i], memcg, true); + put_page(pages[i]); + } +out_kfree_pages: + kfree(pages); + goto out; +} + +int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) +{ + struct vm_area_struct *vma = vmf->vma; + struct page *page = NULL, *new_page; + struct mem_cgroup *memcg; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + struct mmu_notifier_range range; + gfp_t huge_gfp; /* for allocation and charge */ + int ret = 0; + + vmf->ptl = pud_lockptr(vma->vm_mm, vmf->pud); + VM_BUG_ON_VMA(!vma->anon_vma, vma); + if (is_huge_zero_pud(orig_pud)) + goto alloc; + spin_lock(vmf->ptl); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) + goto out_unlock; + + page = pud_page(orig_pud); + VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page); + /* + * We can only reuse the page if nobody else maps the huge page or it's + * part. + */ + if (!trylock_page(page)) { + get_page(page); + spin_unlock(vmf->ptl); + lock_page(page); + spin_lock(vmf->ptl); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) { + unlock_page(page); + put_page(page); + goto out_unlock; + } + put_page(page); + } + if (reuse_swap_page(page, NULL)) { + pud_t entry; + + entry = pud_mkyoung(orig_pud); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + if (pudp_set_access_flags(vma, haddr, vmf->pud, entry, 1)) + update_mmu_cache_pud(vma, vmf->address, vmf->pud); + ret |= VM_FAULT_WRITE; + unlock_page(page); + goto out_unlock; + } + unlock_page(page); + get_page(page); + spin_unlock(vmf->ptl); +alloc: + if (transparent_hugepage_enabled(vma) && + !transparent_hugepage_debug_cow()) { + huge_gfp = alloc_hugepage_direct_gfpmask(vma); + new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PUD_ORDER); + } else + new_page = NULL; + + if (likely(new_page)) { + prep_transhuge_page(new_page); + } else { + if (!page) { + WARN(1, "%s: split_huge_page\n", __func__); + split_huge_pud(vma, vmf->pud, vmf->address); + ret |= VM_FAULT_FALLBACK; + } else { + ret = do_huge_pud_wp_page_fallback(vmf, orig_pud, page); + if (ret & VM_FAULT_OOM) { + WARN(1, "%s: split_huge_page after wp fallback\n", __func__); + split_huge_pud(vma, vmf->pud, vmf->address); + ret |= VM_FAULT_FALLBACK; + } + put_page(page); + } + count_vm_event(THP_FAULT_FALLBACK_PUD); + goto out; + } + + if (unlikely(mem_cgroup_try_charge(new_page, vma->vm_mm, + huge_gfp, &memcg, true))) { + put_page(new_page); + WARN(1, "%s: split_huge_page after mem cgroup failed\n", __func__); + split_huge_pud(vma, vmf->pud, vmf->address); + if (page) + put_page(page); + ret |= VM_FAULT_FALLBACK; + count_vm_event(THP_FAULT_FALLBACK_PUD); + goto out; + } + + count_vm_event(THP_FAULT_ALLOC_PUD); + + if (!page) + clear_huge_page(new_page, vmf->address, HPAGE_PUD_NR); + else + copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PUD_NR); + __SetPageUptodate(new_page); + + mmu_notifier_range_init(&range, vma->vm_mm, haddr, + haddr + HPAGE_PUD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + spin_lock(vmf->ptl); + if (page) + put_page(page); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) { + spin_unlock(vmf->ptl); + mem_cgroup_cancel_charge(new_page, memcg, true); + put_page(new_page); + goto out_mn; + } else { + pud_t entry; + + entry = mk_huge_pud(new_page, vma->vm_page_prot); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + pudp_huge_clear_flush_notify(vma, haddr, vmf->pud); + page_add_new_anon_rmap(new_page, vma, haddr, true); + mem_cgroup_commit_charge(new_page, memcg, false, true); + lru_cache_add_active_or_unevictable(new_page, vma); + set_pud_at(vma->vm_mm, haddr, vmf->pud, entry); + update_mmu_cache_pud(vma, vmf->address, vmf->pud); + if (!page) { + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PUD_NR); + } else { + VM_BUG_ON_PAGE(!PageHead(page), page); + page_remove_rmap(page, true); + put_page(page); + } + ret |= VM_FAULT_WRITE; + } + spin_unlock(vmf->ptl); +out_mn: + /* + * No need to double call mmu_notifier->invalidate_range() callback as + * the above pmdp_huge_clear_flush_notify() did already call it. + */ + mmu_notifier_invalidate_range_only_end(&range); +out: + return ret; +out_unlock: + spin_unlock(vmf->ptl); + return ret; +} + #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd) diff --git a/mm/memory.c b/mm/memory.c index 177478d5ee47..3608b5436519 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3722,7 +3722,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud) #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* No support for anonymous transparent PUD pages yet */ if (vma_is_anonymous(vmf->vma)) - return VM_FAULT_FALLBACK; + return do_huge_pud_wp_page(vmf, orig_pud); if (vmf->vma->vm_ops->huge_fault) return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ From patchwork Fri Feb 15 22:08:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815973 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AE8B513B5 for ; Fri, 15 Feb 2019 22:09:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9CEA92E9E6 for ; Fri, 15 Feb 2019 22:09:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 915A82FE5E; Fri, 15 Feb 2019 22:09:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA7342E9E6 for ; Fri, 15 Feb 2019 22:09:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A22BA8E0011; Fri, 15 Feb 2019 17:09:24 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9A4498E0009; Fri, 15 Feb 2019 17:09:24 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81DF78E0011; Fri, 15 Feb 2019 17:09:24 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by kanga.kvack.org (Postfix) with ESMTP id 694B28E0009 for ; Fri, 15 Feb 2019 17:09:24 -0500 (EST) Received: by mail-qk1-f199.google.com with SMTP id a11so9325286qkk.10 for ; Fri, 15 Feb 2019 14:09:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=TKkr6F+pcQWOHjR1WyLSliHQNu6t5AXNp24XvGYdLBs=; b=rlNQwv0f0Ah2N+RvecJfquB1BqBN+cg1PlJG/8Y5/ZTtCTC3FbNQ6PUoGOjqAgQiFG N41gyUduItEm3T5Zt9YBwt4etwYcptiQKhunqxFJqPYgF7fE0I2ZdmLMtR7TMlJJvpvG cDz6AE1hdpczIe5o2p7GfFdVLIv1O5U1DfSuhs8/vOTJp6hrF0DK76Gh/cbeeQvPtXtC mswj0BmNz8933Cl2d1+FAZV7uhQitA9akPmHYVvxV6V5lJgLOPj94FPhVk3JAfjKR76f IgqJp2sGxgrtxkiyzO+uAGGKRUrGT1eEpR7vBIf2KXQAT8V4gaG6qHOQNQewdto1Phqc Yd8A== X-Gm-Message-State: AHQUAuZBkB3d/D0knAvQFDOJyje13QLjO1nm8gsoulQcGGbVHTgra619 eCU5C/V/Aug4nHJu9vjX3YkNM3q6rQSTKRVTYED3goTUYmoKIR5/9EQnuJqoduq5UMFNS0r+GnJ QefQuP1Wu2mVLcuKhaVqUx8xKmC1bPTtxxzAm3JJ3f6EAPbRo4MvzHwkB7swHgvqxUg== X-Received: by 2002:a0c:b515:: with SMTP id d21mr9090146qve.31.1550268564171; Fri, 15 Feb 2019 14:09:24 -0800 (PST) X-Google-Smtp-Source: AHgI3IZzxLnjVjZZzZv2fFK0D8hEiHlyRC6Bw99llGbWuxayQvrXIlUTmwN1O6FAaUZ66L6Xyk5Q X-Received: by 2002:a0c:b515:: with SMTP id d21mr9090117qve.31.1550268563477; Fri, 15 Feb 2019 14:09:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268563; cv=none; d=google.com; s=arc-20160816; b=caha3Z27Nk11r/O/EuVaZMi4/ElT4Z4DcUOC50QO7blVmcBABfT1xVRYDc/16CQmHn 7dwVCA/ctDO1MsAqq8cuskiEf6n5dDO1hCfvxPJmNJJC8DGSw6rFQNeEmE7Syqb/0Jlo o9byIqKSRGIEj8xwQlG+Mce+Mz+mI7FmV3jKj/x/za3Oihqqaa56D8aHxiXMY+X/+ZO3 fm0YtL/SDRtXwF/S/UGLBXjtRjlPgBbmlXVyFeVUsjN8/s9gbwnunsT/84q22Atutj77 2wGiKDelHhtKVmJ/cuTFlwlbAUCinAAJByWRglu8LxYecUZJN5BGvje8Gu37bJx21u1T 35Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=TKkr6F+pcQWOHjR1WyLSliHQNu6t5AXNp24XvGYdLBs=; b=U6j7YM8GVxt3OXz+lM5yvrwAXT0YGdHXjEYAnb8o76bBonQhg3KcNpd+jkAJMn8+O/ cjJPZfDNvoYgGApfdr9CP9gOE7O5btg89x7TBj2NCTP6ODP/56v+9EUYfNUJV2UGdieM XzZNxoEQCHJM19vrWMR1cjR9PGDhFM1Okj5PHg/soGJTyfmuliB20GrUuwVNHqC55zfE KBim/0m0DLSVfCNxmbrnoLd54Tf3Dd5QM66xkRo1Qm7GfLbcrDmsFJHz+ESEy7uMWdqj gn9eqximogm0/LXyFLSx9sx//dM9CQlAfMIOlsTvGE50vhG8NmJ0osbf6zPxtgZHd0sF U4Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=AYblONMb; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=AqcjlNuy; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id b67si3647254qke.116.2019.02.15.14.09.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:23 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=AYblONMb; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=AqcjlNuy; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id AF61C328F; Fri, 15 Feb 2019 17:09:21 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=TKkr6F+pcQWOH jR1WyLSliHQNu6t5AXNp24XvGYdLBs=; b=AYblONMbNmEVoXDrndoP5hiaC7BxH if8PValgurwRTWKMwStlOQlHgFz12nYs8gjBmaxY4KEM7asJGnAvXpD24jL0OfCl sVMk0tjTf/wEJ5wOkTRVVIVVL65o1nYeV3IW57dEYTpi3W/6Ngp019RFTukKaVCO wUTsRKrojwvQl7HzZq+P3Yd7pDn+WVgRlugOEgYVlzYzSLSwPN1P3tWEVgFhi27V hkbCZ0SbB+tbcuu3Oau0ThbO+FeNjEwkcEDeGh+CE3lHVhoD+HTTMJ66GWhg4nib w2qApPqUhsTNf2WZReocjnMKGlGvoHn/FZPNCsaj4PYA/NxIe/3eSobPg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=TKkr6F+pcQWOHjR1WyLSliHQNu6t5AXNp24XvGYdLBs=; b=AqcjlNuy xcXQy0AfD6UJGUFVV2Qode5Ko3w0XsnPltynvlKlr+tV60C4P33rkCP0ttnaCk2A mErF0rSZXutipuZMwcuIavwzayInHGuFlY3wrzeJuIhfXABlUeZhlvoudnsuR8i6 5bwvOStiD21N9oe8SCrn4bCPCL5EGoIRaxvINy/5EO25rpLo2vN4/QsVoiCXKJQQ 8erINEI7KWACOjsIrLXoSLIlP6YRoUqmmP0aZwCdy5YpqBSV2KziZwT8ShhSpl1v 7ft2c/SNmHjpkSf0xfjOP3bqXUHjWKco/rcIt1pFA0AU2ll6BItBz+UXA45ZLc46 29JD3OXLsrLEBg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedufe X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id AD1A9E46AE; Fri, 15 Feb 2019 17:09:19 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 14/31] mm: thp: handling 1GB THP reference bit. Date: Fri, 15 Feb 2019 14:08:39 -0800 Message-Id: <20190215220856.29749-15-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Add PUD-level TLB flush ops and teach page_vma_mapped_talk about 1GB THPs. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgtable.h | 3 +++ arch/x86/mm/pgtable.c | 13 +++++++++++++ include/asm-generic/pgtable.h | 14 ++++++++++++++ include/linux/mmu_notifier.h | 13 +++++++++++++ include/linux/rmap.h | 1 + mm/page_vma_mapped.c | 33 +++++++++++++++++++++++++++++---- mm/rmap.c | 12 +++++++++--- 7 files changed, 82 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index ae3ac49c32ad..f99ce657d282 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1151,6 +1151,9 @@ extern int pudp_test_and_clear_young(struct vm_area_struct *vma, extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); +#define __HAVE_ARCH_PUDP_CLEAR_YOUNG_FLUSH +extern int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 0a5008690d7c..0edcfa8007cb 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -643,6 +643,19 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, return young; } +int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp) +{ + int young; + + VM_BUG_ON(address & ~HPAGE_PUD_MASK); + + young = pudp_test_and_clear_young(vma, address, pudp); + if (young) + flush_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + + return young; +} #endif /** diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 0f626d6177c3..682531e0d55c 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -121,6 +121,20 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef __HAVE_ARCH_PUDP_CLEAR_YOUNG_FLUSH +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +extern int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); +#else +int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp) +{ + BUILD_BUG(); + return 0; +} +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 4050ec1c3b45..6850b9e9b2cb 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -353,6 +353,19 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range, __young; \ }) +#define pudp_clear_flush_young_notify(__vma, __address, __pudp) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = pudp_clear_flush_young(___vma, ___address, __pudp); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address, \ + ___address + \ + PUD_SIZE); \ + __young; \ +}) + #define ptep_clear_young_notify(__vma, __address, __ptep) \ ({ \ int __young; \ diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 988d176472df..2b566736e3c2 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -206,6 +206,7 @@ struct page_vma_mapped_walk { struct page *page; struct vm_area_struct *vma; unsigned long address; + pud_t *pud; pmd_t *pmd; pte_t *pte; spinlock_t *ptl; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 11df03e71288..a473553aa9a5 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -141,9 +141,12 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) struct page *page = pvmw->page; pgd_t *pgd; p4d_t *p4d; - pud_t *pud; + pud_t pude; pmd_t pmde; + if (!pvmw->pte && !pvmw->pmd && pvmw->pud) + return not_found(pvmw); + /* The only possible pmd mapping has been handled on last iteration */ if (pvmw->pmd && !pvmw->pte) return not_found(pvmw); @@ -171,10 +174,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) p4d = p4d_offset(pgd, pvmw->address); if (!p4d_present(*p4d)) return false; - pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) + pvmw->pud = pud_offset(p4d, pvmw->address); + + /* + * Make sure the pud value isn't cached in a register by the + * compiler and used as a stale value after we've observed a + * subsequent update. + */ + pude = READ_ONCE(*pvmw->pud); + if (pud_trans_huge(pude)) { + pvmw->ptl = pud_lock(mm, pvmw->pud); + if (likely(pud_trans_huge(*pvmw->pud))) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (pud_page(*pvmw->pud) != page) + return not_found(pvmw); + return true; + } else { + /* THP pud was split under us: handle on pmd level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + } + } else if (!pud_present(pude)) return false; - pvmw->pmd = pmd_offset(pud, pvmw->address); + + pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); /* * Make sure the pmd value isn't cached in a register by the * compiler and used as a stale value after we've observed a @@ -210,6 +234,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } else if (!pmd_present(pmde)) { return false; } + if (!map_pte(pvmw)) goto next_pte; while (1) { diff --git a/mm/rmap.c b/mm/rmap.c index dae66a4329ea..f69d81d4a956 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -789,9 +789,15 @@ static bool page_referenced_one(struct page *page, struct vm_area_struct *vma, referenced++; } } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - if (pmdp_clear_flush_young_notify(vma, address, - pvmw.pmd)) - referenced++; + if (pvmw.pmd) { + if (pmdp_clear_flush_young_notify(vma, address, + pvmw.pmd)) + referenced++; + } else if (pvmw.pud) { + if (pudp_clear_flush_young_notify(vma, address, + pvmw.pud)) + referenced++; + } } else { /* unexpected pmd-mapped page? */ WARN_ON_ONCE(1); From patchwork Fri Feb 15 22:08:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815979 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2F2313B5 for ; Fri, 15 Feb 2019 22:10:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D62832E9E6 for ; Fri, 15 Feb 2019 22:10:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C63952FEAE; Fri, 15 Feb 2019 22:10:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4DDE2E9E6 for ; Fri, 15 Feb 2019 22:10:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 031FC8E0009; Fri, 15 Feb 2019 17:09:29 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F238B8E0014; Fri, 15 Feb 2019 17:09:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D74C28E0009; Fri, 15 Feb 2019 17:09:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 9824C8E0014 for ; Fri, 15 Feb 2019 17:09:28 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id o34so10051373qtf.19 for ; Fri, 15 Feb 2019 14:09:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=xRBm+vHil3BZm0ttcnYbNzuq2O2Fb/0nzUxUc/QQj80=; b=O7kE5sOueRfK6MkFbBdvNstbDHhsvESU37CiRdnJpR0tsnCAV78waYBsUHJDgrDG1l lzOn7el2B6AkhwO4aF8YhIaPCUPti+PrIfx8v5P//rUrswutEEyotpc3MitU9UAUtwt2 jPX4ax8acptEo6NVtM+/XVhvzG56mYWaX1marU+/x/adJh8a2iP7eGByZ1BDc9OwVkbz bLGZQMevx1M/ru+UctZWR12DG1XxRyZmvngtD1nqQnBoV7sG61BwIK8j80KJGT0f0Per gTnF8sj6dMEDUKRq3491SEolWyvt667a5yKlytN+89pRRph3ccsRG7duNLhbRker514C pODQ== X-Gm-Message-State: AHQUAubNUm3bQQDSfY7/ZJCFVGTsuHFOCK4l74pjjhAyvhO0vGP5TiXu 4ebuzekdLjebscneRX0jtoKeEnvzJjDwxjZw/jlF4RpxX/WP5RIJoHG8zuFo3qna8EI8UBgipYX zJKWe+BxSJKwSS4/q6csBnopgcf13nwAQSUg7nDZfwhgQKblzLkHOc8Ww6PT+V9Y5pw== X-Received: by 2002:aed:384a:: with SMTP id j68mr9346179qte.171.1550268568191; Fri, 15 Feb 2019 14:09:28 -0800 (PST) X-Google-Smtp-Source: AHgI3IZUozO/RdeKLu6vcDVD4N8XYBVwOQYQOUO2aTx48kqpO+zRCh2bkcmguo7mbPsMCLtFkmUp X-Received: by 2002:aed:384a:: with SMTP id j68mr9346012qte.171.1550268565205; Fri, 15 Feb 2019 14:09:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268565; cv=none; d=google.com; s=arc-20160816; b=n4pHeOfe7+HTdXzY+OJC+BoZRWhD/i7RsopHnPsYO2ss/69GKljjPdJZpwQ9swr0f3 ZljS6t1vD5WJXTYEVo8Nn499aX0sT3QSzIRzVMjeySAv82ueutdSOJxKvy6wmGpsXDub l+JSr5rJ5syILGwjn9sGyW54Gb+adhKguHkwv3RkUEujOYqcOiWbtwHcSIXyDLtH14N+ YPX3h39OTPncWDcd9dFGgpIoGmIPyS0RcHbzDdDfIdlPyywfBiVnWnKMCptVvL5FPJX6 Wu+tKdrLyo35i756on3HwItYCn6TIQLHs0QJKZlStF/rxE4k30+7AfBsJEj1YNDMotey GNKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=xRBm+vHil3BZm0ttcnYbNzuq2O2Fb/0nzUxUc/QQj80=; b=vXPPr9amJlPCz/OXf27MPiugxmE6myi0sJX90ht9HNXhy5zxmc1qeYyIpo41h2sqjf G2x24FAhSYxyKES4mkHh9rQoMyGV/YK3J0AlC1Lb27wifU7B3IEEEDXFhQ1XC43xDnm1 hUUxSh1Vxmk2nvkpxiMfCITBGnHimWi0VqDVnJPsW2T9wcQCbLvehU+rT1xH+Q7xoA0i vliAZ/GjYfAa28B/340/sAZKdLbwzvBeD0ptHqnnnOTWuR8OCIi0rdcYyHeL+AnuTTKr canyshME2SDp+JYP+3F8MUMhAkKHLp+PcZiIOj6jSvnuTp983w0vLJU84nKKIFPb/4N3 mUrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=LYDUDFfd; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=kiWUA94o; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id g7si713992qkd.146.2019.02.15.14.09.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:25 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=LYDUDFfd; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=kiWUA94o; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 4FE10310A; Fri, 15 Feb 2019 17:09:23 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=xRBm+vHil3BZm 0ttcnYbNzuq2O2Fb/0nzUxUc/QQj80=; b=LYDUDFfdSWvZnNMqEiO/4rFrwFPIa yeAbmeC9aZGMozkMgwqkGOkCbZvlU7wFN28QWVJh2QuQIeMSEjOJTBD76BU3GS9/ 4D0ZoHt4pLwFmlQSQBf0ohEbIbSr7EoIBn2tBceNct6sIzzF1p5j35l1K5UQGbik fcLadp3HaVSb29XY0GAGl4RZIYuG6em51TQIQdQ7rPbHZ/GtnmbRW+9KIGiQ/ZMK hKFhLrBrAfjCJ9JF46BYN0C3OAutbE7ziVRoQjMNrEXN5u5HxGZAF4u1igbHzxkb 2OXMM5Nh4BVsMtJOhjD9FGaZl2BZLrOlgg/NY9hyCbkTKfFbDWwtp0SOQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=xRBm+vHil3BZm0ttcnYbNzuq2O2Fb/0nzUxUc/QQj80=; b=kiWUA94o 3TP+LJ4eH5EPIgk8Qe6OedG4IJdtPdahn4jaNC+xnwyP7PkPHCjjaIZolF3BHL2U MWzWXLgRjLm5IScSfaSbAml3DINLrpx+kBwuRSCPU7L/mbgar9F/Am7jtWxpvn04 dcdLiKAA+9QxH9egbFrqLh271GEUQijmYURQgahGZAPHDFfAHWMu0lerDNOXSXeF ZByWB7IfrE47ZMKuWIIaSSEdHjjANaOvwSQVLKzUuF2Y1vde7O6/8p5TingPI2ny k08q9SPbPoxxmpeWGhUhru4U0ybOBPU5SIxPOC+KNLvMmNILJpOxxiFYMtd7Vzqy Oujbgtx9CL+8lw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedufe X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 13C37E4511; Fri, 15 Feb 2019 17:09:21 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 15/31] mm: thp: add 1GB THP split_huge_pud_page() function. Date: Fri, 15 Feb 2019 14:08:40 -0800 Message-Id: <20190215220856.29749-16-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan It mimics PMD-level THP split. In addition, to support PMD-mapped PUD THP, PMDPageInPUD() is used. For the mapcount of PMD-mapped PUD THP, sub_compound_mapcount() is used, which uses (head_page+3).compound_mapcount, since each base page's mapcount is used for PTE mapping. PagePUDDoubleMap() is used for both PUD-mapped and PMD-mapped PUD THPs. page_xxx_rmap() functions now have an extra page order parameter to distinguish different THP sizes. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgtable.h | 15 + include/asm-generic/pgtable.h | 83 ++++ include/linux/huge_mm.h | 31 +- include/linux/memcontrol.h | 5 + include/linux/mm.h | 18 + include/linux/page-flags.h | 79 +++- include/linux/rmap.h | 9 +- include/linux/swap.h | 2 + include/linux/vm_event_item.h | 4 + kernel/events/uprobes.c | 4 +- mm/huge_memory.c | 695 ++++++++++++++++++++++++++++++--- mm/hugetlb.c | 4 +- mm/khugepaged.c | 4 +- mm/ksm.c | 4 +- mm/memcontrol.c | 13 + mm/memory.c | 16 +- mm/migrate.c | 8 +- mm/page_alloc.c | 18 +- mm/pgtable-generic.c | 11 + mm/rmap.c | 108 +++-- mm/swap.c | 38 ++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 2 +- mm/util.c | 7 + mm/vmstat.c | 4 + 25 files changed, 1079 insertions(+), 107 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f99ce657d282..4a6805f8f128 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1269,6 +1269,21 @@ static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) } #endif /* CONFIG_PAGE_TABLE_ISOLATION */ +#ifndef pudp_establish +#define pudp_establish pudp_establish +static inline pud_t pudp_establish(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp, pud_t pud) +{ + if (IS_ENABLED(CONFIG_SMP)) { + return xchg(pudp, pud); + } else { + pud_t old = *pudp; + *pudp = pud; + return old; + } +} +#endif + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 682531e0d55c..1ae33b6590b8 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -346,6 +346,11 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PUDP_INVALIDATE +extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { @@ -941,6 +946,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp) } #endif +#ifndef pud_read_atomic +static inline pud_t pud_read_atomic(pud_t *pudp) +{ + /* + * Depend on compiler for an atomic pmd read. NOTE: this is + * only going to work, if the pmdval_t isn't larger than + * an unsigned long. + */ + return *pudp; +} +#endif + #ifndef arch_needs_pgtable_deposit #define arch_needs_pgtable_deposit() (false) #endif @@ -1032,6 +1049,72 @@ static inline int pmd_trans_unstable(pmd_t *pmd) #endif } +static inline int pud_none_or_trans_huge_or_clear_bad(pud_t *pud) +{ + pud_t pudval = pud_read_atomic(pud); + /* + * The barrier will stabilize the pmdval in a register or on + * the stack so that it will stop changing under the code. + * + * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE, + * pmd_read_atomic is allowed to return a not atomic pmdval + * (for example pointing to an hugepage that has never been + * mapped in the pmd). The below checks will only care about + * the low part of the pmd with 32bit PAE x86 anyway, with the + * exception of pmd_none(). So the important thing is that if + * the low part of the pmd is found null, the high part will + * be also null or the pmd_none() check below would be + * confused. + */ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + barrier(); +#endif + /* + * !pmd_present() checks for pmd migration entries + * + * The complete check uses is_pmd_migration_entry() in linux/swapops.h + * But using that requires moving current function and pmd_trans_unstable() + * to linux/swapops.h to resovle dependency, which is too much code move. + * + * !pmd_present() is equivalent to is_pmd_migration_entry() currently, + * because !pmd_present() pages can only be under migration not swapped + * out. + * + * pmd_none() is preseved for future condition checks on pmd migration + * entries and not confusing with this function name, although it is + * redundant with !pmd_present(). + */ + if (pud_none(pudval) || pud_trans_huge(pudval)) + return 1; + if (unlikely(pud_bad(pudval))) { + pud_clear_bad(pud); + return 1; + } + return 0; +} + +/* + * This is a noop if Transparent Hugepage Support is not built into + * the kernel. Otherwise it is equivalent to + * pmd_none_or_trans_huge_or_clear_bad(), and shall only be called in + * places that already verified the pmd is not none and they want to + * walk ptes while holding the mmap sem in read mode (write mode don't + * need this). If THP is not enabled, the pmd can't go away under the + * code even if MADV_DONTNEED runs, but if THP is enabled we need to + * run a pmd_trans_unstable before walking the ptes after + * split_huge_page_pmd returns (because it may have run when the pmd + * become null, but then a page fault can map in a THP and not a + * regular page). + */ +static inline int pud_trans_unstable(pud_t *pud) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + return pud_none_or_trans_huge_or_clear_bad(pud); +#else + return 0; +#endif +} + #ifndef CONFIG_NUMA_BALANCING /* * Technically a PTE can be PROTNONE even when not doing NUMA balancing but diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 02419fa91e12..bd5cc5e65de8 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -178,17 +178,27 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct page *page); +bool can_split_huge_pud_page(struct page *page, int *pextra_pins); +int split_huge_pud_page_to_list(struct page *page, struct list_head *list); +static inline int split_huge_pud_page(struct page *page) +{ + return split_huge_pud_page_to_list(page, NULL); +} void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address); + unsigned long address, bool freeze, struct page *page); #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud = (__pud); \ if (pud_trans_huge(*____pud) \ || pud_devmap(*____pud)) \ - __split_huge_pud(__vma, __pud, __address); \ + __split_huge_pud(__vma, __pud, __address, \ + false, NULL); \ } while (0) +void split_huge_pud_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct page *page); + extern int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); extern void vma_adjust_trans_huge(struct vm_area_struct *vma, @@ -319,8 +329,25 @@ static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, static inline void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct page *page) {} +static inline bool +can_split_huge_pud_page(struct page *page, int *pextra_pins) +{ + BUILD_BUG(); + return false; +} +static inline int +split_huge_pud_page_to_list(struct page *page, struct list_head *list) +{ + return 0; +} +static inline int split_huge_pud_page(struct page *page) +{ + return 0; +} #define split_huge_pud(__vma, __pmd, __address) \ do { } while (0) +static inline void split_huge_pud_address(struct vm_area_struct *vma, + unsigned long address, bool freeze, struct page *page) {} static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83ae11cbd12c..fd362559d4b7 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -790,6 +790,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, #ifdef CONFIG_TRANSPARENT_HUGEPAGE void mem_cgroup_split_huge_fixup(struct page *head); +void mem_cgroup_split_huge_pud_fixup(struct page *head); #endif #else /* CONFIG_MEMCG */ @@ -1098,6 +1099,10 @@ static inline void mem_cgroup_split_huge_fixup(struct page *head) { } +static inline void mem_cgroup_split_huge_pud_fixup(struct page *head) +{ +} + static inline void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) diff --git a/include/linux/mm.h b/include/linux/mm.h index d10dc9db2311..af6257d05189 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -652,6 +652,24 @@ static inline int compound_mapcount(struct page *page) return atomic_read(compound_mapcount_ptr(page)) + 1; } +static inline unsigned int compound_order(struct page *page); +static inline atomic_t *sub_compound_mapcount_ptr(struct page *page, int sub_level) +{ + struct page *head = compound_head(page); + + VM_BUG_ON_PAGE(!PageCompound(page), page); + VM_BUG_ON_PAGE(compound_order(head) != HPAGE_PUD_ORDER, page); + VM_BUG_ON_PAGE((page - head) % HPAGE_PMD_NR, page); + VM_BUG_ON_PAGE(sub_level != 1, page); + return &page[2 + sub_level].compound_mapcount; +} + +/* Only works for PUD pages */ +static inline int sub_compound_mapcount(struct page *page) +{ + return atomic_read(sub_compound_mapcount_ptr(page, 1)) + 1; +} + /* * The atomic page->_mapcount, starts from -1: so that transitions * both from it and to it can be tracked, using atomic_inc_and_test diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 39b4494e29f1..480e091f52ac 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -607,6 +607,23 @@ static inline int PageTransTail(struct page *page) return PageTail(page); } +#define HPAGE_PMD_SHIFT PMD_SHIFT +#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) +#define HPAGE_PMD_NR (1<_mapcount in all sub-pages is offset up + * by one. This reference will go away with last compound_mapcount. + * + * See also __split_huge_pmd_locked() and page_remove_anon_compound_rmap(). + */ +static inline int PagePUDDoubleMap(struct page *page) +{ + return PageHead(page) && test_bit(PG_double_map, &page[2].flags); +} + +static inline void SetPagePUDDoubleMap(struct page *page) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + set_bit(PG_double_map, &page[2].flags); +} + +static inline void ClearPagePUDDoubleMap(struct page *page) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + clear_bit(PG_double_map, &page[2].flags); +} +static inline int TestSetPagePUDDoubleMap(struct page *page) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + return test_and_set_bit(PG_double_map, &page[2].flags); +} + +static inline int TestClearPagePUDDoubleMap(struct page *page) { VM_BUG_ON_PAGE(!PageHead(page), page); - return test_and_clear_bit(PG_double_map, &page[1].flags); + return test_and_clear_bit(PG_double_map, &page[2].flags); } #else @@ -653,9 +712,13 @@ TESTPAGEFLAG_FALSE(TransHuge) TESTPAGEFLAG_FALSE(TransCompound) TESTPAGEFLAG_FALSE(TransCompoundMap) TESTPAGEFLAG_FALSE(TransTail) +TESTPAGEFLAG_FALSE(PMDPageInPUD) PAGEFLAG_FALSE(DoubleMap) TESTSETFLAG_FALSE(DoubleMap) TESTCLEARFLAG_FALSE(DoubleMap) +PAGEFLAG_FALSE(PUDDoubleMap) + TESTSETFLAG_FALSE(PUDDoubleMap) + TESTCLEARFLAG_FALSE(PUDDoubleMap) #endif /* diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 2b566736e3c2..6adb6e835b30 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -99,6 +99,7 @@ enum ttu_flags { TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ + TTU_SPLIT_HUGE_PUD = 0x200, /* split huge PUD if any */ }; #ifdef CONFIG_MMU @@ -171,13 +172,13 @@ struct anon_vma *page_get_anon_vma(struct page *page); */ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); + unsigned long, bool, int); void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, int); + unsigned long, int, int); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); + unsigned long, bool, int); void page_add_file_rmap(struct page *, bool); -void page_remove_rmap(struct page *, bool); +void page_remove_rmap(struct page *, bool, int); void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long); diff --git a/include/linux/swap.h b/include/linux/swap.h index 622025ac1461..1a6bac77c854 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,6 +333,8 @@ extern void lru_cache_add_anon(struct page *page); extern void lru_cache_add_file(struct page *page); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); +extern void lru_add_pud_page_tail(struct page *page, struct page *page_tail, + struct lruvec *lruvec, struct list_head *head); extern void activate_page(struct page *); extern void mark_page_accessed(struct page *); extern void lru_add_drain(void); diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 4550667b2274..df619262b1b4 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -85,6 +85,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_FAULT_ALLOC_PUD, THP_FAULT_FALLBACK_PUD, THP_SPLIT_PUD, + THP_SPLIT_PUD_PAGE, + THP_SPLIT_PUD_PAGE_FAILED, + THP_ZERO_PUD_PAGE_ALLOC, + THP_ZERO_PUD_PAGE_ALLOC_FAILED, #endif THP_ZERO_PAGE_ALLOC, THP_ZERO_PAGE_ALLOC_FAILED, diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8aef47ee7bfa..e4819fef634f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -195,7 +195,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, VM_BUG_ON_PAGE(addr != pvmw.address, old_page); get_page(new_page); - page_add_new_anon_rmap(new_page, vma, addr, false); + page_add_new_anon_rmap(new_page, vma, addr, false, 0); mem_cgroup_commit_charge(new_page, memcg, false, false); lru_cache_add_active_or_unevictable(new_page, vma); @@ -209,7 +209,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, set_pte_at_notify(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); - page_remove_rmap(old_page, false); + page_remove_rmap(old_page, false, 0); if (!page_mapped(old_page)) try_to_free_swap(old_page); page_vma_mapped_walk_done(&pvmw); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0a006592f3fe..5f83f4c5eac7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -121,10 +121,10 @@ static struct page *get_huge_pud_zero_page(void) zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, HPAGE_PUD_ORDER); if (!zero_page) { - count_vm_event(THP_ZERO_PAGE_ALLOC_FAILED); + count_vm_event(THP_ZERO_PUD_PAGE_ALLOC_FAILED); return NULL; } - count_vm_event(THP_ZERO_PAGE_ALLOC); + count_vm_event(THP_ZERO_PUD_PAGE_ALLOC); preempt_disable(); if (cmpxchg(&huge_pud_zero_page, NULL, zero_page)) { preempt_enable(); @@ -660,7 +660,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, entry = mk_huge_pmd(page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - page_add_new_anon_rmap(page, vma, haddr, true); + page_add_new_anon_rmap(page, vma, haddr, true, HPAGE_PMD_ORDER); mem_cgroup_commit_charge(page, memcg, false, true); lru_cache_add_active_or_unevictable(page, vma); pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); @@ -969,7 +969,7 @@ static int __do_huge_pud_anonymous_page(struct vm_fault *vmf, struct page *page, entry = mk_huge_pud(page, vma->vm_page_prot); entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); - page_add_new_anon_rmap(page, vma, haddr, true); + page_add_new_anon_rmap(page, vma, haddr, true, HPAGE_PUD_ORDER); mem_cgroup_commit_charge(page, memcg, false, true); lru_cache_add_active_or_unevictable(page, vma); pgtable_trans_huge_pud_deposit(vma->vm_mm, vmf->pud, @@ -1463,7 +1463,7 @@ static int do_huge_pud_wp_page_fallback(struct vm_fault *vmf, pud_t orig_pud, entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); memcg = (void *)page_private(pages[i]); set_page_private(pages[i], 0); - page_add_new_anon_rmap(pages[i], vmf->vma, haddr, true); + page_add_new_anon_rmap(pages[i], vmf->vma, haddr, true, HPAGE_PMD_ORDER); mem_cgroup_commit_charge(pages[i], memcg, false, true); lru_cache_add_active_or_unevictable(pages[i], vma); vmf->pmd = pmd_offset(&_pud, haddr); @@ -1475,7 +1475,7 @@ static int do_huge_pud_wp_page_fallback(struct vm_fault *vmf, pud_t orig_pud, smp_wmb(); /* make pte visible before pmd */ pud_populate_with_pgtable(vma->vm_mm, vmf->pud, pgtable); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PUD_ORDER); spin_unlock(vmf->ptl); /* @@ -1566,13 +1566,13 @@ int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) prep_transhuge_page(new_page); } else { if (!page) { - WARN(1, "%s: split_huge_page\n", __func__); + /*WARN(1, "%s: split_huge_page\n", __func__);*/ split_huge_pud(vma, vmf->pud, vmf->address); ret |= VM_FAULT_FALLBACK; } else { ret = do_huge_pud_wp_page_fallback(vmf, orig_pud, page); if (ret & VM_FAULT_OOM) { - WARN(1, "%s: split_huge_page after wp fallback\n", __func__); + /*WARN(1, "%s: split_huge_page after wp fallback\n", __func__);*/ split_huge_pud(vma, vmf->pud, vmf->address); ret |= VM_FAULT_FALLBACK; } @@ -1585,7 +1585,7 @@ int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) if (unlikely(mem_cgroup_try_charge(new_page, vma->vm_mm, huge_gfp, &memcg, true))) { put_page(new_page); - WARN(1, "%s: split_huge_page after mem cgroup failed\n", __func__); + /*WARN(1, "%s: split_huge_page after mem cgroup failed\n", __func__);*/ split_huge_pud(vma, vmf->pud, vmf->address); if (page) put_page(page); @@ -1620,7 +1620,7 @@ int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) entry = mk_huge_pud(new_page, vma->vm_page_prot); entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); pudp_huge_clear_flush_notify(vma, haddr, vmf->pud); - page_add_new_anon_rmap(new_page, vma, haddr, true); + page_add_new_anon_rmap(new_page, vma, haddr, true, HPAGE_PUD_ORDER); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_active_or_unevictable(new_page, vma); set_pud_at(vma->vm_mm, haddr, vmf->pud, entry); @@ -1629,7 +1629,7 @@ int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PUD_NR); } else { VM_BUG_ON_PAGE(!PageHead(page), page); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PUD_ORDER); put_page(page); } ret |= VM_FAULT_WRITE; @@ -1748,7 +1748,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, entry = maybe_mkwrite(pte_mkdirty(entry), vma); memcg = (void *)page_private(pages[i]); set_page_private(pages[i], 0); - page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false); + page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false, 0); mem_cgroup_commit_charge(pages[i], memcg, false, false); lru_cache_add_active_or_unevictable(pages[i], vma); vmf->pte = pte_offset_map(&_pmd, haddr); @@ -1760,7 +1760,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, smp_wmb(); /* make pte visible before pmd */ pmd_populate(vma->vm_mm, vmf->pmd, pgtable); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); spin_unlock(vmf->ptl); /* @@ -1900,7 +1900,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) entry = mk_huge_pmd(new_page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd); - page_add_new_anon_rmap(new_page, vma, haddr, true); + page_add_new_anon_rmap(new_page, vma, haddr, true, HPAGE_PMD_ORDER); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_active_or_unevictable(new_page, vma); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); @@ -1909,7 +1909,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd) add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); } else { VM_BUG_ON_PAGE(!PageHead(page), page); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); put_page(page); } ret |= VM_FAULT_WRITE; @@ -2282,9 +2282,9 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (pmd_present(orig_pmd)) { page = pmd_page(orig_pmd); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); - VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(!PageHead(page) && !PMDPageInPUD(page), page); } else if (thp_migration_supported()) { swp_entry_t entry; @@ -2560,7 +2560,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, if (pud_present(orig_pud)) { page = pud_page(orig_pud); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PUD_ORDER); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); } else @@ -2582,9 +2582,60 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, return 1; } +static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd); + +static void __split_huge_zero_page_pud(struct vm_area_struct *vma, + unsigned long haddr, pud_t *pud) +{ + struct mm_struct *mm = vma->vm_mm; + pgtable_t pgtable; + pud_t _pud; + int i; + + /* + * Leave pmd empty until pte is filled note that it is fine to delay + * notification until mmu_notifier_invalidate_range_end() as we are + * replacing a zero pmd write protected page with a zero pte write + * protected page. + * + * See Documentation/vm/mmu_notifier.txt + */ + pudp_huge_clear_flush(vma, haddr, pud); + + pgtable = pgtable_trans_huge_pud_withdraw(mm, pud); + pud_populate_with_pgtable(mm, &_pud, pgtable); + + for (i = 0; i < (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)); + i++, haddr += PMD_SIZE) { + pmd_t *pmd = pmd_offset(&_pud, haddr), entry; + struct page *zero_page = mm_get_huge_zero_page(mm); + + if (unlikely(!zero_page)) { + VM_BUG_ON(1); + __split_huge_zero_page_pmd(vma, haddr, pmd); + continue; + } + + VM_BUG_ON(!pmd_none(*pmd)); + entry = mk_huge_pmd(zero_page, vma->vm_page_prot); + set_pmd_at(mm, haddr, pmd, entry); + } + smp_wmb(); /* make pte visible before pmd */ + pud_populate_with_pgtable(mm, pud, pgtable); +} + static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, - unsigned long haddr) + unsigned long haddr, bool freeze) { + struct mm_struct *mm = vma->vm_mm; + struct page *page; + pgtable_t pgtable; + pud_t _pud, old_pud; + bool young, write, dirty, soft_dirty; + unsigned long addr; + int i; + VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); @@ -2592,22 +2643,149 @@ static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, count_vm_event(THP_SPLIT_PUD); - pudp_huge_clear_flush_notify(vma, haddr, pud); + if (!vma_is_anonymous(vma)) { + _pud = pudp_huge_clear_flush_notify(vma, haddr, pud); + /* + * We are going to unmap this huge page. So + * just go ahead and zap it + */ + if (arch_needs_pgtable_deposit()) + zap_pud_deposited_table(mm, pud); + if (vma_is_dax(vma)) + return; + page = pud_page(_pud); + if (!PageReferenced(page) && pud_young(_pud)) + SetPageReferenced(page); + page_remove_rmap(page, true, HPAGE_PUD_ORDER); + put_page(page); + add_mm_counter(mm, MM_FILEPAGES, -HPAGE_PUD_NR); + return; + } else if (is_huge_zero_pud(*pud)) { + /* + * FIXME: Do we want to invalidate secondary mmu by calling + * mmu_notifier_invalidate_range() see comments below inside + * __split_huge_pmd() ? + * + * We are going from a zero huge page write protected to zero + * small page also write protected so it does not seems useful + * to invalidate secondary mmu at this time. + */ + return __split_huge_zero_page_pud(vma, haddr, pud); + } + + /* See the comment above pmdp_invalidate() in __split_huge_pmd_locked() */ + old_pud = pudp_invalidate(vma, haddr, pud); + + page = pud_page(old_pud); + VM_BUG_ON_PAGE(!page_count(page), page); + page_ref_add(page, (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)) - 1); + if (pud_dirty(old_pud)) + SetPageDirty(page); + write = pud_write(old_pud); + young = pud_young(old_pud); + dirty = pud_dirty(old_pud); + soft_dirty = pud_soft_dirty(old_pud); + + pgtable = pgtable_trans_huge_pud_withdraw(mm, pud); + pud_populate_with_pgtable(mm, &_pud, pgtable); + + for (i = 0, addr = haddr; i < HPAGE_PUD_NR; + i += HPAGE_PMD_NR, addr += PMD_SIZE) { + pmd_t entry, *pmd; + /* + * Note that NUMA hinting access restrictions are not + * transferred to avoid any possibility of altering + * permissions across VMAs. + */ + if (freeze) { + swp_entry_t swp_entry; + + swp_entry = make_migration_entry(page + i, write); + entry = swp_entry_to_pmd(swp_entry); + if (soft_dirty) + entry = pmd_swp_mksoft_dirty(entry); + } else { + entry = mk_huge_pmd(page + i, READ_ONCE(vma->vm_page_prot)); + entry = maybe_pmd_mkwrite(entry, vma); + if (!write) + entry = pmd_wrprotect(entry); + if (!young) + entry = pmd_mkold(entry); + if (soft_dirty) + entry = pmd_mksoft_dirty(entry); + } + pmd = pmd_offset(&_pud, addr); + VM_BUG_ON(!pmd_none(*pmd)); + set_pmd_at(mm, addr, pmd, entry); + /* distinguish between pud compound_mapcount and pmd compound_mapcount */ + if (atomic_inc_and_test(sub_compound_mapcount_ptr(&page[i], 1))) + /* first pmd-mapped pud page */ + __inc_node_page_state(page, NR_ANON_THPS); + } + + /* + * Set PG_double_map before dropping compound_mapcount to avoid + * false-negative page_mapped(). + */ + if (compound_mapcount(page) > 1 && !TestSetPagePUDDoubleMap(page)) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + /* distinguish between pud compound_mapcount and pmd compound_mapcount */ + atomic_inc(sub_compound_mapcount_ptr(&page[i], 1)); + } + + if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { + /* Last compound_mapcount is gone. */ + __dec_node_page_state(page, NR_ANON_THPS_PUD); + if (TestClearPagePUDDoubleMap(page)) { + /* No need in mapcount reference anymore */ + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + /* distinguish between pud compound_mapcount and pmd compound_mapcount */ + atomic_dec(sub_compound_mapcount_ptr(&page[i], 1)); + } + } + + smp_wmb(); /* make pte visible before pmd */ + pud_populate_with_pgtable(mm, pud, pgtable); + + if (freeze) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + /*page_remove_rmap(page + i, true, HPAGE_PMD_ORDER);*/ + atomic_dec(sub_compound_mapcount_ptr(&page[i], 1)); + __dec_node_page_state(page, NR_ANON_THPS); + __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -HPAGE_PMD_NR); + put_page(page + i); + } + } } void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) + unsigned long address, bool freeze, struct page *page) { spinlock_t *ptl; + struct mm_struct *mm = vma->vm_mm; + unsigned long haddr = address & HPAGE_PUD_MASK; struct mmu_notifier_range range; mmu_notifier_range_init(&range, vma->vm_mm, address & HPAGE_PUD_MASK, (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); mmu_notifier_invalidate_range_start(&range); - ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + ptl = pud_lock(mm, pud); + + /* + * If caller asks to setup a migration entries, we need a page to check + * pmd against. Otherwise we can end up replacing wrong page. + */ + VM_BUG_ON(freeze && !page); + if (page && page != pud_page(*pud)) goto out; - __split_huge_pud_locked(vma, pud, range.start); + + if (pud_trans_huge(*pud)) { + page = pud_page(*pud); + if (PageMlocked(page)) + clear_page_mlock(page); + } else if (unlikely(!pud_devmap(*pud))) + goto out; + __split_huge_pud_locked(vma, pud, haddr, freeze); out: spin_unlock(ptl); @@ -2617,6 +2795,369 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, */ mmu_notifier_invalidate_range_only_end(&range); } + +void split_huge_pud_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct page *page) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + + pgd = pgd_offset(vma->vm_mm, address); + if (!pgd_present(*pgd)) + return; + + p4d = p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + return; + + pud = pud_offset(p4d, address); + + __split_huge_pud(vma, pud, address, freeze, page); +} + +static void freeze_pud_page(struct page *page) +{ + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | + TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PUD; + bool unmap_success; + + VM_BUG_ON_PAGE(!PageHead(page), page); + + if (PageAnon(page)) + ttu_flags |= TTU_SPLIT_FREEZE; + + unmap_success = try_to_unmap(page, ttu_flags); + VM_BUG_ON_PAGE(!unmap_success, page); +} + +static void unfreeze_pud_page(struct page *page) +{ + int i; + + VM_BUG_ON(!PageTransHuge(page)); + if (compound_order(page) == HPAGE_PUD_ORDER) { + remove_migration_ptes(page, page, true); + } else if (compound_order(page) == HPAGE_PMD_ORDER) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + remove_migration_ptes(page + i, page + i, true); + } else + VM_BUG_ON_PAGE(1, page); +} + +static void __split_huge_pud_page_tail(struct page *head, int tail, + struct lruvec *lruvec, struct list_head *list) +{ + struct page *page_tail = head + tail; + /*int page_tail_mapcount = sub_compound_mapcount(page_tail);*/ + + VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); + + /*atomic_set(sub_compound_mapcount_ptr(page_tail, 1), -1);*/ + + clear_compound_head(page_tail); + prep_compound_page(page_tail, HPAGE_PMD_ORDER); + prep_transhuge_page(page_tail); + + /* move sub PMD page mapcount */ + /*atomic_set(compound_mapcount_ptr(page_tail), page_tail_mapcount);*/ + /* + * tail_page->_refcount is zero and not changing from under us. But + * get_page_unless_zero() may be running from under us on the + * tail_page. If we used atomic_set() below instead of atomic_inc() or + * atomic_add(), we would then run atomic_set() concurrently with + * get_page_unless_zero(), and atomic_set() is implemented in C not + * using locked ops. spin_unlock on x86 sometime uses locked ops + * because of PPro errata 66, 92, so unless somebody can guarantee + * atomic_set() here would be safe on all archs (and not only on x86), + * it's safer to use atomic_inc()/atomic_add(). + */ + if (PageAnon(head) && !PageSwapCache(head)) { + page_ref_inc(page_tail); + } else { + VM_BUG_ON(1); + /* Additional pin to radix tree */ + page_ref_add(page_tail, 2); + } + + page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; + page_tail->flags |= (head->flags & + ((1L << PG_referenced) | + (1L << PG_swapbacked) | + (1L << PG_swapcache) | + (1L << PG_mlocked) | + (1L << PG_uptodate) | + (1L << PG_active) | + (1L << PG_locked) | + (1L << PG_unevictable) | + (1L << PG_dirty) | + /* preserve THP */ + (1L << PG_head))); + + /* + * After clearing PageTail the gup refcount can be released. + * Page flags also must be visible before we make the page non-compound. + */ + smp_wmb(); + + if (page_is_young(head)) + set_page_young(page_tail); + if (page_is_idle(head)) + set_page_idle(page_tail); + + /* ->mapping in first tail page is compound_mapcount */ + VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, + page_tail); + page_tail->mapping = head->mapping; + + page_tail->index = head->index + tail; + page_cpupid_xchg_last(page_tail, page_cpupid_last(head)); + lru_add_pud_page_tail(head, page_tail, lruvec, list); +} + +static void __split_huge_pud_page(struct page *page, struct list_head *list, + unsigned long flags) +{ + struct page *head = compound_head(page); + struct zone *zone = page_zone(head); + struct lruvec *lruvec; + pgoff_t end = -1; + int i; + + lruvec = mem_cgroup_page_lruvec(head, zone->zone_pgdat); + + /* complete memcg works before add pages to LRU */ + mem_cgroup_split_huge_pud_fixup(head); + + if (!PageAnon(page)) { + VM_BUG_ON(1); + end = DIV_ROUND_UP(i_size_read(head->mapping->host), PAGE_SIZE); + } + + for (i = HPAGE_PUD_NR - HPAGE_PMD_NR; i >= 1; i -= HPAGE_PMD_NR) { + __split_huge_pud_page_tail(head, i, lruvec, list); + /* Some pages can be beyond i_size: drop them from page cache */ + if (head[i].index >= end) { + VM_BUG_ON(1); + __ClearPageDirty(head + i); + __delete_from_page_cache(head + i, NULL); + if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head)) + shmem_uncharge(head->mapping->host, 1); + put_page(head + i); + } + } + /* reset head page order */ + prep_compound_page(head, HPAGE_PMD_ORDER); + prep_transhuge_page(head); + + /* See comment in __split_huge_page_tail() */ + if (PageAnon(head)) { + /* Additional pin to radix tree of swap cache */ + if (PageSwapCache(head)) { + VM_BUG_ON(1); + page_ref_add(head, 2); + } else + page_ref_inc(head); + } else { + VM_BUG_ON(1); + /* Additional pin to radix tree */ + page_ref_add(head, 2); + xa_unlock(&head->mapping->i_pages); + } + + spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); + + unfreeze_pud_page(head); + + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + struct page *subpage = head + i; + + if (subpage == page) + continue; + unlock_page(subpage); + + /* + * Subpages may be freed if there wasn't any mapping + * like if add_to_swap() is running on a lru page that + * had its mapping zapped. And freeing these pages + * requires taking the lru_lock so we do the put_page + * of the tail pages after the split is complete. + */ + put_page(subpage); + } +} +/* Racy check whether the huge page can be split */ +bool can_split_huge_pud_page(struct page *page, int *pextra_pins) +{ + int extra_pins; + + /* Additional pins from radix tree */ + if (PageAnon(page)) + extra_pins = PageSwapCache(page) ? HPAGE_PUD_NR : 0; + else + extra_pins = HPAGE_PUD_NR; + if (pextra_pins) + *pextra_pins = extra_pins; + return total_mapcount(page) == page_count(page) - extra_pins - 1; +} + +/* + * This function splits huge page into normal pages. @page can point to any + * subpage of huge page to split. Split doesn't change the position of @page. + * + * Only caller must hold pin on the @page, otherwise split fails with -EBUSY. + * The huge page must be locked. + * + * If @list is null, tail pages will be added to LRU list, otherwise, to @list. + * + * Both head page and tail pages will inherit mapping, flags, and so on from + * the hugepage. + * + * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if + * they are not mapped. + * + * Returns 0 if the hugepage is split successfully. + * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under + * us. + */ +int split_huge_pud_page_to_list(struct page *page, struct list_head *list) +{ + struct page *head = compound_head(page); + struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); + struct anon_vma *anon_vma = NULL; + struct address_space *mapping = NULL; + int count, mapcount, extra_pins, ret; + bool mlocked; + unsigned long flags; + + VM_BUG_ON_PAGE(is_huge_zero_page(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(!PageCompound(page), page); + + if (PageWriteback(page)) + return -EBUSY; + + if (PageAnon(head)) { + /* + * The caller does not necessarily hold an mmap_sem that would + * prevent the anon_vma disappearing so we first we take a + * reference to it and then lock the anon_vma for write. This + * is similar to page_lock_anon_vma_read except the write lock + * is taken to serialise against parallel split or collapse + * operations. + */ + anon_vma = page_get_anon_vma(head); + if (!anon_vma) { + ret = -EBUSY; + goto out; + } + mapping = NULL; + anon_vma_lock_write(anon_vma); + } else { + VM_BUG_ON(1); + mapping = head->mapping; + + /* Truncated ? */ + if (!mapping) { + ret = -EBUSY; + goto out; + } + + anon_vma = NULL; + i_mmap_lock_read(mapping); + } + + /* + * Racy check if we can split the page, before freeze_pud_page() will + * split PUDs + */ + if (!can_split_huge_pud_page(head, &extra_pins)) { + ret = -EBUSY; + goto out_unlock; + } + + mlocked = PageMlocked(page); + freeze_pud_page(head); + VM_BUG_ON_PAGE(compound_mapcount(head), head); + + /* Make sure the page is not on per-CPU pagevec as it takes pin */ + if (mlocked) + lru_add_drain(); + + /* prevent PageLRU to go away from under us, and freeze lru stats */ + spin_lock_irqsave(zone_lru_lock(page_zone(head)), flags); + + if (mapping) { + void **pslot; + + VM_BUG_ON(1); + + xa_lock(&mapping->i_pages); + pslot = radix_tree_lookup_slot(&mapping->i_pages, + page_index(head)); + /* + * Check if the head page is present in radix tree. + * We assume all tail are present too, if head is there. + */ + if (radix_tree_deref_slot_protected(pslot, + &mapping->i_pages.xa_lock) != head) + goto fail; + } + + /* Prevent deferred_split_scan() touching ->_refcount */ + spin_lock(&pgdata->split_queue_lock); + count = page_count(head); + mapcount = total_mapcount(head); + if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { + if (!list_empty(page_deferred_list(head))) { + pgdata->split_queue_len--; + list_del(page_deferred_list(head)); + } + if (mapping) { + VM_BUG_ON(1); + __dec_node_page_state(page, NR_SHMEM_THPS); + } + spin_unlock(&pgdata->split_queue_lock); + __split_huge_pud_page(page, list, flags); + if (PageSwapCache(head)) { + swp_entry_t entry = { .val = page_private(head) }; + + VM_BUG_ON(1); + + ret = split_swap_cluster(entry); + } else + ret = 0; + } else { + if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { + pr_alert("total_mapcount: %u, page_count(): %u\n", + mapcount, count); + if (PageTail(page)) + dump_page(head, NULL); + dump_page(page, "total_mapcount(head) > 0"); + VM_BUG_ON(1); + } + spin_unlock(&pgdata->split_queue_lock); +fail: + if (mapping) { + VM_BUG_ON(1); + xa_unlock(&mapping->i_pages); + } + spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); + unfreeze_pud_page(head); + ret = -EBUSY; + } + +out_unlock: + if (anon_vma) { + anon_vma_unlock_write(anon_vma); + put_anon_vma(anon_vma); + } + if (mapping) + i_mmap_unlock_read(mapping); +out: + count_vm_event(!ret ? THP_SPLIT_PUD_PAGE : THP_SPLIT_PUD_PAGE_FAILED); + return ret; +} #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, @@ -2687,7 +3228,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, set_page_dirty(page); if (!PageReferenced(page) && pmd_young(_pmd)) SetPageReferenced(page); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; @@ -2787,12 +3328,19 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, * Set PG_double_map before dropping compound_mapcount to avoid * false-negative page_mapped(). */ - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { + if (((PMDPageInPUD(page) && + sub_compound_mapcount(page) > + (1 + PagePUDDoubleMap(compound_head(page)))) || + compound_mapcount(page) > 1) + && !TestSetPageDoubleMap(page)) { for (i = 0; i < HPAGE_PMD_NR; i++) atomic_inc(&page[i]._mapcount); } - if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { + if ((PMDPageInPUD(page) && + atomic_add_negative(-(1 + PagePUDDoubleMap(compound_head(page))), + sub_compound_mapcount_ptr(page, 1))) || + atomic_add_negative(-1, compound_mapcount_ptr(page))) { /* Last compound_mapcount is gone. */ __dec_node_page_state(page, NR_ANON_THPS); if (TestClearPageDoubleMap(page)) { @@ -2807,7 +3355,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, if (freeze) { for (i = 0; i < HPAGE_PMD_NR; i++) { - page_remove_rmap(page + i, false); + page_remove_rmap(page + i, false, 0); put_page(page + i); } } @@ -2892,6 +3440,11 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, * previously contain an hugepage: check if we need to split * an huge pmd. */ + if (start & ~HPAGE_PUD_MASK && + (start & HPAGE_PUD_MASK) >= vma->vm_start && + (start & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE <= vma->vm_end) + split_huge_pud_address(vma, start, false, NULL); + if (start & ~HPAGE_PMD_MASK && (start & HPAGE_PMD_MASK) >= vma->vm_start && (start & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end) @@ -2902,6 +3455,11 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, * previously contain an hugepage: check if we need to split * an huge pmd. */ + if (end & ~HPAGE_PUD_MASK && + (end & HPAGE_PUD_MASK) >= vma->vm_start && + (end & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE <= vma->vm_end) + split_huge_pud_address(vma, end, false, NULL); + if (end & ~HPAGE_PMD_MASK && (end & HPAGE_PMD_MASK) >= vma->vm_start && (end & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end) @@ -2916,6 +3474,11 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, struct vm_area_struct *next = vma->vm_next; unsigned long nstart = next->vm_start; nstart += adjust_next << PAGE_SHIFT; + if (nstart & ~HPAGE_PUD_MASK && + (nstart & HPAGE_PUD_MASK) >= next->vm_start && + (nstart & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE <= next->vm_end) + split_huge_pud_address(next, nstart, false, NULL); + if (nstart & ~HPAGE_PMD_MASK && (nstart & HPAGE_PMD_MASK) >= next->vm_start && (nstart & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= next->vm_end) @@ -3084,12 +3647,23 @@ int total_mapcount(struct page *page) if (PageHuge(page)) return compound; ret = compound; - for (i = 0; i < HPAGE_PMD_NR; i++) - ret += atomic_read(&page[i]._mapcount) + 1; + /* if PMD, read all base page, if PUD, read the sub_compound_mapcount()*/ + if (compound_order(page) == HPAGE_PMD_ORDER) { + for (i = 0; i < hpage_nr_pages(page); i++) + ret += atomic_read(&page[i]._mapcount) + 1; + } else if (compound_order(page) == HPAGE_PUD_ORDER) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + ret += sub_compound_mapcount(&page[i]); + for (i = 0; i < hpage_nr_pages(page); i++) + ret += atomic_read(&page[i]._mapcount) + 1; + } else + VM_BUG_ON_PAGE(1, page); /* File pages has compound_mapcount included in _mapcount */ + /* both PUD and PMD has HPAGE_PMD_NR sub pages */ if (!PageAnon(page)) return ret - compound * HPAGE_PMD_NR; - if (PageDoubleMap(page)) + /* both PUD and PMD has HPAGE_PMD_NR sub pages */ + if (PagePUDDoubleMap(page) || PageDoubleMap(page)) ret -= HPAGE_PMD_NR; return ret; } @@ -3135,13 +3709,38 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount) page = compound_head(page); _total_mapcount = ret = 0; - for (i = 0; i < HPAGE_PMD_NR; i++) { - mapcount = atomic_read(&page[i]._mapcount) + 1; - ret = max(ret, mapcount); - _total_mapcount += mapcount; - } - if (PageDoubleMap(page)) { + /* if PMD, read all base page, if PUD, read the sub_compound_mapcount()*/ + if (compound_order(page) == HPAGE_PMD_ORDER) { + for (i = 0; i < hpage_nr_pages(page); i++) { + mapcount = atomic_read(&page[i]._mapcount) + 1; + ret = max(ret, mapcount); + _total_mapcount += mapcount; + } + } else if (compound_order(page) == HPAGE_PUD_ORDER) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + int j; + + mapcount = sub_compound_mapcount(&page[i]); + ret = max(ret, mapcount); + _total_mapcount += mapcount; + + /* Triple mapped at base page size */ + for (j = 0; j < HPAGE_PMD_NR; j++) { + mapcount = atomic_read(&page[i + j]._mapcount) + 1; + ret = max(ret, mapcount); + _total_mapcount += mapcount; + } + + if (PageDoubleMap(&page[i])) { + ret -= 1; + _total_mapcount -= HPAGE_PMD_NR; + } + } + } else + VM_BUG_ON_PAGE(1, page); + if (PageDoubleMap(page) || PagePUDDoubleMap(page)) { ret -= 1; + /* both PUD and PMD has HPAGE_PMD_NR sub pages */ _total_mapcount -= HPAGE_PMD_NR; } mapcount = compound_mapcount(page); @@ -3360,6 +3959,9 @@ static unsigned long deferred_split_count(struct shrinker *shrink, return READ_ONCE(pgdata->split_queue_len); } +#define deferred_list_entry(x) (compound_head(list_entry((void *)x, \ + struct page, mapping))) + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3372,8 +3974,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, spin_lock_irqsave(&pgdata->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_safe(pos, next, &pgdata->split_queue) { - page = list_entry((void *)pos, struct page, mapping); - page = compound_head(page); + page = deferred_list_entry(pos); if (get_page_unless_zero(page)) { list_move(page_deferred_list(page), &list); } else { @@ -3387,12 +3988,18 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, spin_unlock_irqrestore(&pgdata->split_queue_lock, flags); list_for_each_safe(pos, next, &list) { - page = list_entry((void *)pos, struct page, mapping); + page = deferred_list_entry(pos); if (!trylock_page(page)) goto next; /* split_huge_page() removes page from list on success */ - if (!split_huge_page(page)) - split++; + if (compound_order(page) == HPAGE_PUD_ORDER) { + if (!split_huge_pud_page(page)) + split++; + } else if (compound_order(page) == HPAGE_PMD_ORDER) { + if (!split_huge_page(page)) + split++; + } else + VM_BUG_ON_PAGE(1, page); unlock_page(page); next: put_page(page); @@ -3499,7 +4106,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); put_page(page); } @@ -3525,7 +4132,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE); if (PageAnon(new)) - page_add_anon_rmap(new, vma, mmun_start, true); + page_add_anon_rmap(new, vma, mmun_start, true, HPAGE_PMD_ORDER); else page_add_file_rmap(new, true); set_pmd_at(mm, mmun_start, pvmw->pmd, pmde); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index afef61656c1e..0db6c31440e8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3418,7 +3418,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, set_page_dirty(page); hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, true); + page_remove_rmap(page, true, huge_page_order(h)); spin_unlock(ptl); tlb_remove_page_size(tlb, page, huge_page_size(h)); @@ -3643,7 +3643,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, range.start, range.end); set_huge_pte_at(mm, haddr, ptep, make_huge_pte(vma, new_page, 1)); - page_remove_rmap(old_page, true); + page_remove_rmap(old_page, true, huge_page_order(h)); hugepage_add_new_anon_rmap(new_page, vma, haddr); /* Make the old page be freed below */ new_page = old_page; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index aedaa9f75806..3acfddcba714 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -674,7 +674,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, * superfluous. */ pte_clear(vma->vm_mm, address, _pte); - page_remove_rmap(src_page, false); + page_remove_rmap(src_page, false, 0); spin_unlock(ptl); free_page_and_swap_cache(src_page); } @@ -1073,7 +1073,7 @@ static void collapse_huge_page(struct mm_struct *mm, spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); - page_add_new_anon_rmap(new_page, vma, address, true); + page_add_new_anon_rmap(new_page, vma, address, true, HPAGE_PMD_ORDER); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_active_or_unevictable(new_page, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); diff --git a/mm/ksm.c b/mm/ksm.c index dc1ec06b71a0..68f1d0f8be22 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1154,7 +1154,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, */ if (!is_zero_pfn(page_to_pfn(kpage))) { get_page(kpage); - page_add_anon_rmap(kpage, vma, addr, false); + page_add_anon_rmap(kpage, vma, addr, false, 0); newpte = mk_pte(kpage, vma->vm_page_prot); } else { newpte = pte_mkspecial(pfn_pte(page_to_pfn(kpage), @@ -1178,7 +1178,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, ptep_clear_flush(vma, addr, ptep); set_pte_at_notify(mm, addr, ptep, newpte); - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); if (!page_mapped(page)) try_to_free_swap(page); put_page(page); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af7f18b32389..ae3ff6a4da8c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2678,6 +2678,19 @@ void mem_cgroup_split_huge_fixup(struct page *head) __mod_memcg_state(head->mem_cgroup, MEMCG_RSS_HUGE, -HPAGE_PMD_NR); } + +void mem_cgroup_split_huge_pud_fixup(struct page *head) +{ + int i; + + if (mem_cgroup_disabled()) + return; + + for (i = HPAGE_PMD_NR; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + head[i].mem_cgroup = head->mem_cgroup; + + /*__mod_memcg_state(head->mem_cgroup, MEMCG_RSS_HUGE, -HPAGE_PUD_NR);*/ +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_MEMCG_SWAP diff --git a/mm/memory.c b/mm/memory.c index 3608b5436519..c875cc1a2600 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1088,7 +1088,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, mark_page_accessed(page); } rss[mm_counter(page)]--; - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); if (unlikely(__tlb_remove_page(tlb, page))) { @@ -1116,7 +1116,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--; - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); put_page(page); continue; } @@ -2300,7 +2300,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * thread doing COW. */ ptep_clear_flush_notify(vma, vmf->address, vmf->pte); - page_add_new_anon_rmap(new_page, vma, vmf->address, false); + page_add_new_anon_rmap(new_page, vma, vmf->address, false, 0); mem_cgroup_commit_charge(new_page, memcg, false, false); lru_cache_add_active_or_unevictable(new_page, vma); /* @@ -2333,7 +2333,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * mapcount is visible. So transitively, TLBs to * old page will be flushed before it can be reused. */ - page_remove_rmap(old_page, false); + page_remove_rmap(old_page, false, 0); } /* Free the old page.. */ @@ -2816,11 +2816,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* ksm created a completely new copy */ if (unlikely(page != swapcache && swapcache)) { - page_add_new_anon_rmap(page, vma, vmf->address, false); + page_add_new_anon_rmap(page, vma, vmf->address, false, 0); mem_cgroup_commit_charge(page, memcg, false, false); lru_cache_add_active_or_unevictable(page, vma); } else { - do_page_add_anon_rmap(page, vma, vmf->address, exclusive); + do_page_add_anon_rmap(page, vma, vmf->address, exclusive, 0); mem_cgroup_commit_charge(page, memcg, true, false); activate_page(page); } @@ -2967,7 +2967,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) } inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, vmf->address, false); + page_add_new_anon_rmap(page, vma, vmf->address, false, 0); mem_cgroup_commit_charge(page, memcg, false, false); lru_cache_add_active_or_unevictable(page, vma); setpte: @@ -3241,7 +3241,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, vmf->address, false); + page_add_new_anon_rmap(page, vma, vmf->address, false, 0); mem_cgroup_commit_charge(page, memcg, false, false); lru_cache_add_active_or_unevictable(page, vma); } else { diff --git a/mm/migrate.c b/mm/migrate.c index b8c79aa62134..f7e5d88210ee 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -268,7 +268,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); if (PageAnon(new)) - page_add_anon_rmap(new, vma, pvmw.address, false); + page_add_anon_rmap(new, vma, pvmw.address, false, 0); else page_add_file_rmap(new, false); } @@ -2067,7 +2067,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, page_ref_unfreeze(page, 2); mlock_migrate_page(new_page, page); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED); spin_unlock(ptl); @@ -2297,7 +2297,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, * drop page refcount. Page won't be freed, as we took * a reference just above. */ - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); put_page(page); if (pte_present(pte)) @@ -2688,7 +2688,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } inc_mm_counter(mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, addr, false); + page_add_new_anon_rmap(page, vma, addr, false, 0); mem_cgroup_commit_charge(page, memcg, false, false); if (!is_zone_device_page(page)) lru_cache_add_active_or_unevictable(page, vma); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a3b295ea7348..dbcccc022b30 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -626,6 +626,9 @@ void prep_compound_page(struct page *page, unsigned int order) set_compound_head(p, page); } atomic_set(compound_mapcount_ptr(page), -1); + if (order == HPAGE_PUD_ORDER) + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + atomic_set(sub_compound_mapcount_ptr(&page[i], 1), -1); } #ifdef CONFIG_DEBUG_PAGEALLOC @@ -1001,6 +1004,13 @@ static int free_tail_pages_check(struct page *head_page, struct page *page) */ break; default: + /* sub_compound_map_ptr store here */ + if (compound_order(head_page) == HPAGE_PUD_ORDER && + (page - head_page) % HPAGE_PMD_NR == 3) { + if (unlikely(atomic_read(&page->compound_mapcount) != -1)) + bad_page(page, "nonzero sub_compound_mapcount", 0); + break; + } if (page->mapping != TAIL_MAPPING) { bad_page(page, "corrupted mapping in tail page", 0); goto out; @@ -1041,8 +1051,14 @@ static __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); - if (compound) + if (compound) { ClearPageDoubleMap(page); + if (order == HPAGE_PUD_ORDER) { + ClearPagePUDDoubleMap(page); + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + ClearPageDoubleMap(&page[i]); + } + } for (i = 1; i < (1 << order); i++) { if (compound) bad += free_tail_pages_check(page, page + i); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 0b79568fba1c..95af1d67f209 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -236,6 +236,17 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PUDP_INVALIDATE +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + pud_t old = pudp_establish(vma, address, pudp, pud_mknotpresent(*pudp)); + + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return old; +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) diff --git a/mm/rmap.c b/mm/rmap.c index f69d81d4a956..79908cfc518a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1097,9 +1097,9 @@ static void __page_check_anon_rmap(struct page *page, * (but PageKsm is never downgraded to PageAnon). */ void page_add_anon_rmap(struct page *page, - struct vm_area_struct *vma, unsigned long address, bool compound) + struct vm_area_struct *vma, unsigned long address, bool compound, int order) { - do_page_add_anon_rmap(page, vma, address, compound ? RMAP_COMPOUND : 0); + do_page_add_anon_rmap(page, vma, address, compound ? RMAP_COMPOUND : 0, order); } /* @@ -1108,7 +1108,7 @@ void page_add_anon_rmap(struct page *page, * Everybody else should continue to use page_add_anon_rmap above. */ void do_page_add_anon_rmap(struct page *page, - struct vm_area_struct *vma, unsigned long address, int flags) + struct vm_area_struct *vma, unsigned long address, int flags, int order) { bool compound = flags & RMAP_COMPOUND; bool first; @@ -1117,7 +1117,18 @@ void do_page_add_anon_rmap(struct page *page, atomic_t *mapcount; VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageTransHuge(page), page); - mapcount = compound_mapcount_ptr(page); + if (compound_order(page) == HPAGE_PUD_ORDER) { + if (order == HPAGE_PUD_ORDER) { + mapcount = compound_mapcount_ptr(page); + } else if (order == HPAGE_PMD_ORDER) { + VM_BUG_ON(!PMDPageInPUD(page)); + mapcount = sub_compound_mapcount_ptr(page, 1); + } else + VM_BUG_ON(1); + } else if (compound_order(page) == HPAGE_PMD_ORDER) { + mapcount = compound_mapcount_ptr(page); + } else + VM_BUG_ON(1); first = atomic_inc_and_test(mapcount); } else { first = atomic_inc_and_test(&page->_mapcount); @@ -1132,7 +1143,7 @@ void do_page_add_anon_rmap(struct page *page, * disabled. */ if (compound) { - if (nr == HPAGE_PMD_NR) + if (order == HPAGE_PMD_ORDER) __inc_node_page_state(page, NR_ANON_THPS); else __inc_node_page_state(page, NR_ANON_THPS_PUD); @@ -1164,7 +1175,7 @@ void do_page_add_anon_rmap(struct page *page, * Page does not have to be locked. */ void page_add_new_anon_rmap(struct page *page, - struct vm_area_struct *vma, unsigned long address, bool compound) + struct vm_area_struct *vma, unsigned long address, bool compound, int order) { int nr = compound ? hpage_nr_pages(page) : 1; @@ -1174,10 +1185,15 @@ void page_add_new_anon_rmap(struct page *page, VM_BUG_ON_PAGE(!PageTransHuge(page), page); /* increment count (starts at -1) */ atomic_set(compound_mapcount_ptr(page), 0); - if (nr == HPAGE_PMD_NR) - __inc_node_page_state(page, NR_ANON_THPS); - else + if (order == HPAGE_PUD_ORDER) { + VM_BUG_ON(compound_order(page) != HPAGE_PUD_ORDER); + /* Anon THP always mapped first with PMD */ __inc_node_page_state(page, NR_ANON_THPS_PUD); + } else if (order == HPAGE_PMD_ORDER) { + VM_BUG_ON(compound_order(page) != HPAGE_PMD_ORDER); + __inc_node_page_state(page, NR_ANON_THPS); + } else + VM_BUG_ON(1); } else { /* Anon THP always mapped first with PMD */ VM_BUG_ON_PAGE(PageTransCompound(page), page); @@ -1268,12 +1284,40 @@ static void page_remove_file_rmap(struct page *page, bool compound) unlock_page_memcg(page); } -static void page_remove_anon_compound_rmap(struct page *page) +static void page_remove_anon_compound_rmap(struct page *page, int order) { - int i, nr; - - if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) - return; + int i, nr = 0; + struct page *head = compound_head(page); + + if (compound_order(head) == HPAGE_PUD_ORDER) { + if (order == HPAGE_PMD_ORDER) { + VM_BUG_ON(!PMDPageInPUD(page)); + if (atomic_add_negative(-1, sub_compound_mapcount_ptr(page, 1))) { + if (TestClearPageDoubleMap(page)) { + /* + * Subpages can be mapped with PTEs too. Check how many of + * themi are still mapped. + */ + for (i = 0; i < hpage_nr_pages(head); i++) { + if (atomic_add_negative(-1, &head[i]._mapcount)) + nr++; + } + } + __dec_node_page_state(page, NR_ANON_THPS); + } + nr += HPAGE_PMD_NR; + __mod_node_page_state(page_pgdat(head), NR_ANON_MAPPED, -nr); + return; + } else { + VM_BUG_ON(order != HPAGE_PUD_ORDER); + if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) + return; + } + } else if (compound_order(head) == HPAGE_PMD_ORDER) { + if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) + return; + } else + VM_BUG_ON_PAGE(1, page); /* Hugepages are not counted in NR_ANON_PAGES for now. */ if (unlikely(PageHuge(page))) @@ -1282,30 +1326,44 @@ static void page_remove_anon_compound_rmap(struct page *page) if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) return; - if (hpage_nr_pages(page) == HPAGE_PMD_NR) + if (order == HPAGE_PMD_ORDER) __dec_node_page_state(page, NR_ANON_THPS); - else + else if (order == HPAGE_PUD_ORDER) __dec_node_page_state(page, NR_ANON_THPS_PUD); + else + VM_BUG_ON(1); - if (TestClearPageDoubleMap(page)) { + /* PMD-mapped PUD THP is handled above */ + if (TestClearPagePUDDoubleMap(head)) { + VM_BUG_ON(!(compound_order(head) == HPAGE_PUD_ORDER || head == page)); + /* + * Subpages can be mapped with PMDs too. Check how many of + * themi are still mapped. + */ + for (i = 0, nr = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + if (atomic_add_negative(-1, sub_compound_mapcount_ptr(&head[i], 1))) + nr += HPAGE_PMD_NR; + } + } else if (TestClearPageDoubleMap(head)) { + VM_BUG_ON(compound_order(head) != HPAGE_PMD_ORDER); /* * Subpages can be mapped with PTEs too. Check how many of * themi are still mapped. */ - for (i = 0, nr = 0; i < hpage_nr_pages(page); i++) { - if (atomic_add_negative(-1, &page[i]._mapcount)) + for (i = 0, nr = 0; i < hpage_nr_pages(head); i++) { + if (atomic_add_negative(-1, &head[i]._mapcount)) nr++; } } else { - nr = hpage_nr_pages(page); + nr = hpage_nr_pages(head); } if (unlikely(PageMlocked(page))) clear_page_mlock(page); if (nr) { - __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr); - deferred_split_huge_page(page); + __mod_node_page_state(page_pgdat(head), NR_ANON_MAPPED, -nr); + deferred_split_huge_page(head); } } @@ -1316,13 +1374,13 @@ static void page_remove_anon_compound_rmap(struct page *page) * * The caller needs to hold the pte lock. */ -void page_remove_rmap(struct page *page, bool compound) +void page_remove_rmap(struct page *page, bool compound, int order) { if (!PageAnon(page)) return page_remove_file_rmap(page, compound); if (compound) - return page_remove_anon_compound_rmap(page); + return page_remove_anon_compound_rmap(page, order); /* page still mapped by someone else? */ if (!atomic_add_negative(-1, &page->_mapcount)) @@ -1672,7 +1730,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, PageHuge(page)); + page_remove_rmap(subpage, PageHuge(page), 0); put_page(page); } diff --git a/mm/swap.c b/mm/swap.c index 4929bc1be60e..79de59875280 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -851,6 +851,44 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, if (!PageUnevictable(page)) update_page_reclaim_stat(lruvec, file, PageActive(page_tail)); } + +/* used by __split_pud_huge_page_tail() */ +void lru_add_pud_page_tail(struct page *page, struct page *page_tail, + struct lruvec *lruvec, struct list_head *list) +{ + const int file = 0; + + VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(PageLRU(page_tail), page); + VM_BUG_ON(NR_CPUS != 1 && + !spin_is_locked(&lruvec_pgdat(lruvec)->lru_lock)); + + if (!list) + SetPageLRU(page_tail); + + if (likely(PageLRU(page))) + list_add_tail(&page_tail->lru, &page->lru); + else if (list) { + /* page reclaim is reclaiming a huge page */ + get_page(page_tail); + list_add_tail(&page_tail->lru, list); + } else { + struct list_head *list_head; + /* + * Head page has not yet been counted, as an hpage, + * so we must account for each subpage individually. + * + * Use the standard add function to put page_tail on the list, + * but then correct its position so they all end up in order. + */ + add_page_to_lru_list(page_tail, lruvec, page_lru(page_tail)); + list_head = page_tail->lru.prev; + list_move_tail(&page_tail->lru, list_head); + } + + if (!PageUnevictable(page)) + update_page_reclaim_stat(lruvec, file, PageActive(page_tail)); +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, diff --git a/mm/swapfile.c b/mm/swapfile.c index dbac1d49469d..742caaea2aa5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1775,10 +1775,10 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); if (page == swapcache) { - page_add_anon_rmap(page, vma, addr, false); + page_add_anon_rmap(page, vma, addr, false, 0); mem_cgroup_commit_charge(page, memcg, true, false); } else { /* ksm created a completely new copy */ - page_add_new_anon_rmap(page, vma, addr, false); + page_add_new_anon_rmap(page, vma, addr, false, 0); mem_cgroup_commit_charge(page, memcg, false, false); lru_cache_add_active_or_unevictable(page, vma); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index d59b5a73dfb3..e49537f6000e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -90,7 +90,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_uncharge_unlock; inc_mm_counter(dst_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, dst_vma, dst_addr, false); + page_add_new_anon_rmap(page, dst_vma, dst_addr, false, 0); mem_cgroup_commit_charge(page, memcg, false, false); lru_cache_add_active_or_unevictable(page, dst_vma); diff --git a/mm/util.c b/mm/util.c index 1ea055138043..1b1b6dd386d1 100644 --- a/mm/util.c +++ b/mm/util.c @@ -536,8 +536,15 @@ struct address_space *page_mapping_file(struct page *page) int __page_mapcount(struct page *page) { int ret; + struct page *head = compound_head(page); ret = atomic_read(&page->_mapcount) + 1; + if (compound_order(head) == HPAGE_PUD_ORDER) { + struct page *sub_compound_page = head + + (((page - head) / HPAGE_PMD_NR) * HPAGE_PMD_NR); + + ret += sub_compound_mapcount(sub_compound_page); + } /* * For file THP page->_mapcount contains total number of mapping * of the page: no need to look into compound_mapcount. diff --git a/mm/vmstat.c b/mm/vmstat.c index 25a88693e417..1d185cf748a6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1263,6 +1263,10 @@ const char * const vmstat_text[] = { "thp_fault_alloc_pud", "thp_fault_fallback_pud", "thp_split_pud", + "thp_split_pud_page", + "thp_split_pud_page_failed", + "thp_zero_pud_page_alloc", + "thp_zero_pud_page_alloc_failed", #endif "thp_zero_page_alloc", "thp_zero_page_alloc_failed", From patchwork Fri Feb 15 22:08:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815975 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0332A139A for ; Fri, 15 Feb 2019 22:09:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E73B42E9E6 for ; Fri, 15 Feb 2019 22:09:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB8412FE5E; Fri, 15 Feb 2019 22:09:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80F1A2E9E6 for ; Fri, 15 Feb 2019 22:09:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CA6A8E0012; Fri, 15 Feb 2019 17:09:27 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2519D8E0009; Fri, 15 Feb 2019 17:09:27 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118D08E0012; Fri, 15 Feb 2019 17:09:27 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id D9EFC8E0009 for ; Fri, 15 Feb 2019 17:09:26 -0500 (EST) Received: by mail-qt1-f197.google.com with SMTP id 35so10451233qtq.5 for ; Fri, 15 Feb 2019 14:09:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=r9i3vSlxWDPjuGqrZdOfoK2Xh9axnIDanzNb86muQ/I=; b=tiWqlp9Fg72iDtDYT+LGfV9hcBY0tomNSv+unuoN8A0FU9j2tx70x1TT5/UxF0xXii oETkXYQ013jvYriGhYGpOPlJU/G3vCEyOvnfkCjpUjyOVOdh97kAzdzWwutvLGX/DRCR E2ANN4LoAGas7jhI94LageTvdYyi3jfExdMpum/jkyv4ARLIlwczYZsBg3Nsdp0i2xkd 3U49sipehkLn0TzmiVNDpVoIK5TJWprEoUDZ53Fbi09RzWeHMYQh5tmu/VXoYyp4gZrC obXg49QrmYHNrZDAuZdSOINJoHWQp5R7L1U+KbRRTRqMUGDeydeif/Nt44lnOHj0GTnT 18sA== X-Gm-Message-State: AHQUAua39DUgwPPL6Sqjl/IjvIbFi7IbH/S41IEO4NahQI9vM9T3uqAr wwSHLyrfmdjGLOeqOWfcv/trB91pJCHMme/ut0wM5mrdmpGjjeEGhxNVZkTFEqHLQ4YmdK6njA2 6fWCO7H6s58ze0fuy3Vqqw3v5sYGTV4qE0qQt25XxS7z77L0ORTB4sObPEt4A3AiQlw== X-Received: by 2002:a0c:b4ae:: with SMTP id c46mr2251327qve.91.1550268566684; Fri, 15 Feb 2019 14:09:26 -0800 (PST) X-Google-Smtp-Source: AHgI3Ibx0GBRVeQ4vnroHS9My8uiQhrCaqmhnkdbiDgsmgdnSyoopcqiW4RtJkJPNSrQEWo3uYMt X-Received: by 2002:a0c:b4ae:: with SMTP id c46mr2251297qve.91.1550268566179; Fri, 15 Feb 2019 14:09:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268566; cv=none; d=google.com; s=arc-20160816; b=VBCSL81QfUX4TtvPJzp4IDstKUB6iMHiDdcl0QLkvSiNNJO75matbR92egS4BTChR8 YHCxzcKl4mh+VZMsFj7DpzLOkH+uuyXJ1e8RQeAnK9IO2di+345f5yckNDBXAheggG6B Tta8wz0esxfgTkqzg+zTen+n8HYSNPhjZBHv+S6pG+gN+XDjpddeMaIdf4tDKIBWSEOv vEovkpyjgjOmSGDT+gDWvhSvjp8sZWLvFt9+3N+BkbOtZ5xgdJN5TuT1skRhRy7M8LcK uU+2BjYXTgKg8j3PTNG6U3Drwfx9al1QbH3x/KdSAjYb1Y7hH9kjyjBjD6IfZh1i1Zdw GsMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=r9i3vSlxWDPjuGqrZdOfoK2Xh9axnIDanzNb86muQ/I=; b=iad/pF71jCzNeR56epYRlmiNmNJpJKWSqF/ig3agBTv/XSVqgXjA6YVvo/eOkyl1N9 632LhCzdHffdu/sXUufOYL2SjTvMfQduGBjGjeI+Y5kcqKRxcuHPvl74nVbk0a2KXtHz m+Y2+n8Sc7luBj4AAmrwvcDXrqhSb85H+kOn64GpqbicvJgPtJwmhjKdK4Pti0buHfJc mA9/FeEuB2+T+6Ct9gTX+dLDUZPwESYS5duirMNYDSQNUy7U8vBGcKIACKk66fg6PYmT UGwSVyS4D2aGSYc/n570ajJv3Wh3tIiwe9ZbDAjp4kvJ7rrYeLOhEUzqIL+8+fpIVHwI 4nsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b="GQS/bI26"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=s6OKUXHN; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id w90si4410947qvw.209.2019.02.15.14.09.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:26 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b="GQS/bI26"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=s6OKUXHN; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 6A54F31CA; Fri, 15 Feb 2019 17:09:24 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=r9i3vSlxWDPju GqrZdOfoK2Xh9axnIDanzNb86muQ/I=; b=GQS/bI26wmmBEUSWTQc1KDie06qmf dHHS6bcITudgOzKWqJm+bTQa3Yn9q/a8KZZie2M7okmXpoNg4mPOmHJnTgZEapT4 BgXIhIm7O+Lxd+tFgWjRjyP2bFFU0X80GNp5O0liDYXW1PmjqBctKETvUOk/IQli oZ39kVtke9F4FjDhff4fN2MeHyDTPMN8V1P9kNVhs1G5HR0RD7IC1f+Dbmj6iONa +BoWnCBMscCLCtTJF34/l8cKOadaYaJp3bRsCGvfZB+NYghl07+fuNL1KTPUdzxZ 6B8Kn245uznBC8SaduoZVmlsGRYOVR+WTD8si2rtIZe4oY1a1+RKQsMrg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=r9i3vSlxWDPjuGqrZdOfoK2Xh9axnIDanzNb86muQ/I=; b=s6OKUXHN jGHg3CnPH6ajnJDQiWl02z+FyKMZpcL3YW6nPFe0amZCViM856OiTgK5pKfVti7r yqzFmInVcouPviwuKTx0+TdplJVscLvUVrUFEcHm5LezqdWiznR32XB6XeKF8kld YOoT/HYs1pnx3VlLFcUm8qT0Qt96R1JHEPwyvZXcfgZQPWEe9juedl+5bh3xVeCG 5W5lAYTWYuFwc1DwqPXI8gRPD+PdVcklgNhWCcQke7AovVLMaz/JklW5MCjf5cVP 9q6EQN3eNeKRoJjGnCeLLXuvF7C6o/YdyEtuW88Nac7btKprHbwSCXQa5+7FXEV5 /9xgaAJ/UhxJ1Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedufe X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 74384E462B; Fri, 15 Feb 2019 17:09:22 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 16/31] mm: thp: check compound_mapcount of PMD-mapped PUD THPs at free time. Date: Fri, 15 Feb 2019 14:08:41 -0800 Message-Id: <20190215220856.29749-17-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan PMD mappings on a PUD THP should be zero when the page is freed. Signed-off-by: Zi Yan --- mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dbcccc022b30..b87a2ca0a97c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1007,8 +1007,10 @@ static int free_tail_pages_check(struct page *head_page, struct page *page) /* sub_compound_map_ptr store here */ if (compound_order(head_page) == HPAGE_PUD_ORDER && (page - head_page) % HPAGE_PMD_NR == 3) { - if (unlikely(atomic_read(&page->compound_mapcount) != -1)) + if (unlikely(atomic_read(&page->compound_mapcount) != -1)) { + pr_err("sub_compound_mapcount: %d\n", atomic_read(&page->compound_mapcount) + 1); bad_page(page, "nonzero sub_compound_mapcount", 0); + } break; } if (page->mapping != TAIL_MAPPING) { From patchwork Fri Feb 15 22:08:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815977 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E27C213B5 for ; Fri, 15 Feb 2019 22:10:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE3A12FE51 for ; Fri, 15 Feb 2019 22:10:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BF5062FE5E; Fri, 15 Feb 2019 22:10:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4A61F2E9E6 for ; Fri, 15 Feb 2019 22:10:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A91F8E0013; Fri, 15 Feb 2019 17:09:28 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 75D148E0009; Fri, 15 Feb 2019 17:09:28 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5ADE58E0013; Fri, 15 Feb 2019 17:09:28 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id 333818E0009 for ; Fri, 15 Feb 2019 17:09:28 -0500 (EST) Received: by mail-qk1-f198.google.com with SMTP id n197so9429516qke.0 for ; Fri, 15 Feb 2019 14:09:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=GfBeHwr04ORtgFLQqsnmzDHGaDVuFMi5ctYcK2NwUOA=; b=lysxUH+PyniTe0T3k/HOn508Ny6g1ZGTQblZJgSvQugnStTObmRYOQWgWddc/RevE8 UQTu/+r0Va504J4fGYlHd3/Zg4Xm9pI/scuqiSvmPEACImpp6fgN1vqVxX0tm7uLtUL5 HZOkWPuvIF2BKbTqxU2ziufd903ShT5m+LNbH3F+qk6quCPX1fKouhKAvp/DSRj76HIB 7MD9UqNbrRcNHFftfjEWi+7jncKecLaU71HwhgEioCOw2hWU3Dfn+ubhdxRfDN8erhNO BKcabRGLpdAjDJMhv2US1Nf2m6MxzAv+mxeQMVLNGrNM7R4EAHscq22NiEzYsICaYBQb sbRw== X-Gm-Message-State: AHQUAuYKtYfcPQNF4akrhyyQisQZDoYzfENCkrtoCiu8WhnKOWgVMyH4 zUZUTBkzXjp+RJZ2yxsV++NqfEWhVQhDwKuYp5W2STluNSu9T02kHjbnxRULynzGOpK2klte5/a rOYqM/Fl9LH4PeTle+XImsNMq/YwVXtNaocSI3UVoQVrYHIDi12tZCoudTeyaEY7pQw== X-Received: by 2002:ac8:2709:: with SMTP id g9mr9163368qtg.287.1550268568014; Fri, 15 Feb 2019 14:09:28 -0800 (PST) X-Google-Smtp-Source: AHgI3IY9+GfQf01Ffej8k8fJbL6FGIrHjURtLYCJSvzB6pzyv5nrRAY95ZSXZZXU2a7BnwBkHwPs X-Received: by 2002:ac8:2709:: with SMTP id g9mr9163344qtg.287.1550268567571; Fri, 15 Feb 2019 14:09:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268567; cv=none; d=google.com; s=arc-20160816; b=ZjH7M6LrM+HjAqJMkHmidTSt+GZ693HJHdLpbR76xWAcNVAlGfUiVuxIeUUoHgzcSA B3sgtRKsqK8btAuDWzMS5/RQ5PUZ6fVpWOTzzN4M8AvqNCLzzca8/YQS0SxPsM7I2y6p NXgC65XdCJBQywK/Ogg9KjNoWFJNHz1rloCLAsgLLruAAZEe0flH9jvGKiAPadv7g+z+ fjJZVmg+nMy6QWFQuaRlU775SFErn05Az3ABqzuoa+YwNDmXVG4Sp/h5OMZd6yNjOrVw m4AmAm2twjoXVB1pH9ido4wQZyoNUUtgz9+8x5+ohA767AcSM21OxbF0QEck5eRkuDYe bZjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=GfBeHwr04ORtgFLQqsnmzDHGaDVuFMi5ctYcK2NwUOA=; b=xJgKzdMNR2sest3EW1XdW+VPoZ1R04jOaHQpSJefKPBiT9TBFFpjvkcP6Hd5SGsOQq yT5ZwiHUDnT0uRsDqRTFbqhq6A9vxf0wiZ4BUh9skLn7trXm9SNGWPQ5BJZp3KUcUUpV KzvTn/VLHC1MOef4i1VMLf7aAk/x0QynsBAta0nwsFffL0bK0wBiJF24mjYcOfpxPizU F+9NQUE0SXaII7B6wMRh++332nrfaragdnArGzH/NG8teyspTj1F4C84qch66L1afTpG Trj7mw0GgN7i9t3+CD91g6yN3qTyKDTQMyt3ULonP+N7J/+rs1bMJ25dJAGHc4iUGXE0 mKFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b="h/T0x0Sg"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=BYpXmwnx; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id z2si4492409qtf.343.2019.02.15.14.09.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:27 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b="h/T0x0Sg"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=BYpXmwnx; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id BB191DC6; Fri, 15 Feb 2019 17:09:25 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=GfBeHwr04ORtg FLQqsnmzDHGaDVuFMi5ctYcK2NwUOA=; b=h/T0x0Sgnwu6ItOzFmOHDJZDiM14/ vbHNXuL+4gSJ6PigtN6YNxm1GpjaxWOXwpXf1iShiLEP9fCEJZmBhEUJkblzLZ67 PS1cAIjKw/DeJvmwIdbleMg7sfP82qDR9GoKTXr81fp3v2y+GSRmfz/lRX3zcMnj UMyR7/op89Haa3COGEiziTKkKAUUF5rMYfmDgy9hI0gu0Mw9AMhKkpGOKwF8ajyI +5ZaqbQ7tjXtDYh3bcvXz2SHehNb6wE9iKtxWGzbwt1oUIvQsRINodf1341VO97N VWI3cOLI+qBrAaQwStIf+NaohnTKKY1isq1H2jjq49pmXMeHMF59FZHQw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=GfBeHwr04ORtgFLQqsnmzDHGaDVuFMi5ctYcK2NwUOA=; b=BYpXmwnx TrlidDSTsUGjyMtY5kDWNJ8bCQjZf5W6cEb/tUXA+huVciXtF1tadt/EUcckif9k qK7jrBRhmLDkOwogxn5OSu/4hji7B/gRUvPPzB1cJzZvfb7v47akaaZkIAZcg8/H spVRBd9VjFtSH0XGKqlVZzDB1AjnriIV8f+7RHw3+PBtoCEnEyNuW2RT6140Urx1 kXPXTpP0yBwm0nB29l9Qm2H925HoddwlN/vLPWQCfsJhUx+0eUn6xsFTvO0dkBAF EA9UxypMExSavFefXMhyopv04NDNa6q3pyGtC7Oe1U4UGA/qg6J6c5DjqmZvYzdZ jeSCkvQZHh3MCw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedufe X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id CAB2AE4597; Fri, 15 Feb 2019 17:09:23 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 17/31] mm: thp: split properly PMD-mapped PUD THP to PTE-mapped PUD THP. Date: Fri, 15 Feb 2019 14:08:42 -0800 Message-Id: <20190215220856.29749-18-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Page count increase needs to goto the head of the PUD page. Signed-off-by: Zi Yan --- mm/huge_memory.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5f83f4c5eac7..bbdbc9ae06bf 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3198,7 +3198,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, unsigned long haddr, bool freeze) { struct mm_struct *mm = vma->vm_mm; - struct page *page; + struct page *page, *head; pgtable_t pgtable; pmd_t old_pmd, _pmd; bool young, write, soft_dirty, pmd_migration = false; @@ -3285,7 +3285,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, soft_dirty = pmd_soft_dirty(old_pmd); } VM_BUG_ON_PAGE(!page_count(page), page); - page_ref_add(page, HPAGE_PMD_NR - 1); + head = compound_head(page); + page_ref_add(head, HPAGE_PMD_NR - 1); /* * Withdraw the table only after we mark the pmd entry invalid. From patchwork Fri Feb 15 22:08:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815981 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 361D4139A for ; Fri, 15 Feb 2019 22:10:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 213912E9E6 for ; Fri, 15 Feb 2019 22:10:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 14D022FE5E; Fri, 15 Feb 2019 22:10:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E7DE2FE51 for ; Fri, 15 Feb 2019 22:10:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F6458E0015; Fri, 15 Feb 2019 17:09:30 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 02E108E0014; Fri, 15 Feb 2019 17:09:29 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E371A8E0015; Fri, 15 Feb 2019 17:09:29 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id B82F08E0014 for ; Fri, 15 Feb 2019 17:09:29 -0500 (EST) Received: by mail-qk1-f198.google.com with SMTP id 207so9420689qkl.2 for ; Fri, 15 Feb 2019 14:09:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=Ga8VenLEXsfHIBrTuI99vweGDQOP7ZPGj8CyjpF25ow=; b=fqVuFGz9TMpfNOjGrzdjA0E7HevReRzVsr2AqQ2auCanrZybbeNQQSx7x0nF9ZtkFX au+QcY0WCVH6ifN8ueeYDu9kebVGCc51lUE1zOKTcvTb5OWSrknvxsP8/VtCIOsNEX0e 0MBOkqZJhYBK+9yeJd3HxiD3jyH7ezo+6xnJ98RYkzlLxhFcceTIVpC1/q/ZntOObtGh yxIOKTYwbB3Pc6BpAtvsNihkaD7o28/rGub05wZLLRbBmZYEI0NG0pdGt+rAC0FOzKAb ZoVYeUrpU/nSxlIvD50Qsse7uNxH5RYsci/jpsbCwthKTucaflFbiNPXEHjuUFkpFyVr NKag== X-Gm-Message-State: AHQUAuYlp4cJUTRR3CmfNgA0SAbi++3Nb5gxSx4aIhlj/zlF1/UcDkeN xyEPPatgTpu927Q7aHKVe9IELp6AhMrr0MlpCAhaXoDV2id4x1WlnozhwbQSwBGCwwLFspmq5/i h+JdsQww5jngjRInOeHz7vX9BjkEfPncnHhD3vqIPBtyYV2XNEFr6jcT6DUbClij6PA== X-Received: by 2002:a37:b146:: with SMTP id a67mr8569241qkf.240.1550268569515; Fri, 15 Feb 2019 14:09:29 -0800 (PST) X-Google-Smtp-Source: AHgI3IZ942Wy/5mEdF4QQsJf7J2E2w1Thu/gMnFQIpR0P3967KhBEZxesBXgHq8lWN7bJWZYg1b+ X-Received: by 2002:a37:b146:: with SMTP id a67mr8569211qkf.240.1550268568883; Fri, 15 Feb 2019 14:09:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268568; cv=none; d=google.com; s=arc-20160816; b=tg4PVvuzE9qBNZ0PGZ0m6RnPixCevJfBTTZ+Xg8N7VQeO318voSqspPc9ajx4t9EcM 0+I5epFRdirG2kGZYnyQpcSKRV2P+QYR0T1Yh/FCTclWHL3hR+lYMOuwizwv1fF87Feh IfBuxEYH15RtQg2b2cFEPFVjjDYfnuSHAvbRKeDNG2gkb2Uxf5fp492e5GKl97Ud0tVH ZPPIhSpy3Aym9owtoGTc5xT7j1aWDniDDVp5nFuUUqS3U04K6iWUmZ0rxSVFJv4+rHAG GADhe7FI06MTursYXkn+qkBToPaN2zJErMi85KKUEc726/i4ZQxymYHOtGraLMT8K3Sr TncA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=Ga8VenLEXsfHIBrTuI99vweGDQOP7ZPGj8CyjpF25ow=; b=yWHEGbs2sNLrDU/jiw87+9C32n90qSPESLD8/R1Lgg8xzwYYg0/eEVnwIhl4wcGumV KBdCKap1o6Qimbw3HZiDGn1BIVFdU41qGY9bh8sLsnELZTgFx949k/FG/qMa4Y6d8fzZ v9+8iYmTVQ3e34ApetBmdF0vhGwHGJnsiBu7POTJbl5cqZal622TsX32oXwr7e31/Mpx GXTmYFlEAiG9E6U9Jy43sbTHwYNNIocCQ3lVbmJ8eRfvOTSE+psrr0dTEEA+7KnEqpCS g/pDd/+8xMm2LAA5ijTq48d9/KHizNmU8d2DMS22G/oEhQ+rxBaMBzr72eEQ6ms0a687 QTHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b="M67Fe/Tz"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="bqJPD/Ck"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id y14si1058003qvc.191.2019.02.15.14.09.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:28 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b="M67Fe/Tz"; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="bqJPD/Ck"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 1DF59180C; Fri, 15 Feb 2019 17:09:27 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=Ga8VenLEXsfHI BrTuI99vweGDQOP7ZPGj8CyjpF25ow=; b=M67Fe/TzdU9IqJ5P/Z3Mhb7yl2l9f b6hpBpxfXUuSGtDzySjnkvu1aFuaDoS8vx+0XHnCJ3T2MZy5d+ABEdCSzlWQR9i0 VxfAKXjwyv8JxUfctwau2jllu9BxfD9AuQ7Y3oIA6FG4ITwZ5f+Kq5l5FmbjUH8w wlPWe1ST3ukCkSwJ2lev/yq14axD8W9de1qwWUGuuQQXpmn5k2P8vOsw6LQxnuog 4rsFXBED+TRXLao/2soBUrRpnLhaf6OGfWGiBpA0tStfw/g78kyxH5zShHqogcTJ 6lLDGIM+vFr2xAHocWGO07nRW6V4n+VDWSdrWTN140/zhxnNcMKmSXfCw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=Ga8VenLEXsfHIBrTuI99vweGDQOP7ZPGj8CyjpF25ow=; b=bqJPD/Ck MaLt5xra5OYDZGo7mSEF5nkxyAwIvzjZf56YkILlgES3rS71DMpjQxkBEOIOOpOe 7w8dH/uUBIoUkJDLmtVo9MEgAAzs4nqjW75dw/0ofu0Ej3YxT+eSnpA5T/E7B9tQ KS4gzhJt1rb2k/48VNrhjVwD6VEf44BDxBGukgcbk1lcEHIoP48v3Ly0XYIEV7tG 6Vn/Cd88r374iuweWEaTg6Hq+R+lN/1BYx2BV9n0WaPNLy0y7yIdkt+DVtGWumt7 j6H2rJvlzrCTC9HX/ePyJlwxPELxlgQ7BmQGcMDWcg+JmCE+1pQbV6/UVtQ9i+1G L0QCklN2ObGF3Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedufe X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 2C2A1E4511; Fri, 15 Feb 2019 17:09:25 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 18/31] mm: page_vma_walk: teach it about PMD-mapped PUD THP. Date: Fri, 15 Feb 2019 14:08:43 -0800 Message-Id: <20190215220856.29749-19-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan We now have PMD-mapped PUD THP and PTE-mapped PUD THP, page_vma_walk should handle them properly. Signed-off-by: Zi Yan --- mm/page_vma_mapped.c | 116 ++++++++++++++++++++++++++++++------------- 1 file changed, 82 insertions(+), 34 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index a473553aa9a5..fde47dae0b9c 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -52,6 +52,22 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) return true; } +static bool map_pmd(struct page_vma_mapped_walk *pvmw) +{ + pmd_t pmde; + + pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); + pmde = READ_ONCE(*pvmw->pmd); + if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { + pvmw->ptl = pmd_lock(pvmw->vma->vm_mm, pvmw->pmd); + return true; + } else if (!pmd_present(pmde)) + return false; + + pvmw->ptl = pmd_lock(pvmw->vma->vm_mm, pvmw->pmd); + return true; +} + static inline bool pfn_in_hpage(struct page *hpage, unsigned long pfn) { unsigned long hpage_pfn = page_to_pfn(hpage); @@ -111,6 +127,38 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return pfn_in_hpage(pvmw->page, pfn); } +/* 0: not mapped, 1: pmd_page, 2: pmd */ +static int check_pmd(struct page_vma_mapped_walk *pvmw) +{ + unsigned long pfn; + + if (likely(pmd_trans_huge(*pvmw->pmd))) { + if (pvmw->flags & PVMW_MIGRATION) + return 0; + pfn = pmd_pfn(*pvmw->pmd); + if (!pfn_in_hpage(pvmw->page, pfn)) + return 0; + return 1; + } else if (!pmd_present(*pvmw->pmd)) { + if (thp_migration_supported()) { + if (!(pvmw->flags & PVMW_MIGRATION)) + return 0; + if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { + swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); + + pfn = migration_entry_to_pfn(entry); + if (!pfn_in_hpage(pvmw->page, pfn)) + return 0; + return 1; + } + } + return 0; + } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + return 2; +} /** * page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at * @pvmw->address @@ -142,14 +190,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pgd_t *pgd; p4d_t *p4d; pud_t pude; - pmd_t pmde; + int pmd_res; if (!pvmw->pte && !pvmw->pmd && pvmw->pud) return not_found(pvmw); /* The only possible pmd mapping has been handled on last iteration */ if (pvmw->pmd && !pvmw->pte) - return not_found(pvmw); + goto next_pmd; if (pvmw->pte) goto next_pte; @@ -198,43 +246,43 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } else if (!pud_present(pude)) return false; - pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); - /* - * Make sure the pmd value isn't cached in a register by the - * compiler and used as a stale value after we've observed a - * subsequent update. - */ - pmde = READ_ONCE(*pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { - pvmw->ptl = pmd_lock(mm, pvmw->pmd); - if (likely(pmd_trans_huge(*pvmw->pmd))) { - if (pvmw->flags & PVMW_MIGRATION) - return not_found(pvmw); - if (pmd_page(*pvmw->pmd) != page) - return not_found(pvmw); + if (!map_pmd(pvmw)) + goto next_pmd; + /* pmd locked after map_pmd */ + while (1) { + pmd_res = check_pmd(pvmw); + if (pmd_res == 1) /* pmd_page */ return true; - } else if (!pmd_present(*pvmw->pmd)) { - if (thp_migration_supported()) { - if (!(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { - swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); - - if (migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; + else if (pmd_res == 2) /* pmd entry */ + goto pte_level; +next_pmd: + /* Only PMD-mapped PUD THP has next pmd */ + if (!(PageTransHuge(pvmw->page) && compound_order(pvmw->page) == HPAGE_PUD_ORDER)) + return not_found(pvmw); + do { + pvmw->address += HPAGE_PMD_SIZE; + if (pvmw->address >= pvmw->vma->vm_end || + pvmw->address >= + __vma_address(pvmw->page, pvmw->vma) + + hpage_nr_pages(pvmw->page) * PAGE_SIZE) + return not_found(pvmw); + /* Did we cross page table boundary? */ + if (pvmw->address % PUD_SIZE == 0) { + if (pvmw->ptl) { + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; } + goto restart; + } else { + pvmw->pmd++; } - return not_found(pvmw); - } else { - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; - } - } else if (!pmd_present(pmde)) { - return false; + } while (pmd_none(*pvmw->pmd)); + + if (!pvmw->ptl) + pvmw->ptl = pmd_lock(mm, pvmw->pmd); } +pte_level: if (!map_pte(pvmw)) goto next_pte; while (1) { From patchwork Fri Feb 15 22:08:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815983 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C250213B5 for ; Fri, 15 Feb 2019 22:10:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B08252E9E6 for ; Fri, 15 Feb 2019 22:10:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A4AD32FE5E; Fri, 15 Feb 2019 22:10:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C88A72E9E6 for ; Fri, 15 Feb 2019 22:10:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 81DBC8E0016; Fri, 15 Feb 2019 17:09:31 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7D1128E0014; Fri, 15 Feb 2019 17:09:31 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C1DE8E0016; Fri, 15 Feb 2019 17:09:31 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 3CA858E0014 for ; Fri, 15 Feb 2019 17:09:31 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id a65so9379064qkf.19 for ; Fri, 15 Feb 2019 14:09:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=A2wlhWDTpb+gR/4BLLRTuLLz9Rcq9oerJDarTKQq2Lo=; b=XoHsE2H0+gaILOKdx4l7TJywzWzD0mP6uLXpLqZi660wOJe40kb1C07jzfrAR19rgM B7ebfDeQ4LYgIzO1PsK0coup586dosPeBKaBrg9Nde/cjc9+ZYcDf3MGg6v+Y56XEq2q BhCxmRcVc8J4lKCC2JLLhxfLT5BNL7euD9C8Achemeb+Dcsn+IFe3OmiJy9MKKwb3owp qJrn0IWHgQ6Q+bk416iZ7befHL6xn5D/X70veX3AX33iQlT/ZNKXFd2J780AX69wiCKY vhW/MvbveX0aUHKhWVGzVVkasRTPhEYlMLW2JqLbHFqcG9vISoUSpTHZnR09nc8eNMIF 6L4g== X-Gm-Message-State: AHQUAuZha1CBLhLUCu1Vafsao8qm0BjSoH/jqhuC1Hl64ZgwKkPHuP9g Ga++Kyd8FviaBUDgKTZRVAlH9IeEAx9vMpYnUIwZvCPBDfvTikUok8mP3eOUS/ovSxoRj0S0w99 +8Sofaukdj9vmnVRQnlH6rMkZj1LMpmDHwI82uPwcgGERRcBb9AZ7dbszcr+Xi4cKHA== X-Received: by 2002:aed:3964:: with SMTP id l91mr9441962qte.33.1550268571005; Fri, 15 Feb 2019 14:09:31 -0800 (PST) X-Google-Smtp-Source: AHgI3IYE7FMj6Jl4jrEKEZ0QZhzLTxzyYylXYhdFceoAEqNtQQRbH3R8wI1EfDEusO3CGWNITTXM X-Received: by 2002:aed:3964:: with SMTP id l91mr9441919qte.33.1550268570282; Fri, 15 Feb 2019 14:09:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268570; cv=none; d=google.com; s=arc-20160816; b=ls0CW3Q5LJfFdm5mK41GFQ9Bs1zIZYeZjN8hNCv6ibY6M/XdZ3CbVESXxpnn1a0g1l bN/d1JxbqZyUf/lmrn8K3sDF4yOd1WdTLt8uUGxkuCEzHixKHKAxVAGAo5so54GFGfB5 nBAZWzLc9i7C7H2lF31M/WSH9sE4VkdM5vxgoR2Cm832E3y8x3A6KtMpFnpyPBIzXp1y BmoLZ7S5PM7fghnmEXXEcXPmuKQNdbofBp4k3XcqAIFlwz4Mov+NPktV/trLH4ctIuEn J8Y+IVCYHIFEZUTmCopaIBn1IVaRvdp8bQuyT+Iz3IGwROguu8so0c9Vla2toTwxTD3R 2gWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=A2wlhWDTpb+gR/4BLLRTuLLz9Rcq9oerJDarTKQq2Lo=; b=lzB64bBNwMGCXU2ILKXkEdJ4af+LTgIY85ky4P8zQdmK6MihFFJS5/Sjbab6z9ToFD oFruq5M3wl3qDUvUyaRpETHgloA3QgzTORNNyPJ/bVGe0GU5u5366cfXvNI9hPzrKmNl axFnS2VxDJ/+yBChJR3HzxQmmkNnWbzynOVLvA6zH1aDa+b47H6deb+Lk/1p6Be2FACG M1ziycGdpzvbYXZQxnhnQ1t0FVdAp1pJ4tq6DsG5uZth/44L5JnmJrBX+7W6IBgrF8qC ISRQVzzr6fWt965arP8znNsz1Y0keCklY9MhaJ+JcOt/qOYcoRE5MP6uQ3clBX3RrPwA 4fnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=CdOxNozZ; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=4LPh6hb6; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id m96si484268qte.185.2019.02.15.14.09.30 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:30 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=CdOxNozZ; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=4LPh6hb6; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 786CBDC6; Fri, 15 Feb 2019 17:09:28 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=A2wlhWDTpb+gR /4BLLRTuLLz9Rcq9oerJDarTKQq2Lo=; b=CdOxNozZGNlFNkj+2EJXTHf/tO5oG CASqtGZhC4bOmawHzNTZVXwf2tSGpauaptrZKrnLC89Atk//w1E3YDuhfff5g6Pj irjr7byySDvaEc7fjZqj3ZxqAQhdBqwZbveystsraWZbdOOZT2mHze6r9uuDearm ASz3bT5Mx+nGiBh2cWRSgWWTcsSSgbJRdp0UPLf7shQAnJ7B9Xx62N3G/KuJU/Bg MtSBefB/XkkvTrJz11Uv9YAeLa6TkMyYKVC0Kt66kDXbvN9ITabVdVTGT4i/T+aa sJvnGwzg3AzDRAKYUtZqPb0mj9VdAjNpXUYM/uUuQ9pAP88/Xzdy+IokQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=A2wlhWDTpb+gR/4BLLRTuLLz9Rcq9oerJDarTKQq2Lo=; b=4LPh6hb6 ugawOwv39w9Rk8ETYkQrzYu8WKuoxYvkcfeF77Ld/907CcQXYwI3aHMwvNz7NckF fbb0ISZdAb19hsgT/rC8XVOOz9cUs6KvN58D3SzTkS0fZlJDly3UPpvUYdBlWShp 9HxDcRo6KtFQdKG5XRTrkCViZpdrQ5ahhP0VsxpAZtDMFPJ/j2xNK6paOdMagMiG qDbxvivXBXwHm4z+ihEHhBdll3zhhsxMgWS3DREluy9d88sjN6OwixR7fKGtkImI 5KtKv/FdFSeBq6QblB0Ijux0VwrSlyIXVLp7MPcqvhYZ6T/0/7VHt8NOKXrPa2hG bRC16dKRLEzaxw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeduke X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 7F8C4E4680; Fri, 15 Feb 2019 17:09:26 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 19/31] mm: thp: 1GB THP support in try_to_unmap(). Date: Fri, 15 Feb 2019 14:08:44 -0800 Message-Id: <20190215220856.29749-20-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Unmap different subpages in different sized THPs properly in the try_to_unmap() function. Signed-off-by: Zi Yan --- mm/migrate.c | 2 +- mm/rmap.c | 140 +++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 103 insertions(+), 39 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index f7e5d88210ee..7deb64d75adb 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -223,7 +223,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ - if (!pvmw.pte) { + if (!pvmw.pte && pvmw.pmd) { VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); remove_migration_pmd(&pvmw, new); continue; diff --git a/mm/rmap.c b/mm/rmap.c index 79908cfc518a..39f446a6775d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1031,7 +1031,7 @@ void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma) * __page_set_anon_rmap - set up new anonymous rmap * @page: Page or Hugepage to add to rmap * @vma: VM area to add page to. - * @address: User virtual address of the mapping + * @address: User virtual address of the mapping * @exclusive: the page is exclusively owned by the current process */ static void __page_set_anon_rmap(struct page *page, @@ -1423,7 +1423,9 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, .address = address, }; pte_t pteval; - struct page *subpage; + pmd_t pmdval; + pud_t pudval; + struct page *subpage = NULL; bool ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)arg; @@ -1436,6 +1438,11 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, is_zone_device_page(page) && !is_device_private_page(page)) return true; + if (flags & TTU_SPLIT_HUGE_PUD) { + split_huge_pud_address(vma, address, + flags & TTU_SPLIT_FREEZE, page); + } + if (flags & TTU_SPLIT_HUGE_PMD) { split_huge_pmd_address(vma, address, flags & TTU_SPLIT_FREEZE, page); @@ -1465,7 +1472,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, while (page_vma_mapped_walk(&pvmw)) { #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ - if (!pvmw.pte && (flags & TTU_MIGRATION)) { + if (!pvmw.pte && pvmw.pmd && (flags & TTU_MIGRATION)) { VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); set_pmd_migration_entry(&pvmw, page); @@ -1497,9 +1504,14 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_PAGE(!pvmw.pte, page); - subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + if (pvmw.pte) + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + else if (!pvmw.pte && pvmw.pmd) + subpage = page - page_to_pfn(page) + pmd_pfn(*pvmw.pmd); + else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) + subpage = page - page_to_pfn(page) + pud_pfn(*pvmw.pud); + VM_BUG_ON(!subpage); address = pvmw.address; if (PageHuge(page)) { @@ -1556,16 +1568,26 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } if (!(flags & TTU_IGNORE_ACCESS)) { - if (ptep_clear_flush_young_notify(vma, address, - pvmw.pte)) { - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; + if ((pvmw.pte && + ptep_clear_flush_young_notify(vma, address, pvmw.pte)) || + ((!pvmw.pte && pvmw.pmd) && + pmdp_clear_flush_young_notify(vma, address, pvmw.pmd)) || + ((!pvmw.pte && !pvmw.pmd && pvmw.pud) && + pudp_clear_flush_young_notify(vma, address, pvmw.pud)) + ) { + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; } } /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + if (pvmw.pte) + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + else if (!pvmw.pte && pvmw.pmd) + flush_cache_page(vma, address, pmd_pfn(*pvmw.pmd)); + else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) + flush_cache_page(vma, address, pud_pfn(*pvmw.pud)); if (should_defer_flush(mm, flags)) { /* * We clear the PTE but do not flush so potentially @@ -1575,16 +1597,34 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * transition on a cached TLB entry is written through * and traps if the PTE is unmapped. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (pvmw.pte) { + pteval = ptep_get_and_clear(mm, address, pvmw.pte); + + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else if (!pvmw.pte && pvmw.pmd) { + pmdval = pmdp_huge_get_and_clear(mm, address, pvmw.pmd); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + set_tlb_ubc_flush_pending(mm, pmd_dirty(pmdval)); + } else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) { + pudval = pudp_huge_get_and_clear(mm, address, pvmw.pud); + + set_tlb_ubc_flush_pending(mm, pud_dirty(pudval)); + } } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + if (pvmw.pte) + pteval = ptep_clear_flush(vma, address, pvmw.pte); + else if (!pvmw.pte && pvmw.pmd) + pmdval = pmdp_huge_clear_flush(vma, address, pvmw.pmd); + else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) + pudval = pudp_huge_clear_flush(vma, address, pvmw.pud); } /* Move the dirty bit to the page. Now the pte is gone. */ - if (pte_dirty(pteval)) - set_page_dirty(page); + if ((pvmw.pte && pte_dirty(pteval)) || + ((!pvmw.pte && pvmw.pmd) && pmd_dirty(pmdval)) || + ((!pvmw.pte && !pvmw.pmd && pvmw.pud) && pud_dirty(pudval)) + ) + set_page_dirty(page); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); @@ -1620,33 +1660,57 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } else if (IS_ENABLED(CONFIG_MIGRATION) && (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) { swp_entry_t entry; - pte_t swp_pte; - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } + if (pvmw.pte) { + pte_t swp_pte; - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = make_migration_entry(subpage, - pte_write(pteval)); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; + } + + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + entry = make_migration_entry(subpage, + pte_write(pteval)); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + */ + } else if (!pvmw.pte && pvmw.pmd) { + pmd_t swp_pmd; + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + entry = make_migration_entry(subpage, + pmd_write(pmdval)); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw.pmd, swp_pmd); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + */ + } else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) { + VM_BUG_ON(1); + } } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(subpage) }; pte_t swp_pte; + + VM_BUG_ON(!pvmw.pte); /* * Store the swap location in the pte. * See handle_pte_fault() ... From patchwork Fri Feb 15 22:08:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815985 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CF3F13B5 for ; Fri, 15 Feb 2019 22:10:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BEE22E9E6 for ; Fri, 15 Feb 2019 22:10:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 607962FE5E; Fri, 15 Feb 2019 22:10:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CAD3B2E9E6 for ; Fri, 15 Feb 2019 22:10:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7DB78E0017; Fri, 15 Feb 2019 17:09:32 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2C3B8E0014; Fri, 15 Feb 2019 17:09:32 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CBA88E0017; Fri, 15 Feb 2019 17:09:32 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 733368E0014 for ; Fri, 15 Feb 2019 17:09:32 -0500 (EST) Received: by mail-qt1-f197.google.com with SMTP id m34so10401389qtb.14 for ; Fri, 15 Feb 2019 14:09:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=FiyxBlqHBRuB2dgup3XVTlpHV28/2P36o4TG2N7q4n0=; b=RT0q1RA+Pl3naEnh+8LyIOrOSeHahU+zLshI/oXl9lj0hoV+2wVLkhrxjqAvnidKwV LoGuVxWEfo5utvt6wgT6/5Nc+SymEZuN5Ms02e9KZ05wwS6O0CdJAvLSWO9RROIJyMkt TWxm/n21VA0kGw8L2HnPZilEOnfynRK9lGa9+imeIWTWtrRUzhsj8fjScBwbPXDXh0pg ev5G7sLASAjW7p0t5tkpmMEFNCyyfxyHYWbMpyxcUNP2lGj+efoG0mb/PkwuwqXflHzd QSpKYOJsxgeTxuN9nAAqb4wqATiYhSNaRwC570GjF3I/D4rFkilMBeGxL8qUuclqUfpo FNng== X-Gm-Message-State: AHQUAuazHy9JtAXNtDtFUmoGZqWLLOab/Muq4G1O89J106+JAC/1HMks 96JBDKzrjFMf/reZNqe+w/CLVBVznRhHX8u+017F7NIYR1Ynnz/tECj29vF5YCYASm/bZVCWqq+ oyfr37rTW38nNW7S62WpTpEoSfrDR7yvv3X3emUiqfVKoI+7ESpRI0SYy8HcqwowP9A== X-Received: by 2002:aed:234d:: with SMTP id i13mr9599582qtc.367.1550268572267; Fri, 15 Feb 2019 14:09:32 -0800 (PST) X-Google-Smtp-Source: AHgI3IaHxHhurRpJP/lOIU8cn5hkNTd0Zlkz+fUdVvRHyuaM033gWJrK2rH5IcocAo90m2tGb+JQ X-Received: by 2002:aed:234d:: with SMTP id i13mr9599548qtc.367.1550268571680; Fri, 15 Feb 2019 14:09:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268571; cv=none; d=google.com; s=arc-20160816; b=kwIgeStgZTNZb4J0lcgYHcKDeY5pY/D1/0GIJfAceOpeYK2z8emVLQh6xtDMiYgrHh EOeLlwn9lTEsGydbghxZZ5fjuS4larsttb3VsmYOHOJmHVKFOqfLk3jRTSgoz1vBNZa1 8FnphH6yJkMCM3gcCxSB5g6PNFH3RzmYAW7NkTA9zOILPLm0ePEEviSBGmpuLT6hWp4Q /AkCFtAV7WF9BTANmHfq+IGb044e3Ypo24ccdFfpAuXjo5iD37uoBmSzM0E2AYCt2gnD gF8qlryQZktGeB6WHDpiXLI5r5xVFQOw+rdnTi38q4wFo0bbJpBTujH0ABEJmxWy+dJJ dgnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=FiyxBlqHBRuB2dgup3XVTlpHV28/2P36o4TG2N7q4n0=; b=YMmZA/R8uxPkDarn5Ref8+y9pm3nTflYaptZ5pBh8kY+uSQFcgU+SLR//KWHHHXJEn iz5+A1oDo0AiZwAW3E0pUz565+Q21dnU+0c3v6OpyWEEXjJZLs+aD+aCcmqCyF8CI096 AeJ6mv49CDVG74c6DdgZcryfqEF+jhUjSiZ2SwyZC98BMgnx9yw8E+aOaZ1DxoX3pqb0 tWRMm7qP7uEeDPhJC+e/WG0NkxSLuzBgWhGi6bmI5Rhp8qNnPMbHog9daEk1nqQWDn0h uHq5UGk6KOgIQDcVF3S4IqQadiOTeTX5g3k5/mVuHADWmpZgZ3jLWzpOYkhR/cMvo3tp q0EQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=hFnIT5Aa; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Z2stXGOb; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id o190si4374737qkf.141.2019.02.15.14.09.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:31 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=hFnIT5Aa; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Z2stXGOb; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id E1B5F3016; Fri, 15 Feb 2019 17:09:29 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=FiyxBlqHBRuB2 dgup3XVTlpHV28/2P36o4TG2N7q4n0=; b=hFnIT5Aat3OtrEQ7DS0ABokwDm7No icKjz/BCt4fW+B1LJ53VLZSVtGhx2aG21Ih+xGdu3cBOeP+aNsa6Ltg3S0b/Eifq BAM7u0GnDA/GCY5jdC1FxPfjAudYFw5krumJHia8w1KMtT79XeF4XArd3qLciYug NHUmio3kvhCaYEhvBLET1sr/O2TJ+AYaKglEqFeyv+q8Y3OyBJfs/MmG1s11HvvE Qp9E18dNYbA6G3yvEm2bRBWpGY3C70hpMIiqOK79Rq4Ds8JRrb1qEZnNnb6w7T0i sfpxQteCPfRvQNNveQ6HtH9DJRzc8Ad119wrS8k+hntA535W1Dx2YryCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=FiyxBlqHBRuB2dgup3XVTlpHV28/2P36o4TG2N7q4n0=; b=Z2stXGOb xyQ1GRz0Xeoz+I9aOw0h8VR+Trx0R8H8QeDTCpaSKrOh4qRuSIKzrtbycQvB+zGC ohXuOdMde935KPdEXb30mdfJO8RRbL4FM3h8m6IPZJQ9piyMKj+LBlSXnHkKHIC2 Gah9RCajQGpw5xGxy+lImBmALr/jeGhxTIiwCSnTsgVX5N9XzYSFyqi3D6s0cQg5 KOjDmbtBKjIR1TLHXRFOXoqacLhQLVGRQAErPPUkKuLgCDdlnyspfSUVqqejgI0Q wud5qwmWHG0qNtyIWvcVNUXt/0rm70V8rHobvUFksSdmzi15BFYnxifYNT5hI5rS 7hghfTL5RfIuFQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeduke X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id D59AAE462B; Fri, 15 Feb 2019 17:09:27 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 20/31] mm: thp: split 1GB THPs at page reclaim. Date: Fri, 15 Feb 2019 14:08:45 -0800 Message-Id: <20190215220856.29749-21-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan We cannot swap 1GB THPs, so split them before swap them out. Signed-off-by: Zi Yan --- mm/swap_slots.c | 2 ++ mm/vmscan.c | 55 ++++++++++++++++++++++++++++++++++++------------- 2 files changed, 43 insertions(+), 14 deletions(-) diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 63a7b4563a57..797c804ff905 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -315,6 +315,8 @@ swp_entry_t get_swap_page(struct page *page) entry.val = 0; if (PageTransHuge(page)) { + if (compound_order(page) == HPAGE_PUD_ORDER) + return entry; if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, HPAGE_PMD_NR); goto out; diff --git a/mm/vmscan.c b/mm/vmscan.c index a714c4f800e9..a2a91c1d3dae 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1288,25 +1288,47 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (!(sc->gfp_mask & __GFP_IO)) goto keep_locked; if (PageTransHuge(page)) { - /* cannot split THP, skip it */ - if (!can_split_huge_page(page, NULL)) - goto activate_locked; - /* - * Split pages without a PMD map right - * away. Chances are some or all of the - * tail pages can be freed without IO. - */ - if (!compound_mapcount(page) && - split_huge_page_to_list(page, - page_list)) + if (compound_order(page) == HPAGE_PUD_ORDER) { + /* cannot split THP, skip it */ + if (!can_split_huge_pud_page(page, NULL)) + goto activate_locked; + /* + * Split pages without a PMD map right + * away. Chances are some or all of the + * tail pages can be freed without IO. + */ + if (!compound_mapcount(page) && + split_huge_pud_page_to_list(page, + page_list)) + goto activate_locked; + } + if (compound_order(page) == HPAGE_PMD_ORDER) { + /* cannot split THP, skip it */ + if (!can_split_huge_page(page, NULL)) + goto activate_locked; + /* + * Split pages without a PMD map right + * away. Chances are some or all of the + * tail pages can be freed without IO. + */ + if (!compound_mapcount(page) && + split_huge_page_to_list(page, + page_list)) + goto activate_locked; + } + } + if (compound_order(page) == HPAGE_PUD_ORDER) { + if (split_huge_pud_page_to_list(page, + page_list)) goto activate_locked; } if (!add_to_swap(page)) { if (!PageTransHuge(page)) goto activate_locked; /* Fallback to swap normal pages */ + VM_BUG_ON_PAGE(compound_order(page) != HPAGE_PMD_ORDER, page); if (split_huge_page_to_list(page, - page_list)) + page_list)) goto activate_locked; #ifdef CONFIG_TRANSPARENT_HUGEPAGE count_vm_event(THP_SWPOUT_FALLBACK); @@ -1321,6 +1343,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, mapping = page_mapping(page); } } else if (unlikely(PageTransHuge(page))) { + VM_BUG_ON_PAGE(compound_order(page) != HPAGE_PMD_ORDER, page); /* Split file THP */ if (split_huge_page_to_list(page, page_list)) goto keep_locked; @@ -1333,8 +1356,12 @@ static unsigned long shrink_page_list(struct list_head *page_list, if (page_mapped(page)) { enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; + if (unlikely(PageTransHuge(page))) { + if (compound_order(page) == HPAGE_PMD_ORDER) + flags |= TTU_SPLIT_HUGE_PMD; + else if (compound_order(page) == HPAGE_PUD_ORDER) + flags |= TTU_SPLIT_HUGE_PUD; + } if (!try_to_unmap(page, flags)) { nr_unmap_fail++; goto activate_locked; From patchwork Fri Feb 15 22:08:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815987 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15464139A for ; Fri, 15 Feb 2019 22:10:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0507E2FE51 for ; Fri, 15 Feb 2019 22:10:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EDACC2FE5E; Fri, 15 Feb 2019 22:10:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 76E0F2E9E6 for ; Fri, 15 Feb 2019 22:10:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 513548E0018; Fri, 15 Feb 2019 17:09:34 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 49AA58E0014; Fri, 15 Feb 2019 17:09:34 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B4228E0018; Fri, 15 Feb 2019 17:09:34 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 102FB8E0014 for ; Fri, 15 Feb 2019 17:09:34 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id a11so9325642qkk.10 for ; Fri, 15 Feb 2019 14:09:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=Mql9ve4UDV93VXmSDRn0eYbPSMabzYnGVtBZe6Hc3wQ=; b=bKa3RQR2q4LqfUQvS+vKs/P0BozSLmml/8e9GfZrp6mfs9HF16AQ3WEqLchqoLSTPO A+97+gqVbmL2pjIvM/VckcqeuwwuiGjx2BfLlkuyTx/+OQ9mwEXMoqBg3Ag0L5gQ1U4D QAhxYb0p6dw8cTae/vkJIPjsu4M/9x7mjjEv+9yDWjhCt15fGQLZbStofFHzbKt7d4s1 3OeBJ7GwvQYhEPkyFh8oQ8XuE1lOOnKlGzWcyje/6saL6/clRWM3gONgkUkEwe1a5CwJ EseyBB6ycVc37WIjTmpdlFk4OCpCA8FXQuSYS5GKA5ZxkgrICh42sVMhrALfyL2OG7Kx j89g== X-Gm-Message-State: AHQUAuY1oCT8bI2dv9dqFMWZ4ulNPovqPU7wnK4HxM2SsO9ka9RXiYPS TWnzQjWXZHqR3PKajv4X6FStpYI4dEHaVAPQ79HsPNwcnxeGRihIQKD02Qu1WpSVeVlgYjVYS2K DMKZ8cApTol16JBFIemVZ2u+Mjvr81TBAN2SRS7jcLd8RB3pbSSQvxEGzlAqt6/WAnw== X-Received: by 2002:ac8:100e:: with SMTP id z14mr9290158qti.293.1550268573849; Fri, 15 Feb 2019 14:09:33 -0800 (PST) X-Google-Smtp-Source: AHgI3Ib126vg6fD/AXhdZ+mVkW4wg1kiWU4qZ/rhVMSUUUE4kcvJgTVa2gfFYOAqZRvAHi9r8Xy4 X-Received: by 2002:ac8:100e:: with SMTP id z14mr9290110qti.293.1550268572983; Fri, 15 Feb 2019 14:09:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268572; cv=none; d=google.com; s=arc-20160816; b=foZcI4PIfunR4+K3nTxEDGQramPxmnGFy9ju41JomOcTrzWLLu/8qmEY8H+kub7GvN 4OFB1Z8zl3wahSfhtuBdoH9lpDE2PCta6fjntfRugcV8a8Gj31kA8fYdNv6ToHu85EIv lpq7geM/LJzFhiwBk5pfgm1xJ4MP8eQTyb1GqtRQov4Wx+lSw7Y32uoZPf29K0VpcbPQ 66kTSmtwbuawLX2+bP+AzGa7ToUrdZi4+Yu8f0NfsCk7JgtvWkgWStI8ij5BS2JYcYi0 ngvoQowmjKzVOOnClSEMp+4vklkyILrHrXpWFQrpt5c69UKtAdRQDRis0P4MTWTkHmD0 tHbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=Mql9ve4UDV93VXmSDRn0eYbPSMabzYnGVtBZe6Hc3wQ=; b=nJK7UUokgOV0wJas7peYzFgyFTpZC0DAjz8k+Y+tqPL6DzFgg8BydAogiEMLAa7Jm+ E4yv+d0rKHXASvWC74nZa426EJOC96DzRMsDMr0+WcvX3/Sy+W68QU7GAo3X10tMYFMc emDMgr5zFexYIbH7bDtf2V/jzrgvIzVx/Xb1kZmSPfiAXDJ3pcWF/gDNJpmdR7tDHkAH O5DO1xQUs/83b5dLnZMz8NLoj3e75aTxZ9eGxHxunObPGq7LcLxekoQuV3sNE8xV+BDD gWmQ9iiskZavVLUtCTjDYRhKzlrDvwSZvKNS4VjDa2Mqe8TIgbZxkZpQ0EMv24KzJXRA nCjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=VFUKyEKq; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=L9R6iOLb; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id y19si2766935qki.241.2019.02.15.14.09.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:32 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=VFUKyEKq; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=L9R6iOLb; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 3178C3058; Fri, 15 Feb 2019 17:09:31 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=Mql9ve4UDV93V XmSDRn0eYbPSMabzYnGVtBZe6Hc3wQ=; b=VFUKyEKqzz/0iG/0dF5ekrZ9RwQod gNG5oHcb2NvSIDd+Rl9isNZlF5VgGKlJ4Olj2aicC4M/9Sn+EoR7fV4DQl3SJUd2 DgWLETjIDsPKPR0OUdEJEURAHMG0FexsIKnOZEuuAZtaZ+xk94F2LsZlJwoKiiMV UUR3X1EpJwT2JHKKicMLPPi6+H7Y3yvc2EZZLXndyFSkRwVsMwKGNOnxV6fldG4e DApV/ZDSmQmUpI90aeLEYN8aFquAr0SUpa0Swn8154uPWdbzgcyeXBQnYqf3tE1o kC5+4JN2B8jhE3zm9mTPJ9n7dgz3QEHhYXy4DI9j9Zmn4a5fH3nRfAKdw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=Mql9ve4UDV93VXmSDRn0eYbPSMabzYnGVtBZe6Hc3wQ=; b=L9R6iOLb +1fx/XQVSYC4Jd+HRI9FMFrNQAJ2x8avz1sCrPrCicNu0wDk/XmNq/CojPSXTl6X wmQUTCjl7MEjMlmBz2jvN95txuTU8IxNOVLmOhE9GJdpdevWQnli7p6mG0hudAu4 vaOcgB6DH6XFpX0/6PGv1OM/aDZbokapTVvm/+wjJ8oz4DOvHKBYxpEzCp8SYAso 6Li+nN4pM7dz9276IONc2xX7cfy3hx4fgp7dqL0RT14bIqQNLZxtjxKzxo15MD8t 1r/EZwsP3oCmVrgSCsSYMwkmnr9SiNju/BV2HYNBSi4HVaBwle/5G/hqxWO7Fukd HCEMDHd5Idjofg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeduke X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 37CFEE46AD; Fri, 15 Feb 2019 17:09:29 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 21/31] mm: thp: 1GB zero page shrinker. Date: Fri, 15 Feb 2019 14:08:46 -0800 Message-Id: <20190215220856.29749-22-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Remove 1GB zero page when we are under memory pressure. Signed-off-by: Zi Yan --- mm/huge_memory.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index bbdbc9ae06bf..41adc103ead1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -207,6 +207,32 @@ static struct shrinker huge_zero_page_shrinker = { .seeks = DEFAULT_SEEKS, }; +static unsigned long shrink_huge_pud_zero_page_count(struct shrinker *shrink, + struct shrink_control *sc) +{ + /* we can free zero page only if last reference remains */ + return atomic_read(&huge_pud_zero_refcount) == 1 ? HPAGE_PUD_NR : 0; +} + +static unsigned long shrink_huge_pud_zero_page_scan(struct shrinker *shrink, + struct shrink_control *sc) +{ + if (atomic_cmpxchg(&huge_pud_zero_refcount, 1, 0) == 1) { + struct page *zero_page = xchg(&huge_pud_zero_page, NULL); + BUG_ON(zero_page == NULL); + __free_pages(zero_page, compound_order(zero_page)); + return HPAGE_PUD_NR; + } + + return 0; +} + +static struct shrinker huge_pud_zero_page_shrinker = { + .count_objects = shrink_huge_pud_zero_page_count, + .scan_objects = shrink_huge_pud_zero_page_scan, + .seeks = DEFAULT_SEEKS, +}; + #ifdef CONFIG_SYSFS static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -474,6 +500,9 @@ static int __init hugepage_init(void) err = register_shrinker(&huge_zero_page_shrinker); if (err) goto err_hzp_shrinker; + err = register_shrinker(&huge_pud_zero_page_shrinker); + if (err) + goto err_hpzp_shrinker; err = register_shrinker(&deferred_split_shrinker); if (err) goto err_split_shrinker; @@ -496,6 +525,8 @@ static int __init hugepage_init(void) err_khugepaged: unregister_shrinker(&deferred_split_shrinker); err_split_shrinker: + unregister_shrinker(&huge_pud_zero_page_shrinker); +err_hpzp_shrinker: unregister_shrinker(&huge_zero_page_shrinker); err_hzp_shrinker: khugepaged_destroy(); From patchwork Fri Feb 15 22:08:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815989 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 249A2139A for ; Fri, 15 Feb 2019 22:10:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 12D052E9E6 for ; Fri, 15 Feb 2019 22:10:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 074252FE5E; Fri, 15 Feb 2019 22:10:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 545A32E9E6 for ; Fri, 15 Feb 2019 22:10:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5E948E0019; Fri, 15 Feb 2019 17:09:35 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id E10F28E0014; Fri, 15 Feb 2019 17:09:35 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D006A8E0019; Fri, 15 Feb 2019 17:09:35 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by kanga.kvack.org (Postfix) with ESMTP id 9BAC18E0014 for ; Fri, 15 Feb 2019 17:09:35 -0500 (EST) Received: by mail-qk1-f200.google.com with SMTP id u197so9235372qka.8 for ; Fri, 15 Feb 2019 14:09:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=/AQEq7MFamEX+lef9TgIsa67uAqS6jJC4Tewj7KDxqE=; b=eBuiAciNwjqESUUii+UgeyORE14sZ7buSop+OpRIZLNpDT1We1VXyaC6xAmbeX/y8/ L4K3xO0IvUWz2+Vqu/rgzCtu2OVFCIqBJBciWJsjxIJr20fCH90j50WKCYfNUizltzI8 Qu6zVl7k1Cu0N/im8sEoaYvJORp+b+AMlL1JrVe3tUTO110/CNDbTp9Vl80Uaw5q6tlc SqEyiIjdpQdtKR/sD/AuEwgPxdUOq0nCk4jL58Y8xDPBFevid9yYOipT5MTT/aJCYWK8 VJvvb1Wtn+O2OHwHCajyH6VzaFrGyRvhv7wuSwdNolJ36rnpEzNfL9TQE1z+TH9hD5JM RXsw== X-Gm-Message-State: AHQUAuZPELhE3ZEsZpkr43x2e5s7/2RqzlWnWVV4zZNy13ZNFfv98bxz ssVPNdIMWrXPmu5NRVR9rBMZChcqN2OauZWix/y/JjD6LqzyvgtSpwgPtedFEDdMSXWoKfM1Lpj 62GHTaKvbKqa8v8NNZPqGHDzTBFQTt59LdsUk8zpuiSUAL0Pi3FUCutuaSJNBj+U/Kg== X-Received: by 2002:a37:2241:: with SMTP id i62mr8923698qki.226.1550268575376; Fri, 15 Feb 2019 14:09:35 -0800 (PST) X-Google-Smtp-Source: AHgI3Iape6rE8M05HWCi4niHRtf0NV+PjXwMVWmZxH2RuChmkJ/3nMRcpeIz3AL5EjrD3y87G1St X-Received: by 2002:a37:2241:: with SMTP id i62mr8923654qki.226.1550268574558; Fri, 15 Feb 2019 14:09:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268574; cv=none; d=google.com; s=arc-20160816; b=vG8tCek/RHQ8rPYwQKk0s3mn4HS/FaTuo9Vm8CEYAzqNqZ7hCZoo/mf+p58DMb1pu5 TcV261LE66/t2y+MTZd3xGrvTSzZu8aOzcyOT62CzT20OnlHxA+EQ1NAAI0jtyW2r87G vu3LnyXOP4V0XTfCqgvOs+fMJNHksB4Kqs5z73ToDFxBglV/tWUa1xAfdJ3PnSP6W6bz jgo0/qDeEi8tAa0DKwEFr1d9P+Dn1jdgSvbwqFQygdhxPPdqam21Nt++7cM16HeGlcBd Vv3yfny/4XHK6dhoLPWhdXWk3kBe5YNXRMfch2gIuTmw2Hra0haA18Q0/F3p+fg5AsvM 8VHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=/AQEq7MFamEX+lef9TgIsa67uAqS6jJC4Tewj7KDxqE=; b=i/rcvWm4M9u3u7rXsIB+8lLe6+r0rLMV2w8Jh5acU/wXlUwH1wWFl0GB3Gb2w65N1E y06Z2o5hoqsUjdK+ZWYENmcqWiCenJlbyrIXZGhDa+BrOO5qPsqLGjrxCTyoEAl0EotN Bao81lbBhtqrAZCXP8Fz0bbfeTlPVIP4aIicP4NghHHjz/C8ZK4NogYDwACN2q5axfFS TAsWHpVaAtnW1mYoEekXIU35tge5k6wAkO1I4oF2g3tSyPyj6mPk9D/trXWMjQQIxJIf YP6/ZZCIAsfBFxJk0QUFHi0vrMjMo0je8gFECyS7tFp3/6N0H1C886Jy/Nlpt1Da/rEQ 28AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=nJw98cFu; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=GOpffcTO; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id n63si1410068qva.115.2019.02.15.14.09.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:34 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=nJw98cFu; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=GOpffcTO; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id BC535329D; Fri, 15 Feb 2019 17:09:32 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=/AQEq7MFamEX+ lef9TgIsa67uAqS6jJC4Tewj7KDxqE=; b=nJw98cFuoVjMJxVajB0ZbTFZLpH6k sVVoLvTUAiPL7hd5xY3ESJa3IMPWNrQCNVjU91ojIWudhbT17n/87KEacbDpxu8W jtliE8TiMSzwb5t4nQO0k4M/EEq+HmJbJ1pskDwGW7Zc2IouHjtYAT0vPYSYaf4l DAVR+7pY2Am1WWjJvMmLJwX5FmCqMwtihIhp2UxIefwPqB/M45D9IoEHoN3eE0Xl cgon6S8jKGzL27KQl9GcSQdOhPNFl9BqxkpAyjnM3IsB2fj/Dp6uzOBOauZktyom sFEghzxAP/zQQkx/C2r/6NrDqaUM9DWNFB88/jZ9hW2bIGdUNI3s4AybA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=/AQEq7MFamEX+lef9TgIsa67uAqS6jJC4Tewj7KDxqE=; b=GOpffcTO NWchByon5Enk/TsqQkBi5A4+w4CuF+l75DY7LnedtUcB8nDTnZxl1Ank6j9hjE/z oKJ7wGE6uuj5n9TVzaWmb2C2CDOnmUwSMchVy4rEFVjZszHU6ANu9NZACPrwa3lH 5bozi5ZtjyOuAfOPhAKbOPIyRLF3Kg3DZEOQnDQTXpdR5cp86Hg8XGnZdfqfIrLV lXWCAaIC0Awos3rjugymqMDliKbtDhilxhy6HhGirCKINZPzCpuPv6gJEnF20cZY briSyliEBPO8sBBgSPXST3HE0sTv2Q7rSfToExy6iku3mANeY709e9K4ETAae1VK xNfzOLcCbOAmyQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpeduke X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 93795E4680; Fri, 15 Feb 2019 17:09:30 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 22/31] mm: thp: 1GB THP follow_p*d_page() support. Date: Fri, 15 Feb 2019 14:08:47 -0800 Message-Id: <20190215220856.29749-23-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Add follow_page support for 1GB THPs. Signed-off-by: Zi Yan --- include/linux/huge_mm.h | 11 +++++++ mm/gup.c | 60 ++++++++++++++++++++++++++++++++- mm/huge_memory.c | 73 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 142 insertions(+), 2 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index bd5cc5e65de8..b1acada9ce8c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -20,6 +20,10 @@ extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, extern void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); extern int do_huge_pud_anonymous_page(struct vm_fault *vmf); extern int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud); +extern struct page *follow_trans_huge_pud(struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { @@ -32,6 +36,13 @@ extern int do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) { return VM_FAULT_FALLBACK; } +struct page *follow_trans_huge_pud(struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags) +{ + return NULL; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); diff --git a/mm/gup.c b/mm/gup.c index 05acd7e2eb22..0ad0509b03fc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -348,10 +348,68 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, if (page) return page; } + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + if (likely(!pud_trans_huge(*pud))) { + if (unlikely(pud_bad(*pud))) + return no_page_table(vma, flags); + return follow_pmd_mask(vma, address, pud, flags, ctx); + } + + ptl = pud_lock(mm, pud); + + if (unlikely(!pud_trans_huge(*pud))) { + spin_unlock(ptl); + if (unlikely(pud_bad(*pud))) + return no_page_table(vma, flags); + return follow_pmd_mask(vma, address, pud, flags, ctx); + } + + if (flags & FOLL_SPLIT) { + int ret; + pmd_t *pmd = NULL; + + page = pud_page(*pud); + if (is_huge_zero_page(page)) { + + spin_unlock(ptl); + ret = 0; + split_huge_pud(vma, pud, address); + pmd = pmd_offset(pud, address); + split_huge_pmd(vma, pmd, address); + if (pmd_trans_unstable(pmd)) + ret = -EBUSY; + } else { + get_page(page); + spin_unlock(ptl); + lock_page(page); + ret = split_huge_pud_page(page); + if (!ret) + ret = split_huge_page(page); + else { + unlock_page(page); + put_page(page); + goto out; + } + unlock_page(page); + put_page(page); + if (pud_none(*pud)) + return no_page_table(vma, flags); + pmd = pmd_offset(pud, address); + } +out: + return ret ? ERR_PTR(ret) : + follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + } + page = follow_trans_huge_pud(vma, address, pud, flags); + spin_unlock(ptl); + ctx->page_mask = HPAGE_PUD_NR - 1; + return page; +#else if (unlikely(pud_bad(*pud))) return no_page_table(vma, flags); - return follow_pmd_mask(vma, address, pud, flags, ctx); +#endif } static struct page *follow_p4d_mask(struct vm_area_struct *vma, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 41adc103ead1..191261771452 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1309,6 +1309,77 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, return page; } +/* + * FOLL_FORCE can write to even unwritable pmd's, but only + * after we've gone through a COW cycle and they are dirty. + */ +static inline bool can_follow_write_pud(pud_t pud, unsigned int flags) +{ + return pud_write(pud) || + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pud_dirty(pud)); +} + +struct page *follow_trans_huge_pud(struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags) +{ + struct mm_struct *mm = vma->vm_mm; + struct page *page = NULL; + + assert_spin_locked(pud_lockptr(mm, pud)); + + if (flags & FOLL_WRITE && !can_follow_write_pud(*pud, flags)) + goto out; + + /* Avoid dumping huge zero page */ + if ((flags & FOLL_DUMP) && is_huge_zero_pud(*pud)) + return ERR_PTR(-EFAULT); + + /* Full NUMA hinting faults to serialise migration in fault paths */ + /*&& pud_protnone(*pmd)*/ + if ((flags & FOLL_NUMA)) + goto out; + + page = pud_page(*pud); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + if (flags & FOLL_TOUCH) + touch_pud(vma, addr, pud, flags); + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { + /* + * We don't mlock() pte-mapped THPs. This way we can avoid + * leaking mlocked pages into non-VM_LOCKED VMAs. + * + * For anon THP: + * + * We do the same thing as PMD-level THP. + * + * For file THP: + * + * No support yet. + * + */ + + if (PageAnon(page) && compound_mapcount(page) != 1) + goto skip_mlock; + if (PagePUDDoubleMap(page) || !page->mapping) + goto skip_mlock; + if (!trylock_page(page)) + goto skip_mlock; + lru_add_drain(); + if (page->mapping && !PagePUDDoubleMap(page)) + mlock_vma_page(page); + unlock_page(page); + } +skip_mlock: + page += (addr & ~HPAGE_PUD_MASK) >> PAGE_SHIFT; + VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); + if (flags & FOLL_GET) + get_page(page); + +out: + return page; +} int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma) @@ -1991,7 +2062,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, goto out; page = pmd_page(*pmd); - VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page) && !PMDPageInPUD(page), page); if (flags & FOLL_TOUCH) touch_pmd(vma, addr, pmd, flags); if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { From patchwork Fri Feb 15 22:08:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815991 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C88913B5 for ; Fri, 15 Feb 2019 22:10:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C1F930235 for ; Fri, 15 Feb 2019 22:10:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 804DD3025E; Fri, 15 Feb 2019 22:10:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 07B5E30235 for ; Fri, 15 Feb 2019 22:10:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CDC08E001A; Fri, 15 Feb 2019 17:09:36 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 97C938E0014; Fri, 15 Feb 2019 17:09:36 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81C528E001A; Fri, 15 Feb 2019 17:09:36 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 512DA8E0014 for ; Fri, 15 Feb 2019 17:09:36 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id p5so10415672qtp.3 for ; Fri, 15 Feb 2019 14:09:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=EYbvEGR5T55O/whT3ormGMsc4OjejtdS/ncHWvxXxkQ=; b=MUHQDaEVIPp3/aw4XN9JN+BnKiZl6OceBlytjQdqRv3683E5A+3ytiB+fIlVEwaOtZ Mzzb8dA8JWLj2WeRLGSPGPMP2lG6FnvgaTHC8GdNRaGVisz3tJH8ln8e0T/5eJgmschT JeeycWMkLoMRuGW3H8EsihBGHnx8v85DLR0iZXvKQJ6DDlyzINlqGD/GiNcz527su6HB o72kyXjfeuiuDMKXsujO1cTDRLULj0imbjjHO+aVSSDbcR8LMIsNIKteYWIzPW9qyL/K YVzYzfbPcMXXV2kA3A1dWU4g67JJOFKpHM4pOyZiyefhGKewn7q1IOLkrT3bpOYO8Cjl L+/A== X-Gm-Message-State: AHQUAuYrdqfqjdRUIIXcz4s8Erqa+fZgD/tx/CjBvN+jZibgqsN6AEA+ SPPTvxbGdbBTl78PcrgNovfjb1G2NRAGABHrqoAnWxSm0OU9MOTt6wJUAB+C7DPjsnXEzkGvQrP Eo1dVi+L7/MO3/tCvDUHNPCD6Dsth8t1qPo2+593tOy1PeQCkbf345C262Uw4rNpqtw== X-Received: by 2002:a0c:a326:: with SMTP id u35mr9156737qvu.190.1550268576103; Fri, 15 Feb 2019 14:09:36 -0800 (PST) X-Google-Smtp-Source: AHgI3IbIiJcWOAzt7wT3DgS/fNJGl7//ERbB2c/fXPhrWMm9DVPKw8/T6YaucM8dlwjTxpzdDWhY X-Received: by 2002:a0c:a326:: with SMTP id u35mr9156696qvu.190.1550268575534; Fri, 15 Feb 2019 14:09:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268575; cv=none; d=google.com; s=arc-20160816; b=fSYJHRlnXMMNEaEKx8p7bjFa66uyywRnwECp/OutFwoBLZEJl3mFmtJPT5AZIRa8Hm uWWnwOEusthJhLtVQnoLYU3L4ld5FMzLwtDAdPqxQOBXaFlnpV7MsjLg+kIMn7oSbLBl wE7CEa1zovinXl6rUyx+oFqBG8plJ57pLc19tEt3eXlgYaE3L9/9Q9ixRq5MAgovI27d 5OeA3UFY2yFHRzZndvQiCvbTcduQo55noKlryG5g4TRPT3XvyYZyucQ/uqcbEFAn3VqA cnZN8YD/2mqGHVg8EhOayYkEs5nuYUVBpK7pHsNfyEkxhTPhl42rxmXAepRToUprg1Qa PXDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=EYbvEGR5T55O/whT3ormGMsc4OjejtdS/ncHWvxXxkQ=; b=y7hY6tCmPbCn8z5E+xmLK0q5Tg+/yD3e0B/btj9EF0rCIjMgBnSolfD5SntCQEcQS1 PEsjZPWbW81+3iSJgj4FfFMxFnMSURyTQ6c/ZwlyWuEW+1SRnNHqakiULpbYLbbfMlzc qGVbvXT3GilEhdlHY0Z4qCxBarjHSRm7Rwso0x6Rt21XBYTss0gX7ESrQHofKSJg+ZEk M+TBi1Dyo2sKN/Qn5xMBR8Xfv4d1mj14D9F+5tyIfIAQuqMqm1FuqWc2Pv9MUjuH6rJs 5TwPlpVqPmcYzbgpH83u6dQRGmOOaMvNURIQWneH5xdC0sXZiszhUi4vEJuHLRl2FBP4 IjeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=i82NFi9c; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=a41IDbu3; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id j6si578330qtb.108.2019.02.15.14.09.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:35 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=i82NFi9c; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=a41IDbu3; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id B6AA9310A; Fri, 15 Feb 2019 17:09:33 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=EYbvEGR5T55O/ whT3ormGMsc4OjejtdS/ncHWvxXxkQ=; b=i82NFi9clRWllhWB+1WAQDVTJSxf2 jkZ+OZAH1x/MG1hPo+I91aXGzCqHNZqN/VbppOW4xJYXO4kNZfKmmcVg6gK/K8YM fsd2Eew3vCg8A5AxWQpjCGI2G22ajdn05x7sotOilp56JcFmTWtM+rDWjnCxsOPI mAtnifkzdIbVhOTNOBOTM3e5rOLAJ7g4zykkGNg9Jc/UYSG2BhMq+n106msRfV9i ZIOVP1PIZVEnuj/NwEZFV6XpzaSW5WDH0RdHpc9iAvwRaGnPCWbLkcLWsFtSNqsx eToUv6aDMEdcKSq6T89UMR1bm4VTPcZRPoce57vAp475TeBacI19f1Jgw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=EYbvEGR5T55O/whT3ormGMsc4OjejtdS/ncHWvxXxkQ=; b=a41IDbu3 5RvxGLl/hTeM1wR3EQ9zuSaqUa3N/oLwo1XIObh6/XEIjeq8g8IFvjLlyqals5Sa ckfgVcYysYgWcyp/M++lW2PcvAXKaTHPKVEXYvTEfcmnPU+87hMTXTzkacOFTebi /gmTBScgG9ZTVAaKE29y1CX+rg4dwjekj0vLs9uuO9kfeDV7ASd2gy0n4azIeLKc A1jGIaQFVmk1iXi20m5uRATMbjTLmL1KM0qVAaaMS0Oj3X0Q3K6r2hzrPJv9p7aJ KfP5r7sSURNfkcQrpyxop7Y8d5r2tjMu/jLM9DS64ZFD23CsgRYFGOkZQYY6WVfp hjV+pVn0qBJ72g== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedvvd X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id E5B2AE4511; Fri, 15 Feb 2019 17:09:31 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 23/31] mm: support 1GB THP pagemap support. Date: Fri, 15 Feb 2019 14:08:48 -0800 Message-Id: <20190215220856.29749-24-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Print page flags properly. Signed-off-by: Zi Yan --- fs/proc/task_mmu.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f0ec9edab2f3..ccf8ce760283 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1373,6 +1373,45 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, return err; } +static int pagemap_pud_range(pud_t *pudp, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + struct pagemapread *pm = walk->private; + int err = 0; + u64 flags = 0, frame = 0; + pud_t pud = *pudp; + struct page *page = NULL; + + if (vma->vm_flags & VM_SOFTDIRTY) + flags |= PM_SOFT_DIRTY; + + if (pud_present(pud)) { + page = pud_page(pud); + + flags |= PM_PRESENT; + if (pud_soft_dirty(pud)) + flags |= PM_SOFT_DIRTY; + if (pm->show_pfn) + frame = pud_pfn(pud) + + ((addr & ~PMD_MASK) >> PAGE_SHIFT); + } + + if (page && page_mapcount(page) == 1) + flags |= PM_MMAP_EXCLUSIVE; + + for (; addr != end; addr += PAGE_SIZE) { + pagemap_entry_t pme = make_pme(frame, flags); + + err = add_to_pagemap(addr, &pme, pm); + if (err) + break; + if (pm->show_pfn && (flags & PM_PRESENT)) + frame++; + } + return err; +} + #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, @@ -1479,6 +1518,9 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, if (!pm.buffer) goto out_mm; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + pagemap_walk.pud_entry = pagemap_pud_range; +#endif pagemap_walk.pmd_entry = pagemap_pmd_range; pagemap_walk.pte_hole = pagemap_pte_hole; #ifdef CONFIG_HUGETLB_PAGE From patchwork Fri Feb 15 22:08:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815993 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5544013B5 for ; Fri, 15 Feb 2019 22:10:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42EB830235 for ; Fri, 15 Feb 2019 22:10:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 374713025E; Fri, 15 Feb 2019 22:10:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B1B8730235 for ; Fri, 15 Feb 2019 22:10:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D31998E001B; Fri, 15 Feb 2019 17:09:38 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C668C8E0014; Fri, 15 Feb 2019 17:09:38 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADCC78E001B; Fri, 15 Feb 2019 17:09:38 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 822DD8E0014 for ; Fri, 15 Feb 2019 17:09:38 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id u32so10483781qte.1 for ; Fri, 15 Feb 2019 14:09:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=zmEt0fRNgPgER4hv784eatILoHkzLDvws83cRovtJ3M=; b=jc4+QMps0HId8LHusaVG6FJoR3x02HsDPMC8pXxuXw2nouJ6jp7lqkdB6kpDQsPLYu vc8Yn5rvvpxeEvWZwX71XGN+MojTbsViy12eYNduJTfnot4/NV7ddHKBUQ2qP4omeUqx wQp+MjaFJLf7QVLxRt4zjlg61RYCyANyfivJtTXXF12bFY0hxC/cZGTWYdPikraUhsdo Jfhp+avonLKcE8Eo03qKArPRwlbMdXUkV+3O9ZTGMU9o+i7mFON7mELIRNHrLZU9P8xL trhjSq2QsSOvtrPBWrTyaclg+SEb59lHeG6cMRrjUt4aUJSRfAiGr7VD/W1ULgncgIWY rSpQ== X-Gm-Message-State: AHQUAuZT0lF1/TgkUSZE2qIw9hHVqKm1y+i7iyPNpKg3kcFaedlqCLLW 3/5ht4DM3HhVEwhEx72ayt9uzZ1qQWfkzw29M7uTx9WwnFUUnkxkbfguGpLSZsp3PdYtd57z6fH 1zzh3wLLWH+g1LUej2u+pK2QoTfmsxR741TZ1ZW0ac+I2xAf2koCrOr1tnHbXPbWB2g== X-Received: by 2002:a0c:becd:: with SMTP id f13mr9168592qvj.72.1550268578254; Fri, 15 Feb 2019 14:09:38 -0800 (PST) X-Google-Smtp-Source: AHgI3IYRwjFz5iSAuMn4qX3D63bcwTl2zE14+j/0iGhPNZdTeQSP1856QM9jImdGYKGmP6PzqzRr X-Received: by 2002:a0c:becd:: with SMTP id f13mr9168526qvj.72.1550268577363; Fri, 15 Feb 2019 14:09:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268577; cv=none; d=google.com; s=arc-20160816; b=UlPXXPSUEoKKXzMOY+jKMAWWolzdDUQp+i/n+tQwx30Df0Kl59fyg/N/n1+C+5ijrd fVS8hRDCY2OcyX3P70iVfufZjLnTtjGDbDSE7mVqo9dLgER86eTvr0EgclPo8uSPSHFT L727WzhjUDfp6w7lIRencwMjKmY/EarNQ+eHKApyuM0/42989zD0/x8q6w1UnsOWngv/ KFWKO/uJKKD4ATcnvfxVXvkvhOoBpD0UfklDhBHnIIpXtSCaVbJwhuRL6Ia8dTuz53VH paLP+BbQB/uucJjCRqcXD94ji5XiOJrk/xY9nCT+FrJMH5XCx2FVsFqnbn1ekmcC1ysX s/Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=zmEt0fRNgPgER4hv784eatILoHkzLDvws83cRovtJ3M=; b=XvMeCuToT3qjHVs4j4WthsKjkhb+fS6iRSpFjJvxmzdfvB7l15XWcAkGHf0VMGAKn9 ArKoNtpJN3zSrYXx9Z9WtAx2ln4Tu7Ysf9hbCECP9fqBOBxNQSvMhtcsHwSJZwCCP7Jb xayfqz6GA6gLR96vDWikUnUxP39S4VFdduK77vTqpMCxAO7UtZeFrf7VcNsBagmLd7Ng eHthVfrlCPgNMxSsU7JCFLTwQbLTiq5lr+GkqD/KOeXYmf3IE2nTZHij/3pWqZFmKq2N eIZ9ejGegAKICqGwhBC1zBRS4GPZPNXaDac4b6yfe68NIy9hS8LYqmhdkQYhJbxKrMft D04Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=a2C8a6h+; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=6VerkcaJ; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id 14si3345182qtp.203.2019.02.15.14.09.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:37 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=a2C8a6h+; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=6VerkcaJ; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 941633058; Fri, 15 Feb 2019 17:09:35 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=zmEt0fRNgPgER 4hv784eatILoHkzLDvws83cRovtJ3M=; b=a2C8a6h+OCVnJPyA6br24ZDIMtT/4 HxsvOyPtTmxUhej0NWrj8rt94TobWnKwr8i4Md7CO2HPmwnerMDTFk1o5P8SIfxv HxzjphrsoTN5V81L9+pk5QjxVHROeA3ld4gR49ioThnIHf6bOtwFi3EeiiwyHWP0 TJkbwrkXWupDYaMDi05+Ev+c1owX22MCiNiJ7xegdQArMic8wgRCZrWR0PdeKzmD UklSeXVbM7f8JlUQEp5YNMvRUpsMTIiw/P3ATRs2co8sTX/+QmHPnvUVI4ZPMT0F uZCjzsaR3Yd3liQlOYFC4JyhmH9kPMRK55oiJiif5qJwEuN9j2bZAqvJg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=zmEt0fRNgPgER4hv784eatILoHkzLDvws83cRovtJ3M=; b=6VerkcaJ /F+cDmhrn9NoTNMMQDk3BJFpVT8oJ5+ZTIE4GO2xU68JAo3Kg+zl9e420DccEsjj pTyVBEJm56cXk9LMVVrkQBVd9zqsrZ88uGFgOfNnFS1kY7EU54OUfGrW/zklmEGU WqaCi3omLfW4LCeT25oGxRQKVMXW1mPt4QYzv1/q+X5Ze/KZyYbO0fSUmLXb5SUO TUrt6RU8UvHx1zH1CE3w8eD3ilAkesCO74XHolmnP7dugmWMrAq9cAU8jkNCrP5i s0ZoydUj+27s2X+Gs0iEJeExfBjyxbWg/wDOI/szFXFJTlDzkHNGlq7XY/edWfLe NZOe434CAAzRcw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedvvd X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 4491BE46AD; Fri, 15 Feb 2019 17:09:33 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 24/31] sysctl: add an option to only print the head page virtual address. Date: Fri, 15 Feb 2019 14:08:49 -0800 Message-Id: <20190215220856.29749-25-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan It can help distinguish between PUD-mapped, PMD-mapped THPs, and PTE-mapped THPs. Signed-off-by: Zi Yan --- fs/proc/task_mmu.c | 7 +++++-- kernel/sysctl.c | 11 +++++++++++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ccf8ce760283..5106d5a07576 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -27,6 +27,9 @@ #define SEQ_PUT_DEC(str, val) \ seq_put_decimal_ull_width(m, str, (val) << (PAGE_SHIFT-10), 8) + +int only_print_head_pfn; + void task_mem(struct seq_file *m, struct mm_struct *mm) { unsigned long text, lib, swap, anon, file, shmem; @@ -1308,7 +1311,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, flags |= PM_SOFT_DIRTY; if (pm->show_pfn) frame = pmd_pfn(pmd) + - ((addr & ~PMD_MASK) >> PAGE_SHIFT); + (only_print_head_pfn?0:((addr & ~PMD_MASK) >> PAGE_SHIFT)); } #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION else if (is_swap_pmd(pmd)) { @@ -1394,7 +1397,7 @@ static int pagemap_pud_range(pud_t *pudp, unsigned long addr, unsigned long end, flags |= PM_SOFT_DIRTY; if (pm->show_pfn) frame = pud_pfn(pud) + - ((addr & ~PMD_MASK) >> PAGE_SHIFT); + (only_print_head_pfn?0:((addr & ~PUD_MASK) >> PAGE_SHIFT)); } if (page && page_mapcount(page) == 1) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 6bf0be1af7e0..762535a2c7d1 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -122,6 +122,8 @@ extern int vma_no_repeat_defrag; extern int num_breakout_chunks; extern int defrag_size_threshold; +extern int only_print_head_pfn; + /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR static int sixty = 60; @@ -1750,6 +1752,15 @@ static struct ctl_table vm_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = &zero, }, + { + .procname = "only_print_head_pfn", + .data = &only_print_head_pfn, + .maxlen = sizeof(only_print_head_pfn), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one, + }, { } }; From patchwork Fri Feb 15 22:08:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815995 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 51481139A for ; Fri, 15 Feb 2019 22:10:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4036630235 for ; Fri, 15 Feb 2019 22:10:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 349F03025E; Fri, 15 Feb 2019 22:10:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9320A30235 for ; Fri, 15 Feb 2019 22:10:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 848A18E001C; Fri, 15 Feb 2019 17:09:39 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7F7578E0014; Fri, 15 Feb 2019 17:09:39 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A1318E001C; Fri, 15 Feb 2019 17:09:39 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 34A048E0014 for ; Fri, 15 Feb 2019 17:09:39 -0500 (EST) Received: by mail-qk1-f197.google.com with SMTP id 207so9421048qkl.2 for ; Fri, 15 Feb 2019 14:09:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=wEhGNSHqHzp3fZwWQfZiyRwXqVI8VIbN+Kjw76/hmYo=; b=IEpqWGzckibGioNm5xU23bsX+kxqPBZ5HFKUbR9RxJycOB2t7WPju8ng9T8uoYkJNi ADUZHVQ02OIwKY+rQ5JFVQpjPuXETMPlRk1OjkyCrHKik0aJaugUpdaOneRYR6moLCLt aijqz2NhUFmXrfIdrhIT8vlYFQKJNOZVlN3/D64tS2acaoEUK49PBBAMpGyO0Fyv8ZwX sahkM+pAtZJbPTPBhA5PUdo17C+Z1dcA/V/egJlBGogLuE27IAK6qIPHlWk366OJJ5HI 0EZ0ZFOBju6SyFyD3T/frBkZfCEc38AeLdWWKw6VRc9uHYQnRGrJC1R7Ep1+vzRquXm7 SRUw== X-Gm-Message-State: AHQUAua6/pJmHsul2rPRr5Cw9u+FZ06vgt7O9o3j/OQRBh4BSs3z/07K dZ6E9R2+yUCPPs5edhd/xT7IGCAqk0E3zgnTlE15G4wTnPUxd0+nLDB7isduJ5P5V4tiEAgEUzw dFhuDsDJEQCMsDCpUhXZ4ZRiLcj1vkf0tI27ZhZcCpGVpVlnoBKvQ+f5v+7bzrAfWng== X-Received: by 2002:a37:4c0f:: with SMTP id z15mr8811588qka.180.1550268578981; Fri, 15 Feb 2019 14:09:38 -0800 (PST) X-Google-Smtp-Source: AHgI3IbWh6EFsQF/MhUQuTlQN410Mr7lHLNrwZYCEqH5h2J8Djcln23ikkfHCixBH249ghvEAUf2 X-Received: by 2002:a37:4c0f:: with SMTP id z15mr8811552qka.180.1550268578392; Fri, 15 Feb 2019 14:09:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268578; cv=none; d=google.com; s=arc-20160816; b=QMHdBKOCuv8Mu0Vrb09DxjbHrL3uJBlKyTMm7cuxWZMDE2aJPlLFdxEml+EALV2oJ2 HYmSPhjTn+JKzRqUi6orYl08v8BIy+Eq8TxOX+Sz+QW+aErq0uTEzi8cwSpT5L8V9RXD HEdHySlLD9mzytIq9tKdamlp4l8j+tqovA0NRah9M6Sg/iCWUL6w503aAxOPgkv/WDuH W/fYA4+f558pPIWxmMwRvxKzkpblKdheoIbgt5l6yi5eMdNuXZzMiHddnQOhFjbhZgti t1jy4Ax0/75I9msay/VK/HqE/P874uR2GUznBsqcSQUyuOj7xrHtZwveDjhaNMpIIWyR tR/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=wEhGNSHqHzp3fZwWQfZiyRwXqVI8VIbN+Kjw76/hmYo=; b=Zpvkp/Hi/i8hMvXPTqz/3VjmE5rk4KNby3QgMD+OnlXdjSLtggOPdRGxMSoDPa1typ Gk6uP68Xihv+Z+jUeX/DzP/WMgzdvvb1lRezvLecuRJsiiiGtkd3f7Yzl+uzslrJMsUq PW96S3CgaOCw5jZmNGe90ShrFhG/cJZ4GgzWszQ3GGUk0gCAAKrpls1H2bLITKMnbOsi gg4xOaLu0tj2RLbfdmDX5NSscVoIXW0P3sz/U4mbji7jRGhvyxSgLwaka5OvGTmEPKyO yY0pIwLb/bwvJgXYH4OU6k7N5S1Urb5Na5ChnWOPT2wMQVQp0DrknCT5hQT/8nfB29S5 MAEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=D2kj6qCW; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=ffAj6nKR; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id b203si1317806qka.144.2019.02.15.14.09.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:38 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=D2kj6qCW; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=ffAj6nKR; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 98633310A; Fri, 15 Feb 2019 17:09:36 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=wEhGNSHqHzp3f ZwWQfZiyRwXqVI8VIbN+Kjw76/hmYo=; b=D2kj6qCWlFHmfotrf7Ke/LUEg7yzF ccqkfGEBNm0+EFFa2BDH/ySxrvGb/NiberlTcmtesFpTwjwSX2IUByvzblQC9tV3 YaxcXdZHTIXlk8X78XfJvT9oaqB3zhzIEFiwsPcNsu9mvDzBz2u4LCzXLPQ5eYRL EEBPGf2UpUtRVzYD0aZVrqq4dhb9xAqFo0VueTxjXsyaDxrji9/dqGPMb1b8H35s b5hEFGa0vo/18cMkaS612oUF0lIS8/VcYCZmi0aQKo1D4FBbSRANloT0DSTKJvkW s1dQYQho3lzL1LloYPswExDB/KOBIua9YUgMDj/xKAEZaP9XDxqfG+K3A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=wEhGNSHqHzp3fZwWQfZiyRwXqVI8VIbN+Kjw76/hmYo=; b=ffAj6nKR isuG/XWvtnwhlqGNCVF/7dfvaD3uRkyHj3qBgmMFVcgbFJ4mjsVhgdnP3qaUSn4S jQpEbyHMq6z5+DfnMRdW7XoYeqtkecJaYBqduQnzb1H5lANGGuArF4zQI5ehh8Du MxVzLbv6VcPGC9ZNvyAi+k44ayAVaVbeqsKXoI4wOx1ntnVXWRmLsyCkFhY3ZWVm kFBzy9E3KyzsJWwAe4FrqV+RyqeqyaPDphpnbaopgyAURNg7y+VOvcajVk+TeWoF W+gtlXt5fTLFizyrJn9acCdALc41v/sbuk0LMn2VDWpj6w277EnLUZCk06BKjGIa p8EMmDiC/sMxag== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedvvd X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id AAF68E4597; Fri, 15 Feb 2019 17:09:34 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 25/31] mm: thp: add a knob to enable/disable 1GB THPs. Date: Fri, 15 Feb 2019 14:08:50 -0800 Message-Id: <20190215220856.29749-26-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan It does not affect existing 1GB THPs. It is similar to the knob for 2MB THPs. Signed-off-by: Zi Yan --- include/linux/huge_mm.h | 14 ++++++++++++++ mm/huge_memory.c | 42 ++++++++++++++++++++++++++++++++++++++++- mm/memory.c | 2 +- 3 files changed, 56 insertions(+), 2 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b1acada9ce8c..687c7d59df8b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -84,6 +84,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_PUD_HUGEPAGE_FLAG, + TRANSPARENT_PUD_HUGEPAGE_REQ_MADV_FLAG, }; struct kobject; @@ -146,6 +148,18 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) } bool transparent_hugepage_enabled(struct vm_area_struct *vma); +static inline bool transparent_pud_hugepage_enabled(struct vm_area_struct *vma) +{ + if (transparent_hugepage_enabled(vma)) { + if (transparent_hugepage_flags & (1 << TRANSPARENT_PUD_HUGEPAGE_FLAG)) + return true; + if (transparent_hugepage_flags & + (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + } + + return false; +} #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 191261771452..fa3e12b17621 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -50,9 +50,11 @@ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS (1<ptl); alloc: - if (transparent_hugepage_enabled(vma) && + if (transparent_pud_hugepage_enabled(vma) && !transparent_hugepage_debug_cow()) { huge_gfp = alloc_hugepage_direct_gfpmask(vma); new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PUD_ORDER); diff --git a/mm/memory.c b/mm/memory.c index c875cc1a2600..5b8ad19cc439 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3859,7 +3859,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, vmf.pud = pud_alloc(mm, p4d, address); if (!vmf.pud) return VM_FAULT_OOM; - if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) { + if (pud_none(*vmf.pud) && transparent_pud_hugepage_enabled(vma)) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; From patchwork Fri Feb 15 22:08:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815997 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E51D13B5 for ; Fri, 15 Feb 2019 22:10:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EB7BE30235 for ; Fri, 15 Feb 2019 22:10:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB2B03025E; Fri, 15 Feb 2019 22:10:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BDEEF30235 for ; Fri, 15 Feb 2019 22:10:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 95C2D8E001D; Fri, 15 Feb 2019 17:09:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8E20D8E0014; Fri, 15 Feb 2019 17:09:41 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D9D68E001D; Fri, 15 Feb 2019 17:09:41 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 4CA368E0014 for ; Fri, 15 Feb 2019 17:09:41 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id 35so10451874qtq.5 for ; Fri, 15 Feb 2019 14:09:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=HOulFDNFYu4K6/JzIStY0Ib6K58LTjkyep/ybBEhbPw=; b=VJE7ffjnPzqbQuh6+P1o8dGFCo9OdfU4ZGwoVyM0nZaWzXMGkrj+8YyvFMiy82Cw6P V6JSzLsjWaYFny0j1A+27EIPvvJEnr16kOiUlh1RuqCvdfdsKiQYFYhdVkz+Wq/0q4Kl QyVJ/JhFqNZz8FsdRo9ddrUxtum1fruiUMoTkwFin5cRwuBdT8gmIKBBqWF/cAiiNw5f 1QILTSR0LFZ2fi1r3b3Dp6IK3VcJqU6VIDyfQ5TkyaP9hVBq5Sjqx2CU6IRgA0NuY4Ot k2I9gBhqXjy5ULZW7koZE8Re1dDev8VXZcgrox7ajtjz55+YYrvMkRKBGHm3+5wOpbie lC/w== X-Gm-Message-State: AHQUAubqAfIeDCZp0xCuZQ+uJ1Roi4wYbHLB7Oa+lTZ/Tj0TS/wZnIul RKSMmqP+/imAfprAm4g/p/Aex7ESIiF7vHtFxEhxWbbf3pg8lsWFjUPcyUmXVLRwcy3mjM2yskU GGxqMj8JVLTR9dMKgYnlZD21z9U/gcaJtDSREowvdRrNq3BwD04YJl/BG1CE7zLD8Ww== X-Received: by 2002:a37:74a:: with SMTP id 71mr8820842qkh.47.1550268580997; Fri, 15 Feb 2019 14:09:40 -0800 (PST) X-Google-Smtp-Source: AHgI3IYoGX7OnfB4dmZJlDT+JqBqrPkfsNR7jWRxJU14u5syACLOKowEotrySRAa+6EQESd0r6AA X-Received: by 2002:a37:74a:: with SMTP id 71mr8820783qkh.47.1550268579857; Fri, 15 Feb 2019 14:09:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268579; cv=none; d=google.com; s=arc-20160816; b=sDjnqCpdIi03Ja41OVaIGuTR9j+6ttywSDPhzxJFWEHtvVsc100Br5umKEm+349H/R RULQTNVEo+/ordr94Ci7L3aLbpsZWBR48cNV2lTGVHkqSicyiWBSplfxBdnlk6bBeXUR ivDrkactgPsn44C72wP4VODR8ocsUh9YBQ/0qXOqxoN/U/pbL9C94Gzam6EejBqO0Tnh J6M3cbSCed7sGJtxLiLgfqD0eviX7jctiScZn2+Ujk8W7aiC8Ork1zhlAoEnUTy5y+qX 3YcFNR8sDswLVx89PbpwnuhqJCsKa8zHCp6GYw9ohsl9z01eWkLmgWx/uDJPzSXhqafC fhkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=HOulFDNFYu4K6/JzIStY0Ib6K58LTjkyep/ybBEhbPw=; b=ZuBAa5l1M2DZ/udBgVgFq1c1032PLXXzrD/Nq/D4xt/9mKB1oaylnyYYCH57wWFS5z hrwRGaUUkLShZCMkb+UuCaVoN/OZ0QqiIG4j2QZL8IIsMMZPAz5XwAx3gh9q8+QKPWoi fUWo2GhPZw47QFuWocfEd2n4ntR/jmGGLFzj6lBB5mN8igZ7K1+eodA5jqExiZ8jLSgd IC23XrBFWxknIZiLCnpqgot8s2Wo+QzsNZM653Jf06feZMFujA+dgL8Esn3N9GV5e8E9 puYOsLMpXtYfIrllZJSOykKbhkPrdegNWlK/0Wt33Kh9xum9Fs7t7j5WxwXEGxVh5kM+ y6cA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=Kq3tLwQ+; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=oeod5Wkj; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id w12si395424qka.209.2019.02.15.14.09.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:39 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=Kq3tLwQ+; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=oeod5Wkj; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 198B3329F; Fri, 15 Feb 2019 17:09:38 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=HOulFDNFYu4K6 /JzIStY0Ib6K58LTjkyep/ybBEhbPw=; b=Kq3tLwQ+1p7rOgUvlAEzqr3U0nUTD V47unYDYG/pH+wCKTQm+r4gTnVbWkbXGVbGacKIO2ayKnO+UQDd09Fk/67pv2CTv ig2dDSNEpwEOCh+6c1++xfQa/oQCtbn0iazNiG0Lz6TgfJ6ZgJgV4x5W+d2OwVfG uQQIy8fnn2hYF4yU1MK4nXneQmQexZpPKaCrzap527CJZ8qZNNQFHq/fSdSF4yUT prxVcKmq6VQNEM6TSLgiYmhVRvUna0UxRGyeBEk/sTGU3W2LkekY2cAYA1HzXu2+ EMrXcz/ywtau3hV9xwiXFwW+t5OaY5OlSGcT2MAxyT1RfJjKB8p+NlKQw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=HOulFDNFYu4K6/JzIStY0Ib6K58LTjkyep/ybBEhbPw=; b=oeod5Wkj njyO7NIpXsxKStu7/by8XCTq3ciJjewsXPcIuFpDfQtgC6A1LsvahP/tN6okDKDX A552YC9glj/afabIa3rUWehu1Jx9Iytfo8GCFs5VMJdWYOWEe/coulMTfL7EydEg OYLIHNG3nYaYxbNSFUNrLPx9laWyxrKMaLeHIpp/dhVB0xf+Cn8i9VdJo85M0dud n1EZ5CynuS+DWfarDdyQXrovJfCDxVLTi9DXFBXg8aFhturGVu8pwH3AMBfARCTC ZaLK8/zWEBX940rYEAJ641xvbPlxuHSA91HGZnprPVfGt93dDJ5fADSV9KP7G57o 5b+A28WigEOa8A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedvvd X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 19DF4E46AB; Fri, 15 Feb 2019 17:09:36 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 26/31] mm: thp: promote PTE-mapped THP to PMD-mapped THP. Date: Fri, 15 Feb 2019 14:08:51 -0800 Message-Id: <20190215220856.29749-27-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan First promote 512 base pages to a PTE-mapped THP, then promote the PTE-mapped THP to a PMD-mapped THP. Signed-off-by: Zi Yan --- include/linux/khugepaged.h | 1 + mm/filemap.c | 8 + mm/huge_memory.c | 419 +++++++++++++++++++++++++++++++++++++ mm/internal.h | 6 + mm/khugepaged.c | 2 +- 5 files changed, 435 insertions(+), 1 deletion(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 082d1d2a5216..675c5ee99698 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -55,6 +55,7 @@ static inline int khugepaged_enter(struct vm_area_struct *vma, return -ENOMEM; return 0; } +void release_pte_pages(pte_t *pte, pte_t *_pte); #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline int khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm) { diff --git a/mm/filemap.c b/mm/filemap.c index 9f5e323e883e..54babad945ad 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1236,6 +1236,14 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem #endif +void __unlock_page(struct page *page) +{ + BUILD_BUG_ON(PG_waiters != 7); + VM_BUG_ON_PAGE(!PageLocked(page), page); + if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags)) + wake_up_page_bit(page, PG_locked); +} + /** * unlock_page - unlock a locked page * @page: the page diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fa3e12b17621..f856f7e39095 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4284,3 +4284,422 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) update_mmu_cache_pmd(vma, address, pvmw->pmd); } #endif + +/* promote HPAGE_PMD_SIZE range into a PMD map. + * mmap_sem needs to be down_write. + */ +int promote_huge_pmd_address(struct vm_area_struct *vma, unsigned long haddr) +{ + struct mm_struct *mm = vma->vm_mm; + pmd_t *pmd, _pmd; + pte_t *pte, *_pte; + spinlock_t *pmd_ptl, *pte_ptl; + struct mmu_notifier_range range; + pgtable_t pgtable; + struct page *page, *head; + unsigned long address = haddr; + int ret = -EBUSY; + + VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); + + if (haddr < vma->vm_start || (haddr + HPAGE_PMD_SIZE) > vma->vm_end) + return -EINVAL; + + pmd = mm_find_pmd(mm, haddr); + if (!pmd || pmd_trans_huge(*pmd)) + goto out; + + anon_vma_lock_write(vma->anon_vma); + + pte = pte_offset_map(pmd, haddr); + pte_ptl = pte_lockptr(mm, pmd); + + head = page = vm_normal_page(vma, haddr, *pte); + if (!page || !PageTransCompound(page)) + goto out_unlock; + VM_BUG_ON(page != compound_head(page)); + lock_page(head); + + mmu_notifier_range_init(&range, mm, haddr, haddr + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */ + /* + * After this gup_fast can't run anymore. This also removes + * any huge TLB entry from the CPU so we won't allow + * huge and small TLB entries for the same virtual address + * to avoid the risk of CPU bugs in that area. + */ + + _pmd = pmdp_collapse_flush(vma, haddr, pmd); + spin_unlock(pmd_ptl); + mmu_notifier_invalidate_range_end(&range); + + /* remove ptes */ + for (_pte = pte; _pte < pte + HPAGE_PMD_NR; + _pte++, page++, address += PAGE_SIZE) { + pte_t pteval = *_pte; + + if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { + pr_err("pte none or zero pfn during pmd promotion\n"); + if (is_zero_pfn(pte_pfn(pteval))) { + /* + * ptl mostly unnecessary. + */ + spin_lock(pte_ptl); + /* + * paravirt calls inside pte_clear here are + * superfluous. + */ + pte_clear(vma->vm_mm, address, _pte); + spin_unlock(pte_ptl); + } + } else { + /* + * ptl mostly unnecessary, but preempt has to + * be disabled to update the per-cpu stats + * inside page_remove_rmap(). + */ + spin_lock(pte_ptl); + /* + * paravirt calls inside pte_clear here are + * superfluous. + */ + pte_clear(vma->vm_mm, address, _pte); + atomic_dec(&page->_mapcount); + /*page_remove_rmap(page, false, 0);*/ + if (atomic_read(&page->_mapcount) > -1) { + SetPageDoubleMap(head); + pr_info("page double mapped"); + } + spin_unlock(pte_ptl); + } + } + page_ref_sub(head, HPAGE_PMD_NR - 1); + + pte_unmap(pte); + pgtable = pmd_pgtable(_pmd); + + _pmd = mk_huge_pmd(head, vma->vm_page_prot); + _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); + + /* + * spin_lock() below is not the equivalent of smp_wmb(), so + * this is needed to avoid the copy_huge_page writes to become + * visible after the set_pmd_at() write. + */ + smp_wmb(); + + spin_lock(pmd_ptl); + VM_BUG_ON(!pmd_none(*pmd)); + atomic_inc(compound_mapcount_ptr(head)); + __inc_node_page_state(head, NR_ANON_THPS); + pgtable_trans_huge_deposit(mm, pmd, pgtable); + set_pmd_at(mm, haddr, pmd, _pmd); + update_mmu_cache_pmd(vma, haddr, pmd); + spin_unlock(pmd_ptl); + unlock_page(head); + ret = 0; + +out_unlock: + anon_vma_unlock_write(vma->anon_vma); +out: + return ret; +} + +/* Racy check whether the huge page can be split */ +static bool can_promote_huge_page(struct page *page) +{ + int extra_pins; + + /* Additional pins from radix tree */ + if (PageAnon(page)) + extra_pins = PageSwapCache(page) ? 1 : 0; + else + return false; + if (PageSwapCache(page)) + return false; + if (PageWriteback(page)) + return false; + return total_mapcount(page) == page_count(page) - extra_pins - 1; +} + +/* write a __promote_huge_page_isolate(struct vm_area_struct *vma, + * unsigned long address, pte_t *pte) to isolate all subpages into a list, + * then call promote_list_to_huge_page() to promote in-place + */ + +static int __promote_huge_page_isolate(struct vm_area_struct *vma, + unsigned long haddr, pte_t *pte, + struct page **head, struct list_head *subpage_list) +{ + struct page *page = NULL; + pte_t *_pte; + bool writable = false; + unsigned long address = haddr; + + *head = NULL; + lru_add_drain(); + for (_pte = pte; _pte < pte+HPAGE_PMD_NR; + _pte++, address += PAGE_SIZE) { + pte_t pteval = *_pte; + + if (pte_none(pteval) || (pte_present(pteval) && + is_zero_pfn(pte_pfn(pteval)))) + goto out; + if (!pte_present(pteval)) + goto out; + page = vm_normal_page(vma, address, pteval); + if (unlikely(!page)) + goto out; + + if (address == haddr) { + *head = page; + if (page_to_pfn(page) & ((1<lru, subpage_list); + VM_BUG_ON_PAGE(!PageLocked(p), p); + } + return 1; + } else { + /*result = SCAN_PAGE_RO;*/ + } + +out: + release_pte_pages(pte, _pte); + return 0; +} + +/* + * This function promotes normal pages into a huge page. @list point to all + * subpages of huge page to promote, @head point to the head page. + * + * Only caller must hold pin on the pages on @list, otherwise promotion + * fails with -EBUSY. All subpages must be locked. + * + * Both head page and tail pages will inherit mapping, flags, and so on from + * the hugepage. + * + * GUP pin and PG_locked transferred to @page. * + * + * Returns 0 if the hugepage is promoted successfully. + * Returns -EBUSY if any subpage is pinned or if anon_vma disappeared from + * under us. + */ +int promote_list_to_huge_page(struct page *head, struct list_head *list) +{ + struct anon_vma *anon_vma = NULL; + int ret = 0; + DECLARE_BITMAP(subpage_bitmap, HPAGE_PMD_NR); + struct page *subpage; + int i; + + /* no file-backed page support yet */ + if (PageAnon(head)) { + /* + * The caller does not necessarily hold an mmap_sem that would + * prevent the anon_vma disappearing so we first we take a + * reference to it and then lock the anon_vma for write. This + * is similar to page_lock_anon_vma_read except the write lock + * is taken to serialise against parallel split or collapse + * operations. + */ + anon_vma = page_get_anon_vma(head); + if (!anon_vma) { + ret = -EBUSY; + goto out; + } + anon_vma_lock_write(anon_vma); + } else + return -EBUSY; + + /* Racy check each subpage to see if any has extra pin */ + list_for_each_entry(subpage, list, lru) { + if (can_promote_huge_page(subpage)) + bitmap_set(subpage_bitmap, subpage - head, 1); + } + /* Proceed only if none of subpages has extra pin. */ + if (!bitmap_full(subpage_bitmap, HPAGE_PMD_NR)) { + ret = -EBUSY; + goto out_unlock; + } + + list_for_each_entry(subpage, list, lru) { + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | + TTU_RMAP_LOCKED; + bool unmap_success; + + if (PageAnon(subpage)) + ttu_flags |= TTU_SPLIT_FREEZE; + + unmap_success = try_to_unmap(subpage, ttu_flags); + VM_BUG_ON_PAGE(!unmap_success, subpage); + } + + /* Take care of migration wait list: + * make compound page first, since it is impossible to move waiting + * process from subpage queues to the head page queue. + */ + set_compound_page_dtor(head, COMPOUND_PAGE_DTOR); + set_compound_order(head, HPAGE_PMD_ORDER); + __SetPageHead(head); + for (i = 1; i < HPAGE_PMD_NR; i++) { + struct page *p = head + i; + + p->index = 0; + p->mapping = TAIL_MAPPING; + p->mem_cgroup = NULL; + ClearPageActive(p); + /* move subpage refcount to head page */ + page_ref_add(head, page_count(p) - 1); + set_page_count(p, 0); + set_compound_head(p, head); + } + atomic_set(compound_mapcount_ptr(head), -1); + prep_transhuge_page(head); + + remap_page(head); + + if (!mem_cgroup_disabled()) + mod_memcg_state(head->mem_cgroup, MEMCG_RSS_HUGE, HPAGE_PMD_NR); + + for (i = 1; i < HPAGE_PMD_NR; i++) { + struct page *subpage = head + i; + __unlock_page(subpage); + } + + INIT_LIST_HEAD(&head->lru); + unlock_page(head); + putback_lru_page(head); + + mod_node_page_state(page_pgdat(head), + NR_ISOLATED_ANON + page_is_file_cache(head), -HPAGE_PMD_NR); +out_unlock: + if (anon_vma) { + anon_vma_unlock_write(anon_vma); + put_anon_vma(anon_vma); + } +out: + return ret; +} + +static int promote_huge_page_isolate(struct vm_area_struct *vma, + unsigned long haddr, + struct page **head, struct list_head *subpage_list) +{ + struct mm_struct *mm = vma->vm_mm; + pmd_t *pmd; + pte_t *pte; + spinlock_t *pte_ptl; + int ret = -EBUSY; + + pmd = mm_find_pmd(mm, haddr); + if (!pmd || pmd_trans_huge(*pmd)) + goto out; + + anon_vma_lock_write(vma->anon_vma); + + pte = pte_offset_map(pmd, haddr); + pte_ptl = pte_lockptr(mm, pmd); + + spin_lock(pte_ptl); + ret = __promote_huge_page_isolate(vma, haddr, pte, head, subpage_list); + spin_unlock(pte_ptl); + + if (unlikely(!ret)) { + pte_unmap(pte); + ret = -EBUSY; + goto out_unlock; + } + ret = 0; + /* + * All pages are isolated and locked so anon_vma rmap + * can't run anymore. + */ +out_unlock: + anon_vma_unlock_write(vma->anon_vma); +out: + return ret; +} + +/* assume mmap_sem is down_write, wrapper for madvise */ +int promote_huge_page_address(struct vm_area_struct *vma, unsigned long haddr) +{ + LIST_HEAD(subpage_list); + struct page *head; + + if (haddr & (HPAGE_PMD_SIZE - 1)) + return -EINVAL; + + if (haddr < vma->vm_start || (haddr + HPAGE_PMD_SIZE) > vma->vm_end) + return -EINVAL; + + if (promote_huge_page_isolate(vma, haddr, &head, &subpage_list)) + return -EBUSY; + + return promote_list_to_huge_page(head, &subpage_list); +} diff --git a/mm/internal.h b/mm/internal.h index 70a6ef603e5b..c5e5a0f1cc58 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -581,4 +581,10 @@ int expand_free_page(struct zone *zone, struct page *buddy_head, void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags); +void __unlock_page(struct page *page); + +int promote_huge_pmd_address(struct vm_area_struct *vma, unsigned long haddr); + +int promote_huge_page_address(struct vm_area_struct *vma, unsigned long haddr); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 3acfddcba714..ff059353ebc3 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -508,7 +508,7 @@ static void release_pte_page(struct page *page) putback_lru_page(page); } -static void release_pte_pages(pte_t *pte, pte_t *_pte) +void release_pte_pages(pte_t *pte, pte_t *_pte) { while (--_pte >= pte) { pte_t pteval = *_pte; From patchwork Fri Feb 15 22:08:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10815999 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2784613B5 for ; Fri, 15 Feb 2019 22:10:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1388F30235 for ; Fri, 15 Feb 2019 22:10:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 06B213025E; Fri, 15 Feb 2019 22:10:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E7DD30235 for ; Fri, 15 Feb 2019 22:10:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E14798E001E; Fri, 15 Feb 2019 17:09:42 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DC40F8E0014; Fri, 15 Feb 2019 17:09:42 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C65738E001E; Fri, 15 Feb 2019 17:09:42 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by kanga.kvack.org (Postfix) with ESMTP id 95C938E0014 for ; Fri, 15 Feb 2019 17:09:42 -0500 (EST) Received: by mail-qt1-f199.google.com with SMTP id d13so10343821qth.6 for ; Fri, 15 Feb 2019 14:09:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=Dbtcm1k9n8crLQHYsC2HtO/EHp8WfiZYUCKxILxdM4M=; b=QhNcg9W6nnWrJfKdNHiy2eaAqFrMqimZT3uZ2mB3MqKWSCOTevHIKW32eMiK+KOn8d /oacMFO4kjwxcyTFwdF2Fejs1XXnRN7gh7EvwqFpfAZ3FRrXz+R2HRaCptCC6aTC5zMl du3qbkWlvsbs9MxfzhyghHeB0aS6E5v/2ydHr+kgHMWNz4Ttc7c32JLZgka6pNT2wc1l IZ515OOzNAga9NcW9cnId3RnTRw+Ig0hj00XCadOssFeQflN79hZifpeQ0wTbLHcDAvh s7RfU5fslCv0TXIuHnVCyib488jlpaS/0qMvi2Q7T25gKBJZuE6+uOlU7Ejdnup+Ykj8 8q0g== X-Gm-Message-State: AHQUAuZ+j8XAkeT6i5P+fEVzD0jTGj+IUBIAKevpx3pVLsG9sJErqiJg v/Yhm+xABdM8r55tlYkQQ1egYhMk8EbfoRDEkoXzNqRC/S9djn+dMyrZTDYI2LYoEDV16fKhT3y DdI0go68cFZYabUGX5S6dZ8YYo9SiIDGFgIccI3QCueh3JPF9Or7z418QKfOXjArINw== X-Received: by 2002:ac8:393a:: with SMTP id s55mr9478293qtb.70.1550268582318; Fri, 15 Feb 2019 14:09:42 -0800 (PST) X-Google-Smtp-Source: AHgI3IZJLG7Pxq7mA1P+dgbDmuztYdzAGUjy929GLoKTtbQ951a88hVlQ6I3E90hOdMJcSITSiNE X-Received: by 2002:ac8:393a:: with SMTP id s55mr9478234qtb.70.1550268581173; Fri, 15 Feb 2019 14:09:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268581; cv=none; d=google.com; s=arc-20160816; b=LEiekC25zTO1KW9281VvyuRl0ZPc6jdlVfW0d7qxa/+qRTG4uz1w4I6ViPb7aP6F49 C5IWYBff6TZ1CyuhzWXYaE30YUR7+Ck7vjvk/gEQL435QMBxXkRViA+eWzMty8fwVjMW RM51LqCNpY452JfQM4L9XLtgiC4zDQMrq+nR6IshJ8CxqUq7M/EVfQKOvWaetNwBN82S 48qKPrZvGyTXQ3Nj0DzsRzCHHQ0Z/rgA89zdRj2qo7gX3VfmjmVmRYujDFv0XEHBsQqZ +18k2znqrHo60pWfeZkdsnbIxU54cwRNr1QmnLIH5S9xl4Txvz7q1jvCsskuZSJbKof7 +mmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=Dbtcm1k9n8crLQHYsC2HtO/EHp8WfiZYUCKxILxdM4M=; b=ORkKEZeNz7iYdoDZmyEl/3MnwqAMSPtSS5u+MfcqXzrhFUVhM0uSrd75ba9jVud2mb NV9oiBFixVWF1XKAoRtXEYjHyE8rhG+U87q68n6yDPgZEtSMvYD3K7g1tIBpfh+Ka9P0 Z25pPhR0VOllRIv9TWA4MVhGbwgc4TdZG5/jt7Dr8sJLEVbjC9Bn/TB4sjxvq1On8RDr CQ0XPVcfAPBUIDbnheqClXDlNzqMfIwcazE0syOJbYkuFLjk7efF0psDglaWIluDEOKO 25YyKOtIMDEP+RQM+OzfuStG5uxWUVK+9ut5QR5NYuGYc4o/doaALYV1w9T7EGwZ1JOE aJGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=CJ04xWmv; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=U9MlAYrI; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id l7si4249042qtj.301.2019.02.15.14.09.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:41 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=CJ04xWmv; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=U9MlAYrI; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 5EF6A3016; Fri, 15 Feb 2019 17:09:39 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=Dbtcm1k9n8crL QHYsC2HtO/EHp8WfiZYUCKxILxdM4M=; b=CJ04xWmvrOFbXnkPj1UeC+utwQcP7 xZi1ygG+u8bo3tl/KuwqLulw1G2AzOhRIrUtueMKdxOxBBR8gGPwydXASO6MDdBs C/SkCnFokHu3yWQ3QDWYAMzfnN8QLkDnmMvGStnOzvbHbIkRr+r/3/pQwaG7xeMk jDwBylyQnaFzu4bF7hwY/mFAbwXZ/rMzwb4Wqky3hee68OOZIqOhCi0q3FnkRVmZ sUbsOwzkDXPxD90Fjmyu+asoLEjSebXyMZQDgsu2CyaFqZCLVphUpGNcMZN3qdKK JQQYOzq3IEGZTkgTVNzN9OqHoh2tyuh3/G98KWRFKdV8gwIrwC5c6tDcg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=Dbtcm1k9n8crLQHYsC2HtO/EHp8WfiZYUCKxILxdM4M=; b=U9MlAYrI ZXlyebYBh9CZFgIH+JK5WVIiXbQbJppYgtcTEh9eVHExRXn4bFhUlCdQxmhZoHzQ j4hpaTZgvjObIDacpDtkRn9fEfQsiJRtHDMvKvHxvDBX1ExZfRijUyBsBus4mHLG CKBAmj70NVVPmbpEkvsxSDHsY5JDKquv0x31L0YUpNjyl7OpPouEAw6p4q0kaLA/ Txpu2gQuBP3QectkRq3rEp7G9Gke0OQrl+PG8FQJxwrSXd48fWCizC9XC/mP63jP cb2L0dKNiePGyPxmkORSRkOdNP3CaGiXy8v37bH7oGPaAqEGu0/s2F0+eiqx1+Fe g75sHKi0mUbHVA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedvvd X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 715DCE462B; Fri, 15 Feb 2019 17:09:37 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 27/31] mm: thp: promote PMD-mapped PUD pages to PUD-mapped PUD pages. Date: Fri, 15 Feb 2019 14:08:52 -0800 Message-Id: <20190215220856.29749-28-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan First promote 512 PMD-mapped THPs to a PMD-mapped PUD THP, then promote a PMD-mapped PUD THP to a PUD-mapped PUD THP. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgalloc.h | 2 + include/asm-generic/pgtable.h | 10 + mm/huge_memory.c | 497 ++++++++++++++++++++++++++++++++- mm/internal.h | 2 + mm/pgtable-generic.c | 20 ++ mm/rmap.c | 23 +- 6 files changed, 540 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index ebcb022f6bb9..153a6749f92b 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -119,6 +119,8 @@ static inline void pud_populate_with_pgtable(struct mm_struct *mm, pud_t *pud, set_pud(pud, __pud(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); } +#define pud_pgtable(pud) pud_page(pud) + #if CONFIG_PGTABLE_LEVELS > 2 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) { diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 1ae33b6590b8..9984c75d64ce 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -302,6 +302,8 @@ static inline void pudp_set_wrprotect(struct mm_struct *mm, #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); +extern pud_t pudp_collapse_flush(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); #else static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, @@ -310,7 +312,15 @@ static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, BUILD_BUG(); return *pmdp; } +static inline pud_t pudp_collapse_flush(struct vm_area_struct *vma, + unsigned long address, + pud_t *pudp) +{ + BUILD_BUG(); + return *pudp; +} #define pmdp_collapse_flush pmdp_collapse_flush +#define pudp_collapse_flush pudp_collapse_flush #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f856f7e39095..67fd1821f4dc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2958,7 +2958,7 @@ void split_huge_pud_address(struct vm_area_struct *vma, unsigned long address, __split_huge_pud(vma, pud, address, freeze, page); } -static void freeze_pud_page(struct page *page) +static void unmap_pud_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PUD; @@ -2973,7 +2973,7 @@ static void freeze_pud_page(struct page *page) VM_BUG_ON_PAGE(!unmap_success, page); } -static void unfreeze_pud_page(struct page *page) +static void remap_pud_page(struct page *page) { int i; @@ -3109,7 +3109,7 @@ static void __split_huge_pud_page(struct page *page, struct list_head *list, spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); - unfreeze_pud_page(head); + remap_pud_page(head); for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { struct page *subpage = head + i; @@ -3210,7 +3210,7 @@ int split_huge_pud_page_to_list(struct page *page, struct list_head *list) } /* - * Racy check if we can split the page, before freeze_pud_page() will + * Racy check if we can split the page, before unmap_pud_page() will * split PUDs */ if (!can_split_huge_pud_page(head, &extra_pins)) { @@ -3219,7 +3219,7 @@ int split_huge_pud_page_to_list(struct page *page, struct list_head *list) } mlocked = PageMlocked(page); - freeze_pud_page(head); + unmap_pud_page(head); VM_BUG_ON_PAGE(compound_mapcount(head), head); /* Make sure the page is not on per-CPU pagevec as it takes pin */ @@ -3285,7 +3285,7 @@ int split_huge_pud_page_to_list(struct page *page, struct list_head *list) xa_unlock(&mapping->i_pages); } spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); - unfreeze_pud_page(head); + remap_pud_page(head); ret = -EBUSY; } @@ -4703,3 +4703,488 @@ int promote_huge_page_address(struct vm_area_struct *vma, unsigned long haddr) return promote_list_to_huge_page(head, &subpage_list); } + +static pud_t *mm_find_pud(struct mm_struct *mm, unsigned long address) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud = NULL; + pud_t pude; + + pgd = pgd_offset(mm, address); + if (!pgd_present(*pgd)) + goto out; + + p4d = p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + goto out; + + pud = pud_offset(p4d, address); + + pude = *pud; + barrier(); + if (!pud_present(pude) || pud_trans_huge(pude)) + pud = NULL; +out: + return pud; +} + +/* promote HPAGE_PUD_SIZE range into a PUD map. + * mmap_sem needs to be down_write. + */ +int promote_huge_pud_address(struct vm_area_struct *vma, unsigned long haddr) +{ + struct mm_struct *mm = vma->vm_mm; + pud_t *pud, _pud; + pmd_t *pmd, *_pmd; + spinlock_t *pud_ptl, *pmd_ptl; + struct mmu_notifier_range range; + pgtable_t pgtable; + struct page *page, *head; + unsigned long address = haddr; + int ret = -EBUSY; + + VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); + + if (haddr < vma->vm_start || (haddr + HPAGE_PUD_SIZE) > vma->vm_end) + return -EINVAL; + + pud = mm_find_pud(mm, haddr); + if (!pud) + goto out; + + anon_vma_lock_write(vma->anon_vma); + + pmd = pmd_offset(pud, haddr); + pmd_ptl = pmd_lockptr(mm, pmd); + + head = page = vm_normal_page_pmd(vma, haddr, *pmd); + if (!page || !PageTransCompound(page) || + compound_order(page) != HPAGE_PUD_ORDER) + goto out_unlock; + VM_BUG_ON(head != compound_head(page)); + lock_page(head); + + mmu_notifier_range_init(&range, mm, haddr, haddr + HPAGE_PUD_SIZE); + mmu_notifier_invalidate_range_start(&range); + pud_ptl = pud_lock(mm, pud); + /* + * After this gup_fast can't run anymore. This also removes + * any huge TLB entry from the CPU so we won't allow + * huge and small TLB entries for the same virtual address + * to avoid the risk of CPU bugs in that area. + */ + + _pud = pudp_collapse_flush(vma, haddr, pud); + spin_unlock(pud_ptl); + mmu_notifier_invalidate_range_end(&range); + + /* remove ptes */ + for (_pmd = pmd; _pmd < pmd + (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)); + _pmd++, page += HPAGE_PMD_NR, address += HPAGE_PMD_SIZE) { + pmd_t pmdval = *_pmd; + + if (pmd_none(pmdval) || is_zero_pfn(pmd_pfn(pmdval))) { + if (is_zero_pfn(pmd_pfn(pmdval))) { + /* + * ptl mostly unnecessary. + */ + spin_lock(pmd_ptl); + /* + * paravirt calls inside pte_clear here are + * superfluous. + */ + pmd_clear(_pmd); + spin_unlock(pmd_ptl); + } + } else { + /* + * ptl mostly unnecessary, but preempt has to + * be disabled to update the per-cpu stats + * inside page_remove_rmap(). + */ + spin_lock(pmd_ptl); + /* + * paravirt calls inside pte_clear here are + * superfluous. + */ + pmd_clear(_pmd); + atomic_dec(sub_compound_mapcount_ptr(page, 1)); + __dec_node_page_state(page, NR_ANON_THPS); + spin_unlock(pmd_ptl); + } + } + page_ref_sub(head, (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)) - 1); + + pgtable = pud_pgtable(_pud); + + _pud = mk_huge_pud(head, vma->vm_page_prot); + _pud = maybe_pud_mkwrite(pud_mkdirty(_pud), vma); + + /* + * spin_lock() below is not the equivalent of smp_wmb(), so + * this is needed to avoid the copy_huge_page writes to become + * visible after the set_pmd_at() write. + */ + smp_wmb(); + + spin_lock(pud_ptl); + BUG_ON(!pud_none(*pud)); + pgtable_trans_huge_pud_deposit(mm, pud, pgtable); + set_pud_at(mm, haddr, pud, _pud); + update_mmu_cache_pud(vma, haddr, pud); + __inc_node_page_state(head, NR_ANON_THPS_PUD); + atomic_inc(compound_mapcount_ptr(head)); + spin_unlock(pud_ptl); + unlock_page(head); + ret = 0; + +out_unlock: + anon_vma_unlock_write(vma->anon_vma); +out: + return ret; +} + +/* Racy check whether the huge page can be split */ +static bool can_promote_huge_pud_page(struct page *page) +{ + int extra_pins; + + /* Additional pins from radix tree */ + if (PageAnon(page)) + extra_pins = PageSwapCache(page) ? 1 : 0; + else + return false; + if (PageSwapCache(page)) + return false; + if (PageWriteback(page)) + return false; + return total_mapcount(page) == page_count(page) - extra_pins - 1; +} + + +static void release_pmd_page(struct page *page) +{ + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); + unlock_page(page); + putback_lru_page(page); +} + +void release_pmd_pages(pmd_t *pmd, pmd_t *_pmd) +{ + while (--_pmd >= pmd) { + pmd_t pmdval = *_pmd; + + if (!pmd_none(pmdval) && !is_zero_pfn(pmd_pfn(pmdval))) + release_pmd_page(pmd_page(pmdval)); + } +} + +/* write a __promote_huge_page_isolate(struct vm_area_struct *vma, + * unsigned long address, pte_t *pte) to isolate all subpages into a list, + * then call promote_list_to_huge_page() to promote in-place + */ + +static int __promote_huge_pud_page_isolate(struct vm_area_struct *vma, + unsigned long haddr, pmd_t *pmd, + struct page **head, struct list_head *subpage_list) +{ + struct page *page = NULL; + pmd_t *_pmd; + bool writable = false; + unsigned long address = haddr; + + *head = NULL; + + lru_add_drain(); + for (_pmd = pmd; _pmd < pmd+PTRS_PER_PMD; + _pmd++, address += HPAGE_PMD_SIZE) { + pmd_t pmdval = *_pmd; + + if (pmd_none(pmdval) || (pmd_trans_huge(pmdval) && + is_zero_pfn(pmd_pfn(pmdval)))) + goto out; + if (!pmd_present(pmdval)) + goto out; + page = vm_normal_page_pmd(vma, address, pmdval); + if (unlikely(!page)) + goto out; + + if (address == haddr) { + *head = page; + if (page_to_pfn(page) & ((1<lru, subpage_list); + VM_BUG_ON_PAGE(!PageLocked(p), p); + } + return 1; + } else { + /*result = SCAN_PAGE_RO;*/ + } + +out: + release_pmd_pages(pmd, _pmd); + return 0; +} + +static int promote_huge_pud_page_isolate(struct vm_area_struct *vma, + unsigned long haddr, + struct page **head, struct list_head *subpage_list) +{ + struct mm_struct *mm = vma->vm_mm; + pud_t *pud; + pmd_t *pmd; + spinlock_t *pmd_ptl; + int ret = -EBUSY; + + pud = mm_find_pud(mm, haddr); + if (!pud) + goto out; + + anon_vma_lock_write(vma->anon_vma); + + pmd = pmd_offset(pud, haddr); + if (!pmd) + goto out_unlock; + pmd_ptl = pmd_lockptr(mm, pmd); + + spin_lock(pmd_ptl); + ret = __promote_huge_pud_page_isolate(vma, haddr, pmd, head, subpage_list); + spin_unlock(pmd_ptl); + + if (unlikely(!ret)) { + ret = -EBUSY; + goto out_unlock; + } + ret = 0; + /* + * All pages are isolated and locked so anon_vma rmap + * can't run anymore. + */ +out_unlock: + anon_vma_unlock_write(vma->anon_vma); +out: + return ret; +} + +/* + * This function promotes normal pages into a huge page. @list point to all + * subpages of huge page to promote, @head point to the head page. + * + * Only caller must hold pin on the pages on @list, otherwise promotion + * fails with -EBUSY. All subpages must be locked. + * + * Both head page and tail pages will inherit mapping, flags, and so on from + * the hugepage. + * + * GUP pin and PG_locked transferred to @page. * + * + * Returns 0 if the hugepage is promoted successfully. + * Returns -EBUSY if any subpage is pinned or if anon_vma disappeared from + * under us. + */ +int promote_list_to_huge_pud_page(struct page *head, struct list_head *list) +{ + struct anon_vma *anon_vma = NULL; + int ret = 0; + DECLARE_BITMAP(subpage_bitmap, HPAGE_PMD_NR); + struct page *subpage; + int i; + + /* no file-backed page support yet */ + if (PageAnon(head)) { + /* + * The caller does not necessarily hold an mmap_sem that would + * prevent the anon_vma disappearing so we first we take a + * reference to it and then lock the anon_vma for write. This + * is similar to page_lock_anon_vma_read except the write lock + * is taken to serialise against parallel split or collapse + * operations. + */ + anon_vma = page_get_anon_vma(head); + if (!anon_vma) { + ret = -EBUSY; + goto out; + } + anon_vma_lock_write(anon_vma); + } else { + ret = -EBUSY; + goto out; + } + + /* Racy check each subpage to see if any has extra pin */ + list_for_each_entry(subpage, list, lru) { + if (can_promote_huge_pud_page(subpage)) + bitmap_set(subpage_bitmap, (subpage - head)/HPAGE_PMD_NR, 1); + } + /* Proceed only if none of subpages has extra pin. */ + if (!bitmap_full(subpage_bitmap, HPAGE_PMD_NR)) { + ret = -EBUSY; + goto out_unlock; + } + + list_for_each_entry(subpage, list, lru) { + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | + TTU_RMAP_LOCKED; + bool unmap_success; + struct pglist_data *pgdata = NULL; + + if (PageAnon(subpage)) + ttu_flags |= TTU_SPLIT_FREEZE; + + unmap_success = try_to_unmap(subpage, ttu_flags); + VM_BUG_ON_PAGE(!unmap_success, subpage); + + /* remove subpages from page_deferred_list */ + pgdata = NODE_DATA(page_to_nid(subpage)); + spin_lock(&pgdata->split_queue_lock); + if (!list_empty(page_deferred_list(subpage))) { + pgdata->split_queue_len--; + list_del_init(page_deferred_list(subpage)); + } + spin_unlock(&pgdata->split_queue_lock); + } + + /*first_compound_mapcount = compound_mapcount(head);*/ + /* Take care of migration wait list: + * make compound page first, since it is impossible to move waiting + * process from subpage queues to the head page queue. + */ + set_compound_page_dtor(head, COMPOUND_PAGE_DTOR); + set_compound_order(head, HPAGE_PUD_ORDER); + __SetPageHead(head); + list_del(&head->lru); + for (i = 1; i < HPAGE_PUD_NR; i++) { + struct page *p = head + i; + + if (i % HPAGE_PMD_NR == 0) { + list_del(&p->lru); + /* move subpage refcount to head page */ + page_ref_add(head, page_count(p) - 1); + } + p->index = 0; + p->mapping = TAIL_MAPPING; + p->mem_cgroup = NULL; + ClearPageActive(p); + set_page_count(p, 0); + set_compound_head(p, head); + } + atomic_set(compound_mapcount_ptr(head), -1); + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + atomic_set(sub_compound_mapcount_ptr(&head[i], 1), -1); + prep_transhuge_page(head); + /* Set first PMD-mapped page sub_compound_mapcount */ + + remap_pud_page(head); + + for (i = HPAGE_PMD_NR; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + struct page *subpage = head + i; + + __unlock_page(subpage); + } + + INIT_LIST_HEAD(&head->lru); + unlock_page(head); + putback_lru_page(head); + + mod_node_page_state(page_pgdat(head), + NR_ISOLATED_ANON + page_is_file_cache(head), -HPAGE_PUD_NR); +out_unlock: + if (anon_vma) { + anon_vma_unlock_write(anon_vma); + put_anon_vma(anon_vma); + } +out: + while (!list_empty(list)) { + struct page *p = list_first_entry(list, struct page, lru); + list_del(&p->lru); + unlock_page(p); + putback_lru_page(p); + } + return ret; +} + +/* assume mmap_sem is down_write, wrapper for madvise */ +int promote_huge_pud_page_address(struct vm_area_struct *vma, unsigned long haddr) +{ + LIST_HEAD(subpage_list); + struct page *head; + + if (haddr & (HPAGE_PUD_SIZE - 1)) + return -EINVAL; + if (haddr < vma->vm_start || (haddr + HPAGE_PUD_SIZE) > vma->vm_end) + return -EINVAL; + + if (promote_huge_pud_page_isolate(vma, haddr, &head, &subpage_list)) + return -EBUSY; + + return promote_list_to_huge_pud_page(head, &subpage_list); +} diff --git a/mm/internal.h b/mm/internal.h index c5e5a0f1cc58..6d5ebcdcde4c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -584,7 +584,9 @@ void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, void __unlock_page(struct page *page); int promote_huge_pmd_address(struct vm_area_struct *vma, unsigned long haddr); +int promote_huge_pud_address(struct vm_area_struct *vma, unsigned long haddr); int promote_huge_page_address(struct vm_area_struct *vma, unsigned long haddr); +int promote_huge_pud_page_address(struct vm_area_struct *vma, unsigned long haddr); #endif /* __MM_INTERNAL_H */ diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 95af1d67f209..99c4fb526c04 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -266,4 +266,24 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, return pmd; } #endif + +#ifndef pudp_collapse_flush +pud_t pudp_collapse_flush(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + /* + * pud and hugepage pte format are same. So we could + * use the same function. + */ + pud_t pud; + + VM_BUG_ON(address & ~HPAGE_PUD_MASK); + VM_BUG_ON(pud_trans_huge(*pudp)); + pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); + + /* collapse entails shooting down ptes not pmd */ + flush_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return pud; +} +#endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/mm/rmap.c b/mm/rmap.c index 39f446a6775d..49ccbf0cfe4d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1112,12 +1112,13 @@ void do_page_add_anon_rmap(struct page *page, { bool compound = flags & RMAP_COMPOUND; bool first; + struct page *head = compound_head(page); if (compound) { atomic_t *mapcount; VM_BUG_ON_PAGE(!PageLocked(page), page); - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - if (compound_order(page) == HPAGE_PUD_ORDER) { + VM_BUG_ON_PAGE(!PMDPageInPUD(page) && !PageTransHuge(page), page); + if (compound_order(head) == HPAGE_PUD_ORDER) { if (order == HPAGE_PUD_ORDER) { mapcount = compound_mapcount_ptr(page); } else if (order == HPAGE_PMD_ORDER) { @@ -1125,7 +1126,7 @@ void do_page_add_anon_rmap(struct page *page, mapcount = sub_compound_mapcount_ptr(page, 1); } else VM_BUG_ON(1); - } else if (compound_order(page) == HPAGE_PMD_ORDER) { + } else if (compound_order(head) == HPAGE_PMD_ORDER) { mapcount = compound_mapcount_ptr(page); } else VM_BUG_ON(1); @@ -1135,7 +1136,8 @@ void do_page_add_anon_rmap(struct page *page, } if (first) { - int nr = compound ? hpage_nr_pages(page) : 1; + /*int nr = compound ? hpage_nr_pages(page) : 1;*/ + int nr = 1<vm_flags & VM_LOCKED)) @@ -1505,12 +1508,16 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ - if (pvmw.pte) + if (pvmw.pte) { subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); - else if (!pvmw.pte && pvmw.pmd) + order = 0; + } else if (!pvmw.pte && pvmw.pmd) { subpage = page - page_to_pfn(page) + pmd_pfn(*pvmw.pmd); - else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) + order = HPAGE_PMD_ORDER; + } else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) { subpage = page - page_to_pfn(page) + pud_pfn(*pvmw.pud); + order = HPAGE_PUD_ORDER; + } VM_BUG_ON(!subpage); address = pvmw.address; @@ -1794,7 +1801,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, PageHuge(page), 0); + page_remove_rmap(subpage, PageHuge(page) || order >= HPAGE_PMD_ORDER, order); put_page(page); } From patchwork Fri Feb 15 22:08:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10816001 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 815AD139A for ; Fri, 15 Feb 2019 22:10:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 70CD930235 for ; Fri, 15 Feb 2019 22:10:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 651EC3025E; Fri, 15 Feb 2019 22:10:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB30C30235 for ; Fri, 15 Feb 2019 22:10:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 909DE8E001F; Fri, 15 Feb 2019 17:09:43 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8B8CA8E0014; Fri, 15 Feb 2019 17:09:43 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75A838E001F; Fri, 15 Feb 2019 17:09:43 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by kanga.kvack.org (Postfix) with ESMTP id 4B2AB8E0014 for ; Fri, 15 Feb 2019 17:09:43 -0500 (EST) Received: by mail-qt1-f198.google.com with SMTP id q33so10199179qte.23 for ; Fri, 15 Feb 2019 14:09:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=shCZ1iSGYyBc0QtOy4OOg2NHBqv6Cu7B1aYGh3Lj914=; b=RMkKBhbeWv9tprz6tX7v8O3q1cI1AUWFYGzrVBJQvorp2QueyM/j715kmcLPk+QDiJ d8WF45crq8HOhToE7cgnJQA3FS04UjDTCoKtZwt7EPTryQ1jZvShbsVDOFVEWyZG6Y5P 3qwSrOfJ+DlI6pOJ+sceffVkfukiMHQS+gkJy5T6K6cmm08SDrhbQO56n/J3uxpOtdh6 UiYPtFOwyk+pfkMiAMOlOzoet9rufwFBB/OkBqgNac337LNI+H+lW+PWfpucL2ieU7WR YZmyOI5geHqDGhlglx/I7pWoJ4xRe+pPNVh5fU4kSYr5RXFAhlYrf//QllOqrixTlxmZ UsUQ== X-Gm-Message-State: AHQUAuagpE6z1q9HpIfi6ARS8/54nvV20Bp1JN6HcF3ebKYw6bwrewxj Lkq2FXCpx8d89fcK9eCQaFxVH20us5xq/bsDbnEr1bvRqa61ZyWEH9xRUvErvlldNzSLjnBrfH4 Mht/npBRiHUXPN+U+k17N1iG0hHEggovwU5+k9A2zpEwboY7nqkRhicADAbNvzrsvuA== X-Received: by 2002:a37:c85d:: with SMTP id c90mr8967761qkj.7.1550268583069; Fri, 15 Feb 2019 14:09:43 -0800 (PST) X-Google-Smtp-Source: AHgI3IYPTUt/f5kUI8G4ZK5I4WL2xDWM8Vcn9LwjP9CxERXJDWmCkHhV1j/AN/n4PHHQvbnkdgpN X-Received: by 2002:a37:c85d:: with SMTP id c90mr8967736qkj.7.1550268582525; Fri, 15 Feb 2019 14:09:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268582; cv=none; d=google.com; s=arc-20160816; b=ASDSyaw2mojhVUdBBW/WerIP+wv8nH/WCDNDQpYrQYI2lM/SF4pDBzx9HBElobWhF8 jP6kqLVRr5O31Gs2DK8D8C3O6pJkIKye8gRZzgEfKEsp0k22F0am3SWQqN6JLOz1TmWL mHXfySgRGEq2mAUWpsKj8IokrKbaYrdMjIErwM8txVU13sfJZkJWp+ksO4DKJuqjTiJb lmmu30pNZT+26YFeC9+WJ7u/GFGDu47sXLqGxwNMiXLpuazasbmaHbXETQldxh/1U/0+ Op3PfkgaDmMNpmVa/8eoHpxHc8x1sy4Qr4539UASNaW9S+0jlZlPPIFrUcUz1hmQ9vQX us6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=shCZ1iSGYyBc0QtOy4OOg2NHBqv6Cu7B1aYGh3Lj914=; b=K6ytCaId+bUnNLUsAJieJY7A9f/8k5BMEnQsXv5So7FoN3RzIYz2dYJDufSSzJwyxE syaiCQ8mN2xpQfV5HobDOLFidO8hFAoRRA1LgJyuNOqBiK5gKKpfLqXkNni+MvQAFGDW 7HO25Fp32qJ2UiNoNw+dUeDjk2e3sf/EIvapttRL6sohteR1eegNybC6pVKCr09Iuve5 mhb08z2bVEr+4oJu9Nc17lNR03SJeAVwtTdJX9ZYBB41D2ffw8TJ/gW8vwpLZF5H3XpL dCiJJwybGIVL9LEu1sVDV6tHdfFVaYFUXzXvG5kAu5NdB/NvzEg53fYk46E4B2vbTaa7 7N6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=JGI4HvBY; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=7LBY7jFR; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id x32si1046628qta.47.2019.02.15.14.09.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:42 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=JGI4HvBY; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=7LBY7jFR; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id BCE9F32A2; Fri, 15 Feb 2019 17:09:40 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=shCZ1iSGYyBc0 QtOy4OOg2NHBqv6Cu7B1aYGh3Lj914=; b=JGI4HvBYMMqsmw33Ld98dJrsDB1V9 eA5YY/5aymibWj3GCK2ZBnkSanqlVNR9KcGe5P7tk9qq5GnUifO5nV4HwD6wMY2O TnBHiA/MCr6o2UrwKCfJ1MXqTmx+l5fWXalUKXfMUFPf8VhAtmXWak99iVu9TCmW swFXWRKzblPSqu4RhbrqErajr+wH09P7q2M+A+vYMWc8eDq5sHwSy5Nls1Zj9DMU pLTQ4BJ+KVGl5yZ2h1mNF+O6I8Yb2HQJBTnPEYWC3PrA8UIOZSuddQGia+euS1EN 9Y8hsOSKtNwrR0AbF5AB02bfXkPOSlxpwPs9qRxF6A+N1EtUy0T3cNyrw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=shCZ1iSGYyBc0QtOy4OOg2NHBqv6Cu7B1aYGh3Lj914=; b=7LBY7jFR gIXY6xtbU8OGarWtbnuAaiWCo3dFDFVQQca/BX7rdBvGPfSMGieDPVc+0gFJYRs9 Qk8EagwAb2TOX8lPvaaM1i7Zv2S+p9yD+OdQk2WNOEd0l+GygVYDeLLUwRlEcdYK dAKMV/snTNYp0A/YZpwmkWpklS6NzZ+ZLx3EwSwkMkDXYga01k5bYx8CD55L0dOW QtPUJ1HH95ud2bOfeHcsJTQmQJUqSw/NEOwWic2VVNl9VlbYiE7MTxwwVxpe1bxT JMUPhQJNgHOjFbuxVTcIk7lADUi6wNskAEv9KyWnh4lq25C388QU4nqzPjx2Iy5N y3GoeqFHYM4kAw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehlecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedt X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id C30E4E4511; Fri, 15 Feb 2019 17:09:38 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 28/31] mm: vmstats: add page promotion stats. Date: Fri, 15 Feb 2019 14:08:53 -0800 Message-Id: <20190215220856.29749-29-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Count all four types of page promotion. Signed-off-by: Zi Yan --- include/linux/vm_event_item.h | 4 ++++ mm/huge_memory.c | 8 ++++++++ mm/vmstat.c | 4 ++++ 3 files changed, 16 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index df619262b1b4..f352e5cbfc9c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -81,6 +81,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PAGE_FAILED, THP_DEFERRED_SPLIT_PAGE, THP_SPLIT_PMD, + THP_PROMOTE_PMD, + THP_PROMOTE_PAGE, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD THP_FAULT_ALLOC_PUD, THP_FAULT_FALLBACK_PUD, @@ -89,6 +91,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PUD_PAGE_FAILED, THP_ZERO_PUD_PAGE_ALLOC, THP_ZERO_PUD_PAGE_ALLOC_FAILED, + THP_PROMOTE_PUD, + THP_PROMOTE_PUD_PAGE, #endif THP_ZERO_PAGE_ALLOC, THP_ZERO_PAGE_ALLOC_FAILED, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 67fd1821f4dc..911463c98bcc 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4403,6 +4403,8 @@ int promote_huge_pmd_address(struct vm_area_struct *vma, unsigned long haddr) out_unlock: anon_vma_unlock_write(vma->anon_vma); out: + if (!ret) + count_vm_event(THP_PROMOTE_PMD); return ret; } @@ -4644,6 +4646,8 @@ int promote_list_to_huge_page(struct page *head, struct list_head *list) put_anon_vma(anon_vma); } out: + if (!ret) + count_vm_event(THP_PROMOTE_PAGE); return ret; } @@ -4842,6 +4846,8 @@ int promote_huge_pud_address(struct vm_area_struct *vma, unsigned long haddr) out_unlock: anon_vma_unlock_write(vma->anon_vma); out: + if (!ret) + count_vm_event(THP_PROMOTE_PUD); return ret; } @@ -5169,6 +5175,8 @@ int promote_list_to_huge_pud_page(struct page *head, struct list_head *list) unlock_page(p); putback_lru_page(p); } + if (!ret) + count_vm_event(THP_PROMOTE_PUD_PAGE); return ret; } diff --git a/mm/vmstat.c b/mm/vmstat.c index 1d185cf748a6..7dd1ff5805ef 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1259,6 +1259,8 @@ const char * const vmstat_text[] = { "thp_split_page_failed", "thp_deferred_split_page", "thp_split_pmd", + "thp_promote_pmd", + "thp_promote_page", #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD "thp_fault_alloc_pud", "thp_fault_fallback_pud", @@ -1267,6 +1269,8 @@ const char * const vmstat_text[] = { "thp_split_pud_page_failed", "thp_zero_pud_page_alloc", "thp_zero_pud_page_alloc_failed", + "thp_promote_pud", + "thp_promote_pud_page", #endif "thp_zero_page_alloc", "thp_zero_page_alloc_failed", From patchwork Fri Feb 15 22:08:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10816003 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9985A13B5 for ; Fri, 15 Feb 2019 22:10:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 873E530235 for ; Fri, 15 Feb 2019 22:10:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7BA743025E; Fri, 15 Feb 2019 22:10:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C295830235 for ; Fri, 15 Feb 2019 22:10:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED60D8E0020; Fri, 15 Feb 2019 17:09:44 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EAE408E0014; Fri, 15 Feb 2019 17:09:44 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D74DF8E0020; Fri, 15 Feb 2019 17:09:44 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id ACF218E0014 for ; Fri, 15 Feb 2019 17:09:44 -0500 (EST) Received: by mail-qk1-f198.google.com with SMTP id b6so9416657qkg.4 for ; Fri, 15 Feb 2019 14:09:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=k7ZExxjAnB4keDLSp3xQU/1nRGQzF8KzUQhggKQXhQA=; b=QcBpr3crFhAWcR+l825JLLLFPwWlc+VdDVpZDFSl1oYGFxZV6K0M5t3pWdi/t0zYou 5refvXT7DjtIGI03QU+A/syJGMmn2VilbYmpUJCxtdhA0I43C+0FPZLUrelMB/q4X1R6 2pWhslOfdFvZt7JS4DvfqfqXzM6iulkka3oQXaKnNkpZGgscle2Dstuq2wqr9iCgEv5g umhULgOKVuQv85K9vJCvqpb3sAlalzo1CoEwdwOY0xku1FmWMAaIfeQMoXJVcTdaMpHu 3+VzxYsrVLrL758ZfByXVs5NBwsTe7HrUefY6ZKgw9kXc+svMFFC7vMKSLZSf5en5hYg tH9A== X-Gm-Message-State: AHQUAuYI/QHJuFIVqEfpGjAGhroNjkP3yng+AXNBsF+eyflR6omdRccq 16sTpXf7mstw/Q5sIEBEx0RbgK0H7gKnxA5cPdEhTvNzQxk/OriotUxeJKm9TSng28A9q23xuFF o9y729PDmzW6R2tL01ckV3W7rtj7kJ6kZKJg+wqIdt/jdwS0b+nCFvYsUW1/Mby3azA== X-Received: by 2002:ac8:3f46:: with SMTP id w6mr9645988qtk.175.1550268584487; Fri, 15 Feb 2019 14:09:44 -0800 (PST) X-Google-Smtp-Source: AHgI3IYj4ZociD6oFExUCXkhXgiGdkHHnFGs1xg9gnDiQ3mU3A+zhLkDbkZDyNqBHhZLfBOVXLf8 X-Received: by 2002:ac8:3f46:: with SMTP id w6mr9645951qtk.175.1550268583913; Fri, 15 Feb 2019 14:09:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268583; cv=none; d=google.com; s=arc-20160816; b=T5c01I8L/nB93wZLqJdeZJ8PdQyxdFmnIqjShhA8fUrij4sQGC+MAsONTsS2JgWaZY e+h3t3VQBP2HRBxJJD0JXwj8yNP1tWj5LwnSbj7G9wEXbSMJwtkpjN65gGsUhbUUdtf3 3iynvvm30CgtLsQFk/tTNkn95mzXhxgSyWOR55KdNIidQptTDlKdeCyDl0QT+nLCGgCo j7GDSji7+ftxm6bZ9VQSiVZwv41rOka4DCQSsRwRlHwWtyC07xH7Rvb27qCQAoTURC0r xs6vPI65HfTKMkjOGLC1oubBbznPiL2rZkp8sVFwu/Fa4h1MzwSgh9rzBRWFBXyWshM2 YfUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=k7ZExxjAnB4keDLSp3xQU/1nRGQzF8KzUQhggKQXhQA=; b=jrIGr2Y8Sq51QlXEUSTzi95PA2cGlep8Eke4fHm8jY9T0Wnc/tsLb8ESI+QDYcgS8s iEPCbl+brYaiMgNJGV4zK5xKwueMeQ40VuxHPZKopO3EL2SumFkwUN0spaNe7+N0mDBL ryLPXh3m6RRe4wQYZVpIkkQPPtTjhuPIElKqOH9B4zt1BLsTXroUFj8ZYXuwSE2VSqz9 bVhaslVFT+4MFm1wlWIUXaIsPmYZ5Oysj3qXK65vvm9P+NXv0Sz/s9IQBd72QcQ5Tk+C BPVaAmlLwpo10B3fvOCIRNLapoFW1hJ0d/6xHXhdMFJRGf2t+PjRUR4Wdkj6/zUYzbb+ gd5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=bCST9++F; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="N2TNvU/R"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id u4si4414746qte.0.2019.02.15.14.09.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:43 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=bCST9++F; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="N2TNvU/R"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 2050E329D; Fri, 15 Feb 2019 17:09:42 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=k7ZExxjAnB4ke DLSp3xQU/1nRGQzF8KzUQhggKQXhQA=; b=bCST9++F8XtSzpwjh7el976UwiLPi M7Tj8JRboCd0IwlnCEwtvDlYbyA8Rjpd9AeKVgoCIT5DSbK0F5OT5ThksiP9y5Fv gGUOCMAhWAnS5OrY9KmLJvxXKHo1+9vfT0FHiVvUevjxUVRUqZ1jfE5ADNrp63Eq /BSrH5lpf3BTEN5ouultEKnaUjtma934YpfHusNBt4taJWspaNRCAdRyHucXSagp oW5tMUYZxy4oRavKWNRO2QmLgry+kZrIic4ZFt7PkUj9qmE0XYNs8+wq+XNQLG3O 8obrb2RuOt5RHrBgdH7qr7GXlFclhjXKS0B7bHOjn0H4N7EONBlfEZwMA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=k7ZExxjAnB4keDLSp3xQU/1nRGQzF8KzUQhggKQXhQA=; b=N2TNvU/R jWTIXilhmDIXAMyLfIPdfFFt3yofQ6K/H+6duECZyb3D3Kn7KlO1wWA8/o5J0cUi OsigiqED0MnljUauXgN6wn4MUvpUVzfu+LgdOMgVv8w9KKh7Xv0wYYmRQmKAHzYf cCO7bZb5Ge+jT7dNlPLCNbPaE8Q39AgTa5QJorojhOadKrKZivu3BJA60YJgqGVT HQ05Kdd2LfbbRewAP7EjHdyMiFktvsgQp3edeAdrCfZf1Qkz6ICEWtIbuu6qMwSc fWDF8CwPOgTdGU3iA3GIk55JoI/68mEukzNu/fWChg9Z16Gteap7m0l7/pFTOnkF LKNxQ/mqa/zLXg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehlecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedt X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 2ABD2E4597; Fri, 15 Feb 2019 17:09:40 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 29/31] mm: madvise: add madvise options to split PMD and PUD THPs. Date: Fri, 15 Feb 2019 14:08:54 -0800 Message-Id: <20190215220856.29749-30-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Signed-off-by: Zi Yan --- include/uapi/asm-generic/mman-common.h | 12 +++ mm/madvise.c | 106 +++++++++++++++++++++++++ 2 files changed, 118 insertions(+) diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index d1ec94a1970d..33db8b6a2ce0 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -69,6 +69,18 @@ #define MADV_MEMDEFRAG 20 /* Worth backing with hugepages */ #define MADV_NOMEMDEFRAG 21 /* Not worth backing with hugepages */ +#define MADV_SPLITHUGEPAGE 24 /* Split huge page in range once */ +#define MADV_PROMOTEHUGEPAGE 25 /* Promote range into huge page */ + +#define MADV_SPLITHUGEMAP 26 /* Split huge page table entry in range once */ +#define MADV_PROMOTEHUGEMAP 27 /* Promote range into huge page table entry */ + +#define MADV_SPLITHUGEPUDPAGE 28 /* Split huge page in range once */ +#define MADV_PROMOTEHUGEPUDPAGE 29 /* Promote range into huge page */ + +#define MADV_SPLITHUGEPUDMAP 30 /* Split huge page table entry in range once */ +#define MADV_PROMOTEHUGEPUDMAP 31 /* Promote range into huge page table entry */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/madvise.c b/mm/madvise.c index 9cef96d633e8..be3818c06e17 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -624,6 +624,95 @@ static long madvise_memdefrag(struct vm_area_struct *vma, *prev = vma; return memdefrag_madvise(vma, &vma->vm_flags, behavior); } + +static long madvise_split_promote_hugepage(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end, int behavior) +{ + struct page *page; + unsigned long addr = start, haddr; + int ret = 0; + *prev = vma; + + while (addr < end && !ret) { + switch (behavior) { + case MADV_SPLITHUGEMAP: + split_huge_pmd_address(vma, addr, false, NULL); + addr += HPAGE_PMD_SIZE; + break; + case MADV_SPLITHUGEPUDMAP: + split_huge_pud_address(vma, addr, false, NULL); + addr += HPAGE_PUD_SIZE; + break; + case MADV_SPLITHUGEPAGE: + page = follow_page(vma, addr, FOLL_GET); + if (page) { + lock_page(page); + if (split_huge_page(page)) { + pr_debug("%s: fail to split page\n", __func__); + ret = -EBUSY; + } + unlock_page(page); + put_page(page); + } else + ret = -ENODEV; + addr += HPAGE_PMD_SIZE; + break; + case MADV_SPLITHUGEPUDPAGE: + page = follow_page(vma, addr, FOLL_GET); + if (page) { + lock_page(page); + if (split_huge_pud_page(page)) { + pr_debug("%s: fail to split pud page\n", __func__); + ret = -EBUSY; + } + unlock_page(page); + put_page(page); + } else + ret = -ENODEV; + addr += HPAGE_PUD_SIZE; + break; + case MADV_PROMOTEHUGEMAP: + haddr = addr & HPAGE_PMD_MASK; + if (haddr >= start && (haddr + HPAGE_PMD_SIZE) <= end) + promote_huge_pmd_address(vma, haddr); + else + ret = -ENODEV; + addr += HPAGE_PMD_SIZE; + break; + case MADV_PROMOTEHUGEPUDMAP: + haddr = addr & HPAGE_PUD_MASK; + if (haddr >= start && (haddr + HPAGE_PUD_SIZE) <= end) + promote_huge_pud_address(vma, haddr); + else + ret = -ENODEV; + addr += HPAGE_PUD_SIZE; + break; + case MADV_PROMOTEHUGEPAGE: + haddr = addr & HPAGE_PMD_MASK; + if (haddr >= start && (haddr + HPAGE_PMD_SIZE) <= end) + promote_huge_page_address(vma, haddr); + else + ret = -ENODEV; + addr += HPAGE_PMD_SIZE; + break; + case MADV_PROMOTEHUGEPUDPAGE: + haddr = addr & HPAGE_PUD_MASK; + if (haddr >= start && (haddr + HPAGE_PUD_SIZE) <= end) + promote_huge_pud_page_address(vma, haddr); + else + ret = -ENODEV; + addr += HPAGE_PUD_SIZE; + break; + default: + ret = -EINVAL; + break; + } + } + + return ret; +} + #ifdef CONFIG_MEMORY_FAILURE /* * Error injection support for memory error handling. @@ -708,6 +797,15 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, case MADV_MEMDEFRAG: case MADV_NOMEMDEFRAG: return madvise_memdefrag(vma, prev, start, end, behavior); + case MADV_SPLITHUGEPAGE: + case MADV_PROMOTEHUGEPAGE: + case MADV_SPLITHUGEMAP: + case MADV_PROMOTEHUGEMAP: + case MADV_SPLITHUGEPUDPAGE: + case MADV_PROMOTEHUGEPUDPAGE: + case MADV_SPLITHUGEPUDMAP: + case MADV_PROMOTEHUGEPUDMAP: + return madvise_split_promote_hugepage(vma, prev, start, end, behavior); default: return madvise_behavior(vma, prev, start, end, behavior); } @@ -744,6 +842,14 @@ madvise_behavior_valid(int behavior) #endif case MADV_MEMDEFRAG: case MADV_NOMEMDEFRAG: + case MADV_SPLITHUGEPAGE: + case MADV_PROMOTEHUGEPAGE: + case MADV_SPLITHUGEMAP: + case MADV_PROMOTEHUGEMAP: + case MADV_SPLITHUGEPUDPAGE: + case MADV_PROMOTEHUGEPUDPAGE: + case MADV_SPLITHUGEPUDMAP: + case MADV_PROMOTEHUGEPUDMAP: return true; default: From patchwork Fri Feb 15 22:08:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10816005 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86D9813B5 for ; Fri, 15 Feb 2019 22:10:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75EBF30235 for ; Fri, 15 Feb 2019 22:10:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A51E3025E; Fri, 15 Feb 2019 22:10:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4EF030235 for ; Fri, 15 Feb 2019 22:10:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64C658E0021; Fri, 15 Feb 2019 17:09:46 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5FA528E0014; Fri, 15 Feb 2019 17:09:46 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C4848E0021; Fri, 15 Feb 2019 17:09:46 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 2061F8E0014 for ; Fri, 15 Feb 2019 17:09:46 -0500 (EST) Received: by mail-qt1-f197.google.com with SMTP id r24so10411365qtj.13 for ; Fri, 15 Feb 2019 14:09:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=b+4owQ22jMmEaJxbgoKlHw/S+Iw79LgMeB8j4fvm/+w=; b=pBuRE+IhbCMyN34aZudAGtO7C21VQDbENufwE8PD5Ht6xn+vu+Ms0bzGUyo9ycNyVM 4qbH/jU8Ng9E3b8XRWUpxhb+WOfRB+JRHVv0b9vnnBH+/amQxKLlB4I8y0tX4EKIKNyy BT8gQDf1i449f71LCR6hXLG/Pr17V4OGJJyarWOlnWSLKoyWeFsv73Rd3EMVL6QcpnyL BXQ9fRtemtQAA6BWZ+J2c7o3KsIPGQ9F7i8nYhDdM5zlEQq+SLKobTiYDqmHYqO3ByVO /D4qya7fhra8526izV2My2JGDZnn4DBjxMPOAtrc7I+QAAPyBON0ic3GUahpy+kTvVzi NTEA== X-Gm-Message-State: AHQUAubQkELjhjmmO5lSj16yQr/+ewr1K/pudDVQdnfrsIna9VICilOp gZrAH0G4lAKD0nBIvWZ7o2jXyIFLSDKTJTGQIFrKfVp/vHobnrCAAbT+Fd7iec9PGjNAOsUJUUn kJPOBpxHEosyZ/8FQ75qc+UAUoV/muXDEUq/+kLDAsZyI3LnrLqWHaZJ0IFiwUieivg== X-Received: by 2002:a37:9a13:: with SMTP id c19mr4964584qke.48.1550268585904; Fri, 15 Feb 2019 14:09:45 -0800 (PST) X-Google-Smtp-Source: AHgI3IYCfVqzqT1vgWMJ0NKkW1Q1TeOq7BvkQaK6hMLZqAe3pQvRQK448NiD2HqpDfm3s+2KivIY X-Received: by 2002:a37:9a13:: with SMTP id c19mr4964547qke.48.1550268585343; Fri, 15 Feb 2019 14:09:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268585; cv=none; d=google.com; s=arc-20160816; b=oyYO5vCxVYR3yDXcQWVpnboDQN8du/tn4uEE8WTwuUafMT4NWLL/MZqnSjCp63woMH ACwz/7o7rw1DQF8BMAA7/NL0/ogB1t3stowAHC3I9BXjhkLbV7XRkn0mwFu7gWlzbJaW cETJyp2a9zYjIeJRhAvaYq2FWL3ABhKPCtOjdizoyJEPO3iPj/K6LkHJI/NkJkd7aAPk SaFyFqGtWvGSFkiB0z4j2RvaZZ3kMW5Jc5t4yoOTv7cES6Uru75bKrVVl9sdLSU5L/UZ WTd1S7WCMRL9Tyi+Hy5H2YJBsS5y3uzHgEdXvLteNIrL8yVvxxWjBmcnsN8Fu+GPYkHx Hdlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=b+4owQ22jMmEaJxbgoKlHw/S+Iw79LgMeB8j4fvm/+w=; b=H4nYdeuyHJ+uGfgcDVze6PW/YxDfLkrccOcgdHMHsSpSzC3+FMJiV7GVc/Nz9N1NNM n268ZLAJtitxYsYGHyg0tYaRP482Riajd6oPm/cRnbRMurbTkoz50TNRT2qB6S85tWly R9L/0cSzeXCMSCWhz0VRWUc/TGYLJhH/Dkb0uQ4Uq6mx/bP63cfnRK7/gIsV78TD7s5S M0ht3yoqGXDVrga9JnJXbA6Et85yki6TJ4OHBRwicPn6AyPrSBkUfFm4tJ6S8jPEar7U 28ssPPQiEqUjCImnwbiLpcWwy1kboUp4Yd6JXzIOS1U9r6ZqXMxosRBj0o4TCxVHWbFy Geyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=OWPUqTmx; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="wFbCS/8Q"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id j18si1214687qth.388.2019.02.15.14.09.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:45 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=OWPUqTmx; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b="wFbCS/8Q"; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 915703058; Fri, 15 Feb 2019 17:09:43 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=b+4owQ22jMmEa JxbgoKlHw/S+Iw79LgMeB8j4fvm/+w=; b=OWPUqTmxFQ5a9WHvCV1c3pv0rgwda k7X4RrLTo8a67TLIxEEOXgzVmuUQAeKpIbuLSeCJWTt7fawFGN748dglDo/WsoVB m6HBkUfqG/wZaiklaIKbOpY+TXEvIQN08fxDq9Hyl40n4JQGSH+mtk9p0fkxnRtU Ct1vjYMgrkVbnujGF15r67EXGd7CDwLHqcq7qswpkEsM3QMn58L36/mwZmLoTzm6 bt3CoGkY4uun6L5roNEGmwc4XmUe2omNRQp6XN0u/MPxkH0ihYI2lC7Vb+B/vBsq s4UbfX8JO1IMxwS3lpioxA1p8BVDtaifeAZyFX+OMrvQMq6OFiLLOTDew== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=b+4owQ22jMmEaJxbgoKlHw/S+Iw79LgMeB8j4fvm/+w=; b=wFbCS/8Q dQ65SYdwaciXiZ4GIs+iLogpwaw48RgAcLS2b+twa65UCcamgOHSqP8QKd07sRTt xzuD1xMpUddn4yuxZF+ax/mfTPsjNs5lphxbyUN28ViJ6ydGVdGmB6e3Hy8WOv00 KUe66i1ihWtA2z1fzi3VEelkgFGimuNdpcogFedTF/0QoMMokMUPYsxzuVVZ0luW VL9x6oEBY1MwyhPAn2oF62dAxS1+8stPYogU9TMlZZJ+ipmojV97RZ8qbYkix7NQ 9ivTYwGmuoMu7XnZ4QsU5Yv9T8gfrfkEW/q6QvUG6xRU5iKm19krNdrfbGXEnihK 73G9qZfSRIgopQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehlecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedt X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id 83858E46AD; Fri, 15 Feb 2019 17:09:41 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 30/31] mm: mem_defrag: thp: PMD THP and PUD THP in-place promotion support. Date: Fri, 15 Feb 2019 14:08:55 -0800 Message-Id: <20190215220856.29749-31-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan PMD THPs will get PMD page table entry promotion as well. PUD THPs only gets PUD page table entry promotion when the toggle is on, which is off by default. Since 1GB THP performs not so good due to shortage of 1GB TLB entries. Signed-off-by: Zi Yan --- mm/mem_defrag.c | 79 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 73 insertions(+), 6 deletions(-) diff --git a/mm/mem_defrag.c b/mm/mem_defrag.c index 4d458b125c95..d7a579924d12 100644 --- a/mm/mem_defrag.c +++ b/mm/mem_defrag.c @@ -56,6 +56,7 @@ struct defrag_result_stats { unsigned long dst_non_lru_failed; unsigned long dst_non_moveable_failed; unsigned long not_defrag_vpn; + unsigned int aligned_max_order; }; enum { @@ -689,6 +690,10 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, page_size = get_contig_page_size(scan_page); + if (compound_order(compound_head(scan_page)) == HPAGE_PUD_ORDER) { + defrag_stats->aligned_max_order = HPAGE_PUD_ORDER; + goto quit_defrag; + } /* PTE-mapped THP not allowed */ if ((scan_page == compound_head(scan_page)) && PageTransHuge(scan_page) && !PageHuge(scan_page)) @@ -714,6 +719,8 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, /* already in the contiguous pos */ if (page_dist == (long long)(scan_page - anchor_page)) { defrag_stats->aligned += (page_size/PAGE_SIZE); + defrag_stats->aligned_max_order = max(defrag_stats->aligned_max_order, + compound_order(scan_page)); continue; } else { /* migrate pages according to the anchor pages in the vma. */ struct page *dest_page = anchor_page + page_dist; @@ -901,6 +908,10 @@ int defrag_address_range(struct mm_struct *mm, struct vm_area_struct *vma, } else { /* exchange */ int err = -EBUSY; + if (compound_order(compound_head(dest_page)) == HPAGE_PUD_ORDER) { + defrag_stats->aligned_max_order = HPAGE_PUD_ORDER; + goto quit_defrag; + } /* PTE-mapped THP not allowed */ if ((dest_page == compound_head(dest_page)) && PageTransHuge(dest_page) && !PageHuge(dest_page)) @@ -1486,10 +1497,13 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) up_read(&vma->vm_mm->mmap_sem); } else if (sc->action == MEM_DEFRAG_DO_DEFRAG) { /* go to nearest 1GB aligned address */ + unsigned long defrag_begin = *scan_address; unsigned long defrag_end = min_t(unsigned long, (*scan_address + HPAGE_PUD_SIZE) & HPAGE_PUD_MASK, vend); int defrag_result; + int nr_fails_in_1gb_range = 0; + int skip_promotion = 0; anchor_node = get_anchor_page_node_from_vma(vma, *scan_address); @@ -1583,14 +1597,47 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) * skip the page which cannot be defragged and restart * from the next page */ - if (defrag_stats.not_defrag_vpn && - defrag_stats.not_defrag_vpn < defrag_sub_chunk_end) { + if (defrag_stats.not_defrag_vpn) { VM_BUG_ON(defrag_sub_chunk_end != defrag_end && defrag_stats.not_defrag_vpn > defrag_sub_chunk_end); - - *scan_address = defrag_stats.not_defrag_vpn; - defrag_stats.not_defrag_vpn = 0; - goto continue_defrag; + find_anchor_pages_in_vma(mm, vma, defrag_stats.not_defrag_vpn); + + nr_fails_in_1gb_range += 1; + if (defrag_stats.not_defrag_vpn < defrag_sub_chunk_end) { + /* reset and continue */ + *scan_address = defrag_stats.not_defrag_vpn; + defrag_stats.not_defrag_vpn = 0; + goto continue_defrag; + } + } else { + /* defrag works for the whole chunk, + * promote to THP in place + */ + if (!defrag_result && + /* skip existing THPs */ + defrag_stats.aligned_max_order < HPAGE_PMD_ORDER && + !(*scan_address & (HPAGE_PMD_SIZE-1)) && + !(defrag_sub_chunk_end & (HPAGE_PMD_SIZE-1))) { + int ret = 0; + /* find a range to promote pmd */ + down_write(&mm->mmap_sem); + ret = promote_huge_page_address(vma, *scan_address); + if (!ret) { + /* + * promote to 2MB THP successful, but it is + * still PTE pointed + */ + /* promote PTE-mapped THP to PMD-mapped */ + promote_huge_pmd_address(vma, *scan_address); + } + up_write(&mm->mmap_sem); + } + /* skip PUD pages */ + if (defrag_stats.aligned_max_order == HPAGE_PUD_ORDER) { + *scan_address = defrag_end; + skip_promotion = 1; + continue; + } } /* Done with current 2MB chunk */ @@ -1606,6 +1653,26 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) } } + /* defrag works for the whole chunk, promote to PUD THP in place */ + if (!nr_fails_in_1gb_range && + !skip_promotion && /* avoid existing THP */ + !(defrag_begin & (HPAGE_PUD_SIZE-1)) && + !(defrag_end & (HPAGE_PUD_SIZE-1))) { + int ret = 0; + /* find a range to promote pud */ + down_write(&mm->mmap_sem); + ret = promote_huge_pud_page_address(vma, defrag_begin); + if (!ret) { + /* + * promote to 1GB THP successful, but it is + * still PMD pointed + */ + /* promote PMD-mapped THP to PUD-mapped */ + if (mem_defrag_promote_1gb_thp) + promote_huge_pud_address(vma, defrag_begin); + } + up_write(&mm->mmap_sem); + } } } done_one_vma: From patchwork Fri Feb 15 22:08:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 10816007 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86FB1139A for ; Fri, 15 Feb 2019 22:10:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75E7830235 for ; Fri, 15 Feb 2019 22:10:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A5573025E; Fri, 15 Feb 2019 22:10:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD85630235 for ; Fri, 15 Feb 2019 22:10:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B53328E0022; Fri, 15 Feb 2019 17:09:47 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B2E5D8E0014; Fri, 15 Feb 2019 17:09:47 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1A7C8E0022; Fri, 15 Feb 2019 17:09:47 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 719B88E0014 for ; Fri, 15 Feb 2019 17:09:47 -0500 (EST) Received: by mail-qt1-f200.google.com with SMTP id m37so10332688qte.10 for ; Fri, 15 Feb 2019 14:09:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:dkim-signature:from:to:cc:subject :date:message-id:in-reply-to:references:reply-to:mime-version :content-transfer-encoding; bh=Oh7GULCOLWRjmjhuSKnyAXxTibOklN87jGLHPOsAqeM=; b=FK4xJZ++m/gaWQ7Hnb1XKwrrc9w6nfVPaMMhk8jdEbfp1u4CxJqCaEnbyCoTIDC02g srOTfy5Ngi/qaGbKh65j3hcn0XrD3Bl+VY/2GY7nLPScPYeECysaundrN6qL9+7T/9ha V61Wrv0fUSwv2eywo2YVK55fj747hyWh/F8EbK4H+waCmh+9G4Z8N8OWy2veyOYi0sio Jzlo9OkwCh0A1GzfUO4Mn07EVOZtScTcUx3jgkIbtXwuWkUUGXjBUuX3duV+byjHSekE pf28z7WGAybHA9tknrsLcqMMo5hmhLo/ZX3I67AVi3C9Tz8lZPYFXAVPEeKFWU1Y1bCC wKbA== X-Gm-Message-State: AHQUAua3XHL8kuQ2m3sStG67LIbLkWOeVrR9a+IBQmDodR/uyPjV7ai7 YfrNIu0jdDB80pBKrND7hVeNRj++FNQwAuxrumc0CAQNPw+j8UBRJYX0hurnHkGt2dV558zF4Lp 5yY29TCqSRtmkMzqMAJxbcUPT2FxPFMfg/jMtWrbgt8UXqSUTM5VMwZeTdO8YrK/g8Q== X-Received: by 2002:a0c:8204:: with SMTP id h4mr9015353qva.85.1550268587232; Fri, 15 Feb 2019 14:09:47 -0800 (PST) X-Google-Smtp-Source: AHgI3Iatp59kp1YEU8NUwSEBAIQYnx4yYPREvT8qKffegeS6ioFPmslU4v8tBQgGqBIlRa7wSFhj X-Received: by 2002:a0c:8204:: with SMTP id h4mr9015319qva.85.1550268586628; Fri, 15 Feb 2019 14:09:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550268586; cv=none; d=google.com; s=arc-20160816; b=FCiUKSgI3kNa68kHaycKG6qDnmfMVhW14CN7UiEKFsvvu7JRqpyAhvN+jri0DREM2D /mkJeOvOT2WmeLQCObn0oUU9Tdc/3Zf6XqMjJIjTFYc4kdJJz+2Sk5pg6sqXXmCgxlk3 Uj844xE3+5DdPcL0xff8dopu1/FbnJvCcCS375yt3mHKK7Ih9hwmqp4W7mGV/+HydKwK oy2hHLoNfe/uri1500HhxH6TxY4orYTNZsvmnILdfPS21/cPfIdmNRVAAUDROs/Sv9tY ZCnEzx3QIB/Xn7Ea1LPNMAtdaJVm6Qu5GQ5ouQLS8kesG8uFyk2GIduVAmRDlwS95DYo fWIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature :dkim-signature; bh=Oh7GULCOLWRjmjhuSKnyAXxTibOklN87jGLHPOsAqeM=; b=joERkvg9JkJgOwP67SusW65+q7uHg0APBr/2cI2SqTf805VxsZeg5PoLmPG7F6XNlX blQhy+IuDz1wTIpZzVzQIEd07Oz2+7blwqwNc0iFh7MhDOcWKLSf8d5T6Qdzg4skfu9s 2tM3QsChlhZ5eBTSlvWlHSvw1L8gGXvPtr5TZF1DDa57w3YCM3CK2q6KVN+JkpthYwoS p754RqimV6Qo1o4VIdGddwgwxl2unwBrkQgAv6i5LOnTXBhjVnz90ueAvlVp0Mmrh9Y9 ihtIdvoF68zekmclvVgpRj+cjUrHbE8eEmzfXlvrJosWGOMQsYbCcrNotObp2r9/Sr7q zJng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=aMljrvdA; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Wk4BxuV8; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com. [64.147.123.25]) by mx.google.com with ESMTPS id n20si822937qvh.47.2019.02.15.14.09.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Feb 2019 14:09:46 -0800 (PST) Received-SPF: pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) client-ip=64.147.123.25; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=aMljrvdA; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=Wk4BxuV8; spf=pass (google.com: domain of zi.yan@sent.com designates 64.147.123.25 as permitted sender) smtp.mailfrom=zi.yan@sent.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id D2F7F310A; Fri, 15 Feb 2019 17:09:44 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 15 Feb 2019 17:09:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm2; bh=Oh7GULCOLWRjm jhuSKnyAXxTibOklN87jGLHPOsAqeM=; b=aMljrvdAcwoldsXjgO5yhN+UvVx3b xtMpZUikF3bhBc27IQwdHghiiPt57aD2XagmtanJzU4OJO08uxfX8eccKKcKLyQl cWo78u9Vxy8fbhM65Snhsq7J9epAhHPZ/68q1i8TetxogjvtsQdOgyE2b6D61rJ2 aviVGagS+HDm/RWkikaQcz+3n1cf3h0NpFSHd2vYQa7krQWuJVarBn7OPKZV0sho K7sgtCHQBMM5g2kBIADd1ov0LYCmB8N7iu9IiorRaWcRGr7GWBt6jckraSt/Xuys 6ttVkk/s5fDzcuIDmX1ilk5cujGubxuQ/jfyJPGf+akadbtTsnArp41xw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=Oh7GULCOLWRjmjhuSKnyAXxTibOklN87jGLHPOsAqeM=; b=Wk4BxuV8 OWmOHmdNoqc9hHLktFqfkSznUBiLSiCxC0pJAtVVlHqjehVnz4ZMuLnSO62B+G44 5l4Ii0kVi7gON8W5cEQgXfLWghI9YiupsUpV2lM8FZq0cah9kbQYyWu3iuCRncDa li91SR0Dctl5cxvHRSRRjbknzDtTtjXJQ6LDTlNHRuYiGm6Sg3scWw2/YGJ8fYNn O9kbwF6AAnrJ1aNcp1YuSu2wdIh4AGvXkLS/aGQpfEPxs/zA/ozSwFzxnKTtv1v9 1d3aw0zSY8OpsQqOLj7/B9nVSYZlkUNB4uW+1o4KB7QKcQ9YAv0sV8uiiBlAbacN b9QrjdiBUa3qZA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedtledruddtjedgudehlecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfhuthenuceurghilhhouhhtmecu fedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufffkf fojghfrhgggfestdekredtredttdenucfhrhhomhepkghiucgjrghnuceoiihirdihrghn sehsvghnthdrtghomheqnecukfhppedvudeirddvvdekrdduuddvrddvvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomhenucevlhhushhtvghr ufhiiigvpedt X-ME-Proxy: Received: from nvrsysarch5.nvidia.com (thunderhill.nvidia.com [216.228.112.22]) by mail.messagingengine.com (Postfix) with ESMTPA id DBFB3E409D; Fri, 15 Feb 2019 17:09:42 -0500 (EST) From: Zi Yan To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Dave Hansen , Michal Hocko , "Kirill A . Shutemov" , Andrew Morton , Vlastimil Babka , Mel Gorman , John Hubbard , Mark Hairgrove , Nitin Gupta , David Nellans , Zi Yan Subject: [RFC PATCH 31/31] sysctl: toggle to promote PUD-mapped 1GB THP or not. Date: Fri, 15 Feb 2019 14:08:56 -0800 Message-Id: <20190215220856.29749-32-zi.yan@sent.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190215220856.29749-1-zi.yan@sent.com> References: <20190215220856.29749-1-zi.yan@sent.com> Reply-To: ziy@nvidia.com MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Zi Yan Only promotion PMD THP by default. Signed-off-by: Zi Yan --- kernel/sysctl.c | 11 +++++++++++ mm/mem_defrag.c | 17 +++++++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 762535a2c7d1..20263d2c39b9 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -121,6 +121,7 @@ extern int vma_scan_threshold_type; extern int vma_no_repeat_defrag; extern int num_breakout_chunks; extern int defrag_size_threshold; +extern int mem_defrag_promote_thp; extern int only_print_head_pfn; @@ -135,6 +136,7 @@ static int zero; static int __maybe_unused one = 1; static int __maybe_unused two = 2; static int __maybe_unused four = 4; +static int __maybe_unused fifteen = 15; static unsigned long one_ul = 1; static int one_hundred = 100; static int one_thousand = 1000; @@ -1761,6 +1763,15 @@ static struct ctl_table vm_table[] = { .extra1 = &zero, .extra2 = &one, }, + { + .procname = "mem_defrag_promote_thp", + .data = &mem_defrag_promote_thp, + .maxlen = sizeof(mem_defrag_promote_thp), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &fifteen, + }, { } }; diff --git a/mm/mem_defrag.c b/mm/mem_defrag.c index d7a579924d12..7cfa99351925 100644 --- a/mm/mem_defrag.c +++ b/mm/mem_defrag.c @@ -64,12 +64,18 @@ enum { VMA_THRESHOLD_TYPE_SIZE, }; +#define PROMOTE_PMD_MAP (0x8) +#define PROMOTE_PMD_PAGE (0x4) +#define PROMOTE_PUD_MAP (0x2) +#define PROMOTE_PUD_PAGE (0x1) + int num_breakout_chunks; int vma_scan_percentile = 100; int vma_scan_threshold_type = VMA_THRESHOLD_TYPE_TIME; int vma_no_repeat_defrag; int kmem_defragd_always; int defrag_size_threshold = 5; +int mem_defrag_promote_thp = (PROMOTE_PMD_MAP|PROMOTE_PMD_PAGE); static DEFINE_SPINLOCK(kmem_defragd_mm_lock); #define MM_SLOTS_HASH_BITS 10 @@ -1613,7 +1619,8 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) /* defrag works for the whole chunk, * promote to THP in place */ - if (!defrag_result && + if ((mem_defrag_promote_thp & PROMOTE_PMD_PAGE) && + !defrag_result && /* skip existing THPs */ defrag_stats.aligned_max_order < HPAGE_PMD_ORDER && !(*scan_address & (HPAGE_PMD_SIZE-1)) && @@ -1628,7 +1635,8 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) * still PTE pointed */ /* promote PTE-mapped THP to PMD-mapped */ - promote_huge_pmd_address(vma, *scan_address); + if (mem_defrag_promote_thp & PROMOTE_PMD_MAP) + promote_huge_pmd_address(vma, *scan_address); } up_write(&mm->mmap_sem); } @@ -1654,7 +1662,8 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) } /* defrag works for the whole chunk, promote to PUD THP in place */ - if (!nr_fails_in_1gb_range && + if ((mem_defrag_promote_thp & PROMOTE_PUD_PAGE) && + !nr_fails_in_1gb_range && !skip_promotion && /* avoid existing THP */ !(defrag_begin & (HPAGE_PUD_SIZE-1)) && !(defrag_end & (HPAGE_PUD_SIZE-1))) { @@ -1668,7 +1677,7 @@ static int kmem_defragd_scan_mm(struct defrag_scan_control *sc) * still PMD pointed */ /* promote PMD-mapped THP to PUD-mapped */ - if (mem_defrag_promote_1gb_thp) + if (mem_defrag_promote_thp & PROMOTE_PUD_MAP) promote_huge_pud_address(vma, defrag_begin); } up_write(&mm->mmap_sem);