From patchwork Wed Sep 2 18:06:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751479 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 505ED722 for ; Wed, 2 Sep 2020 18:06:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 126FE206EB for ; Wed, 2 Sep 2020 18:06:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="fXFPwlL5"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="WKrEjAcS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 126FE206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4420D90000F; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1FAD9900016; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F4077900019; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id C1F30900015 for ; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 855C3180AD806 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-FDA: 77218901466.30.hill53_4d0cd78270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 2DE96180B3C85 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-Spam-Summary: 1,0,0,916d6d5b1deee668,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:355:379:541:800:960:966:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1542:1711:1730:1747:1777:1792:2196:2199:2393:2559:2562:3138:3139:3140:3141:3142:3354:3865:3866:3867:3868:3870:3871:3872:3874:4117:4250:4321:4385:5007:6119:6120:6261:6630:6653:7558:7576:7875:7903:9010:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13894:14181:14721:14819:21080:21451:21627:21990:30034:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y83ui4gf3b1i6xtsjfyba1yonorypbffnwgxy5yc7qgm69q3p7amqj5ce8xz6.5aqnf7zhzuxghaizxmx7nke5uakz3zb5af4c7eende7zgik9fkys4tdxb39tjkp.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: hill53_4d0cd78270a2 X-Filterd-Recvd-Size: 6422 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:32 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 496145C019E; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=ITRQskCpKb2rU yJNQp9somjEBq9yHU8r9R8TpD8Y4Pk=; b=fXFPwlL5k+gkNJiXxZ5ClVHlaUVGi gtfCPgn2MBp0mOe9Lu6IIkTsODzkQJcg8V7fL2CM0BBuWGynbWG4rXsGwlbFGz8e G/HGDh7CERpIhH1kpx2TJkCw4Tag6GNVAT42nLSOPhLqeQnrH1HmIyiu/4/gVu/2 jUDd66Zevo8pGJ1x1lkRpoWlxKhzupkBSzsVW1jDsNmaXefRxOlXcli3LxFHvLcT 7QdJ1l7ppTNte3t+q40IFB+N3Gmkzux+mzLJo0y3ZSeeR+EThGHekPClLml+q6x0 q0x1dqMVNXk4JxYezuYe0/3X0xgMFeBS1jh2K+uU7WEaHdtmG4HOWVSnw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=ITRQskCpKb2rUyJNQp9somjEBq9yHU8r9R8TpD8Y4Pk=; b=WKrEjAcS JqCI2RSIGYdxYwrLtNgf+P3JV9iETBJZslPss8x7KdAGK4Ru448SDdqEUheYqou8 Ik+8M+bYu1q7VdeM1zGFSIcoCiVtbbD809s9klTZmY9PKIJbYo4BtRb6hDwlnqmf sf7FAHDiA53lcxA6IuuuXMavHIxBjd0Qni1AzwswRz86Pe9XCBdsF97bn8YW2Mbj J8eAaNYisgzhkr/r9K3npLgYIBMHWKIHO2DNCED7FimVEi0Q3s6HKFzKsBzkfTOW qp9285W7mlITOYW67O22Y5nq05vT9Ic+cSn5yuawSD/StIi1k3v8msMX+TG/RnoD cQU29CoZjKcJ5A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id E685730600B4; Wed, 2 Sep 2020 14:06:30 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 01/16] mm: add pagechain container for storing multiple pages. Date: Wed, 2 Sep 2020 14:06:13 -0400 Message-Id: <20200902180628.4052244-2-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 2DE96180B3C85 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan When depositing page table pages for 1GB THPs, we need 512 PTE pages + 1 PMD page. Instead of counting and depositing 513 pages, we can use the PMD page as a leader page and chain the rest 512 PTE pages with ->lru. This, however, prevents us depositing PMD pages with ->lru, which is currently used by depositing PTE pages for 2MB THPs. So add a new pagechain container for PMD pages. Signed-off-by: Zi Yan --- include/linux/pagechain.h | 73 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 include/linux/pagechain.h diff --git a/include/linux/pagechain.h b/include/linux/pagechain.h new file mode 100644 index 000000000000..be536142b413 --- /dev/null +++ b/include/linux/pagechain.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * include/linux/pagechain.h + * + * In many places it is efficient to batch an operation up against multiple + * pages. A pagechain is a multipage container which is used for that. + */ + +#ifndef _LINUX_PAGECHAIN_H +#define _LINUX_PAGECHAIN_H + +#include + +/* 14 pointers + two long's align the pagechain structure to a power of two */ +#define PAGECHAIN_SIZE 13 + +struct page; + +struct pagechain { + struct list_head list; + unsigned int nr; + struct page *pages[PAGECHAIN_SIZE]; +}; + +static inline void pagechain_init(struct pagechain *pchain) +{ + pchain->nr = 0; + INIT_LIST_HEAD(&pchain->list); +} + +static inline void pagechain_reinit(struct pagechain *pchain) +{ + pchain->nr = 0; +} + +static inline unsigned int pagechain_count(struct pagechain *pchain) +{ + return pchain->nr; +} + +static inline unsigned int pagechain_space(struct pagechain *pchain) +{ + return PAGECHAIN_SIZE - pchain->nr; +} + +static inline bool pagechain_empty(struct pagechain *pchain) +{ + return pchain->nr == 0; +} + +/* + * Add a page to a pagechain. Returns the number of slots still available. + */ +static inline unsigned int pagechain_deposit(struct pagechain *pchain, struct page *page) +{ + VM_BUG_ON(!pagechain_space(pchain)); + pchain->pages[pchain->nr++] = page; + return pagechain_space(pchain); +} + +static inline struct page *pagechain_withdraw(struct pagechain *pchain) +{ + if (!pagechain_count(pchain)) + return NULL; + return pchain->pages[--pchain->nr]; +} + +void __init pagechain_cache_init(void); +struct pagechain *pagechain_alloc(void); +void pagechain_free(struct pagechain *pchain); + +#endif /* _LINUX_PAGECHAIN_H */ + From patchwork Wed Sep 2 18:06:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751481 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3F40722 for ; Wed, 2 Sep 2020 18:06:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 82389206EB for ; Wed, 2 Sep 2020 18:06:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="UFUXOnjJ"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="goFpC3BM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 82389206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7678D900016; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 68574900018; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33432900019; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id D2BE5900012 for ; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8CE828248068 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-FDA: 77218901466.07.mine56_060b520270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 62B3D1803F9A3 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-Spam-Summary: 1,0,0,5cee3d6a3bb5aa1e,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:327:355:379:541:800:960:966:968:973:988:989:1260:1261:1311:1314:1345:1359:1431:1437:1515:1605:1730:1747:1777:1792:1801:1981:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:2731:2737:3138:3139:3140:3141:3142:3865:3866:3867:3871:3874:4250:4321:4385:4605:5007:6119:6120:6261:6630:6653:7576:7875:7903:8784:8957:9010:9040:9165:10004:10226:11026:11232:11233:11473:11657:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13894:21063:21080:21433:21451:21627:21795:21966:21987:21990:30003:30051:30054:30064:30070,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yru7h6pci19j11n5zybxiqgc6yzopwqjianhhztq9jp7toxrqtb4kenxr1ead.wbhucph718j83cwhgo4nsdqowyhoafwsptewd6zm5pn1jj1x9zwqix743tyenry.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:22,LUA_SUMMAR Y:none X-HE-Tag: mine56_060b520270a2 X-Filterd-Recvd-Size: 31038 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:32 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 71C865C01BC; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=Gc4UsLeb5bYFT ZYziOCiFFrVo565mG6ZB6RNAooF1qY=; b=UFUXOnjJZ92OP+ZxhHUP8neKAGPXP pzdl8faWniqhbMm4pBqg2Rm4gf9/5yo9hxhGzSaRsOcxjf1Z/0m+f1NQuIAz40h8 FUVe9tAc8hBCLfvjX0XtBgataJElrA/r9+027KUm7rLdGPUxLqrT/dkFXB2L71uP A0sfenNgn2n6NkCk9tIvV1pjWmW1i1CilSEJ9wEyJH7yVpJ9tn15gXyr8tsLG/nU ztgxiJ3ex/VTLAkNONKIx3AEvgaxN2PSr9ccnoIsOmPG/WmbTvboBstQY2bFhdnW mvZgPDave4BZkljO6WaI0QQdzeO0iZ7LWmNnLnVKqnHV4JH+gKIQ9GfzQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=Gc4UsLeb5bYFTZYziOCiFFrVo565mG6ZB6RNAooF1qY=; b=goFpC3BM MTKZbBpKBmz5M+RA9qSSRvGqv41UWVrqSqlZJA42PehDSDaKf70Zz/Z/H5e8TKnZ azc2YYo6bdEhEflqMsa1chVkZgo3RmLH9AOWFggaSen4z0H2bxL9AhxPad61AzG7 TH3LQBYh5zZjdVlVvoX36VVxeVm8ZFS6VwUwTERJjydTD20VIiFs9oXPhbLj3ZZY +bYfC4OtlTcXKj+/EYjU7jeOcDMNevTKm28iaSx8/ImkjxnGat2MCmRt2bownW6i Gx+AeYKPTnJ6POg9t0muyrhPLyMt8lmlldrTJHCU9wQE2zyKQ5RBQIl/Mty6EPKa I1yCL1EiFTq/OA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 2FA4530600DC; Wed, 2 Sep 2020 14:06:31 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 02/16] mm: thp: 1GB anonymous page implementation. Date: Wed, 2 Sep 2020 14:06:14 -0400 Message-Id: <20200902180628.4052244-3-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 62B3D1803F9A3 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan This adds 1GB THP support for anonymous pages. Applications can get 1GB pages during page faults when their VMAs are larger than 1GB. For read-only 1GB zero THP, a shared 1GB zero THP is created for all readers. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgalloc.h | 59 +++++++++++ arch/x86/include/asm/pgtable.h | 2 + arch/x86/mm/pgtable.c | 25 +++++ drivers/base/node.c | 3 + fs/proc/meminfo.c | 2 + include/linux/huge_mm.h | 13 ++- include/linux/mm.h | 4 + include/linux/mm_types.h | 1 + include/linux/mmzone.h | 1 + include/linux/pgtable.h | 3 + include/linux/vm_event_item.h | 3 + kernel/fork.c | 5 + mm/huge_memory.c | 188 +++++++++++++++++++++++++++++++-- mm/memory.c | 29 ++++- mm/page_alloc.c | 3 +- mm/pgtable-generic.c | 45 ++++++++ mm/rmap.c | 30 ++++-- mm/vmstat.c | 4 + 18 files changed, 396 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index 62ad61d6fefc..fae13467d3e1 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -52,6 +52,18 @@ extern pgd_t *pgd_alloc(struct mm_struct *); extern void pgd_free(struct mm_struct *mm, pgd_t *pgd); extern pgtable_t pte_alloc_one(struct mm_struct *); +extern pgtable_t pte_alloc_order(struct mm_struct *, unsigned long, int); + +static inline void pte_free_order(struct mm_struct *mm, struct page *pte, + int order) +{ + int i; + + for (i = 0; i < (1< 2 +static inline pmd_t *pmd_alloc_one_page_with_ptes(struct mm_struct *mm, unsigned long addr) +{ + pgtable_t pte_pgtables; + pmd_t *pmd; + spinlock_t *pmd_ptl; + int i; + + pte_pgtables = pte_alloc_order(mm, addr, + HPAGE_PUD_ORDER - HPAGE_PMD_ORDER); + if (!pte_pgtables) + return NULL; + + pmd = pmd_alloc_one(mm, addr); + if (unlikely(!pmd)) { + pte_free_order(mm, pte_pgtables, + HPAGE_PUD_ORDER - HPAGE_PMD_ORDER); + return NULL; + } + pmd_ptl = pmd_lock(mm, pmd); + + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + pgtable_trans_huge_deposit(mm, pmd, pte_pgtables + i); + + spin_unlock(pmd_ptl); + + return pmd; +} + +static inline void pmd_free_page_with_ptes(struct mm_struct *mm, pmd_t *pmd) +{ + spinlock_t *pmd_ptl; + int i; + + BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); + pmd_ptl = pmd_lock(mm, pmd); + + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) { + pgtable_t pte_pgtable; + + pte_pgtable = pgtable_trans_huge_withdraw(mm, pmd); + pte_free(mm, pte_pgtable); + } + + spin_unlock(pmd_ptl); + pmd_free(mm, pmd); +} + extern void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5e0dcc20614d..26255cac78c0 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1141,6 +1141,8 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long return native_pmdp_get_and_clear(pmdp); } +#define mk_pud(page, pgprot) pfn_pud(page_to_pfn(page), (pgprot)) + #define __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pud_t *pudp) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index dfd82f51ba66..7be73aee6183 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -33,6 +33,31 @@ pgtable_t pte_alloc_one(struct mm_struct *mm) return __pte_alloc_one(mm, __userpte_alloc_gfp); } +pgtable_t pte_alloc_order(struct mm_struct *mm, unsigned long address, int order) +{ + struct page *pte; + int i; + + pte = alloc_pages(__userpte_alloc_gfp, order); + if (!pte) + return NULL; + split_page(pte, order); + for (i = 1; i < (1 << order); i++) + set_page_private(pte + i, 0); + + for (i = 0; i < (1<= 0) { + pgtable_pte_page_dtor(&pte[i]); + __free_page(&pte[i]); + } + return NULL; + } + } + return pte; +} + static int __init setup_userpte(char *arg) { if (!arg) diff --git a/drivers/base/node.c b/drivers/base/node.c index 508b80f6329b..f11b4d88911c 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -428,6 +428,7 @@ static ssize_t node_read_meminfo(struct device *dev, "Node %d SUnreclaim: %8lu kB\n" #ifdef CONFIG_TRANSPARENT_HUGEPAGE "Node %d AnonHugePages: %8lu kB\n" + "Node %d AnonHugePUDPages: %8lu kB\n" "Node %d ShmemHugePages: %8lu kB\n" "Node %d ShmemPmdMapped: %8lu kB\n" "Node %d FileHugePages: %8lu kB\n" @@ -457,6 +458,8 @@ static ssize_t node_read_meminfo(struct device *dev, , nid, K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR), + nid, K(node_page_state(pgdat, NR_ANON_THPS_PUD) * + HPAGE_PUD_NR), nid, K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR), nid, K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 887a5532e449..b60e0c241015 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -130,6 +130,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_TRANSPARENT_HUGEPAGE show_val_kb(m, "AnonHugePages: ", global_node_page_state(NR_ANON_THPS) * HPAGE_PMD_NR); + show_val_kb(m, "AnonHugePUDPages: ", + global_node_page_state(NR_ANON_THPS_PUD) * HPAGE_PUD_NR); show_val_kb(m, "ShmemHugePages: ", global_node_page_state(NR_SHMEM_THPS) * HPAGE_PMD_NR); show_val_kb(m, "ShmemPmdMapped: ", diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 8a8bc46a2432..7528652400e4 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -18,10 +18,15 @@ extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD extern void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); +extern int do_huge_pud_anonymous_page(struct vm_fault *vmf); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { } +extern int do_huge_pud_anonymous_page(struct vm_fault *vmf) +{ + return VM_FAULT_FALLBACK; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); @@ -115,6 +120,9 @@ extern struct kobj_attribute shmem_enabled_attr; #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1< #include #include +#include struct mempolicy; struct anon_vma; @@ -2184,6 +2185,7 @@ static inline void pgtable_init(void) { ptlock_cache_init(); pgtable_cache_init(); + pagechain_cache_init(); } static inline bool pgtable_pte_page_ctor(struct page *page) @@ -2316,6 +2318,8 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud) return ptl; } +#define pud_huge_pte(mm, pud) ((mm)->pud_huge_pte) + extern void __init pagecache_init(void); extern void __init free_area_init_memoryless_node(int nid); extern void free_initmem(void); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 496c3ff97cce..4c1839366af4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -513,6 +513,7 @@ struct mm_struct { #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page_table_lock */ #endif + struct list_head pud_huge_pte; /* protected by page_table_lock */ #ifdef CONFIG_NUMA_BALANCING /* * numa_next_scan is the next time that the PTEs will be marked diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0a404552ecc1..3a8f54a2c5a7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -196,6 +196,7 @@ enum node_stat_item { NR_FILE_THPS, NR_FILE_PMDMAPPED, NR_ANON_THPS, + NR_ANON_THPS_PUD, NR_VMSCAN_WRITE, NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ NR_DIRTIED, /* page dirtyings since bootup */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e8cbc2e795d5..255275d5b73e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -462,10 +462,13 @@ static inline pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, #ifndef __HAVE_ARCH_PGTABLE_DEPOSIT extern void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, pgtable_t pgtable); +extern void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, + pgtable_t pgtable); #endif #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); +extern pgtable_t pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp); #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 2e6ca53b9bbd..a3f1093a55bb 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -92,6 +92,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_DEFERRED_SPLIT_PAGE, THP_SPLIT_PMD, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + THP_FAULT_ALLOC_PUD, + THP_FAULT_FALLBACK_PUD, + THP_FAULT_FALLBACK_PUD_CHARGE, THP_SPLIT_PUD, #endif THP_ZERO_PAGE_ALLOC, diff --git a/kernel/fork.c b/kernel/fork.c index 3f281814a3d3..842fdc4ae5fc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -663,6 +663,10 @@ static void check_mm(struct mm_struct *mm) #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS VM_BUG_ON_MM(mm->pmd_huge_pte, mm); #endif + VM_BUG_ON_MM(!list_empty(&mm->pud_huge_pte) && + !pagechain_empty(list_first_entry(&mm->pud_huge_pte, + struct pagechain, list)), + mm); } #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) @@ -1023,6 +1027,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte = NULL; #endif + INIT_LIST_HEAD(&mm->pud_huge_pte); mm_init_uprobes_state(mm); if (current->mm) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 90733cefa528..ec3847392208 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -933,6 +933,112 @@ vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, pfn_t pfn, return VM_FAULT_NOPAGE; } EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud_prot); + +static int __do_huge_pud_anonymous_page(struct vm_fault *vmf, struct page *page, + gfp_t gfp) +{ + struct vm_area_struct *vma = vmf->vma; + pmd_t *pmd_pgtable; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + int ret = 0; + + VM_BUG_ON_PAGE(!PageCompound(page), page); + + if (mem_cgroup_charge(page, vma->vm_mm, gfp)) { + put_page(page); + count_vm_event(THP_FAULT_FALLBACK_PUD); + count_vm_event(THP_FAULT_FALLBACK_CHARGE); + return VM_FAULT_FALLBACK; + } + cgroup_throttle_swaprate(page, gfp); + + pmd_pgtable = pmd_alloc_one_page_with_ptes(vma->vm_mm, haddr); + if (unlikely(!pmd_pgtable)) { + ret = VM_FAULT_OOM; + goto release; + } + + clear_huge_page(page, vmf->address, HPAGE_PUD_NR); + /* + * The memory barrier inside __SetPageUptodate makes sure that + * clear_huge_page writes become visible before the set_pmd_at() + * write. + */ + __SetPageUptodate(page); + + vmf->ptl = pud_lock(vma->vm_mm, vmf->pud); + if (unlikely(!pud_none(*vmf->pud))) { + goto unlock_release; + } else { + pud_t entry; + int i; + + ret = check_stable_address_space(vma->vm_mm); + if (ret) + goto unlock_release; + + /* Deliver the page fault to userland */ + if (userfaultfd_missing(vma)) { + vm_fault_t ret2; + + spin_unlock(vmf->ptl); + put_page(page); + pmd_free_page_with_ptes(vma->vm_mm, pmd_pgtable); + ret2 = handle_userfault(vmf, VM_UFFD_MISSING); + VM_BUG_ON(ret2 & VM_FAULT_FALLBACK); + return ret2; + } + + entry = mk_huge_pud(page, vma->vm_page_prot); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + page_add_new_anon_rmap(page, vma, haddr, true); + lru_cache_add_inactive_or_unevictable(page, vma); + pgtable_trans_huge_pud_deposit(vma->vm_mm, vmf->pud, + virt_to_page(pmd_pgtable)); + set_pud_at(vma->vm_mm, haddr, vmf->pud, entry); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PUD_NR); + mm_inc_nr_pmds(vma->vm_mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_inc_nr_ptes(vma->vm_mm); + spin_unlock(vmf->ptl); + count_vm_event(THP_FAULT_ALLOC_PUD); + } + + return 0; +unlock_release: + spin_unlock(vmf->ptl); +release: + if (pmd_pgtable) + pmd_free_page_with_ptes(vma->vm_mm, pmd_pgtable); + put_page(page); + return ret; + +} + +int do_huge_pud_anonymous_page(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + gfp_t gfp; + struct page *page; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + + if (haddr < vma->vm_start || haddr + HPAGE_PUD_SIZE > vma->vm_end) + return VM_FAULT_FALLBACK; + if (unlikely(anon_vma_prepare(vma))) + return VM_FAULT_OOM; + if (unlikely(khugepaged_enter(vma, vma->vm_flags))) + return VM_FAULT_OOM; + + gfp = alloc_hugepage_direct_gfpmask(vma); + page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PUD_ORDER); + if (unlikely(!page)) { + count_vm_event(THP_FAULT_FALLBACK_PUD); + return VM_FAULT_FALLBACK; + } + prep_transhuge_page(page); + return __do_huge_pud_anonymous_page(vmf, page, gfp); +} + #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, @@ -1159,7 +1265,12 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, { spinlock_t *dst_ptl, *src_ptl; pud_t pud; - int ret; + pmd_t *pmd_pgtable = NULL; + int ret = -ENOMEM; + + pmd_pgtable = pmd_alloc_one_page_with_ptes(vma->vm_mm, addr); + if (unlikely(!pmd_pgtable)) + goto out; dst_ptl = pud_lock(dst_mm, dst_pud); src_ptl = pud_lockptr(src_mm, src_pud); @@ -1167,16 +1278,28 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pud = *src_pud; + + /* only transparent huge pud page needs extra page table pages for + * possible huge page split */ + if (!pud_trans_huge(pud)) + pmd_free_page_with_ptes(dst_mm, pmd_pgtable); + if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) goto out_unlock; - /* - * When page table lock is held, the huge zero pud should not be - * under splitting since we don't split the page itself, only pud to - * a page table. - */ - if (is_huge_zero_pud(pud)) { - /* No huge zero pud yet */ + if (pud_trans_huge(pud)) { + struct page *src_page; + int i; + + src_page = pud_page(pud); + VM_BUG_ON_PAGE(!PageHead(src_page), src_page); + get_page(src_page); + page_dup_rmap(src_page, true); + add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PUD_NR); + mm_inc_nr_pmds(dst_mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_inc_nr_ptes(dst_mm); + pgtable_trans_huge_pud_deposit(dst_mm, dst_pud, virt_to_page(pmd_pgtable)); } pudp_set_wrprotect(src_mm, addr, src_pud); @@ -1187,6 +1310,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, out_unlock: spin_unlock(src_ptl); spin_unlock(dst_ptl); +out: return ret; } @@ -1887,11 +2011,27 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) } #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static inline void zap_pud_deposited_table(struct mm_struct *mm, pud_t *pud) +{ + pgtable_t pgtable; + int i; + + pgtable = pgtable_trans_huge_pud_withdraw(mm, pud); + pmd_free_page_with_ptes(mm, (pmd_t *)page_address(pgtable)); + + mm_dec_nr_pmds(mm); + for (i = 0; i < (1<<(HPAGE_PUD_ORDER - HPAGE_PMD_ORDER)); i++) + mm_dec_nr_ptes(mm); +} + int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr) { + pud_t orig_pud; spinlock_t *ptl; + tlb_change_page_size(tlb, HPAGE_PUD_SIZE); + ptl = __pud_trans_huge_lock(pud, vma); if (!ptl) return 0; @@ -1901,14 +2041,40 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, * pgtable_trans_huge_withdraw after finishing pudp related * operations. */ - pudp_huge_get_and_clear_full(tlb->mm, addr, pud, tlb->fullmm); + orig_pud = pudp_huge_get_and_clear_full(tlb->mm, addr, pud, + tlb->fullmm); tlb_remove_pud_tlb_entry(tlb, pud, addr); if (vma_is_special_huge(vma)) { spin_unlock(ptl); /* No zero page support yet */ + } else if (is_huge_zero_pud(orig_pud)) { + zap_pud_deposited_table(tlb->mm, pud); + spin_unlock(ptl); + tlb_remove_page_size(tlb, pud_page(orig_pud), HPAGE_PUD_SIZE); } else { - /* No support for anonymous PUD pages yet */ - BUG(); + struct page *page = NULL; + int flush_needed = 1; + + if (pud_present(orig_pud)) { + page = pud_page(orig_pud); + page_remove_rmap(page, true); + VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); + VM_BUG_ON_PAGE(!PageHead(page), page); + } else + WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!"); + + if (PageAnon(page)) { + zap_pud_deposited_table(tlb->mm, pud); + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PUD_NR); + } else { + if (arch_needs_pgtable_deposit()) + zap_pud_deposited_table(tlb->mm, pud); + add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PUD_NR); + } + + spin_unlock(ptl); + if (flush_needed) + tlb_remove_page_size(tlb, page, HPAGE_PUD_SIZE); } return 1; } diff --git a/mm/memory.c b/mm/memory.c index fb5463153351..6f86294438fd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4147,14 +4147,13 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf) defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) /* No support for anonymous transparent PUD pages yet */ if (vma_is_anonymous(vmf->vma)) - goto split; + return do_huge_pud_anonymous_page(vmf); if (vmf->vma->vm_ops->huge_fault) { vm_fault_t ret = vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD); if (!(ret & VM_FAULT_FALLBACK)) return ret; } -split: /* COW or write-notify not handled on PUD level: split pud.*/ __split_huge_pud(vmf->vma, vmf->pud, vmf->address); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -5098,3 +5097,29 @@ void ptlock_free(struct page *page) kmem_cache_free(page_ptl_cachep, page->ptl); } #endif + +static struct kmem_cache *pagechain_cachep; + +void __init pagechain_cache_init(void) +{ + pagechain_cachep = kmem_cache_create("pagechain", + sizeof(struct pagechain), 0, SLAB_PANIC, NULL); +} + +struct pagechain *pagechain_alloc(void) +{ + struct pagechain *chain; + + chain = kmem_cache_alloc(pagechain_cachep, GFP_ATOMIC); + + if (!chain) + return NULL; + + pagechain_init(chain); + return chain; +} + +void pagechain_free(struct pagechain *pchain) +{ + kmem_cache_free(pagechain_cachep, pchain); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0d9f9bd0e06c..763acbed66f1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5443,7 +5443,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) K(node_page_state(pgdat, NR_SHMEM_THPS) * HPAGE_PMD_NR), K(node_page_state(pgdat, NR_SHMEM_PMDMAPPED) * HPAGE_PMD_NR), - K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR), + K(node_page_state(pgdat, NR_ANON_THPS) * HPAGE_PMD_NR + + node_page_state(pgdat, NR_ANON_THPS_PUD) * HPAGE_PUD_NR), #endif K(node_page_state(pgdat, NR_WRITEBACK_TEMP)), node_page_state(pgdat, NR_KERNEL_STACK_KB), diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 9578db83e312..ef218b0f5d74 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -10,6 +10,7 @@ #include #include #include +#include #include /* @@ -170,6 +171,23 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp, list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru); pmd_huge_pte(mm, pmdp) = pgtable; } + +void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, + pgtable_t pgtable) +{ + struct pagechain *chain = NULL; + + assert_spin_locked(pud_lockptr(mm, pudp)); + /* FIFO */ + chain = list_first_entry_or_null(&pud_huge_pte(mm, pudp), + struct pagechain, list); + + if (!chain || !pagechain_space(chain)) { + chain = pagechain_alloc(); + list_add(&chain->list, &pud_huge_pte(mm, pudp)); + } + pagechain_deposit(chain, pgtable); +} #endif #ifndef __HAVE_ARCH_PGTABLE_WITHDRAW @@ -188,6 +206,33 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) list_del(&pgtable->lru); return pgtable; } + +pgtable_t pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp) +{ + pgtable_t pgtable; + struct pagechain *chain = NULL; + + assert_spin_locked(pud_lockptr(mm, pudp)); + + /* FIFO */ +retry: + chain = list_first_entry_or_null(&pud_huge_pte(mm, pudp), + struct pagechain, list); + + if (!chain) + return NULL; + + if (pagechain_empty(chain)) { + if (list_is_singular(&chain->list)) + return NULL; + list_del(&chain->list); + pagechain_free(chain); + goto retry; + } + + pgtable = pagechain_withdraw(chain); + return pgtable; +} #endif #ifndef __HAVE_ARCH_PMDP_INVALIDATE diff --git a/mm/rmap.c b/mm/rmap.c index 9425260774a1..10195a2421cf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -726,6 +726,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) pgd_t *pgd; p4d_t *p4d; pud_t *pud; + pud_t pude; pmd_t *pmd = NULL; pmd_t pmde; @@ -738,7 +739,10 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pud = pud_offset(p4d, address); - if (!pud_present(*pud)) + + pude = *pud; + barrier(); + if (!pud_present(pude) || pud_trans_huge(pude)) goto out; pmd = pmd_offset(pud, address); @@ -1033,7 +1037,7 @@ void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma) * __page_set_anon_rmap - set up new anonymous rmap * @page: Page or Hugepage to add to rmap * @vma: VM area to add page to. - * @address: User virtual address of the mapping + * @address: User virtual address of the mapping * @exclusive: the page is exclusively owned by the current process */ static void __page_set_anon_rmap(struct page *page, @@ -1137,8 +1141,12 @@ void do_page_add_anon_rmap(struct page *page, * pte lock(a spinlock) is held, which implies preemption * disabled. */ - if (compound) - __inc_lruvec_page_state(page, NR_ANON_THPS); + if (compound) { + if (nr == HPAGE_PMD_NR) + __inc_lruvec_page_state(page, NR_ANON_THPS); + else + __inc_lruvec_page_state(page, NR_ANON_THPS_PUD); + } __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr); } @@ -1180,7 +1188,10 @@ void page_add_new_anon_rmap(struct page *page, if (hpage_pincount_available(page)) atomic_set(compound_pincount_ptr(page), 0); - __inc_lruvec_page_state(page, NR_ANON_THPS); + if (nr == HPAGE_PMD_NR) + __inc_lruvec_page_state(page, NR_ANON_THPS); + else + __inc_lruvec_page_state(page, NR_ANON_THPS_PUD); } else { /* Anon THP always mapped first with PMD */ VM_BUG_ON_PAGE(PageTransCompound(page), page); @@ -1286,14 +1297,17 @@ static void page_remove_anon_compound_rmap(struct page *page) if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) return; - __dec_lruvec_page_state(page, NR_ANON_THPS); + if (thp_nr_pages(page) == HPAGE_PMD_NR) + __dec_lruvec_page_state(page, NR_ANON_THPS); + else + __dec_lruvec_page_state(page, NR_ANON_THPS_PUD); if (TestClearPageDoubleMap(page)) { /* * Subpages can be mapped with PTEs too. Check how many of * them are still mapped. */ - for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) { + for (i = 0, nr = 0; i < thp_nr_pages(page); i++) { if (atomic_add_negative(-1, &page[i]._mapcount)) nr++; } @@ -1306,7 +1320,7 @@ static void page_remove_anon_compound_rmap(struct page *page) if (nr && nr < HPAGE_PMD_NR) deferred_split_huge_page(page); } else { - nr = HPAGE_PMD_NR; + nr = thp_nr_pages(page); } if (unlikely(PageMlocked(page))) diff --git a/mm/vmstat.c b/mm/vmstat.c index 06fd13ebc2b8..3a01212b652c 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1209,6 +1209,7 @@ const char * const vmstat_text[] = { "nr_file_hugepages", "nr_file_pmdmapped", "nr_anon_transparent_hugepages", + "nr_anon_transparent_pud_hugepages", "nr_vmscan_write", "nr_vmscan_immediate_reclaim", "nr_dirtied", @@ -1325,6 +1326,9 @@ const char * const vmstat_text[] = { "thp_deferred_split_page", "thp_split_pmd", #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + "thp_fault_alloc_pud", + "thp_fault_fallback_pud", + "thp_fault_fallback_pud_charge", "thp_split_pud", #endif "thp_zero_page_alloc", From patchwork Wed Sep 2 18:06:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751477 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CF85B722 for ; Wed, 2 Sep 2020 18:06:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 922892083B for ; Wed, 2 Sep 2020 18:06:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="S1ka07XL"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="vOM8SLwb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 922892083B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1877C900015; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 110EA90000F; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD07E900016; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id 9EB68900012 for ; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5D2C3824805A for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-FDA: 77218901466.14.drain90_240267a270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 2FFBD18229835 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-Spam-Summary: 1,0,0,31522e895ed40085,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1534:1541:1711:1730:1747:1777:1792:1801:2198:2199:2393:2559:2562:3138:3139:3140:3141:3142:3352:3867:3871:4321:4605:5007:6120:6261:6630:6653:7576:7901:10004:11026:11473:11658:11914:12043:12296:12438:12555:12679:12895:13069:13311:13357:13894:14096:14181:14384:14581:14721:21080:21451:21627:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y8zjfqtmdmhtzuojmoijyy9xn9sypiwmtwxrddx1d7kmfydy1zasi7qk9kmfj.hud8gt1sgcu5nigfpq3crrixcddjceayuwp97sn84guzur4gbqyxi6goye9t96s.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: drain90_240267a270a2 X-Filterd-Recvd-Size: 4876 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:32 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 45A6A5C0150; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=BMorf3eXecNKl H7Og7d/a2LFgjFUH2s538ZL40vPoMM=; b=S1ka07XLcVy/MEwyYQQLdq5MMrvIN BVYTs5kA1XKydBgD2prQYhC1XuYcqKuRwdpwxsoVZrMrzc53vwZ/VbdMAmqud4e7 kcnpPpJ8dMaXpFQKb+A4iUvq1ZL+MmQJuKKY2IutfabDsDVU5YwAimNJrOx2BHip c9mzFK/v58rZKkga3x6/Rz+EW70taXa1N1LeiuDgAhHbGP6n0SqAmHiSKr70x/IG XBOZFT6UteDCDRGVB7glRFjp+fDSy4OqaV0UVhXtX0LtlwvPLxNHNClY54xMyhvK tHjq7cMNJaBA8YBeEr4uLXbvFcb+QjrmHRUSlx1DZu51dvO7ypr/HQHvA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=BMorf3eXecNKlH7Og7d/a2LFgjFUH2s538ZL40vPoMM=; b=vOM8SLwb 2XsKEmgyTZDsilXBWFEx1lOB7+OIn7JslP62C+9Kph3REvyh3/GFAEAyVXq2/QZE AQ3H/bf7YjdsHmRDiGCPUOHmp2OHVZL+y7PJspOe9tMOBIxhtZX1rWyvvWibKj/g T21vTpO43QPP76Cc2T53VZygIY5GgBCzQaYVfqZhCSNxP5QQuDsBC7e4Gq0+OHN2 lBYfdxh8cl+PoqgszrYsm866A6F14rodJKPT1elM9ArxIR90Z1s0HG9UabNDNdxQ xLycRN8KL5gBOD/KvLcNMmLFZoSAkpZ5AYHJSQAp6CgbdRpUXG+XSJDBAeyd3Vfv MVFjRKKObf3SCQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 6E1A43060272; Wed, 2 Sep 2020 14:06:31 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 03/16] mm: proc: add 1GB THP kpageflag. Date: Wed, 2 Sep 2020 14:06:15 -0400 Message-Id: <20200902180628.4052244-4-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 2FFBD18229835 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Bit 27 is used to identify 1GB THP. Signed-off-by: Zi Yan --- fs/proc/page.c | 2 ++ include/uapi/linux/kernel-page-flags.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/proc/page.c b/fs/proc/page.c index f3b39a7d2bf3..e4e2ad3612c9 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -161,6 +161,8 @@ u64 stable_page_flags(struct page *page) u |= BIT_ULL(KPF_ZERO_PAGE); u |= BIT_ULL(KPF_THP); } + if (compound_order(head) == HPAGE_PUD_ORDER) + u |= 1 << KPF_PUD_THP; } else if (is_zero_pfn(page_to_pfn(page))) u |= BIT_ULL(KPF_ZERO_PAGE); diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h index 6f2f2720f3ac..cdeb33ab655c 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -36,5 +36,7 @@ #define KPF_ZERO_PAGE 24 #define KPF_IDLE 25 #define KPF_PGTABLE 26 +#define KPF_PUD_THP 27 + #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ From patchwork Wed Sep 2 18:06:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751483 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 879FF722 for ; Wed, 2 Sep 2020 18:06:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 445E6206EB for ; Wed, 2 Sep 2020 18:06:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="JoTG8PpH"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="jFA2qhOu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 445E6206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD0F9900018; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 7A42C90001A; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D4B9900012; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id DC678900018 for ; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9D1FD2485 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-FDA: 77218901466.11.work61_0708107270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 69BF6180F8B80 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-Spam-Summary: 1,0,0,fa2fdf57119fcf62,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:2:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1605:1606:1730:1747:1777:1792:1801:2198:2199:2393:2559:2562:2693:3138:3139:3140:3141:3142:3867:3871:3872:4119:4250:4321:4605:5007:6117:6119:6120:6261:6630:6653:7576:7901:7903:8660:8957:10004:11026:11473:11657:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13148:13230:13894:14096:14110:21080:21451:21627:21939:21990:30003:30054:30064:30070:30079,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yr9gwmqrhdopqyksfp8x4p3cn1wocxhxnnbi1da9gxerd1dnsnghe7ph5kidr.d96yth43wmgg1xikb9bohanrpr8jas6fik6ndezhhge7tcb6cfk8tbs3fbg8hdn.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: work61_0708107270a2 X-Filterd-Recvd-Size: 8949 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:32 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 86FAE5C01A9; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=clFcHF8RSMWnt gVxpYKMJOSLAdjV1uGHbUSJNEeaPm0=; b=JoTG8PpHCt6+MSuCBazIZdfaAO886 st903/gbM04E2nYNv8727gPF9HMBh/oqKFgx1YmRs84dnUSNs/omDvdmPpX4gwG/ Ew4Uh4RGJ0fETazGEuI+cM5vbwgnGSYkQfET8StsjilICk/J9PlJqpnRBjLtuuwE BCjfUSEVtbYMUTkrSm7bH16yPjuMvxGTwmXYtm5EAtYmepVM+02YKgqljUtZYV/Z yCKbbHDoLDZtWTWFygbC3/hMzadv+COQfC05AqSECv1sTi8svldpXn5SCRxW9m8i CPUxH9XwY78LZ1QuX76yuy7LK31PFU1lsLazJv3gxOMPBvFGmEAJNSahg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=clFcHF8RSMWntgVxpYKMJOSLAdjV1uGHbUSJNEeaPm0=; b=jFA2qhOu FrMd+jy72pdOO8/f5IEk4cde1d2AYkQ/c0RGD5QBuAQAEfkgFXD9EYoQR5jIDK0l gC9JNi8W/jE3TB1bZrrB8IP2rYmkvfRmzTiOmdNIlOXYWOkWUBqokj1mR/9tmd/e jarC8xIX9RSNkaryI9WsfbMxB0CwmtQ9iI7CN6p1QaSFpyG85Rp3AQp5+cyqOwBL 1fMvaiT+jdoEfsX86CLN7DCiqpE3zVy3tfd2ziuXMHB0TQymw3s3TCd/c2cn77N7 FvXfmOZrGyL3qy+TTU5XfX4OicNdx+GB7kc3CvO+/qFowjEZFJN24LBRfZws0VrX 43aIJ6WAOuqCKA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id A7D5C3060067; Wed, 2 Sep 2020 14:06:31 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 04/16] mm: thp: 1GB THP copy on write implementation. Date: Wed, 2 Sep 2020 14:06:16 -0400 Message-Id: <20200902180628.4052244-5-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 69BF6180F8B80 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan COW on 1GB THPs will fall back to 2MB THPs if 1GB THP is not available. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgalloc.h | 9 ++++++ include/linux/huge_mm.h | 5 ++++ mm/huge_memory.c | 54 ++++++++++++++++++++++++++++++++++ mm/memory.c | 2 +- mm/swapfile.c | 4 ++- 5 files changed, 72 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index fae13467d3e1..31221269c387 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -98,6 +98,15 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, #define pmd_pgtable(pmd) pmd_page(pmd) +static inline void pud_populate_with_pgtable(struct mm_struct *mm, pud_t *pud, + struct page *pte) +{ + unsigned long pfn = page_to_pfn(pte); + + paravirt_alloc_pmd(mm, pfn); + set_pud(pud, __pud(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); +} + #if CONFIG_PGTABLE_LEVELS > 2 static inline pmd_t *pmd_alloc_one_page_with_ptes(struct mm_struct *mm, unsigned long addr) { diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7528652400e4..0c20a8ea6911 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -19,6 +19,7 @@ extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD extern void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); extern int do_huge_pud_anonymous_page(struct vm_fault *vmf); +extern vm_fault_t do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { @@ -27,6 +28,10 @@ extern int do_huge_pud_anonymous_page(struct vm_fault *vmf) { return VM_FAULT_FALLBACK; } +extern vm_fault_t do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) +{ + return VM_FAULT_FALLBACK; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ec3847392208..6da9b02501b7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1334,6 +1334,60 @@ void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) unlock: spin_unlock(vmf->ptl); } + +vm_fault_t do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) +{ + struct vm_area_struct *vma = vmf->vma; + struct page *page = NULL; + unsigned long haddr = vmf->address & HPAGE_PUD_MASK; + + vmf->ptl = pud_lockptr(vma->vm_mm, vmf->pud); + VM_BUG_ON_VMA(!vma->anon_vma, vma); + + if (is_huge_zero_pud(orig_pud)) + goto fallback; + + spin_lock(vmf->ptl); + + if (unlikely(!pud_same(*vmf->pud, orig_pud))) { + spin_unlock(vmf->ptl); + return 0; + } + + page = pud_page(orig_pud); + VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page); + + /* Lock page for reuse_swap_page() */ + if (!trylock_page(page)) { + get_page(page); + spin_unlock(vmf->ptl); + lock_page(page); + spin_lock(vmf->ptl); + if (unlikely(!pud_same(*vmf->pud, orig_pud))) { + unlock_page(page); + put_page(page); + return 0; + } + put_page(page); + } + if (reuse_swap_page(page, NULL)) { + pud_t entry; + + entry = pud_mkyoung(orig_pud); + entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); + if (pudp_set_access_flags(vma, haddr, vmf->pud, entry, 1)) + update_mmu_cache_pud(vma, vmf->address, vmf->pud); + unlock_page(page); + spin_unlock(vmf->ptl); + return VM_FAULT_WRITE; + } + unlock_page(page); + spin_unlock(vmf->ptl); +fallback: + __split_huge_pud(vma, vmf->pud, vmf->address); + return VM_FAULT_FALLBACK; +} + #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd) diff --git a/mm/memory.c b/mm/memory.c index 6f86294438fd..b88587256bc1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4165,7 +4165,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud) #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* No support for anonymous transparent PUD pages yet */ if (vma_is_anonymous(vmf->vma)) - return VM_FAULT_FALLBACK; + return do_huge_pud_wp_page(vmf, orig_pud); if (vmf->vma->vm_ops->huge_fault) return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 20012c0c0252..e3f771c2ad83 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1635,7 +1635,9 @@ static int page_trans_huge_map_swapcount(struct page *page, int *total_mapcount, /* hugetlbfs shouldn't call it */ VM_BUG_ON_PAGE(PageHuge(page), page); - if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!PageTransCompound(page))) { + if (!IS_ENABLED(CONFIG_THP_SWAP) || + unlikely(compound_order(compound_head(page)) == HPAGE_PUD_ORDER) || + likely(!PageTransCompound(page))) { mapcount = page_trans_huge_mapcount(page, total_mapcount); if (PageSwapCache(page)) swapcount = page_swapcount(page); From patchwork Wed Sep 2 18:06:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751485 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D02C14E3 for ; Wed, 2 Sep 2020 18:06:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 02AD5206EB for ; Wed, 2 Sep 2020 18:06:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="cP6uhp21"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="VOjyCc0p" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 02AD5206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3AE1690001E; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3340290001A; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A3E8090001B; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 72261900019 for ; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1ED3C3632 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-FDA: 77218901508.16.store62_3e0ea17270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 01EBA100E6903 for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) X-Spam-Summary: 1,0,0,df951e3cf4b9341e,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:1:2:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1801:2198:2199:2393:2559:2562:2895:3138:3139:3140:3141:3142:3865:3868:3870:4049:4250:4321:4605:5007:6119:6120:6261:6630:6653:7576:7901:7903:10004:11026:11473:11657:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13894:13972:14096:21080:21220:21433:21451:21627:21990:30003:30054:30064:30070,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yfak7h4o9bc5a16qi3tjywybb5soc5ys9oyfzb91uqaqrhh855jeiobdsddf7.rd7dkhx56d5finu9g5xgeuh9iesmisi1ce6rhziy8ttbfegriag7xwc5e3jsbsi.6-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: store62_3e0ea17270a2 X-Filterd-Recvd-Size: 10309 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id C87BE5C01D8; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=ws1j/mKPYHXFd qyNwo3xjFGDqqtwldYO4m1W2BzXBgU=; b=cP6uhp21LZjEK3HPsSfdk3nRply0h woG1gTdXGPQbDfCEYGO6Hhb10dFnSp8VVRKe9+1IVV5V7bhLhF1PPK792xs36ZUt L8E3joubwWsCF++JHZSGekQRv+9IrgJ/ioygr3uIGpJPuseNqP9ZJa3MnU8k9bVX CQgI35Z03RXq4oLm+FjkyQc+5B84fccBeknsbX0hNy24xWg+5a1mgDsz+Bt1fB2R 0VtdIPdZ0eJJNMhK2sO8B/SGs5oFWo4c9aP/+LQ9MuSLOs3Q225GIlqBOtAFj97P 7Mp+sLPOuu8B9w/8y0k61iJlueJwx/1tPA1IsaXMq0KicxURdmxidpT0Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=ws1j/mKPYHXFdqyNwo3xjFGDqqtwldYO4m1W2BzXBgU=; b=VOjyCc0p eT9nNtSZ0qxyKFx3RSDvo+copPgeDQitDQBqeXndq3dt9cigisVB2HU2cHG/QyvJ /d0/wf8Oed8TnJJtHMnix3eQqKi7IbkXMKj9ShSTbuFAV0anLPIGU0E5PJQC3ebV W1BvW0y7DFLYdArzDLjDBwh307KyNw1G0rPUyaRu+N9C6VadKVgzuCZh5c1Qc2p/ gQwySLUjhFrEZbFjwm+d1+yOuaeZQwCvHkmbEHQmGY7uGuvT0/vp0Dg4fvV/eGkg cWA+j8Lpi4iotTKfSCsejg+4bCjIX28hHYTIBsNvkEX4ALPQb6JuzdUuzUxbZQaC IfpXdivBEOBLmA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id E1AA130600A6; Wed, 2 Sep 2020 14:06:31 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 05/16] mm: thp: handling 1GB THP reference bit. Date: Wed, 2 Sep 2020 14:06:17 -0400 Message-Id: <20200902180628.4052244-6-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 01EBA100E6903 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Add PUD-level TLB flush ops and teach page_vma_mapped_talk about 1GB THPs. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgtable.h | 3 +++ arch/x86/mm/pgtable.c | 13 +++++++++++++ include/linux/mmu_notifier.h | 13 +++++++++++++ include/linux/pgtable.h | 14 ++++++++++++++ include/linux/rmap.h | 1 + mm/page_vma_mapped.c | 33 +++++++++++++++++++++++++++++---- mm/rmap.c | 12 +++++++++--- 7 files changed, 82 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 26255cac78c0..15334f5ba172 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1127,6 +1127,9 @@ extern int pudp_test_and_clear_young(struct vm_area_struct *vma, extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); +#define __HAVE_ARCH_PUDP_CLEAR_YOUNG_FLUSH +extern int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 7be73aee6183..e4a2dffcc418 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -633,6 +633,19 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, return young; } +int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp) +{ + int young; + + VM_BUG_ON(address & ~HPAGE_PUD_MASK); + + young = pudp_test_and_clear_young(vma, address, pudp); + if (young) + flush_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + + return young; +} #endif /** diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index b8200782dede..4ffa179e654f 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -557,6 +557,19 @@ static inline void mmu_notifier_range_init_migrate( __young; \ }) +#define pudp_clear_flush_young_notify(__vma, __address, __pudp) \ +({ \ + int __young; \ + struct vm_area_struct *___vma = __vma; \ + unsigned long ___address = __address; \ + __young = pudp_clear_flush_young(___vma, ___address, __pudp); \ + __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \ + ___address, \ + ___address + \ + PUD_SIZE); \ + __young; \ +}) + #define ptep_clear_young_notify(__vma, __address, __ptep) \ ({ \ int __young; \ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 255275d5b73e..8ef358c386af 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -240,6 +240,20 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef __HAVE_ARCH_PUDP_CLEAR_YOUNG_FLUSH +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +extern int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp); +#else +int pudp_clear_flush_young(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp) +{ + BUILD_BUG(); + return 0; +} +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 3a6adfa70fb0..0af61dd193d2 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -206,6 +206,7 @@ struct page_vma_mapped_walk { struct page *page; struct vm_area_struct *vma; unsigned long address; + pud_t *pud; pmd_t *pmd; pte_t *pte; spinlock_t *ptl; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 5e77b269c330..d9d39ec06e21 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -145,9 +145,12 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) struct page *page = pvmw->page; pgd_t *pgd; p4d_t *p4d; - pud_t *pud; + pud_t pude; pmd_t pmde; + if (!pvmw->pte && !pvmw->pmd && pvmw->pud) + return not_found(pvmw); + /* The only possible pmd mapping has been handled on last iteration */ if (pvmw->pmd && !pvmw->pte) return not_found(pvmw); @@ -174,10 +177,31 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) p4d = p4d_offset(pgd, pvmw->address); if (!p4d_present(*p4d)) return false; - pud = pud_offset(p4d, pvmw->address); - if (!pud_present(*pud)) + pvmw->pud = pud_offset(p4d, pvmw->address); + + /* + * Make sure the pud value isn't cached in a register by the + * compiler and used as a stale value after we've observed a + * subsequent update. + */ + pude = READ_ONCE(*pvmw->pud); + if (pud_trans_huge(pude)) { + pvmw->ptl = pud_lock(mm, pvmw->pud); + if (likely(pud_trans_huge(*pvmw->pud))) { + if (pvmw->flags & PVMW_MIGRATION) + return not_found(pvmw); + if (pud_page(*pvmw->pud) != page) + return not_found(pvmw); + return true; + } else { + /* THP pud was split under us: handle on pmd level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + } + } else if (!pud_present(pude)) return false; - pvmw->pmd = pmd_offset(pud, pvmw->address); + + pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); /* * Make sure the pmd value isn't cached in a register by the * compiler and used as a stale value after we've observed a @@ -213,6 +237,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } else if (!pmd_present(pmde)) { return false; } + if (!map_pte(pvmw)) goto next_pte; while (1) { diff --git a/mm/rmap.c b/mm/rmap.c index 10195a2421cf..77cec0658b76 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -803,9 +803,15 @@ static bool page_referenced_one(struct page *page, struct vm_area_struct *vma, referenced++; } } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - if (pmdp_clear_flush_young_notify(vma, address, - pvmw.pmd)) - referenced++; + if (pvmw.pmd) { + if (pmdp_clear_flush_young_notify(vma, address, + pvmw.pmd)) + referenced++; + } else if (pvmw.pud) { + if (pudp_clear_flush_young_notify(vma, address, + pvmw.pud)) + referenced++; + } } else { /* unexpected pmd-mapped page? */ WARN_ON_ONCE(1); From patchwork Wed Sep 2 18:06:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751495 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C07DE109B for ; Wed, 2 Sep 2020 18:07:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 495C4206EB for ; Wed, 2 Sep 2020 18:07:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="P8P7CfEl"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="MkeIbM+2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 495C4206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 81F80900021; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6C27F900022; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D96EC900022; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 4B65A900019 for ; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0B645181AEF10 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-FDA: 77218901550.21.face57_1c169f3270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id D2843180442C0 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-Spam-Summary: 1,0,0,c9c6d992501c5911,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:355:379:541:960:966:968:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:1801:1981:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:2731:2890:2895:2898:2901:2924:2926:3138:3139:3140:3141:3142:3165:3369:3865:3866:3867:3868:3870:3871:3872:3874:4042:4250:4303:4321:4385:4605:5007:6120:6261:6653:7576:7875:7901:7903:7904:8603:8660:8957:9010:9592:10004:10226:11026:11232:11657:11914:12043:12114:12291:12295:12296:12438:12555:12679:12683:12895:12986:13141:13148:13230:13894:21063:21080:21433:21450:21451:21611:21627:21740:21939:21966:21987:21990:30001:30003:30034:30054:30064:30070:30079:30090,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04ygaczerozq47u8ee3jeuauqafz5ocsd9j897fcxooiw1cbg19n8guxjssko7w.jnc9hmhjdefmgipioch8rcjdyfqooi1h4njuakf3gy9bynecoz14s5kqzk6ydcz.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,Do mainCach X-HE-Tag: face57_1c169f3270a2 X-Filterd-Recvd-Size: 65130 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 8F0C55C0124; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=VqjbKpwSxWZRQ y6eLruyGy2XZPSqw79Q7BA1v8F56IU=; b=P8P7CfEloHLWcv67cAe3Of3TnCn7S f/Nhqk+npLVLSQbxlR7oIAg1/kCbMyi1ni0oB78W4AT1DndApXMnJI8BFcl3cnpJ Kpq5CXcG3LtjKGVO+epc2OviDrLJv7AlFuBP2n8SA9uthGznOqUEV84lkPKtHzru WSJTq3uH9BYIOpwfSQgck5X4pk3z5Au09fXIrOf77JGy/typ6GOSHz1uUKEnpnva nI7eJ1R348lpvcZwGq63jqmZrxA88RTlKQEBLsgPMOBy8AqPVyLj5xzvE/nOYcyZ dEZy0MTjAtqGWZT4eZgWAr+bCe4tymZhLbcMDTSt1v9v4A/MWEqZxzu+A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=VqjbKpwSxWZRQy6eLruyGy2XZPSqw79Q7BA1v8F56IU=; b=MkeIbM+2 TUIhWNqYZuNW4jx96eAMxGrJ4KNy4SF/6kj/BMHdL9Lzbj9JUrO69zSI5ogt9p/u J2zB79jD0VuGoqSqaVb9B4KmcLPspgfQsm/9LCysPlH3ejrarX1o6ULvbaUTMll9 u7adAKKZmwPmM0gpgnK0Opaq0eYJ1vYHCzcyTcFlBvhYUyAkMR2tNL8U8trYv/8v QdTvAxF/6ZF+zOsA2IoRkLAWT4LswWGqsgMIePu61cvx0KWh12MMKh+o+LwHrSnf QSRjM4jOoF6P+F8imd59WMScnvvfaUbmhvlp08Shb6Q57OwoyMxgS7mK2E1XKS19 qAlTcFBs+pV55A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 2943130605C8; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 06/16] mm: thp: add 1GB THP split_huge_pud_page() function. Date: Wed, 2 Sep 2020 14:06:18 -0400 Message-Id: <20200902180628.4052244-7-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: D2843180442C0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan It mimics PMD-level THP split. In addition, to support PMD-mapped PUD THP, PMDPageInPUD() is used. For the mapcount of PMD-mapped PUD THP, sub_compound_mapcount() is used, which uses (head_page+3).compound_mapcount, since each base page's mapcount is used for PTE mapping. PagePUDDoubleMap() is used for both PUD-mapped and PMD-mapped PUD THPs. page_xxx_rmap() functions now have an extra page order parameter to distinguish different THP sizes. Signed-off-by: Zi Yan --- arch/x86/include/asm/pgtable.h | 21 ++ include/linux/huge_mm.h | 31 +- include/linux/memcontrol.h | 5 + include/linux/mm.h | 25 +- include/linux/page-flags.h | 47 +++ include/linux/pgtable.h | 17 ++ include/linux/rmap.h | 9 +- include/linux/swap.h | 2 + include/linux/vm_event_item.h | 4 + kernel/events/uprobes.c | 4 +- mm/huge_memory.c | 536 +++++++++++++++++++++++++++++++-- mm/hugetlb.c | 4 +- mm/khugepaged.c | 6 +- mm/ksm.c | 4 +- mm/memcontrol.c | 13 + mm/memory.c | 18 +- mm/migrate.c | 10 +- mm/page_alloc.c | 20 +- mm/pgtable-generic.c | 11 + mm/rmap.c | 106 +++++-- mm/swap.c | 31 ++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 2 +- mm/util.c | 16 +- mm/vmstat.c | 4 + 25 files changed, 852 insertions(+), 98 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 15334f5ba172..fe4600256bc7 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -630,6 +630,12 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) __pgprot(pmd_flags(pmd) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); } +static inline pud_t pud_mknotpresent(pud_t pud) +{ + return pfn_pud(pud_pfn(pud), + __pgprot(pud_flags(pud) & ~(_PAGE_PRESENT|_PAGE_PROTNONE))); +} + static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) @@ -1246,6 +1252,21 @@ static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) } #endif /* CONFIG_PAGE_TABLE_ISOLATION */ +#ifndef pudp_establish +#define pudp_establish pudp_establish +static inline pud_t pudp_establish(struct vm_area_struct *vma, + unsigned long address, pud_t *pudp, pud_t pud) +{ + if (IS_ENABLED(CONFIG_SMP)) { + return xchg(pudp, pud); + } else { + pud_t old = *pudp; + *pudp = pud; + return old; + } +} +#endif + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0c20a8ea6911..589e5af5a1c2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -227,17 +227,27 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct page *page); +bool can_split_huge_pud_page(struct page *page, int *pextra_pins); +int split_huge_pud_page_to_list(struct page *page, struct list_head *list); +static inline int split_huge_pud_page(struct page *page) +{ + return split_huge_pud_page_to_list(page, NULL); +} void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address); + unsigned long address, bool freeze, struct page *page); #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud = (__pud); \ if (pud_trans_huge(*____pud) \ || pud_devmap(*____pud)) \ - __split_huge_pud(__vma, __pud, __address); \ + __split_huge_pud(__vma, __pud, __address, \ + false, NULL); \ } while (0) +void split_huge_pud_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct page *page); + extern int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); extern void vma_adjust_trans_huge(struct vm_area_struct *vma, @@ -427,8 +437,25 @@ static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, static inline void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, bool freeze, struct page *page) {} +static inline bool +can_split_huge_pud_page(struct page *page, int *pextra_pins) +{ + BUILD_BUG(); + return false; +} +static inline int +split_huge_pud_page_to_list(struct page *page, struct list_head *list) +{ + return 0; +} +static inline int split_huge_pud_page(struct page *page) +{ + return 0; +} #define split_huge_pud(__vma, __pmd, __address) \ do { } while (0) +static inline void split_huge_pud_address(struct vm_area_struct *vma, + unsigned long address, bool freeze, struct page *page) {} static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d0b036123c6a..3ccff298d4b2 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -929,6 +929,7 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, #ifdef CONFIG_TRANSPARENT_HUGEPAGE void mem_cgroup_split_huge_fixup(struct page *head); +void mem_cgroup_split_huge_pud_fixup(struct page *head); #endif #else /* CONFIG_MEMCG */ @@ -1261,6 +1262,10 @@ static inline void mem_cgroup_split_huge_fixup(struct page *head) { } +static inline void mem_cgroup_split_huge_pud_fixup(struct page *head) +{ +} + static inline void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) diff --git a/include/linux/mm.h b/include/linux/mm.h index cb1ccf804404..8a85d96ab7e5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -797,6 +797,24 @@ static inline int compound_mapcount(struct page *page) return head_compound_mapcount(page); } +static inline unsigned int compound_order(struct page *page); +static inline atomic_t *sub_compound_mapcount_ptr(struct page *page, int sub_level) +{ + struct page *head = compound_head(page); + + VM_BUG_ON_PAGE(!PageCompound(page), page); + VM_BUG_ON_PAGE(compound_order(head) != HPAGE_PUD_ORDER, page); + VM_BUG_ON_PAGE((page - head) % HPAGE_PMD_NR, page); + VM_BUG_ON_PAGE(sub_level != 1, page); + return &page[2 + sub_level].compound_mapcount; +} + +/* Only works for PUD pages */ +static inline int sub_compound_mapcount(struct page *page) +{ + return atomic_read(sub_compound_mapcount_ptr(page, 1)) + 1; +} + /* * The atomic page->_mapcount, starts from -1: so that transitions * both from it and to it can be tracked, using atomic_inc_and_test @@ -889,13 +907,6 @@ static inline void destroy_compound_page(struct page *page) compound_page_dtors[page[1].compound_dtor](page); } -static inline unsigned int compound_order(struct page *page) -{ - if (!PageHead(page)) - return 0; - return page[1].compound_order; -} - static inline bool hpage_pincount_available(struct page *page) { /* diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index fbbb841a9346..cdca0165d2db 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -235,6 +235,9 @@ static inline void page_init_poison(struct page *page, size_t size) * * PF_SECOND: * the page flag is stored in the first tail page. + * + * PF_THIRD: + * the page flag is stored in the second tail page. */ #define PF_POISONED_CHECK(page) ({ \ VM_BUG_ON_PGFLAGS(PagePoisoned(page), page); \ @@ -253,6 +256,9 @@ static inline void page_init_poison(struct page *page, size_t size) #define PF_SECOND(page, enforce) ({ \ VM_BUG_ON_PGFLAGS(!PageHead(page), page); \ PF_POISONED_CHECK(&page[1]); }) +#define PF_THIRD(page, enforce) ({ \ + VM_BUG_ON_PGFLAGS(!PageHead(page), page); \ + PF_POISONED_CHECK(&page[2]); }) /* * Macros to create function definitions for page flags @@ -674,6 +680,29 @@ static inline int PageTransTail(struct page *page) return PageTail(page); } +#define HPAGE_PMD_SHIFT PMD_SHIFT +#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) +#define HPAGE_PMD_NR (1<_mapcount in all sub-PMD pages is + * offset up by one. This reference will go away with last sub_compound_mapcount. + * + * See also __split_huge_pud_locked() and page_remove_anon_compound_rmap(). + */ +PAGEFLAG(PUDDoubleMap, double_map, PF_THIRD) + TESTSCFLAG(PUDDoubleMap, double_map, PF_THIRD) #else TESTPAGEFLAG_FALSE(TransHuge) TESTPAGEFLAG_FALSE(TransCompound) TESTPAGEFLAG_FALSE(TransCompoundMap) TESTPAGEFLAG_FALSE(TransTail) +TESTPAGEFLAG_FALSE(PMDPageInPUD) PAGEFLAG_FALSE(DoubleMap) TESTSCFLAG_FALSE(DoubleMap) +PAGEFLAG_FALSE(PUDDoubleMap) + TESTSETFLAG_FALSE(PUDDoubleMap) #endif /* diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8ef358c386af..7acf218a8879 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -505,6 +505,11 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PUDP_INVALIDATE +extern pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { @@ -1158,6 +1163,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp) } #endif +#ifndef pud_read_atomic +static inline pud_t pud_read_atomic(pud_t *pudp) +{ + /* + * Depend on compiler for an atomic pmd read. NOTE: this is + * only going to work, if the pmdval_t isn't larger than + * an unsigned long. + */ + return *pudp; +} +#endif + #ifndef arch_needs_pgtable_deposit #define arch_needs_pgtable_deposit() (false) #endif diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0af61dd193d2..c43da5919354 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -99,6 +99,7 @@ enum ttu_flags { TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ TTU_SPLIT_FREEZE = 0x100, /* freeze pte under splitting thp */ + TTU_SPLIT_HUGE_PUD = 0x200, /* split huge PUD if any */ }; #ifdef CONFIG_MMU @@ -171,13 +172,13 @@ struct anon_vma *page_get_anon_vma(struct page *page); */ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); + unsigned long, bool, int); void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, int); + unsigned long, int, int); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); + unsigned long, bool, int); void page_add_file_rmap(struct page *, bool); -void page_remove_rmap(struct page *, bool); +void page_remove_rmap(struct page *, bool, int); void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long); diff --git a/include/linux/swap.h b/include/linux/swap.h index 5c48713221fe..871c62211ecd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,8 @@ extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); +extern void lru_add_pud_page_tail(struct page *page, struct page *page_tail, + struct lruvec *lruvec, struct list_head *head); extern void mark_page_accessed(struct page *); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index a3f1093a55bb..b336de64586c 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -96,6 +96,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_FAULT_FALLBACK_PUD, THP_FAULT_FALLBACK_PUD_CHARGE, THP_SPLIT_PUD, + THP_SPLIT_PUD_PAGE, + THP_SPLIT_PUD_PAGE_FAILED, + THP_ZERO_PUD_PAGE_ALLOC, + THP_ZERO_PUD_PAGE_ALLOC_FAILED, #endif THP_ZERO_PAGE_ALLOC, THP_ZERO_PAGE_ALLOC_FAILED, diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 0e18aaf23a7b..834b350a49f6 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -183,7 +183,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, if (new_page) { get_page(new_page); - page_add_new_anon_rmap(new_page, vma, addr, false); + page_add_new_anon_rmap(new_page, vma, addr, false, 0); lru_cache_add_inactive_or_unevictable(new_page, vma); } else /* no new page, just dec_mm_counter for old_page */ @@ -200,7 +200,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, set_pte_at_notify(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); - page_remove_rmap(old_page, false); + page_remove_rmap(old_page, false, 0); if (!page_mapped(old_page)) try_to_free_swap(old_page); page_vma_mapped_walk_done(&pvmw); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6da9b02501b7..398f1b52f789 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -618,7 +618,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, entry = mk_huge_pmd(page, vma->vm_page_prot); entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); - page_add_new_anon_rmap(page, vma, haddr, true); + page_add_new_anon_rmap(page, vma, haddr, true, HPAGE_PMD_ORDER); lru_cache_add_inactive_or_unevictable(page, vma); pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry); @@ -991,7 +991,7 @@ static int __do_huge_pud_anonymous_page(struct vm_fault *vmf, struct page *page, entry = mk_huge_pud(page, vma->vm_page_prot); entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma); - page_add_new_anon_rmap(page, vma, haddr, true); + page_add_new_anon_rmap(page, vma, haddr, true, HPAGE_PUD_ORDER); lru_cache_add_inactive_or_unevictable(page, vma); pgtable_trans_huge_pud_deposit(vma->vm_mm, vmf->pud, virt_to_page(pmd_pgtable)); @@ -1384,7 +1384,7 @@ vm_fault_t do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) unlock_page(page); spin_unlock(vmf->ptl); fallback: - __split_huge_pud(vma, vmf->pud, vmf->address); + __split_huge_pud(vma, vmf->pud, vmf->address, false, NULL); return VM_FAULT_FALLBACK; } @@ -1825,9 +1825,9 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (pmd_present(orig_pmd)) { page = pmd_page(orig_pmd); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); - VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(!PageHead(page) && !PMDPageInPUD(page), page); } else if (thp_migration_supported()) { swp_entry_t entry; @@ -2111,7 +2111,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, if (pud_present(orig_pud)) { page = pud_page(orig_pud); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PUD_ORDER); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); } else @@ -2134,8 +2134,16 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, } static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, - unsigned long haddr) + unsigned long haddr, bool freeze) { + struct mm_struct *mm = vma->vm_mm; + struct page *page; + pgtable_t pgtable; + pud_t _pud, old_pud; + bool young, write, dirty, soft_dirty; + unsigned long addr; + int i; + VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); @@ -2143,23 +2151,141 @@ static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, count_vm_event(THP_SPLIT_PUD); - pudp_huge_clear_flush_notify(vma, haddr, pud); + if (!vma_is_anonymous(vma)) { + _pud = pudp_huge_clear_flush_notify(vma, haddr, pud); + /* + * We are going to unmap this huge page. So + * just go ahead and zap it + */ + if (arch_needs_pgtable_deposit()) + zap_pud_deposited_table(mm, pud); + if (vma_is_dax(vma)) + return; + page = pud_page(_pud); + if (!PageReferenced(page) && pud_young(_pud)) + SetPageReferenced(page); + page_remove_rmap(page, true, HPAGE_PUD_ORDER); + put_page(page); + add_mm_counter(mm, MM_FILEPAGES, -HPAGE_PUD_NR); + return; + } + + /* See the comment above pmdp_invalidate() in __split_huge_pmd_locked() */ + old_pud = pudp_invalidate(vma, haddr, pud); + + page = pud_page(old_pud); + VM_BUG_ON_PAGE(!page_count(page), page); + page_ref_add(page, (1<<(HPAGE_PUD_ORDER-HPAGE_PMD_ORDER)) - 1); + if (pud_dirty(old_pud)) + SetPageDirty(page); + write = pud_write(old_pud); + young = pud_young(old_pud); + dirty = pud_dirty(old_pud); + soft_dirty = pud_soft_dirty(old_pud); + + pgtable = pgtable_trans_huge_pud_withdraw(mm, pud); + pud_populate_with_pgtable(mm, &_pud, pgtable); + + for (i = 0, addr = haddr; i < HPAGE_PUD_NR; + i += HPAGE_PMD_NR, addr += PMD_SIZE) { + pmd_t entry, *pmd; + /* + * Note that NUMA hinting access restrictions are not + * transferred to avoid any possibility of altering + * permissions across VMAs. + */ + if (freeze) { + swp_entry_t swp_entry; + + swp_entry = make_migration_entry(page + i, write); + entry = swp_entry_to_pmd(swp_entry); + if (soft_dirty) + entry = pmd_swp_mksoft_dirty(entry); + } else { + entry = mk_huge_pmd(page + i, READ_ONCE(vma->vm_page_prot)); + entry = maybe_pmd_mkwrite(entry, vma); + if (!write) + entry = pmd_wrprotect(entry); + if (!young) + entry = pmd_mkold(entry); + if (soft_dirty) + entry = pmd_mksoft_dirty(entry); + } + pmd = pmd_offset(&_pud, addr); + VM_BUG_ON(!pmd_none(*pmd)); + set_pmd_at(mm, addr, pmd, entry); + /* distinguish between pud compound_mapcount and pmd compound_mapcount */ + if (atomic_inc_and_test(sub_compound_mapcount_ptr(&page[i], 1))) { + /* first pmd-mapped pud page */ + lock_page_memcg(page); + __inc_lruvec_page_state(page, NR_ANON_THPS); + unlock_page_memcg(page); + } + } + + /* + * Set PG_double_map before dropping compound_mapcount to avoid + * false-negative page_mapped(). + */ + if (compound_mapcount(page) > 1 && !TestSetPagePUDDoubleMap(page)) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + /* distinguish between pud compound_mapcount and pmd compound_mapcount */ + atomic_inc(sub_compound_mapcount_ptr(&page[i], 1)); + } + + lock_page_memcg(page); + if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { + /* Last compound_mapcount is gone. */ + __dec_lruvec_page_state(page, NR_ANON_THPS_PUD); + if (TestClearPagePUDDoubleMap(page)) { + /* No need in mapcount reference anymore */ + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + /* distinguish between pud compound_mapcount and pmd compound_mapcount */ + atomic_dec(sub_compound_mapcount_ptr(&page[i], 1)); + } + } + unlock_page_memcg(page); + + smp_wmb(); /* make pte visible before pmd */ + pud_populate_with_pgtable(mm, pud, pgtable); + + if (freeze) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + page_remove_rmap(page + i, true, HPAGE_PMD_ORDER); + put_page(page + i); + } + } } void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, - unsigned long address) + unsigned long address, bool freeze, struct page *page) { spinlock_t *ptl; + struct mm_struct *mm = vma->vm_mm; + unsigned long haddr = address & HPAGE_PUD_MASK; struct mmu_notifier_range range; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, address & HPAGE_PUD_MASK, (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); mmu_notifier_invalidate_range_start(&range); - ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + ptl = pud_lock(mm, pud); + + /* + * If caller asks to setup a migration entries, we need a page to check + * pmd against. Otherwise we can end up replacing wrong page. + */ + VM_BUG_ON(freeze && !page); + if (page && page != pud_page(*pud)) goto out; - __split_huge_pud_locked(vma, pud, range.start); + + if (pud_trans_huge(*pud)) { + page = pud_page(*pud); + if (PageMlocked(page)) + clear_page_mlock(page); + } else if (unlikely(!pud_devmap(*pud))) + goto out; + __split_huge_pud_locked(vma, pud, haddr, freeze); out: spin_unlock(ptl); @@ -2169,6 +2295,281 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, */ mmu_notifier_invalidate_range_only_end(&range); } + +void split_huge_pud_address(struct vm_area_struct *vma, unsigned long address, + bool freeze, struct page *page) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + + pgd = pgd_offset(vma->vm_mm, address); + if (!pgd_present(*pgd)) + return; + + p4d = p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + return; + + pud = pud_offset(p4d, address); + + __split_huge_pud(vma, pud, address, freeze, page); +} + +static void unmap_pud_page(struct page *page) +{ + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | + TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PUD; + bool unmap_success; + + VM_BUG_ON_PAGE(!PageHead(page), page); + + if (PageAnon(page)) + ttu_flags |= TTU_SPLIT_FREEZE; + + unmap_success = try_to_unmap(page, ttu_flags); + VM_BUG_ON_PAGE(!unmap_success, page); +} + +static void remap_pud_page(struct page *page) +{ + int i; + + VM_BUG_ON(!PageTransHuge(page)); + if (compound_order(page) == HPAGE_PUD_ORDER) { + remove_migration_ptes(page, page, true); + } else if (compound_order(page) == HPAGE_PMD_ORDER) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + remove_migration_ptes(page + i, page + i, true); + } else + VM_BUG_ON_PAGE(1, page); +} + +static void __split_huge_pud_page_tail(struct page *head, int tail, + struct lruvec *lruvec, struct list_head *list) +{ + struct page *page_tail = head + tail; + + VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail); + + /* + * Clone page flags before unfreezing refcount. + * + * After successful get_page_unless_zero() might follow flags change, + * for example lock_page() which set PG_waiters. + */ + + page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; + page_tail->flags |= (head->flags & + ((1L << PG_referenced) | + (1L << PG_swapbacked) | + (1L << PG_swapcache) | + (1L << PG_mlocked) | + (1L << PG_uptodate) | + (1L << PG_active) | + (1L << PG_locked) | + (1L << PG_unevictable) | + (1L << PG_dirty) | + /* preserve THP */ + (1L << PG_head))); + + /* ->mapping in first tail page is compound_mapcount */ + VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, + page_tail); + page_tail->mapping = head->mapping; + page_tail->index = head->index + tail; + + /* Page flags also must be visible before we make the page PMD-compound. */ + smp_wmb(); + + clear_compound_head(page_tail); + prep_compound_page(page_tail, HPAGE_PMD_ORDER); + prep_transhuge_page(page_tail); + + /* Finally unfreeze refcount. Additional reference from page cache. */ + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || + PageSwapCache(head))); + + if (page_is_young(head)) + set_page_young(page_tail); + if (page_is_idle(head)) + set_page_idle(page_tail); + + page_cpupid_xchg_last(page_tail, page_cpupid_last(head)); + lru_add_pud_page_tail(head, page_tail, lruvec, list); +} + +static void __split_huge_pud_page(struct page *page, struct list_head *list, + unsigned long flags) +{ + struct page *head = compound_head(page); + pg_data_t *pgdat = page_pgdat(head); + struct lruvec *lruvec; + int i; + + lruvec = mem_cgroup_page_lruvec(head, pgdat); + + /* complete memcg works before add pages to LRU */ + mem_cgroup_split_huge_pud_fixup(head); + + /* no file-back page support yet */ + VM_BUG_ON(!PageAnon(page)); + + for (i = HPAGE_PUD_NR - HPAGE_PMD_NR; i >= 1; i -= HPAGE_PMD_NR) { + __split_huge_pud_page_tail(head, i, lruvec, list); + } + /* reset head page order */ + prep_compound_page(head, HPAGE_PMD_ORDER); + prep_transhuge_page(head); + + page_ref_inc(head); + + spin_unlock_irqrestore(&pgdat->lru_lock, flags); + + remap_pud_page(head); + + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + struct page *subpage = head + i; + + if (subpage == page) + continue; + unlock_page(subpage); + + /* + * Subpages may be freed if there wasn't any mapping + * like if add_to_swap() is running on a lru page that + * had its mapping zapped. And freeing these pages + * requires taking the lru_lock so we do the put_page + * of the tail pages after the split is complete. + */ + put_page(subpage); + } +} +/* Racy check whether the huge page can be split */ +bool can_split_huge_pud_page(struct page *page, int *pextra_pins) +{ + int extra_pins; + + VM_BUG_ON(!PageAnon(page)); + + extra_pins = PageSwapCache(page) ? HPAGE_PUD_NR : 0; + + if (pextra_pins) + *pextra_pins = extra_pins; + return total_mapcount(page) == page_count(page) - extra_pins - 1; +} + +/* + * This function splits huge page into normal pages. @page can point to any + * subpage of huge page to split. Split doesn't change the position of @page. + * + * Only caller must hold pin on the @page, otherwise split fails with -EBUSY. + * The huge page must be locked. + * + * If @list is null, tail pages will be added to LRU list, otherwise, to @list. + * + * Both head page and tail pages will inherit mapping, flags, and so on from + * the hugepage. + * + * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if + * they are not mapped. + * + * Returns 0 if the hugepage is split successfully. + * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under + * us. + */ +int split_huge_pud_page_to_list(struct page *page, struct list_head *list) +{ + struct page *head = compound_head(page); + struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); + struct deferred_split *ds_queue = get_deferred_split_queue(head); + struct anon_vma *anon_vma = NULL; + struct address_space *mapping = NULL; + int count, mapcount, extra_pins, ret; + bool mlocked; + unsigned long flags; + + VM_BUG_ON_PAGE(is_huge_zero_page(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(!PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + + if (PageWriteback(page)) + return -EBUSY; + + /* + * The caller does not necessarily hold an mmap_sem that would + * prevent the anon_vma disappearing so we first we take a + * reference to it and then lock the anon_vma for write. This + * is similar to page_lock_anon_vma_read except the write lock + * is taken to serialise against parallel split or collapse + * operations. + */ + anon_vma = page_get_anon_vma(head); + if (!anon_vma) { + ret = -EBUSY; + goto out; + } + mapping = NULL; + anon_vma_lock_write(anon_vma); + /* + * Racy check if we can split the page, before unmap_pud_page() will + * split PUDs + */ + if (!can_split_huge_pud_page(head, &extra_pins)) { + ret = -EBUSY; + goto out_unlock; + } + + mlocked = PageMlocked(page); + unmap_pud_page(head); + VM_BUG_ON_PAGE(compound_mapcount(head), head); + + /* Make sure the page is not on per-CPU pagevec as it takes pin */ + if (mlocked) + lru_add_drain(); + + /* prevent PageLRU to go away from under us, and freeze lru stats */ + spin_lock_irqsave(&pgdata->lru_lock, flags); + + /* Prevent deferred_split_scan() touching ->_refcount */ + spin_lock(&ds_queue->split_queue_lock); + count = page_count(head); + mapcount = total_mapcount(head); + if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { + if (!list_empty(page_deferred_list(head))) { + ds_queue->split_queue_len--; + list_del(page_deferred_list(head)); + } + if (mapping) { + __dec_node_page_state(page, NR_SHMEM_THPS); + } + spin_unlock(&ds_queue->split_queue_lock); + __split_huge_pud_page(page, list, flags); + ret = 0; + } else { + if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { + pr_alert("total_mapcount: %u, page_count(): %u\n", + mapcount, count); + if (PageTail(page)) + dump_page(head, NULL); + dump_page(page, "total_mapcount(head) > 0"); + } + spin_unlock(&ds_queue->split_queue_lock); + spin_unlock_irqrestore(&pgdata->lru_lock, flags); + remap_pud_page(head); + ret = -EBUSY; + } + +out_unlock: + if (anon_vma) { + anon_vma_unlock_write(anon_vma); + put_anon_vma(anon_vma); + } +out: + count_vm_event(!ret ? THP_SPLIT_PUD_PAGE : THP_SPLIT_PUD_PAGE_FAILED); + return ret; +} #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma, @@ -2209,7 +2610,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, unsigned long haddr, bool freeze) { struct mm_struct *mm = vma->vm_mm; - struct page *page; + struct page *page, *head; pgtable_t pgtable; pmd_t old_pmd, _pmd; bool young, write, soft_dirty, pmd_migration = false, uffd_wp = false; @@ -2239,7 +2640,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, set_page_dirty(page); if (!PageReferenced(page) && pmd_young(_pmd)) SetPageReferenced(page); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; @@ -2298,7 +2699,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, uffd_wp = pmd_uffd_wp(old_pmd); } VM_BUG_ON_PAGE(!page_count(page), page); - page_ref_add(page, HPAGE_PMD_NR - 1); + head = compound_head(page); + page_ref_add(head, HPAGE_PMD_NR - 1); /* * Withdraw the table only after we mark the pmd entry invalid. @@ -2344,14 +2746,24 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, /* * Set PG_double_map before dropping compound_mapcount to avoid * false-negative page_mapped(). + * Don't set it if the PUD page is mapped at PUD level, since + * page_mapped() is true in that case. */ - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { + if (((PMDPageInPUD(page) && + sub_compound_mapcount(page) > + (1 + PagePUDDoubleMap(compound_head(page)))) || + (!PMDPageInPUD(page) && + compound_mapcount(page) > 1)) + && !TestSetPageDoubleMap(page)) { for (i = 0; i < HPAGE_PMD_NR; i++) atomic_inc(&page[i]._mapcount); } lock_page_memcg(page); - if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { + if ((PMDPageInPUD(page) && + atomic_add_negative(-1, sub_compound_mapcount_ptr(page, 1))) || + (!PMDPageInPUD(page) && + atomic_add_negative(-1, compound_mapcount_ptr(page)))) { /* Last compound_mapcount is gone. */ __dec_lruvec_page_state(page, NR_ANON_THPS); if (TestClearPageDoubleMap(page)) { @@ -2367,7 +2779,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, if (freeze) { for (i = 0; i < HPAGE_PMD_NR; i++) { - page_remove_rmap(page + i, false); + page_remove_rmap(page + i, false, 0); put_page(page + i); } } @@ -2478,6 +2890,11 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, * previously contain an hugepage: check if we need to split * an huge pmd. */ + if (start & ~HPAGE_PUD_MASK && + (start & HPAGE_PUD_MASK) >= vma->vm_start && + (start & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE <= vma->vm_end) + split_huge_pud_address(vma, start, false, NULL); + if (start & ~HPAGE_PMD_MASK && (start & HPAGE_PMD_MASK) >= vma->vm_start && (start & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end) @@ -2488,6 +2905,11 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, * previously contain an hugepage: check if we need to split * an huge pmd. */ + if (end & ~HPAGE_PUD_MASK && + (end & HPAGE_PUD_MASK) >= vma->vm_start && + (end & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE <= vma->vm_end) + split_huge_pud_address(vma, end, false, NULL); + if (end & ~HPAGE_PMD_MASK && (end & HPAGE_PMD_MASK) >= vma->vm_start && (end & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= vma->vm_end) @@ -2502,6 +2924,11 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, struct vm_area_struct *next = vma->vm_next; unsigned long nstart = next->vm_start; nstart += adjust_next << PAGE_SHIFT; + if (nstart & ~HPAGE_PUD_MASK && + (nstart & HPAGE_PUD_MASK) >= next->vm_start && + (nstart & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE <= next->vm_end) + split_huge_pud_address(next, nstart, false, NULL); + if (nstart & ~HPAGE_PMD_MASK && (nstart & HPAGE_PMD_MASK) >= next->vm_start && (nstart & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE <= next->vm_end) @@ -2691,12 +3118,23 @@ int total_mapcount(struct page *page) if (PageHuge(page)) return compound; ret = compound; - for (i = 0; i < HPAGE_PMD_NR; i++) - ret += atomic_read(&page[i]._mapcount) + 1; + /* if PMD, read all base page, if PUD, read the sub_compound_mapcount()*/ + if (compound_order(page) == HPAGE_PMD_ORDER) { + for (i = 0; i < thp_nr_pages(page); i++) + ret += atomic_read(&page[i]._mapcount) + 1; + } else if (compound_order(page) == HPAGE_PUD_ORDER) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + ret += sub_compound_mapcount(&page[i]); + for (i = 0; i < thp_nr_pages(page); i++) + ret += atomic_read(&page[i]._mapcount) + 1; + } else + VM_BUG_ON_PAGE(1, page); /* File pages has compound_mapcount included in _mapcount */ + /* both PUD and PMD has HPAGE_PMD_NR sub pages */ if (!PageAnon(page)) return ret - compound * HPAGE_PMD_NR; - if (PageDoubleMap(page)) + /* both PUD and PMD has HPAGE_PMD_NR sub pages */ + if (PagePUDDoubleMap(page) || PageDoubleMap(page)) ret -= HPAGE_PMD_NR; return ret; } @@ -2742,13 +3180,38 @@ int page_trans_huge_mapcount(struct page *page, int *total_mapcount) page = compound_head(page); _total_mapcount = ret = 0; - for (i = 0; i < HPAGE_PMD_NR; i++) { - mapcount = atomic_read(&page[i]._mapcount) + 1; - ret = max(ret, mapcount); - _total_mapcount += mapcount; - } - if (PageDoubleMap(page)) { + /* if PMD, read all base page, if PUD, read the sub_compound_mapcount()*/ + if (compound_order(page) == HPAGE_PMD_ORDER) { + for (i = 0; i < thp_nr_pages(page); i++) { + mapcount = atomic_read(&page[i]._mapcount) + 1; + ret = max(ret, mapcount); + _total_mapcount += mapcount; + } + } else if (compound_order(page) == HPAGE_PUD_ORDER) { + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + int j; + + mapcount = sub_compound_mapcount(&page[i]); + ret = max(ret, mapcount); + _total_mapcount += mapcount; + + /* Triple mapped at base page size */ + for (j = 0; j < HPAGE_PMD_NR; j++) { + mapcount = atomic_read(&page[i + j]._mapcount) + 1; + ret = max(ret, mapcount); + _total_mapcount += mapcount; + } + + if (PageDoubleMap(&page[i])) { + ret -= 1; + _total_mapcount -= HPAGE_PMD_NR; + } + } + } else + VM_BUG_ON_PAGE(1, page); + if (PageDoubleMap(page) || PagePUDDoubleMap(page)) { ret -= 1; + /* both PUD and PMD has HPAGE_PMD_NR sub pages */ _total_mapcount -= HPAGE_PMD_NR; } mapcount = compound_mapcount(page); @@ -2994,6 +3457,9 @@ static unsigned long deferred_split_count(struct shrinker *shrink, return READ_ONCE(ds_queue->split_queue_len); } +#define deferred_list_entry(x) (compound_head(list_entry((void *)x, \ + struct page, mapping))) + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3027,12 +3493,18 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); list_for_each_safe(pos, next, &list) { - page = list_entry((void *)pos, struct page, mapping); + page = deferred_list_entry(pos); if (!trylock_page(page)) goto next; /* split_huge_page() removes page from list on success */ - if (!split_huge_page(page)) - split++; + if (compound_order(page) == HPAGE_PUD_ORDER) { + if (!split_huge_pud_page(page)) + split++; + } else if (compound_order(page) == HPAGE_PMD_ORDER) { + if (!split_huge_page(page)) + split++; + } else + VM_BUG_ON_PAGE(1, page); unlock_page(page); next: put_page(page); @@ -3135,7 +3607,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); put_page(page); } @@ -3161,7 +3633,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE); if (PageAnon(new)) - page_add_anon_rmap(new, vma, mmun_start, true); + page_add_anon_rmap(new, vma, mmun_start, true, HPAGE_PMD_ORDER); else page_add_file_rmap(new, true); set_pmd_at(mm, mmun_start, pvmw->pmd, pmde); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 27a51b202d1f..4113d7b66fee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3993,7 +3993,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, set_page_dirty(page); hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, true); + page_remove_rmap(page, true, huge_page_order(h)); spin_unlock(ptl); tlb_remove_page_size(tlb, page, huge_page_size(h)); @@ -4218,7 +4218,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, range.start, range.end); set_huge_pte_at(mm, haddr, ptep, make_huge_pte(vma, new_page, 1)); - page_remove_rmap(old_page, true); + page_remove_rmap(old_page, true, huge_page_order(h)); hugepage_add_new_anon_rmap(new_page, vma, haddr); set_page_huge_active(new_page); /* Make the old page be freed below */ diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e749e568e1ea..84ce39652282 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -762,7 +762,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, * superfluous. */ pte_clear(vma->vm_mm, address, _pte); - page_remove_rmap(src_page, false); + page_remove_rmap(src_page, false, 0); spin_unlock(ptl); free_page_and_swap_cache(src_page); } @@ -1172,7 +1172,7 @@ static void collapse_huge_page(struct mm_struct *mm, spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); - page_add_new_anon_rmap(new_page, vma, address, true); + page_add_new_anon_rmap(new_page, vma, address, true, HPAGE_PMD_ORDER); lru_cache_add_inactive_or_unevictable(new_page, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); @@ -1475,7 +1475,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (pte_none(*pte)) continue; page = vm_normal_page(vma, addr, *pte); - page_remove_rmap(page, false); + page_remove_rmap(page, false, HPAGE_PMD_ORDER); } pte_unmap_unlock(start_pte, ptl); diff --git a/mm/ksm.c b/mm/ksm.c index 0aa2247bddd7..d778b4d1b626 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1153,7 +1153,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, */ if (!is_zero_pfn(page_to_pfn(kpage))) { get_page(kpage); - page_add_anon_rmap(kpage, vma, addr, false); + page_add_anon_rmap(kpage, vma, addr, false, 0); newpte = mk_pte(kpage, vma->vm_page_prot); } else { newpte = pte_mkspecial(pfn_pte(page_to_pfn(kpage), @@ -1177,7 +1177,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, ptep_clear_flush(vma, addr, ptep); set_pte_at_notify(mm, addr, ptep, newpte); - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); if (!page_mapped(page)) try_to_free_swap(page); put_page(page); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index dc892a3c4b17..5d5be3b7c739 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3232,6 +3232,19 @@ void mem_cgroup_split_huge_fixup(struct page *head) head[i].mem_cgroup = memcg; } } + +void mem_cgroup_split_huge_pud_fixup(struct page *head) +{ + int i; + + if (mem_cgroup_disabled()) + return; + + for (i = HPAGE_PMD_NR; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + head[i].mem_cgroup = head->mem_cgroup; + + /*__mod_memcg_state(head->mem_cgroup, MEMCG_RSS_HUGE, -HPAGE_PUD_NR);*/ +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_MEMCG_SWAP diff --git a/mm/memory.c b/mm/memory.c index b88587256bc1..184d8eb2d060 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1090,7 +1090,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, mark_page_accessed(page); } rss[mm_counter(page)]--; - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); if (unlikely(__tlb_remove_page(tlb, page))) { @@ -1118,7 +1118,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--; - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); put_page(page); continue; } @@ -2725,7 +2725,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * thread doing COW. */ ptep_clear_flush_notify(vma, vmf->address, vmf->pte); - page_add_new_anon_rmap(new_page, vma, vmf->address, false); + page_add_new_anon_rmap(new_page, vma, vmf->address, false, 0); lru_cache_add_inactive_or_unevictable(new_page, vma); /* * We call the notify macro here because, when using secondary @@ -2757,7 +2757,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * mapcount is visible. So transitively, TLBs to * old page will be flushed before it can be reused. */ - page_remove_rmap(old_page, false); + page_remove_rmap(old_page, false, 0); } /* Free the old page.. */ @@ -3273,10 +3273,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* ksm created a completely new copy */ if (unlikely(page != swapcache && swapcache)) { - page_add_new_anon_rmap(page, vma, vmf->address, false); + page_add_new_anon_rmap(page, vma, vmf->address, false, 0); lru_cache_add_inactive_or_unevictable(page, vma); } else { - do_page_add_anon_rmap(page, vma, vmf->address, exclusive); + do_page_add_anon_rmap(page, vma, vmf->address, exclusive, 0); } swap_free(entry); @@ -3420,7 +3420,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) } inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, vmf->address, false); + page_add_new_anon_rmap(page, vma, vmf->address, false, 0); lru_cache_add_inactive_or_unevictable(page, vma); setpte: set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); @@ -3678,7 +3678,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page) /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, vmf->address, false); + page_add_new_anon_rmap(page, vma, vmf->address, false, 0); lru_cache_add_inactive_or_unevictable(page, vma); } else { inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); @@ -4155,7 +4155,7 @@ static vm_fault_t create_huge_pud(struct vm_fault *vmf) return ret; } /* COW or write-notify not handled on PUD level: split pud.*/ - __split_huge_pud(vmf->vma, vmf->pud, vmf->address); + split_huge_pud(vmf->vma, vmf->pud, vmf->address); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ return VM_FAULT_FALLBACK; } diff --git a/mm/migrate.c b/mm/migrate.c index 0b945c8031be..be0e80b32686 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -270,7 +270,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); if (PageAnon(new)) - page_add_anon_rmap(new, vma, pvmw.address, false); + page_add_anon_rmap(new, vma, pvmw.address, false, 0); else page_add_file_rmap(new, false); } @@ -2194,7 +2194,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * new page and page_add_new_anon_rmap guarantee the copy is * visible before the pagetable update. */ - page_add_anon_rmap(new_page, vma, start, true); + page_add_anon_rmap(new_page, vma, start, true, HPAGE_PMD_ORDER); /* * At this point the pmd is numa/protnone (i.e. non present) and the TLB * has already been flushed globally. So no TLB can be currently @@ -2211,7 +2211,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, page_ref_unfreeze(page, 2); mlock_migrate_page(new_page, page); - page_remove_rmap(page, true); + page_remove_rmap(page, true, HPAGE_PMD_ORDER); set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED); spin_unlock(ptl); @@ -2455,7 +2455,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, * drop page refcount. Page won't be freed, as we took * a reference just above. */ - page_remove_rmap(page, false); + page_remove_rmap(page, false, 0); put_page(page); if (pte_present(pte)) @@ -2940,7 +2940,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto unlock_abort; inc_mm_counter(mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, vma, addr, false); + page_add_new_anon_rmap(page, vma, addr, false, 0); if (!is_zone_device_page(page)) lru_cache_add_inactive_or_unevictable(page, vma); get_page(page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 763acbed66f1..97a4c7e4a579 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -679,6 +679,9 @@ void prep_compound_page(struct page *page, unsigned int order) atomic_set(compound_mapcount_ptr(page), -1); if (hpage_pincount_available(page)) atomic_set(compound_pincount_ptr(page), 0); + if (order == HPAGE_PUD_ORDER) + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + atomic_set(sub_compound_mapcount_ptr(&page[i], 1), -1); } #ifdef CONFIG_DEBUG_PAGEALLOC @@ -1132,6 +1135,15 @@ static int free_tail_pages_check(struct page *head_page, struct page *page) */ break; default: + /* sub_compound_map_ptr store here */ + if (compound_order(head_page) == HPAGE_PUD_ORDER && + (page - head_page) % HPAGE_PMD_NR == 3) { + if (unlikely(atomic_read(&page->compound_mapcount) != -1)) { + pr_err("sub_compound_mapcount: %d\n", atomic_read(&page->compound_mapcount) + 1); + bad_page(page, "nonzero sub_compound_mapcount"); + } + break; + } if (page->mapping != TAIL_MAPPING) { bad_page(page, "corrupted mapping in tail page"); goto out; @@ -1183,8 +1195,14 @@ static __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(compound && compound_order(page) != order, page); - if (compound) + if (compound) { ClearPageDoubleMap(page); + if (order == HPAGE_PUD_ORDER) { + ClearPagePUDDoubleMap(page); + for (i = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) + ClearPageDoubleMap(&page[i]); + } + } for (i = 1; i < (1 << order); i++) { if (compound) bad += free_tail_pages_check(page, page + i); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index ef218b0f5d74..a8529afc55e5 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -245,6 +245,17 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, } #endif +#ifndef __HAVE_ARCH_PUDP_INVALIDATE +pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, + pud_t *pudp) +{ + pud_t old = pudp_establish(vma, address, pudp, pud_mknotpresent(*pudp)); + + flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); + return old; +} +#endif + #ifndef pmdp_collapse_flush pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) diff --git a/mm/rmap.c b/mm/rmap.c index 77cec0658b76..0bbaaa891b3c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1108,9 +1108,9 @@ static void __page_check_anon_rmap(struct page *page, * (but PageKsm is never downgraded to PageAnon). */ void page_add_anon_rmap(struct page *page, - struct vm_area_struct *vma, unsigned long address, bool compound) + struct vm_area_struct *vma, unsigned long address, bool compound, int order) { - do_page_add_anon_rmap(page, vma, address, compound ? RMAP_COMPOUND : 0); + do_page_add_anon_rmap(page, vma, address, compound ? RMAP_COMPOUND : 0, order); } /* @@ -1119,7 +1119,7 @@ void page_add_anon_rmap(struct page *page, * Everybody else should continue to use page_add_anon_rmap above. */ void do_page_add_anon_rmap(struct page *page, - struct vm_area_struct *vma, unsigned long address, int flags) + struct vm_area_struct *vma, unsigned long address, int flags, int order) { bool compound = flags & RMAP_COMPOUND; bool first; @@ -1130,10 +1130,21 @@ void do_page_add_anon_rmap(struct page *page, VM_BUG_ON_PAGE(!PageLocked(page), page); if (compound) { - atomic_t *mapcount; + atomic_t *mapcount = NULL; VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageTransHuge(page), page); - mapcount = compound_mapcount_ptr(page); + if (compound_order(page) == HPAGE_PUD_ORDER) { + if (order == HPAGE_PUD_ORDER) { + mapcount = compound_mapcount_ptr(page); + } else if (order == HPAGE_PMD_ORDER) { + VM_BUG_ON(!PMDPageInPUD(page)); + mapcount = sub_compound_mapcount_ptr(page, 1); + } else + VM_BUG_ON(1); + } else if (compound_order(page) == HPAGE_PMD_ORDER) { + mapcount = compound_mapcount_ptr(page); + } else + VM_BUG_ON(1); first = atomic_inc_and_test(mapcount); } else { first = atomic_inc_and_test(&page->_mapcount); @@ -1148,7 +1159,7 @@ void do_page_add_anon_rmap(struct page *page, * disabled. */ if (compound) { - if (nr == HPAGE_PMD_NR) + if (order == HPAGE_PMD_ORDER) __inc_lruvec_page_state(page, NR_ANON_THPS); else __inc_lruvec_page_state(page, NR_ANON_THPS_PUD); @@ -1181,7 +1192,7 @@ void do_page_add_anon_rmap(struct page *page, * Page does not have to be locked. */ void page_add_new_anon_rmap(struct page *page, - struct vm_area_struct *vma, unsigned long address, bool compound) + struct vm_area_struct *vma, unsigned long address, bool compound, int order) { int nr = compound ? thp_nr_pages(page) : 1; @@ -1194,10 +1205,15 @@ void page_add_new_anon_rmap(struct page *page, if (hpage_pincount_available(page)) atomic_set(compound_pincount_ptr(page), 0); - if (nr == HPAGE_PMD_NR) - __inc_lruvec_page_state(page, NR_ANON_THPS); - else + if (order == HPAGE_PUD_ORDER) { + VM_BUG_ON(compound_order(page) != HPAGE_PUD_ORDER); + /* Anon THP always mapped first with PMD */ __inc_lruvec_page_state(page, NR_ANON_THPS_PUD); + } else if (order == HPAGE_PMD_ORDER) { + VM_BUG_ON(compound_order(page) != HPAGE_PMD_ORDER); + __inc_lruvec_page_state(page, NR_ANON_THPS); + } else + VM_BUG_ON(1); } else { /* Anon THP always mapped first with PMD */ VM_BUG_ON_PAGE(PageTransCompound(page), page); @@ -1289,12 +1305,40 @@ static void page_remove_file_rmap(struct page *page, bool compound) clear_page_mlock(page); } -static void page_remove_anon_compound_rmap(struct page *page) +static void page_remove_anon_compound_rmap(struct page *page, int order) { - int i, nr; - - if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) - return; + int i, nr = 0; + struct page *head = compound_head(page); + + if (compound_order(head) == HPAGE_PUD_ORDER) { + if (order == HPAGE_PMD_ORDER) { + VM_BUG_ON(!PMDPageInPUD(page)); + if (atomic_add_negative(-1, sub_compound_mapcount_ptr(page, 1))) { + if (TestClearPageDoubleMap(page)) { + /* + * Subpages can be mapped with PTEs too. Check how many of + * themi are still mapped. + */ + for (i = 0; i < thp_nr_pages(head); i++) { + if (atomic_add_negative(-1, &head[i]._mapcount)) + nr++; + } + } + __dec_node_page_state(page, NR_ANON_THPS); + } + nr += HPAGE_PMD_NR; + __mod_node_page_state(page_pgdat(head), NR_ANON_MAPPED, -nr); + return; + } else { + VM_BUG_ON(order != HPAGE_PUD_ORDER); + if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) + return; + } + } else if (compound_order(head) == HPAGE_PMD_ORDER) { + if (!atomic_add_negative(-1, compound_mapcount_ptr(page))) + return; + } else + VM_BUG_ON_PAGE(1, page); /* Hugepages are not counted in NR_ANON_PAGES for now. */ if (unlikely(PageHuge(page))) @@ -1303,12 +1347,26 @@ static void page_remove_anon_compound_rmap(struct page *page) if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) return; - if (thp_nr_pages(page) == HPAGE_PMD_NR) + if (order == HPAGE_PMD_ORDER) __dec_lruvec_page_state(page, NR_ANON_THPS); - else + else if (order == HPAGE_PUD_ORDER) __dec_lruvec_page_state(page, NR_ANON_THPS_PUD); + else + VM_BUG_ON(1); - if (TestClearPageDoubleMap(page)) { + /* PMD-mapped PUD THP is handled above */ + if (TestClearPagePUDDoubleMap(head)) { + VM_BUG_ON(!(compound_order(head) == HPAGE_PUD_ORDER || head == page)); + /* + * Subpages can be mapped with PMDs too. Check how many of + * themi are still mapped. + */ + for (i = 0, nr = 0; i < HPAGE_PUD_NR; i += HPAGE_PMD_NR) { + if (atomic_add_negative(-1, sub_compound_mapcount_ptr(&head[i], 1))) + nr += HPAGE_PMD_NR; + } + } else if (TestClearPageDoubleMap(head)) { + VM_BUG_ON(compound_order(head) != HPAGE_PMD_ORDER); /* * Subpages can be mapped with PTEs too. Check how many of * them are still mapped. @@ -1332,8 +1390,10 @@ static void page_remove_anon_compound_rmap(struct page *page) if (unlikely(PageMlocked(page))) clear_page_mlock(page); - if (nr) - __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr); + if (nr) { + __mod_lruvec_page_state(head, NR_ANON_MAPPED, -nr); + deferred_split_huge_page(head); + } } /** @@ -1343,7 +1403,7 @@ static void page_remove_anon_compound_rmap(struct page *page) * * The caller needs to hold the pte lock. */ -void page_remove_rmap(struct page *page, bool compound) +void page_remove_rmap(struct page *page, bool compound, int order) { lock_page_memcg(page); @@ -1353,7 +1413,7 @@ void page_remove_rmap(struct page *page, bool compound) } if (compound) { - page_remove_anon_compound_rmap(page); + page_remove_anon_compound_rmap(page, order); goto out; } @@ -1734,7 +1794,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, PageHuge(page)); + page_remove_rmap(subpage, PageHuge(page), 0); put_page(page); } diff --git a/mm/swap.c b/mm/swap.c index 999a84dbe12c..b70631c71171 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -964,6 +964,37 @@ void lru_add_page_tail(struct page *page, struct page *page_tail, page_lru(page_tail)); } } + +/* used by __split_pud_huge_page_tail() */ +void lru_add_pud_page_tail(struct page *page, struct page *page_tail, + struct lruvec *lruvec, struct list_head *list) +{ + VM_BUG_ON_PAGE(!PageHead(page), page); + VM_BUG_ON_PAGE(PageLRU(page_tail), page); + VM_BUG_ON(NR_CPUS != 1 && + !spin_is_locked(&lruvec_pgdat(lruvec)->lru_lock)); + + if (!list) + SetPageLRU(page_tail); + + if (likely(PageLRU(page))) + list_add_tail(&page_tail->lru, &page->lru); + else if (list) { + /* page reclaim is reclaiming a huge page */ + get_page(page_tail); + list_add_tail(&page_tail->lru, list); + } else { + /* + * Head page has not yet been counted, as an hpage, + * so we must account for each subpage individually. + * + * Put page_tail on the list at the correct position + * so they all end up in order. + */ + add_page_to_lru_list_tail(page_tail, lruvec, + page_lru(page_tail)); + } +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, diff --git a/mm/swapfile.c b/mm/swapfile.c index e3f771c2ad83..285edbcb5e22 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1921,9 +1921,9 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, pte_mkold(mk_pte(page, vma->vm_page_prot))); if (page == swapcache) { - page_add_anon_rmap(page, vma, addr, false); + page_add_anon_rmap(page, vma, addr, false, 0); } else { /* ksm created a completely new copy */ - page_add_new_anon_rmap(page, vma, addr, false); + page_add_new_anon_rmap(page, vma, addr, false, 0); lru_cache_add_inactive_or_unevictable(page, vma); } swap_free(entry); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9a3d451402d7..9b31d9beaa46 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -122,7 +122,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_uncharge_unlock; inc_mm_counter(dst_mm, MM_ANONPAGES); - page_add_new_anon_rmap(page, dst_vma, dst_addr, false); + page_add_new_anon_rmap(page, dst_vma, dst_addr, false, 0); lru_cache_add_inactive_or_unevictable(page, dst_vma); set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); diff --git a/mm/util.c b/mm/util.c index bb902f5a6582..410f1ca0932a 100644 --- a/mm/util.c +++ b/mm/util.c @@ -713,17 +713,27 @@ struct address_space *page_mapping_file(struct page *page) int __page_mapcount(struct page *page) { int ret; + struct page *head = compound_head(page); + /* base page mapping */ ret = atomic_read(&page->_mapcount) + 1; + + /* PMDInPUD mapping */ + if (compound_order(head) == HPAGE_PUD_ORDER) { + struct page *sub_compound_page = head + + (((page - head) / HPAGE_PMD_NR) * HPAGE_PMD_NR); + + ret += sub_compound_mapcount(sub_compound_page); + } /* * For file THP page->_mapcount contains total number of mapping * of the page: no need to look into compound_mapcount. */ if (!PageAnon(page) && !PageHuge(page)) return ret; - page = compound_head(page); - ret += atomic_read(compound_mapcount_ptr(page)) + 1; - if (PageDoubleMap(page)) + /* highest compound mapping */ + ret += atomic_read(compound_mapcount_ptr(head)) + 1; + if (PageDoubleMap(head)) ret--; return ret; } diff --git a/mm/vmstat.c b/mm/vmstat.c index 3a01212b652c..dc7c2cec9102 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1330,6 +1330,10 @@ const char * const vmstat_text[] = { "thp_fault_fallback_pud", "thp_fault_fallback_pud_charge", "thp_split_pud", + "thp_split_pud_page", + "thp_split_pud_page_failed", + "thp_zero_pud_page_alloc", + "thp_zero_pud_page_alloc_failed", #endif "thp_zero_page_alloc", "thp_zero_page_alloc_failed", From patchwork Wed Sep 2 18:06:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751487 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7374814E3 for ; Wed, 2 Sep 2020 18:06:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F521206EB for ; Wed, 2 Sep 2020 18:06:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="KyXUVz3o"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Soqx4+uH" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F521206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7712890001A; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 55BBB900012; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EA71900012; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0185.hostedemail.com [216.40.44.185]) by kanga.kvack.org (Postfix) with ESMTP id 9C8E5900012 for ; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 58D5F3633 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-FDA: 77218901508.26.alley55_3918661270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 263A21804B66E for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-Spam-Summary: 1,0,0,6d54c49fd11787b1,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1543:1711:1730:1747:1777:1792:2393:2559:2562:2693:3138:3139:3140:3141:3142:3355:3865:3868:3874:4041:4118:4321:4605:5007:6119:6120:6261:6653:7576:7903:8603:8957:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:13894:14096:14110:14181:14721:21080:21450:21627:21796:21990:30036:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04ygwnbn7m45k9a7cyn4pbf1or1hiyptkaqbpc437ymwdokz1kxkxm3y6tqef3p.q7yp7jg8pudc8gfet39as73thhyc6iyuiuz66k3bh3aureeef97n61ff7fniixx.k-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: alley55_3918661270a2 X-Filterd-Recvd-Size: 7233 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 5A0DF5C01B7; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=uV0fK8A8kyNXU 7VUiy/PxZphd7Odkhk2JVfsAnagHMA=; b=KyXUVz3o8pbUikkX3gIY6OzDUrVdo lJZPkDDssk92G83YzNS6IudfjKbKwektFnszCCSdcOL3QPzd6Ripk0pa+okmTBJS 9YutRA/C57/UZyWE2neOW1jmWfWpw/tSUKGGwBCzCrJg3QWaDdtz14BTyIreDlQD Y767g4Tthf9MM5uzf4BCcrZXmM7dB9P/RB8Blil0Z92zfZQf9unYNM1aBhCq+cr4 SsZ/oVPyxZUsgurQjkKR3Ei68wjp5+sUXo9eZLn45WIE3c3C6B9tiJLXhFk3sYsS KGHbApaOLdL6qh4SqZVF0suQZkPs9EGla7uypGM7hDmE4SJEbkzW9pLAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=uV0fK8A8kyNXU7VUiy/PxZphd7Odkhk2JVfsAnagHMA=; b=Soqx4+uH CG+o25MOr/QSIRCN3hrUPo6jC71tJuT4TYMcPJcaSW8LPYAU9ZsbE0YaLrGgk6uN WmZDVhqOCQDLFuaEFU6xYjyD9i42lq3EkpAK3uPqYHqSwsGQAl5OgVnsAhlL7AFe eFIz1as9E9UUovorMa8QaFh4ldY2oHqJ69/VmsvljpvRSqd+xieYvVrbGtrLGhe+ Q8nCMvNKK25MOjC/WWhgyUAAF66y7zj0AN7CFO3VgNIpj+j6c031TFdJZjWs3cB/ 3zL7p35gv7gMDaVoqvNcWsonCzI3O51j+cl50SMKZ9RaBEIdzOZ+jU3+26uB6e5p UUcmwjbcSKnQUw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 6C61D30600A9; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 07/16] mm: stats: make smap stats understand PUD THPs. Date: Wed, 2 Sep 2020 14:06:19 -0400 Message-Id: <20200902180628.4052244-8-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 263A21804B66E X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Signed-off-by: Zi Yan --- fs/proc/task_mmu.c | 63 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 58 insertions(+), 5 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 7fc9b3cc48d3..2ff80a9c8b57 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -430,10 +430,9 @@ static void smaps_page_accumulate(struct mem_size_stats *mss, } static void smaps_account(struct mem_size_stats *mss, struct page *page, - bool compound, bool young, bool dirty, bool locked) + unsigned long size, bool young, bool dirty, bool locked) { - int i, nr = compound ? compound_nr(page) : 1; - unsigned long size = nr * PAGE_SIZE; + int i, nr = size / PAGE_SIZE; /* * First accumulate quantities that depend only on |size| and the type @@ -536,7 +535,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, if (!page) return; - smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte), locked); + smaps_account(mss, page, PAGE_SIZE, pte_young(*pte), pte_dirty(*pte), locked); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -567,8 +566,44 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, /* pass */; else mss->file_thp += HPAGE_PMD_SIZE; - smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd), locked); + smaps_account(mss, page, HPAGE_PMD_SIZE, pmd_young(*pmd), + pmd_dirty(*pmd), locked); } + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +static void smaps_pud_entry(pud_t *pud, unsigned long addr, + struct mm_walk *walk) +{ + struct mem_size_stats *mss = walk->private; + struct vm_area_struct *vma = walk->vma; + bool locked = !!(vma->vm_flags & VM_LOCKED); + struct page *page = NULL; + + if (pud_present(*pud)) { + /* FOLL_DUMP will return -EFAULT on huge zero page */ + page = follow_trans_huge_pud(vma, addr, pud, FOLL_DUMP); + } + if (IS_ERR_OR_NULL(page)) + return; + if (PageAnon(page)) + mss->anonymous_thp += HPAGE_PUD_SIZE; + else if (PageSwapBacked(page)) + mss->shmem_thp += HPAGE_PUD_SIZE; + else if (is_zone_device_page(page)) + /* pass */; + else + mss->file_thp += HPAGE_PUD_SIZE; + smaps_account(mss, page, HPAGE_PUD_SIZE, pud_young(*pud), + pud_dirty(*pud), locked); +} +#else +static void smaps_pud_entry(pud_t *pud, unsigned long addr, + struct mm_walk *walk) +{ +} +#endif + + #else static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, struct mm_walk *walk) @@ -576,6 +611,23 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, } #endif +static int smaps_pud_range(pud_t *pud, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + spinlock_t *ptl; + + ptl = pud_trans_huge_lock(pud, vma); + if (ptl) { + smaps_pud_entry(pud, addr, walk); + spin_unlock(ptl); + walk->action = ACTION_CONTINUE; + } + + cond_resched(); + return 0; +} + static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -713,6 +765,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, #endif /* HUGETLB_PAGE */ static const struct mm_walk_ops smaps_walk_ops = { + .pud_entry = smaps_pud_range, .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, }; From patchwork Wed Sep 2 18:06:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751489 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B8EDA109A for ; Wed, 2 Sep 2020 18:06:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7775C206EB for ; Wed, 2 Sep 2020 18:06:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="BHObYdXY"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="jpg7/LhL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7775C206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5FEA900020; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B9C91900012; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3AAA290001D; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 009CF900019 for ; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A60273632 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-FDA: 77218901508.02.frog63_1a050ef270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 74A1B10097AA6 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-Spam-Summary: 1,0,0,cf974f67558ddeb5,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:69:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1544:1605:1711:1730:1747:1777:1792:2198:2199:2393:2559:2562:2895:3138:3139:3140:3141:3142:3865:3868:3870:3871:4119:5007:6119:6120:6261:6653:7576:7875:7903:8957:9592:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:13894:13972:14096:14110:14181:14721:21080:21451:21627:21990:30003:30054:30064:30070,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yrmk91sdyhmejs1xnxwaab37hbgop33cb6ucxitt376upr57qgjupsmcadzpi.wpf11dsc8bbbnamhzsk1fieqwyepdtor7iz8k9cq8kpg67o3o8wu9mx8qiepn3u.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: frog63_1a050ef270a2 X-Filterd-Recvd-Size: 8806 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id BB2405C0062; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=CxP3EmNT0ija5 U7GfdqrSeZNSEhYP1PIMWR4kSDGPAs=; b=BHObYdXYSlL9I8rrCWWVt9kcsmhrn GWMe5SZQRUF5gSXWFvFcx/YGAahOtaBu6DYGucDGmLl2vpl+g1hZgc/ocbvjeNjh E7SYML3I0DqIF4zgI2zDNRd1RX0uwE/MJBjlDxI5an28o+imB0McBkDoEfJ91t95 zpR5SJG8JkBaIEwU+UT5qlb3vHZOzPyu+7dILFrLdsy5oJtoqekeoCPmt+JH/v08 T8+zAjqkGxqT7PyVeQmApMq2Twr0meBoEh253bhlqYRtMgu2MB0Mv/+9GtILw8ZC JIIFZ6acVDDV9WCQUiNlHa7Lal18MHoPaJ5Jv0AagFHpOArkNvgJIbMuQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=CxP3EmNT0ija5U7GfdqrSeZNSEhYP1PIMWR4kSDGPAs=; b=jpg7/LhL OvW0kfEFjAFU+9wGt9zmb/UbPlEsZY6nPgryB9dpJX5yvM6/wtM7sAnTsCoaytPy KKYZvBJEysK+IXlp114Bp0a54qsgeu+Hq2nMsv3A9aQV92QvojZ5/z8omcuZaGrE 5wGpvbf99ZDsn/ot1Z93A5mfJfTXvgZJ6VwAdymghqIFKdoJMkxr7wzw9TYjSfXm ldblC/uK/L/HMdacAbsABt/1P2lSjQG0c1/dmDvdudHhAgwOalG0jP7tX35/UWYl N7IpZecICXpAzZNSIejCqvRj3NcJv+s1h5g/fBBvDIuk0cRowu4cENGxgSyqnu4x LXe2zmxVgj12Cg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id A5D2E3060067; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 08/16] mm: page_vma_walk: teach it about PMD-mapped PUD THP. Date: Wed, 2 Sep 2020 14:06:20 -0400 Message-Id: <20200902180628.4052244-9-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 74A1B10097AA6 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan We now have PMD-mapped PUD THP and PTE-mapped PUD THP, page_vma_walk should handle them properly. Signed-off-by: Zi Yan --- mm/page_vma_mapped.c | 116 ++++++++++++++++++++++++++++++------------- 1 file changed, 82 insertions(+), 34 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index d9d39ec06e21..549e296287fd 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -52,6 +52,22 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) return true; } +static bool map_pmd(struct page_vma_mapped_walk *pvmw) +{ + pmd_t pmde; + + pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); + pmde = READ_ONCE(*pvmw->pmd); + if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { + pvmw->ptl = pmd_lock(pvmw->vma->vm_mm, pvmw->pmd); + return true; + } else if (!pmd_present(pmde)) + return false; + + pvmw->ptl = pmd_lock(pvmw->vma->vm_mm, pvmw->pmd); + return true; +} + static inline bool pfn_is_match(struct page *page, unsigned long pfn) { unsigned long page_pfn = page_to_pfn(page); @@ -115,6 +131,38 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return pfn_is_match(pvmw->page, pfn); } +/* 0: not mapped, 1: pmd_page, 2: pmd */ +static int check_pmd(struct page_vma_mapped_walk *pvmw) +{ + unsigned long pfn; + + if (likely(pmd_trans_huge(*pvmw->pmd))) { + if (pvmw->flags & PVMW_MIGRATION) + return 0; + pfn = pmd_pfn(*pvmw->pmd); + if (!pfn_is_match(pvmw->page, pfn)) + return 0; + return 1; + } else if (!pmd_present(*pvmw->pmd)) { + if (thp_migration_supported()) { + if (!(pvmw->flags & PVMW_MIGRATION)) + return 0; + if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { + swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); + + pfn = migration_entry_to_pfn(entry); + if (!pfn_is_match(pvmw->page, pfn)) + return 0; + return 1; + } + } + return 0; + } + /* THP pmd was split under us: handle on pte level */ + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + return 2; +} /** * page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at * @pvmw->address @@ -146,14 +194,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) pgd_t *pgd; p4d_t *p4d; pud_t pude; - pmd_t pmde; + int pmd_res; if (!pvmw->pte && !pvmw->pmd && pvmw->pud) return not_found(pvmw); /* The only possible pmd mapping has been handled on last iteration */ if (pvmw->pmd && !pvmw->pte) - return not_found(pvmw); + goto next_pmd; if (pvmw->pte) goto next_pte; @@ -201,43 +249,43 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } else if (!pud_present(pude)) return false; - pvmw->pmd = pmd_offset(pvmw->pud, pvmw->address); - /* - * Make sure the pmd value isn't cached in a register by the - * compiler and used as a stale value after we've observed a - * subsequent update. - */ - pmde = READ_ONCE(*pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { - pvmw->ptl = pmd_lock(mm, pvmw->pmd); - if (likely(pmd_trans_huge(*pvmw->pmd))) { - if (pvmw->flags & PVMW_MIGRATION) - return not_found(pvmw); - if (pmd_page(*pvmw->pmd) != page) - return not_found(pvmw); + if (!map_pmd(pvmw)) + goto next_pmd; + /* pmd locked after map_pmd */ + while (1) { + pmd_res = check_pmd(pvmw); + if (pmd_res == 1) /* pmd_page */ return true; - } else if (!pmd_present(*pvmw->pmd)) { - if (thp_migration_supported()) { - if (!(pvmw->flags & PVMW_MIGRATION)) - return not_found(pvmw); - if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) { - swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd); - - if (migration_entry_to_page(entry) != page) - return not_found(pvmw); - return true; + else if (pmd_res == 2) /* pmd entry */ + goto pte_level; +next_pmd: + /* Only PMD-mapped PUD THP has next pmd */ + if (!(PageTransHuge(pvmw->page) && compound_order(pvmw->page) == HPAGE_PUD_ORDER)) + return not_found(pvmw); + do { + pvmw->address += HPAGE_PMD_SIZE; + if (pvmw->address >= pvmw->vma->vm_end || + pvmw->address >= + __vma_address(pvmw->page, pvmw->vma) + + thp_nr_pages(pvmw->page) * PAGE_SIZE) + return not_found(pvmw); + /* Did we cross page table boundary? */ + if (pvmw->address % PUD_SIZE == 0) { + if (pvmw->ptl) { + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; } + goto restart; + } else { + pvmw->pmd++; } - return not_found(pvmw); - } else { - /* THP pmd was split under us: handle on pte level */ - spin_unlock(pvmw->ptl); - pvmw->ptl = NULL; - } - } else if (!pmd_present(pmde)) { - return false; + } while (pmd_none(*pvmw->pmd)); + + if (!pvmw->ptl) + pvmw->ptl = pmd_lock(mm, pvmw->pmd); } +pte_level: if (!map_pte(pvmw)) goto next_pte; while (1) { From patchwork Wed Sep 2 18:06:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751491 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 362C8109A for ; Wed, 2 Sep 2020 18:06:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC8A4206EB for ; Wed, 2 Sep 2020 18:06:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="er9QpnC9"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Rl9EvIJW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC8A4206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 019E090001D; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id C5C5790001F; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DCCB90001F; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 0723890001A for ; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C37B9180AD806 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-FDA: 77218901508.22.neck50_3615489270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 9D86618038E67 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-Spam-Summary: 1,0,0,3a99d9154ccf11b7,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:1:41:69:355:379:541:800:960:966:968:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1605:1730:1747:1777:1792:2194:2196:2198:2199:2200:2201:2393:2559:2562:2636:2693:2731:2895:2898:2904:3138:3139:3140:3141:3142:3865:3866:3867:3868:3870:3871:3872:4037:4250:4321:4385:5007:6117:6119:6120:6261:6653:7576:7875:7901:7903:8957:10004:11026:11232:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:13894:14096:14110:21080:21324:21433:21627:21990:30003:30025:30054:30064:30070,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04y8xyreau959wufxzh6boo3yu6dzypsbrww3ryefemtfxnhbyiommyu1qipkbq.cwbjs7wbtpg5deprn1usbdcznybb9efcf3tfwu8kh8f9omg8j4hjditzco99f5o.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: neck50_3615489270a2 X-Filterd-Recvd-Size: 14206 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:33 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id C19645C0163; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=b9eSgdBVc6KPh Gvl15otgB0JUvsTDCKSyaNNO5dyWDg=; b=er9QpnC9pzM1R7WWQiY5NxWp9AHDD MaNd48f0CiyA/HkajR0zAjmnLP7JT8fC9Wi6B3SrQH41vdRx8uBGbjf6bymNnVzl GL9JLzRPPcMRxdBHxcnfZGJKlsZV8eBBThqNmYpNY7sl0SbC+bfUcvUOBJff9Tvy H9+50K4VxJ9UKMC5vxHoko3AT1Qm/cFE0il4BQZ2zeLSBWr81iM/5grQ3GiatvUL SBDi03Hwgj3v04PccCt5e37EkkIvQ7MVRFPYK93ClkN+5Rk0BmaFGQqzAOOZljBJ BLRSh0s1UGLqifZiIY+12BepFUkexOWclMpDRa42Ge2azkRxhXxzFp0OQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=b9eSgdBVc6KPhGvl15otgB0JUvsTDCKSyaNNO5dyWDg=; b=Rl9EvIJW 8nBI4WIKOFKe4Y+iElnDPRJiFX3vCfp3M/nlpT3zG9LUhR3S6ylBE8/iNTHimfT3 tUtt1X/4/Z13K1MCYsGOGhAy+K0l30GVolETvOB9e3F/JZCmH7uzbiuEDFs2ZhzH pymZK7ArXPIkWTifRNwCjSG7BMj4Ehy1qO+uKHVE5+ydxY6AgiWgfHfg6b7bOuXI K7s0vc84PWZxA8j6fHLAr+6J8UFF7PVnnvJNMZS1CWY9CYRz3FKpgWD0M0CcnAdF POX5rsH0cNPHsLXkpgeEyS/ebjZw1YbosxaPiTjoFmv40LdEmAB8oygNsNaXmzZ7 vfixVQmNaJ72EA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id DEA5D30600A6; Wed, 2 Sep 2020 14:06:32 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 09/16] mm: thp: 1GB THP support in try_to_unmap(). Date: Wed, 2 Sep 2020 14:06:21 -0400 Message-Id: <20200902180628.4052244-10-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 9D86618038E67 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Unmap different subpages in different sized THPs properly in the try_to_unmap() function. Signed-off-by: Zi Yan --- mm/migrate.c | 2 +- mm/rmap.c | 159 +++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 116 insertions(+), 45 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index be0e80b32686..df069a55722e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -225,7 +225,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ - if (!pvmw.pte) { + if (!pvmw.pte && pvmw.pmd) { VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); remove_migration_pmd(&pvmw, new); continue; diff --git a/mm/rmap.c b/mm/rmap.c index 0bbaaa891b3c..6c788abdb0b9 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1123,6 +1123,7 @@ void do_page_add_anon_rmap(struct page *page, { bool compound = flags & RMAP_COMPOUND; bool first; + struct page *head = compound_head(page); if (unlikely(PageKsm(page))) lock_page_memcg(page); @@ -1132,8 +1133,8 @@ void do_page_add_anon_rmap(struct page *page, if (compound) { atomic_t *mapcount = NULL; VM_BUG_ON_PAGE(!PageLocked(page), page); - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - if (compound_order(page) == HPAGE_PUD_ORDER) { + VM_BUG_ON_PAGE(!PMDPageInPUD(page) && !PageTransHuge(page), page); + if (compound_order(head) == HPAGE_PUD_ORDER) { if (order == HPAGE_PUD_ORDER) { mapcount = compound_mapcount_ptr(page); } else if (order == HPAGE_PMD_ORDER) { @@ -1141,7 +1142,7 @@ void do_page_add_anon_rmap(struct page *page, mapcount = sub_compound_mapcount_ptr(page, 1); } else VM_BUG_ON(1); - } else if (compound_order(page) == HPAGE_PMD_ORDER) { + } else if (compound_order(head) == HPAGE_PMD_ORDER) { mapcount = compound_mapcount_ptr(page); } else VM_BUG_ON(1); @@ -1151,7 +1152,8 @@ void do_page_add_anon_rmap(struct page *page, } if (first) { - int nr = compound ? thp_nr_pages(page) : 1; + /* int nr = compound ? thp_nr_pages(page) : 1; */ + int nr = 1<vm_flags & VM_LOCKED)) @@ -1473,6 +1478,11 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, is_zone_device_page(page) && !is_device_private_page(page)) return true; + if (flags & TTU_SPLIT_HUGE_PUD) { + split_huge_pud_address(vma, address, + flags & TTU_SPLIT_FREEZE, page); + } + if (flags & TTU_SPLIT_HUGE_PMD) { split_huge_pmd_address(vma, address, flags & TTU_SPLIT_FREEZE, page); @@ -1505,7 +1515,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, while (page_vma_mapped_walk(&pvmw)) { #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ - if (!pvmw.pte && (flags & TTU_MIGRATION)) { + if (!pvmw.pte && pvmw.pmd && (flags & TTU_MIGRATION)) { VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); set_pmd_migration_entry(&pvmw, page); @@ -1537,9 +1547,18 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_PAGE(!pvmw.pte, page); - subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + if (pvmw.pte) { + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + order = 0; + } else if (!pvmw.pte && pvmw.pmd) { + subpage = page - page_to_pfn(page) + pmd_pfn(*pvmw.pmd); + order = HPAGE_PMD_ORDER; + } else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) { + subpage = page - page_to_pfn(page) + pud_pfn(*pvmw.pud); + order = HPAGE_PUD_ORDER; + } + VM_BUG_ON(!subpage); address = pvmw.address; if (PageHuge(page)) { @@ -1617,16 +1636,26 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } if (!(flags & TTU_IGNORE_ACCESS)) { - if (ptep_clear_flush_young_notify(vma, address, - pvmw.pte)) { - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; + if ((pvmw.pte && + ptep_clear_flush_young_notify(vma, address, pvmw.pte)) || + ((!pvmw.pte && pvmw.pmd) && + pmdp_clear_flush_young_notify(vma, address, pvmw.pmd)) || + ((!pvmw.pte && !pvmw.pmd && pvmw.pud) && + pudp_clear_flush_young_notify(vma, address, pvmw.pud)) + ) { + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; } } /* Nuke the page table entry. */ - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + if (pvmw.pte) + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + else if (!pvmw.pte && pvmw.pmd) + flush_cache_page(vma, address, pmd_pfn(*pvmw.pmd)); + else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) + flush_cache_page(vma, address, pud_pfn(*pvmw.pud)); if (should_defer_flush(mm, flags)) { /* * We clear the PTE but do not flush so potentially @@ -1636,16 +1665,34 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * transition on a cached TLB entry is written through * and traps if the PTE is unmapped. */ - pteval = ptep_get_and_clear(mm, address, pvmw.pte); + if (pvmw.pte) { + pteval = ptep_get_and_clear(mm, address, pvmw.pte); + + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + } else if (!pvmw.pte && pvmw.pmd) { + pmdval = pmdp_huge_get_and_clear(mm, address, pvmw.pmd); - set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); + set_tlb_ubc_flush_pending(mm, pmd_dirty(pmdval)); + } else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) { + pudval = pudp_huge_get_and_clear(mm, address, pvmw.pud); + + set_tlb_ubc_flush_pending(mm, pud_dirty(pudval)); + } } else { - pteval = ptep_clear_flush(vma, address, pvmw.pte); + if (pvmw.pte) + pteval = ptep_clear_flush(vma, address, pvmw.pte); + else if (!pvmw.pte && pvmw.pmd) + pmdval = pmdp_huge_clear_flush(vma, address, pvmw.pmd); + else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) + pudval = pudp_huge_clear_flush(vma, address, pvmw.pud); } /* Move the dirty bit to the page. Now the pte is gone. */ - if (pte_dirty(pteval)) - set_page_dirty(page); + if ((pvmw.pte && pte_dirty(pteval)) || + ((!pvmw.pte && pvmw.pmd) && pmd_dirty(pmdval)) || + ((!pvmw.pte && !pvmw.pmd && pvmw.pud) && pud_dirty(pudval)) + ) + set_page_dirty(page); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); @@ -1680,35 +1727,59 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } else if (IS_ENABLED(CONFIG_MIGRATION) && (flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))) { swp_entry_t entry; - pte_t swp_pte; - if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); - ret = false; - page_vma_mapped_walk_done(&pvmw); - break; - } + if (pvmw.pte) { + pte_t swp_pte; - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = make_migration_entry(subpage, - pte_write(pteval)); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ + if (arch_unmap_one(mm, vma, address, pteval) < 0) { + set_pte_at(mm, address, pvmw.pte, pteval); + ret = false; + page_vma_mapped_walk_done(&pvmw); + break; + } + + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + entry = make_migration_entry(subpage, + pte_write(pteval)); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + set_pte_at(mm, address, pvmw.pte, swp_pte); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + */ + } else if (!pvmw.pte && pvmw.pmd) { + pmd_t swp_pmd; + /* + * Store the pfn of the page in a special migration + * pte. do_swap_page() will wait until the migration + * pte is removed and then restart fault handling. + */ + entry = make_migration_entry(subpage, + pmd_write(pmdval)); + swp_pmd = swp_entry_to_pmd(entry); + if (pmd_soft_dirty(pmdval)) + swp_pmd = pmd_swp_mksoft_dirty(swp_pmd); + set_pmd_at(mm, address, pvmw.pmd, swp_pmd); + /* + * No need to invalidate here it will synchronize on + * against the special swap migration pte. + */ + } else if (!pvmw.pte && !pvmw.pmd && pvmw.pud) { + VM_BUG_ON(1); + } } else if (PageAnon(page)) { swp_entry_t entry = { .val = page_private(subpage) }; pte_t swp_pte; + + VM_BUG_ON(!pvmw.pte); /* * Store the swap location in the pte. * See handle_pte_fault() ... @@ -1794,7 +1865,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, PageHuge(page), 0); + page_remove_rmap(subpage, PageHuge(page) || order >= HPAGE_PMD_ORDER, order); put_page(page); } From patchwork Wed Sep 2 18:06:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751493 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D3AA0109B for ; Wed, 2 Sep 2020 18:06:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 90AD1206EB for ; Wed, 2 Sep 2020 18:06:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="W993RT4Z"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="JYCdfcFh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90AD1206EB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4544390001B; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ED9AF900012; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E4D4900021; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0149.hostedemail.com [216.40.44.149]) by kanga.kvack.org (Postfix) with ESMTP id 3A48890001B for ; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F2835180AD817 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-FDA: 77218901508.25.pest79_0312a50270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id AE9DA1804E3A8 for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) X-Spam-Summary: 1,0,0,75cd1201c6869196,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:69:355:379:541:800:960:966:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1543:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:3138:3139:3140:3141:3142:3355:3865:3866:3867:3872:4118:4250:4321:4385:5007:6117:6119:6120:6261:6630:6653:7576:7875:7901:8957:9592:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13894:14096:14110:14181:14721:21080:21627:21990:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yr873xo85npc3pbibjp5xm6qrsfop79n11pqzqqxfz7ozec6poox7oiibthb4.5ba6zwn95348aosxnxm8fadzxzkgqi49jqbsyumnbiex7i1yujhotdz1wfjyzdq.s-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: pest79_0312a50270a2 X-Filterd-Recvd-Size: 7781 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id F26395C021E; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=x0au+BqqFQ547 Y2Nkd3x3yYveCHO8htqJqOYoJY8xgY=; b=W993RT4Z+UfTDB8hiyR98Nncw3Zum Nwo2R/KiaS/dSOvMsrxY2lD6MfMTqZ5uuTobCfqclpiM6V2Rg3D0VraBdyzWhlgI E4C1btPcssm+aEnqm1fNTk7kT3k/WSMZUoetoDZf+7qVOq34zGcttnjQvT5mPcOe BBi3WuXoaa3y5b91zYnCzdN1+aDZmocGL3VD6wYRPGP4hJy+vSqLZAZEI9HiET9M S90/ryHmIX1RENjtB12gYJJfpbTqdrrchEriHYuAl7nXgwdCuRy2b5UuCzGntGmH v2XjOeQpbNZ7wDSBKQ1mKh03kNRiIX4bjAXpy1nQqNo7B0HJNtS4PVqDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=x0au+BqqFQ547Y2Nkd3x3yYveCHO8htqJqOYoJY8xgY=; b=JYCdfcFh kC1U+EmfPS6SePapwBIkljtrpgHC/x7tgGdsQpelqsvYbZn1pgZ+y+iypQvA+jz+ QfTvci54nM2T7V90I/ETySUO5Iv6+ThkUhmSuO6sIvSU/VNK72LxozhGdB32TKrs JcEAnHPlI4Pn9bVoG3EXpvhA4MafmZMttJGhmi2947QtkiNXPxdgnFVrK75K9h2f yc1kpnsEgk1KzweTIyt+1m2ph8z5tLDmZoe7Kd1I+5NnXRPrEqPtlBsOi19gEnH+ 7j1VH0DbrAPTlZVSKQWWrWPYJfvjObRxthL/iZ6lO9evrSCuPnmuqNwmkbbYDpTZ SouX2dHQsi4CXQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 23F1430600B4; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 10/16] mm: thp: split 1GB THPs at page reclaim. Date: Wed, 2 Sep 2020 14:06:22 -0400 Message-Id: <20200902180628.4052244-11-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: AE9DA1804E3A8 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan We cannot swap 1GB THPs, so split them before swap them out. Signed-off-by: Zi Yan --- mm/swap_slots.c | 2 ++ mm/vmscan.c | 58 +++++++++++++++++++++++++++++++++++++------------ 2 files changed, 46 insertions(+), 14 deletions(-) diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 3e6453573a89..65b8742a0446 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -312,6 +312,8 @@ swp_entry_t get_swap_page(struct page *page) entry.val = 0; if (PageTransHuge(page)) { + if (compound_order(page) == HPAGE_PUD_ORDER) + return entry; if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, HPAGE_PMD_NR); goto out; diff --git a/mm/vmscan.c b/mm/vmscan.c index 99e1796eb833..617d15a041f8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1240,23 +1240,49 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!(sc->gfp_mask & __GFP_IO)) goto keep_locked; if (PageTransHuge(page)) { - /* cannot split THP, skip it */ - if (!can_split_huge_page(page, NULL)) - goto activate_locked; - /* - * Split pages without a PMD map right - * away. Chances are some or all of the - * tail pages can be freed without IO. - */ - if (!compound_mapcount(page) && - split_huge_page_to_list(page, - page_list)) + if (compound_order(page) == HPAGE_PUD_ORDER) { + /* cannot split THP, skip it */ + if (!can_split_huge_pud_page(page, NULL)) + goto activate_locked; + /* + * Split pages without a PUD map right + * away. Chances are some or all of the + * tail pages can be freed without IO. + */ + if (!compound_mapcount(page) && + split_huge_pud_page_to_list(page, + page_list)) + goto activate_locked; + } + if (compound_order(page) == HPAGE_PMD_ORDER) { + /* cannot split THP, skip it */ + if (!can_split_huge_page(page, NULL)) + goto activate_locked; + /* + * Split pages without a PMD map right + * away. Chances are some or all of the + * tail pages can be freed without IO. + */ + if (!compound_mapcount(page) && + split_huge_page_to_list(page, + page_list)) + goto activate_locked; + } + } + /* Split PUD THPs before swapping */ + if (compound_order(page) == HPAGE_PUD_ORDER) { + if (split_huge_pud_page_to_list(page, page_list)) goto activate_locked; + else { + sc->nr_scanned -= (nr_pages - HPAGE_PMD_NR); + nr_pages = HPAGE_PMD_NR; + } } if (!add_to_swap(page)) { if (!PageTransHuge(page)) goto activate_locked_split; /* Fallback to swap normal pages */ + VM_BUG_ON_PAGE(compound_order(page) != HPAGE_PMD_ORDER, page); if (split_huge_page_to_list(page, page_list)) goto activate_locked; @@ -1273,6 +1299,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, mapping = page_mapping(page); } } else if (unlikely(PageTransHuge(page))) { + VM_BUG_ON_PAGE(compound_order(page) != HPAGE_PMD_ORDER, page); /* Split file THP */ if (split_huge_page_to_list(page, page_list)) goto keep_locked; @@ -1298,9 +1325,12 @@ static unsigned int shrink_page_list(struct list_head *page_list, enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH; bool was_swapbacked = PageSwapBacked(page); - if (unlikely(PageTransHuge(page))) - flags |= TTU_SPLIT_HUGE_PMD; - + if (unlikely(PageTransHuge(page))) { + if (compound_order(page) == HPAGE_PMD_ORDER) + flags |= TTU_SPLIT_HUGE_PMD; + else if (compound_order(page) == HPAGE_PUD_ORDER) + flags |= TTU_SPLIT_HUGE_PUD; + } if (!try_to_unmap(page, flags)) { stat->nr_unmap_fail += nr_pages; if (!was_swapbacked && PageSwapBacked(page)) From patchwork Wed Sep 2 18:06:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751497 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C935B109A for ; Wed, 2 Sep 2020 18:07:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 80743208C7 for ; Wed, 2 Sep 2020 18:07:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="K2KV2DVW"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="YPWKC0BX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 80743208C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B458F900022; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 90434900025; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13D36900021; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 7FFA090001D for ; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 446C3824805A for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-FDA: 77218901550.26.sink23_0d12587270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 15C971804B654 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,64f30d7e1618764b,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:2:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1605:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3865:3867:3868:3870:4049:4120:4250:4321:4605:5007:6119:6120:6261:6630:6653:7576:7901:8957:9036:9592:10004:10226:11026:11232:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:13894:14110:21080:21627:21990:30003:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yfgqq86mmz13bdybo7wnugzfahiycfyzm9o7ourdiqm4hedisud4yzcr97zm6.m3xc9xpwfg58e66mygtpu6r4g3fmnb55we4ppmozh5e5z8kd5zaz6yk5yem3fnk.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: sink23_0d12587270a2 X-Filterd-Recvd-Size: 9967 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 277195C0144; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=9wI5eSYIGtV+C JbXh5SLtYPlAilzT8nVnNMnj6skQGo=; b=K2KV2DVWtRcdX3QiWB3+l4EPkOlif 7hjYtzHbMVjwAb6fUCN65GC1CLfMKL2VCq5F88xbuX4BVVQd4qsWZ/aGyjp6jpPf EKk/7dk+++Htiu9n1t7TUXcntIGgQEUIqTj3qD1ZQqKgsDJVORxcGKCuwnx0b3Ir Y0K7kJjqoOwC9XlMZccZVFqQkWVay+9k9xGaMfIOAp2m3hONQtSHGifldo+NETEO If3C4rsQeW4V9GdTp4OVhRgXLKBlKR4PnMxA/4lG40KuWx5x1TAWtOgct9dwr46v 8havT3dmN9Nd567Yo9Ex2dUOLyRft4Wji4GWs0PU8JWTz5Rby8DpfM5fg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=9wI5eSYIGtV+CJbXh5SLtYPlAilzT8nVnNMnj6skQGo=; b=YPWKC0BX l4CP15MFklKrY6Uf2JRlHyi/6ltWioVeafteVm7XVZu43P+jd222Cns+lSDQgV7L Uq7Vd+G/DzmS2lZi4+5UIjbCnuS6FyP7HmcdRTkObHxjlMxcPU2R4X80vn10WUbM MmBQXVnDdTbK3CCOEwD+vFfYoYkF9n34cH9Psm1xIq0loXqBoTuysISFfh09URN4 /hA/U8RR7M0sil9DVccoSq86cF0sDKJdbkdMKIvG3rXgFEIS9b5ZOJLag6Se338N gd5ZDzNbFGP5Xg6MS692bceOgFvlRpUuVAuVuofVXfVXp63Y2BVh6XGBJ8sACqmq oNdImj6sgGn/ew== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 5C90030600B7; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 11/16] mm: thp: 1GB THP follow_p*d_page() support. Date: Wed, 2 Sep 2020 14:06:23 -0400 Message-Id: <20200902180628.4052244-12-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 15C971804B654 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Add follow_page support for 1GB THPs. Signed-off-by: Zi Yan --- include/linux/huge_mm.h | 11 +++++++ mm/gup.c | 60 ++++++++++++++++++++++++++++++++- mm/huge_memory.c | 73 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 142 insertions(+), 2 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 589e5af5a1c2..c7bc40c4a5e2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -20,6 +20,10 @@ extern int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, extern void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud); extern int do_huge_pud_anonymous_page(struct vm_fault *vmf); extern vm_fault_t do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud); +extern struct page *follow_trans_huge_pud(struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { @@ -32,6 +36,13 @@ extern vm_fault_t do_huge_pud_wp_page(struct vm_fault *vmf, pud_t orig_pud) { return VM_FAULT_FALLBACK; } +struct page *follow_trans_huge_pud(struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags) +{ + return NULL; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); diff --git a/mm/gup.c b/mm/gup.c index bd883a112724..4b32ae3c5fa2 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -698,10 +698,68 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, if (page) return page; } + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + if (likely(!pud_trans_huge(*pud))) { + if (unlikely(pud_bad(*pud))) + return no_page_table(vma, flags); + return follow_pmd_mask(vma, address, pud, flags, ctx); + } + + ptl = pud_lock(mm, pud); + + if (unlikely(!pud_trans_huge(*pud))) { + spin_unlock(ptl); + if (unlikely(pud_bad(*pud))) + return no_page_table(vma, flags); + return follow_pmd_mask(vma, address, pud, flags, ctx); + } + + if (flags & FOLL_SPLIT) { + int ret; + pmd_t *pmd = NULL; + + page = pud_page(*pud); + if (is_huge_zero_page(page)) { + + spin_unlock(ptl); + ret = 0; + split_huge_pud(vma, pud, address); + pmd = pmd_offset(pud, address); + split_huge_pmd(vma, pmd, address); + if (pmd_trans_unstable(pmd)) + ret = -EBUSY; + } else { + get_page(page); + spin_unlock(ptl); + lock_page(page); + ret = split_huge_pud_page(page); + if (!ret) + ret = split_huge_page(page); + else { + unlock_page(page); + put_page(page); + goto out; + } + unlock_page(page); + put_page(page); + if (pud_none(*pud)) + return no_page_table(vma, flags); + pmd = pmd_offset(pud, address); + } +out: + return ret ? ERR_PTR(ret) : + follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); + } + page = follow_trans_huge_pud(vma, address, pud, flags); + spin_unlock(ptl); + ctx->page_mask = HPAGE_PUD_NR - 1; + return page; +#else if (unlikely(pud_bad(*pud))) return no_page_table(vma, flags); - return follow_pmd_mask(vma, address, pud, flags, ctx); +#endif } static struct page *follow_p4d_mask(struct vm_area_struct *vma, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 398f1b52f789..e209c2dfc5b7 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1259,6 +1259,77 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, return page; } +/* + * FOLL_FORCE can write to even unwritable pmd's, but only + * after we've gone through a COW cycle and they are dirty. + */ +static inline bool can_follow_write_pud(pud_t pud, unsigned int flags) +{ + return pud_write(pud) || + ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pud_dirty(pud)); +} + +struct page *follow_trans_huge_pud(struct vm_area_struct *vma, + unsigned long addr, + pud_t *pud, + unsigned int flags) +{ + struct mm_struct *mm = vma->vm_mm; + struct page *page = NULL; + + assert_spin_locked(pud_lockptr(mm, pud)); + + if (flags & FOLL_WRITE && !can_follow_write_pud(*pud, flags)) + goto out; + + /* Avoid dumping huge zero page */ + if ((flags & FOLL_DUMP) && is_huge_zero_pud(*pud)) + return ERR_PTR(-EFAULT); + + /* Full NUMA hinting faults to serialise migration in fault paths */ + /*&& pud_protnone(*pmd)*/ + if ((flags & FOLL_NUMA)) + goto out; + + page = pud_page(*pud); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + if (flags & FOLL_TOUCH) + touch_pud(vma, addr, pud, flags); + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { + /* + * We don't mlock() pte-mapped THPs. This way we can avoid + * leaking mlocked pages into non-VM_LOCKED VMAs. + * + * For anon THP: + * + * We do the same thing as PMD-level THP. + * + * For file THP: + * + * No support yet. + * + */ + + if (PageAnon(page) && compound_mapcount(page) != 1) + goto skip_mlock; + if (PagePUDDoubleMap(page) || !page->mapping) + goto skip_mlock; + if (!trylock_page(page)) + goto skip_mlock; + lru_add_drain(); + if (page->mapping && !PagePUDDoubleMap(page)) + mlock_vma_page(page); + unlock_page(page); + } +skip_mlock: + page += (addr & ~HPAGE_PUD_MASK) >> PAGE_SHIFT; + VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); + if (flags & FOLL_GET) + get_page(page); + +out: + return page; +} int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, pud_t *dst_pud, pud_t *src_pud, unsigned long addr, struct vm_area_struct *vma) @@ -1501,7 +1572,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, goto out; page = pmd_page(*pmd); - VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page) && !PMDPageInPUD(page), page); if (!try_grab_page(page, flags)) return ERR_PTR(-ENOMEM); From patchwork Wed Sep 2 18:06:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751499 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 85492109B for ; Wed, 2 Sep 2020 18:07:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 43E1520FC3 for ; Wed, 2 Sep 2020 18:07:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="UGx5kynZ"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="qhcVowDL" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 43E1520FC3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2C6C9900026; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F1E64900012; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44AC790001F; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id BE23590001B for ; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 87D061DE6 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-FDA: 77218901550.27.nest81_110f099270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 583A93D669 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,e7988f236f165880,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1542:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3354:3865:3870:4117:4321:5007:6119:6120:6261:6653:7576:7901:8660:9036:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13148:13230:13894:14096:14110:14181:14721:21080:21627:21939:21990:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yrtkgow97hs633utjiosxmkr7exypen1gsrwj57cjmw6n63upag7jmxgbi5nm.di54b8tz8by4u8yc84q5guewa671k5csucr9a5x1uyca1xz9rug7k3xot1xzmh5.n-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:42,LUA_SUMMARY:none X-HE-Tag: nest81_110f099270a2 X-Filterd-Recvd-Size: 6198 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 7E8605C0174; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=W8y/hMwf2uJ9y W3fuBaKgbgy0fOnMgqFI5aw7zRR3JY=; b=UGx5kynZxCJQaVCAk5YbyXr6jMzyI 4yhj762Ke3rbG4bzrn5KuQ5amp/1xkbvvfTFG8eKebolmOQ2bpvi0aJ/0PT9mDSA K5EyGXxNpKeonlfMUyKsVKMTbDNI2GUHeZYyQS6f3oHp1q+wWo0TgqOb3xwdJZ74 fX77Vgnc1ByQHDVbQtq0pCiV0OLvhSHR+7MrsnIN9oSOjN2mkbpBZYAvTqT8oP/9 tm17a0OvzT+RUwEbylaK71lHNKiDgmxrzFI0hyXVm/L6J0Cc1WbC/W69AcWAsxTf hhyY1dqSvDrlSK+lNiy0SHm0bJw/MmKa4a4PA7io1GKYm7dne1+9/KMrQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=W8y/hMwf2uJ9yW3fuBaKgbgy0fOnMgqFI5aw7zRR3JY=; b=qhcVowDL sxs+uk520A2zHqb0PStyXqZIp38AiPOP6/dzF4P43weETxkehYN7tHB8lu71R2rF EvhIugSn3FSKS5TNc4eJRb0n1BhcqbD0hluwVcdbpetzmUdfENLR5vge31DGNrHY SCb33fWL1zO1oxnnI4ygJnvnDjbhA+caSmCcXjrNZ+mnV0Wq4peRERF0Ifhx70o/ sssnn4n5JxqcwQINk+Y56kiPHxXWn8GED4tkyqOTuWDcSOQ4UBOjMhn2CwbOb6Cr uQVq+aDjwZRG+Ob1K0VZDOB3bTm98y20LSS1kYM8wmL2GvavUJqb6mzNu7qmqRfn 4HZyQmz+Gx/2DQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 9646C30600A9; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 12/16] mm: support 1GB THP pagemap support. Date: Wed, 2 Sep 2020 14:06:24 -0400 Message-Id: <20200902180628.4052244-13-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 583A93D669 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Print page flags properly. Signed-off-by: Zi Yan --- fs/proc/task_mmu.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2ff80a9c8b57..7254c7ecf659 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1557,6 +1557,64 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, return err; } +static int pagemap_pud_range(pud_t *pudp, unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct vm_area_struct *vma = walk->vma; + struct pagemapread *pm = walk->private; + spinlock_t *ptl; + int err = 0; + +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD + ptl = pud_trans_huge_lock(pudp, vma); + if (ptl) { + u64 flags = 0, frame = 0; + pud_t pud = *pudp; + struct page *page = NULL; + + if (vma->vm_flags & VM_SOFTDIRTY) + flags |= PM_SOFT_DIRTY; + + if (pud_present(pud)) { + page = pud_page(pud); + + flags |= PM_PRESENT; + if (pud_soft_dirty(pud)) + flags |= PM_SOFT_DIRTY; + if (pm->show_pfn) + frame = pud_pfn(pud) + + ((addr & ~PUD_MASK) >> PAGE_SHIFT); + } + + if (page && page_mapcount(page) == 1) + flags |= PM_MMAP_EXCLUSIVE; + + for (; addr != end; addr += PAGE_SIZE) { + pagemap_entry_t pme = make_pme(frame, flags); + + err = add_to_pagemap(addr, &pme, pm); + if (err) + break; + if (pm->show_pfn) { + if (flags & PM_PRESENT) + frame++; + else if (flags & PM_SWAP) + frame += (1 << MAX_SWAPFILES_SHIFT); + } + } + spin_unlock(ptl); + walk->action = ACTION_CONTINUE; + return err; + } + + if (pud_trans_unstable(pudp)) { + walk->action = ACTION_AGAIN; + return 0; + } +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ + return err; +} + #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, @@ -1607,6 +1665,7 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, #endif /* HUGETLB_PAGE */ static const struct mm_walk_ops pagemap_ops = { + .pud_entry = pagemap_pud_range, .pmd_entry = pagemap_pmd_range, .pte_hole = pagemap_pte_hole, .hugetlb_entry = pagemap_hugetlb_range, From patchwork Wed Sep 2 18:06:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751501 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 64802109B for ; Wed, 2 Sep 2020 18:07:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1D9832083B for ; Wed, 2 Sep 2020 18:07:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="XVaqdSTy"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="pCQuiEYb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D9832083B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5FDEC900012; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1163D900019; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81CAE900019; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id F1471900019 for ; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id AF419181AEF10 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-FDA: 77218901550.19.money93_630c5ef270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 8411E1AD1B9 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,0e896e1e8e94b7d4,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:355:379:541:800:960:973:988:989:1260:1261:1311:1314:1345:1359:1434:1437:1515:1535:1544:1711:1730:1747:1777:1792:2393:2559:2562:3138:3139:3140:3141:3142:3354:3865:3867:3871:3872:4119:4321:4605:5007:6120:6261:6630:6653:7576:8784:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:13894:14110:14181:14721:21080:21451:21627:21990:30054:30064,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y8bpws6zhc3wbn3bpkry1gyss7mypknzzses1ii5hg4qtkj4f9igffqzkr11d.nmt1c3ws15paxi6kzc467jhueofgf9dsufuaz19gh1an3ehxijyzscyhrnreyin.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:25,LUA_SUMMARY:none X-HE-Tag: money93_630c5ef270a2 X-Filterd-Recvd-Size: 8357 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:34 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 8D3115C01A5; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=uOcSd2/dn38W3 Qs9ZzkXcxJ7lzZXPjXumgXGRk3+suA=; b=XVaqdSTyUUvwRM3CSZfzfwBaRmW72 50JZ5nA6SaruM2KVMvTK5XALc8GyKZmhjIXtgoMvme35mqwxPDZ+l3DGzRED5Klw hI/4FKLazf/c1W/pJy3SH3micGMvaFOYx97Kpf0UmY5TwERH1E3m37Q2sG1YQ05q pz6U6zcqaNjv5sVx88Oc0ey4zxLbFohFJA60Uwt5ehEkstCLEgVOPPpupFxn86DJ U5MBw+/9lCgE+Pd+RFC2bbtHNQARW3d+F4Ia40h8y7OkCd/qMNuzZEuPmnIdJvxu 5dqUrBj3Pz0CxYvEvEHoOfQBc53NZtV9VdKFAlNgJ1Ray+X9wivQlBlbg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=uOcSd2/dn38W3Qs9ZzkXcxJ7lzZXPjXumgXGRk3+suA=; b=pCQuiEYb GFhSIDEUlJCq8HAvwoXlbk/J/mROrisfgoalZO1s4ZZ8WjAa57Ax8LP2BiWefrru BDECL6N/1qyHb5My1GiTmtEneGFGC6LdvjkHjOw/jGuuQAPXHLu52GTPjsesTnAi VKcyJIrckGNPQl91h/KI5XDycDMqZD5n4+0jJLD0CJqA2OlWBeDMN4ukfaAd2I3V yrGHxXNDPcWC2AUZGuXhQ+LAI6/+71xH6ZA+XQVsv3EZgPOS1BcSEHEYlQsP9bFw WLxM7ngw8JJ8jszIy094WcNlshGbiKrMMfEZ3FHrMto5skGBi/FqPwQjloj562up XifL5G8cJlst9A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpeduvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id CFCD73060067; Wed, 2 Sep 2020 14:06:33 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 13/16] mm: thp: add a knob to enable/disable 1GB THPs. Date: Wed, 2 Sep 2020 14:06:25 -0400 Message-Id: <20200902180628.4052244-14-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 8411E1AD1B9 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan It does not affect existing 1GB THPs. It is similar to the knob for 2MB THPs. Signed-off-by: Zi Yan --- include/linux/huge_mm.h | 14 ++++++++++++++ mm/huge_memory.c | 40 ++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 2 +- 3 files changed, 55 insertions(+), 1 deletion(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index c7bc40c4a5e2..3bf8d8a09f08 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -119,6 +119,8 @@ enum transparent_hugepage_flag { #ifdef CONFIG_DEBUG_VM TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, #endif + TRANSPARENT_PUD_HUGEPAGE_FLAG, + TRANSPARENT_PUD_HUGEPAGE_REQ_MADV_FLAG, }; struct kobject; @@ -184,6 +186,18 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) } bool transparent_hugepage_enabled(struct vm_area_struct *vma); +static inline bool transparent_pud_hugepage_enabled(struct vm_area_struct *vma) +{ + if (transparent_hugepage_enabled(vma)) { + if (transparent_hugepage_flags & (1 << TRANSPARENT_PUD_HUGEPAGE_FLAG)) + return true; + if (transparent_hugepage_flags & + (1 << TRANSPARENT_PUD_HUGEPAGE_REQ_MADV_FLAG)) + return !!(vma->vm_flags & VM_HUGEPAGE); + } + + return false; +} #define HPAGE_CACHE_INDEX_MASK (HPAGE_PMD_NR - 1) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e209c2dfc5b7..e1440a13da63 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -49,9 +49,11 @@ unsigned long transparent_hugepage_flags __read_mostly = #ifdef CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS (1< X-Patchwork-Id: 11751503 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F417C109A for ; Wed, 2 Sep 2020 18:07:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ACE2B2083B for ; Wed, 2 Sep 2020 18:07:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="d0fu8CF1"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="D+VDVEja" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ACE2B2083B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 97BD8900019; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 2C3FE900025; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A913290001F; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 1A6AA900023 for ; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D1FEF8248047 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-FDA: 77218901550.13.girls97_0114837270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id A656718140B60 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,0d7882436ded0b65,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:41:69:355:379:541:560:800:960:966:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1535:1544:1711:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2731:2895:3138:3139:3140:3141:3142:3352:3867:3868:3871:4119:4250:4321:4385:5007:6119:6120:6261:6630:6653:7576:7903:7904:8957:9592:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13894:14096:14110:14181:14721:21080:21627:21966:21990:30054:30064:30070,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yrc3f66hx1agz7q66fxooouxqrrocr9d7k8miq5yty48aw661ychix3monajd.q8mjbizexzm3our1kg39ogajbcmo1jttu9zxwfx8cqhhaamjjeewhmym8ufmhzu.r-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: girls97_0114837270a2 X-Filterd-Recvd-Size: 8657 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id C83AF5C0228; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=s9cLKQkvqUgXX 9qXewNv9AkRjEs8xeCBdkOQExd8Kic=; b=d0fu8CF1/88vb3bsZ//Ho24Y0suIX KQI0dXvhwHw6lSXXK7Ssxhq8nomgdsyFgR63v8tb4xjpetozezYfOHCe3yqAhe9A cYfpbx/ETEuYd/8hWLjP6GD/YHxEggMXs14iCEW1MKR2+To1GaMYxOTSkuxijhwt EJJp9HrWfU/MphR0SME/Wa8eMYuM7b+XasdN83zg6zSNQIv1AopRE2/IR6SLbwve Dmm6+TE1oskLFZAA9AcZyxqwytCQDcV1n12kESu6xGbLch4Tsb4XgQGODsWTsGBs lKVyENQ+EdxAkKBlqzvtbYbq02eBWevmzOuKac+VoIj49z4KpNP7CpIgg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=s9cLKQkvqUgXX9qXewNv9AkRjEs8xeCBdkOQExd8Kic=; b=D+VDVEja tEWQh/q4xvCmWQ0PreIazjh+j4MqjMIrUNRr3K2yx0KtC/6EOTtdJm8YrGIlbNj1 saYcp8FpDifWHUxM/AKiHxxrUknO7J+MW8WLawZJfMJOaZhvV51jdI0An7xgeBQt UFw+3s4Cy3Owm1e6EhqKu0zzw2oXyVBF2egbD31IiiI/USetF9iUqIlnt/Rw3OaP Vaj8AO01ZnSddKw8xORhiFSA9XeOSRR7G08/UwEhLna0ECpd7pcE6vA+94FgnQ44 zzJ3uYxq+poFQ2jBUzjNkTJOH3QkoEKdFQeykpggx2OzbIEVLyZRxrGOf0y1s8cK XO8WYhIsaegQIw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpeduvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 1633630600A6; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 14/16] mm: page_alloc: >=MAX_ORDER pages allocation an deallocation. Date: Wed, 2 Sep 2020 14:06:26 -0400 Message-Id: <20200902180628.4052244-15-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: A656718140B60 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Use alloc_contig_pages for allocation and destroy_compound_gigantic_page for deallocation, so 1GB THP can be created and destroyed without changing MAX_ORDER. Signed-off-by: Zi Yan --- mm/hugetlb.c | 22 ---------------------- mm/internal.h | 2 ++ mm/mempolicy.c | 15 ++++++++++++++- mm/page_alloc.c | 33 ++++++++++++++++++++++++++++----- 4 files changed, 44 insertions(+), 28 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4113d7b66fee..d5357778b026 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1211,26 +1211,6 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed) nr_nodes--) #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE -static void destroy_compound_gigantic_page(struct page *page, - unsigned int order) -{ - int i; - int nr_pages = 1 << order; - struct page *p = page + 1; - - atomic_set(compound_mapcount_ptr(page), 0); - if (hpage_pincount_available(page)) - atomic_set(compound_pincount_ptr(page), 0); - - for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) { - clear_compound_head(p); - set_page_refcounted(p); - } - - set_compound_order(page, 0); - __ClearPageHead(page); -} - static void free_gigantic_page(struct page *page, unsigned int order) { /* @@ -1288,8 +1268,6 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, return NULL; } static inline void free_gigantic_page(struct page *page, unsigned int order) { } -static inline void destroy_compound_gigantic_page(struct page *page, - unsigned int order) { } #endif static void update_and_free_page(struct hstate *h, struct page *page) diff --git a/mm/internal.h b/mm/internal.h index 10c677655912..520fd9b5e18a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -620,4 +620,6 @@ struct migration_target_control { gfp_t gfp_mask; }; +void destroy_compound_gigantic_page(struct page *page, + unsigned int order); #endif /* __MM_INTERNAL_H */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eddbe4e56c73..4bae089e7a89 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2138,7 +2138,12 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, { struct page *page; - page = __alloc_pages(gfp, order, nid); + if (order > MAX_ORDER) { + page = alloc_contig_pages(1UL< MAX_ORDER) { + page = alloc_contig_pages(1UL<= MAX_ORDER) { + destroy_compound_gigantic_page(page, order); + free_contig_range(page_to_pfn(page), 1 << order); + } else { + migratetype = get_pfnblock_migratetype(page, pfn); + local_irq_save(flags); + __count_vm_events(PGFREE, 1 << order); + free_one_page(page_zone(page), page, pfn, order, migratetype); + local_irq_restore(flags); + } } void __free_pages_core(struct page *page, unsigned int order) From patchwork Wed Sep 2 18:06:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751505 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F0985109A for ; Wed, 2 Sep 2020 18:07:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AC50F2100A for ; Wed, 2 Sep 2020 18:07:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="UPUzl0b8"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="idZ69wOS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC50F2100A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D156B900024; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5353090001F; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D4E0E900023; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193]) by kanga.kvack.org (Postfix) with ESMTP id 28F06900024 for ; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E4AE71F0A for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-FDA: 77218901550.24.wish75_3908a3c270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id BC3A81A4A5 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,2d2d9fe2210ee669,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:4:41:69:355:379:421:541:800:960:966:973:988:989:1260:1261:1311:1314:1345:1359:1437:1515:1730:1747:1777:1792:1801:2194:2196:2198:2199:2200:2201:2393:2559:2562:2637:2689:2693:2731:2892:2898:3138:3139:3140:3141:3142:3355:3865:3866:3867:3868:3870:3871:3872:3874:4250:4321:4385:4605:5007:6119:6120:6261:6630:6653:7576:7875:7903:8603:9592:10004:11026:11473:11657:11658:11914:12043:12296:12438:12555:12679:12895:12986:13894:13972:21080:21451:21627:21987:21990:30029:30045:30054:30056:30064:30070:30075,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04y8mhmx3rbwm857n5u38nsfisp5oycc7qw69nujf8x5bqhot6hh5do5xcqyzo5.bbhmxkmk81fsi9qfw4jpbdxnbqsi4o5gizpayy8rmndi58xxdj9oty94d9giye5.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: wish75_3908a3c270a2 X-Filterd-Recvd-Size: 15457 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 0E0A85C016C; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=z/QjPKfyJw+jN wp1dt2Sd4gqN+oJ4ARlb6FmyazN+3o=; b=UPUzl0b8ylt7A/Q4qNv5AkVQikCYz R+8ftNfuoYYpCrWnblBYhzHz/NSaqTQvfae5fqjUFarDKmKwMMhYhzbSO1iBLbve C3OruCDsj4T+Q99RMKcFZuplhyzJ5n2vlvSyZGeqqKGErM5MQIrS3TVwSSXhHJm/ 5KgypYQecNOLDL51BcenM+EkIFXVatutWGrLiXICH0IQy+5Zt71rvsU7BRt4KfjG gNNPzdhQJBPLF9tXhkjtLx77JoZfWSP8+NNkkNlltBjkMWhJzJGxnnm8vA6E64sl LOTcDo5U49aFdwkLmwx7iHevALTCAXwslY+5nv6uI8AmhnBAlQX6vbhZw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=z/QjPKfyJw+jNwp1dt2Sd4gqN+oJ4ARlb6FmyazN+3o=; b=idZ69wOS 5rl1TfFPc8QE7dt/ilcmplHQEq6ztHpUDf2l1bojIFlNoECIWDSJYuiu0lOP00XZ 0RHy+3xBSoo2eQrAPxHosyMUcYtwOeU3zmShZqzqEQ48uibRpltnsIRumjucquWr ltlRa1bdue7XI3CCfHHGHejudDeDcEMke0jpx+i7kcr6hK0qf1Vdh82CNMeb34o9 SCbIcPbqw3OAxSSK7qB4SPTSiT6pn/iPzJaD/toZwIGlgA9UxYHRH9YwNdAZcIqY PMCPE/fXXrGD7uVs5a0oyQ66FPvBsT8K38pbPWTCQkpwncybqoBzBkYp6iWiwmjE +5TQuBBbHsB06w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpeduvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 5017E30600B7; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 15/16] hugetlb: cma: move cma reserve function to cma.c. Date: Wed, 2 Sep 2020 14:06:27 -0400 Message-Id: <20200902180628.4052244-16-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: BC3A81A4A5 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan It will be used by other allocations, like 1GB THP allocation in the upcoming commit. Signed-off-by: Zi Yan --- .../admin-guide/kernel-parameters.txt | 2 +- arch/arm64/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 2 +- arch/x86/kernel/setup.c | 8 +- include/linux/cma.h | 15 ++++ include/linux/hugetlb.h | 12 --- mm/cma.c | 88 +++++++++++++++++++ mm/hugetlb.c | 88 ++----------------- 8 files changed, 118 insertions(+), 99 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 68fee5e034ca..600668ee0ac7 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1507,7 +1507,7 @@ hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET registers. Default set by CONFIG_HPET_MMAP_DEFAULT. - hugetlb_cma= [HW] The size of a cma area used for allocation + hugepage_cma= [HW] The size of a cma area used for allocation of gigantic hugepages. Format: nn[KMGTPE] diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 55ecf6de9ff7..8a3ad7eaae49 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -52,7 +52,7 @@ void __init arm64_hugetlb_cma_reserve(void) * breaking this assumption. */ WARN_ON(order <= MAX_ORDER); - hugetlb_cma_reserve(order); + hugepage_cma_reserve(order); } #endif /* CONFIG_CMA */ diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 26292544630f..d608e58cb69b 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -699,6 +699,6 @@ void __init gigantic_hugetlb_cma_reserve(void) if (order) { VM_WARN_ON(order < MAX_ORDER); - hugetlb_cma_reserve(order); + hugepage_cma_reserve(order); } } diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 52e83ba607b3..93c8fbdff972 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -16,7 +16,7 @@ #include #include #include -#include +#include #include #include @@ -640,7 +640,7 @@ static void __init trim_snb_memory(void) * already been reserved. */ memblock_reserve(0, 1<<20); - + for (i = 0; i < ARRAY_SIZE(bad_pages); i++) { if (memblock_reserve(bad_pages[i], PAGE_SIZE)) printk(KERN_WARNING "failed to reserve 0x%08lx\n", @@ -732,7 +732,7 @@ static void __init trim_low_memory_range(void) { memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); } - + /* * Dump out kernel offset information on panic. */ @@ -1142,7 +1142,7 @@ void __init setup_arch(char **cmdline_p) dma_contiguous_reserve(max_pfn_mapped << PAGE_SHIFT); if (boot_cpu_has(X86_FEATURE_GBPAGES)) - hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); + hugepage_cma_reserve(PUD_SHIFT - PAGE_SHIFT); /* * Reserve memory for crash kernel after SRAT is parsed so that it diff --git a/include/linux/cma.h b/include/linux/cma.h index 6ff79fefd01f..abcf7ab712f9 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -47,4 +47,19 @@ extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); + +extern void cma_reserve(int min_order, unsigned long requested_size, + const char *name, struct cma *cma_struct[N_MEMORY]); +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) +extern void __init hugepage_cma_reserve(int order); +extern void __init hugepage_cma_check(void); +#else +static inline void __init hugepage_cma_check(void) +{ +} +static inline void __init hugepage_cma_reserve(int order) +{ +} +#endif + #endif diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d5cc5f802dd4..087d13a1dc24 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -935,16 +935,4 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, return ptl; } -#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) -extern void __init hugetlb_cma_reserve(int order); -extern void __init hugetlb_cma_check(void); -#else -static inline __init void hugetlb_cma_reserve(int order) -{ -} -static inline __init void hugetlb_cma_check(void) -{ -} -#endif - #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/cma.c b/mm/cma.c index 7f415d7cda9f..aa3a17d8a191 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -37,6 +37,10 @@ #include "cma.h" struct cma cma_areas[MAX_CMA_AREAS]; +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) +struct cma *hugepage_cma[MAX_NUMNODES]; +#endif +unsigned long hugepage_cma_size __initdata; unsigned cma_area_count; static DEFINE_MUTEX(cma_mutex); @@ -541,3 +545,87 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) return 0; } + +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) +/* + * cma_reserve() - reserve CMA for gigantic pages on nodes with memory + * + * must be called after free_area_init() that updates N_MEMORY via node_set_state(). + * cma_reserve() scans over N_MEMORY nodemask and hence expects the platforms + * to have initialized N_MEMORY state. + */ +void __init cma_reserve(int min_order, unsigned long requested_size, const char *name, + struct cma *cma_struct[MAX_NUMNODES]) +{ + unsigned long size, reserved, per_node; + int nid; + + if (!requested_size) + return; + + if (requested_size < (PAGE_SIZE << min_order)) { + pr_warn("%s_cma: cma area should be at least %lu MiB\n", + name, (PAGE_SIZE << min_order) / SZ_1M); + return; + } + + /* + * If 3 GB area is requested on a machine with 4 numa nodes, + * let's allocate 1 GB on first three nodes and ignore the last one. + */ + per_node = DIV_ROUND_UP(requested_size, nr_online_nodes); + pr_info("%s_cma: reserve %lu MiB, up to %lu MiB per node\n", + name, requested_size / SZ_1M, per_node / SZ_1M); + + reserved = 0; + for_each_node_state(nid, N_ONLINE) { + int res; + char node_name[20]; + + size = min(per_node, requested_size - reserved); + size = round_up(size, PAGE_SIZE << min_order); + + snprintf(node_name, 20, "%s%d", name, nid); + res = cma_declare_contiguous_nid(0, size, 0, + PAGE_SIZE << min_order, + 0, false, node_name, + &cma_struct[nid], nid); + if (res) { + pr_warn("%s_cma: reservation failed: err %d, node %d", + name, res, nid); + continue; + } + + reserved += size; + pr_info("%s_cma: reserved %lu MiB on node %d\n", + name, size / SZ_1M, nid); + + if (reserved >= requested_size) + break; + } +} + +static bool hugepage_cma_reserve_called __initdata; + +static int __init cmdline_parse_hugepage_cma(char *p) +{ + hugepage_cma_size = memparse(p, &p); + return 0; +} + +early_param("hugepage_cma", cmdline_parse_hugepage_cma); + +void __init hugepage_cma_reserve(int order) +{ + hugepage_cma_reserve_called = true; + cma_reserve(order, hugepage_cma_size, "hugepage", hugepage_cma); +} + +void __init hugepage_cma_check(void) +{ + if (!hugepage_cma_size || hugepage_cma_reserve_called) + return; + + pr_warn("hugepage_cma: the option isn't supported by current arch\n"); +} +#endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d5357778b026..6685cad879d0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -48,9 +48,9 @@ unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; #ifdef CONFIG_CMA -static struct cma *hugetlb_cma[MAX_NUMNODES]; +extern struct cma *hugepage_cma[MAX_NUMNODES]; #endif -static unsigned long hugetlb_cma_size __initdata; +extern unsigned long hugepage_cma_size __initdata; /* * Minimum page order among possible hugepage sizes, set to a proper value @@ -1218,7 +1218,7 @@ static void free_gigantic_page(struct page *page, unsigned int order) * cma_release() returns false. */ #ifdef CONFIG_CMA - if (cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)) + if (cma_release(hugepage_cma[page_to_nid(page)], page, 1 << order)) return; #endif @@ -1237,10 +1237,10 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, int node; for_each_node_mask(node, *nodemask) { - if (!hugetlb_cma[node]) + if (!hugepage_cma[node]) continue; - page = cma_alloc(hugetlb_cma[node], nr_pages, + page = cma_alloc(hugepage_cma[node], nr_pages, huge_page_order(h), true); if (page) return page; @@ -2532,8 +2532,8 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) for (i = 0; i < h->max_huge_pages; ++i) { if (hstate_is_gigantic(h)) { - if (hugetlb_cma_size) { - pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n"); + if (hugepage_cma_size) { + pr_warn_once("HugeTLB: hugepage_cma is enabled, skip boot time allocation\n"); break; } if (!alloc_bootmem_huge_page(h)) @@ -3209,7 +3209,7 @@ static int __init hugetlb_init(void) } } - hugetlb_cma_check(); + hugepage_cma_check(); hugetlb_init_hstates(); gather_bootmem_prealloc(); report_hugepages(); @@ -5622,75 +5622,3 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason) spin_unlock(&hugetlb_lock); } } - -#ifdef CONFIG_CMA -static bool cma_reserve_called __initdata; - -static int __init cmdline_parse_hugetlb_cma(char *p) -{ - hugetlb_cma_size = memparse(p, &p); - return 0; -} - -early_param("hugetlb_cma", cmdline_parse_hugetlb_cma); - -void __init hugetlb_cma_reserve(int order) -{ - unsigned long size, reserved, per_node; - int nid; - - cma_reserve_called = true; - - if (!hugetlb_cma_size) - return; - - if (hugetlb_cma_size < (PAGE_SIZE << order)) { - pr_warn("hugetlb_cma: cma area should be at least %lu MiB\n", - (PAGE_SIZE << order) / SZ_1M); - return; - } - - /* - * If 3 GB area is requested on a machine with 4 numa nodes, - * let's allocate 1 GB on first three nodes and ignore the last one. - */ - per_node = DIV_ROUND_UP(hugetlb_cma_size, nr_online_nodes); - pr_info("hugetlb_cma: reserve %lu MiB, up to %lu MiB per node\n", - hugetlb_cma_size / SZ_1M, per_node / SZ_1M); - - reserved = 0; - for_each_node_state(nid, N_ONLINE) { - int res; - char name[20]; - - size = min(per_node, hugetlb_cma_size - reserved); - size = round_up(size, PAGE_SIZE << order); - - snprintf(name, 20, "hugetlb%d", nid); - res = cma_declare_contiguous_nid(0, size, 0, PAGE_SIZE << order, - 0, false, name, - &hugetlb_cma[nid], nid); - if (res) { - pr_warn("hugetlb_cma: reservation failed: err %d, node %d", - res, nid); - continue; - } - - reserved += size; - pr_info("hugetlb_cma: reserved %lu MiB on node %d\n", - size / SZ_1M, nid); - - if (reserved >= hugetlb_cma_size) - break; - } -} - -void __init hugetlb_cma_check(void) -{ - if (!hugetlb_cma_size || cma_reserve_called) - return; - - pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); -} - -#endif /* CONFIG_CMA */ From patchwork Wed Sep 2 18:06:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 11751507 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 03E7A109A for ; Wed, 2 Sep 2020 18:07:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B72B62083B for ; Wed, 2 Sep 2020 18:07:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="TEpelWRj"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="SVTFniIB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B72B62083B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0F74D90001F; Wed, 2 Sep 2020 14:06:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 82838900023; Wed, 2 Sep 2020 14:06:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1ADAD900024; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 7DBE3900012 for ; Wed, 2 Sep 2020 14:06:36 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 324523632 for ; Wed, 2 Sep 2020 18:06:36 +0000 (UTC) X-FDA: 77218901592.27.time91_590628c270a2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 016FB3D668 for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) X-Spam-Summary: 1,0,0,c48a314e912f682f,d41d8cd98f00b204,zi.yan@sent.com,,RULES_HIT:1:2:41:355:379:541:800:960:966:968:973:988:989:1260:1261:1311:1314:1345:1359:1431:1437:1515:1605:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2731:3138:3139:3140:3141:3142:3865:3866:3867:3868:3871:3872:4049:4250:4321:4385:4423:4605:5007:6119:6120:6261:6630:6653:7576:7875:7901:7903:8603:8957:9010:10004:11026:11473:11658:11914:12043:12291:12296:12438:12555:12679:12683:12895:12986:13161:13229:13894:14096:21080:21451:21627:21987:21990:30003:30034:30054:30064:30070,0,RBL:66.111.4.25:@sent.com:.lbl8.mailshell.net-66.100.201.100 62.18.0.100;04yfx5sztj3kpigcurooab1czk8x9opqcf5urx1fxfik1j4kp1n54zsdzujquwb.rhgo7tgbbb1ojyi438gzb1d8k4ayrzhyweirqmytdkwp8pj7fwrci6dtktwcgfb.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: time91_590628c270a2 X-Filterd-Recvd-Size: 10613 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 18:06:35 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 3434E5C01D3; Wed, 2 Sep 2020 14:06:35 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 02 Sep 2020 14:06:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=vo+WlP2P6ZaoJ IBCKZI+fyRIw5q3MoMn49o4u3ecZgg=; b=TEpelWRjPEIN4vRp/+z6iP4vNaUPK TCfhY2IrfliRvbztLqUxe/30zaxBLRMNaOfnCx4pTfN+O/8DNd+Pt3aXgK5LLiId 2X3/ltk49Uhbc1yU2cSrD7mnSHuQbOcH7vRCPFnXZuYWbyTdmOxM1ldW0UNO89+w ZAfcvluypPsVDAYYBqmdGlLd42+zsXSV5J/z1vL4tvDVfhZ3ATygP5WeCGNGR7DS 75E9INuE3acxE/LvEpuGf8LVWWa/Iqrhei2VQAvU29p2DbZvX1snfWYdZnqv1qzf oj1hUediWQjrhLVVizqERzNJfk9Y5GjqzAfQG8JNrnyTNus8FC6pInQ/w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=vo+WlP2P6ZaoJIBCKZI+fyRIw5q3MoMn49o4u3ecZgg=; b=SVTFniIB rrl/tDZwQceF1nSi0Gj3wgFNyMZ7A2wCl+DZP9PuJ3elUf1hNTd7D2tJADQzuAGO r2/A2hglI/Pii+/yTm/oTYCtqvnt51yzJgp6Z8ROmqUyrKYqXkEV9K4voQbg6nzb K3DHuFUEOuep6H6xZWIXjyJ0y5h98frbJoTtrOlxYJxd0GvvPq2WOF64fzFQs0+B gFHgN/TDH+SrqWIVa6MM8v9+RcppP5/OfyZMVC95TVnLpQM/d8FIiyLjF0PbbOEi kfQqWy98Qy/ZBJ+j1lAsi1fX7QM/hrXgE4Vfdv3x1RON3hpT+r8AJAL5SLYtwIRP dp8XSwGkGt5nhA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrudefledguddvudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcu jggrnhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhf ffveektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphep uddvrdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpeduvdenucfrrghrrg hmpehmrghilhhfrhhomhepiihirdihrghnsehsvghnthdrtghomh X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 89BD23060272; Wed, 2 Sep 2020 14:06:34 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org, Roman Gushchin Cc: Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 16/16] mm: thp: use cma reservation for pud thp allocation. Date: Wed, 2 Sep 2020 14:06:28 -0400 Message-Id: <20200902180628.4052244-17-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200902180628.4052244-1-zi.yan@sent.com> References: <20200902180628.4052244-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Queue-Id: 016FB3D668 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Sharing hugepage_cma reservation with hugetlb for pud thp allocaiton. The reserved cma regions still can be used for moveable page allocations. During 1GB page split, all subpages are cleared from the CMA bitmap, since they are no more 1GB pages and will be freed via the normal path instead of cma_release(). Signed-off-by: Zi Yan --- include/linux/cma.h | 3 +++ include/linux/huge_mm.h | 10 ++++++++++ mm/cma.c | 31 +++++++++++++++++++++++++++++++ mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++ mm/mempolicy.c | 12 +++++++++--- mm/page_alloc.c | 3 ++- 6 files changed, 85 insertions(+), 4 deletions(-) diff --git a/include/linux/cma.h b/include/linux/cma.h index abcf7ab712f9..b765d19e4052 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -46,6 +46,9 @@ extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, bool no_warn); extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); +extern bool cma_clear_bitmap_if_in_range(struct cma *cma, const struct page *page, + unsigned int count); + extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); extern void cma_reserve(int min_order, unsigned long requested_size, diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 3bf8d8a09f08..5a45877055bb 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -24,6 +24,8 @@ extern struct page *follow_trans_huge_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pud, unsigned int flags); +extern struct page *alloc_thp_pud_page(int nid); +extern bool free_thp_pud_page(struct page *page, int order); #else static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud) { @@ -43,6 +45,14 @@ struct page *follow_trans_huge_pud(struct vm_area_struct *vma, { return NULL; } +struct page *alloc_thp_pud_page(int nid) +{ + return NULL; +} +extern bool free_thp_pud_page(struct page *page, int order); +{ + return false; +} #endif extern vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd); diff --git a/mm/cma.c b/mm/cma.c index aa3a17d8a191..3f721b8f7ccd 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -532,6 +532,37 @@ bool cma_release(struct cma *cma, const struct page *pages, unsigned int count) return true; } +/** + * cma_clear_bitmap_if_in_range() - clear bitmap for a given page + * @cma: Contiguous memory region for which the allocation is performed. + * @pages: Allocated pages. + * @count: Number of allocated pages. + * + * This function clears bitmap of memory allocated by cma_alloc(). + * It returns false when provided pages do not belong to contiguous area and + * true otherwise. + */ +bool cma_clear_bitmap_if_in_range(struct cma *cma, const struct page *pages, + unsigned int count) +{ + unsigned long pfn; + + if (!cma || !pages) + return false; + + pfn = page_to_pfn(pages); + + if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count) + return false; + + if (pfn + count > cma->base_pfn + cma->count) + return false; + + cma_clear_bitmap(cma, pfn, count); + + return true; +} + int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) { int i; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e1440a13da63..2020b843fd97 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -64,6 +65,10 @@ static struct shrinker deferred_split_shrinker; static atomic_t huge_zero_refcount; struct page *huge_zero_page __read_mostly; +#ifdef CONFIG_CMA +extern struct cma *hugepage_cma[MAX_NUMNODES]; +#endif + bool transparent_hugepage_enabled(struct vm_area_struct *vma) { /* The addr is used to check if the vma size fits */ @@ -2526,6 +2531,13 @@ static void __split_huge_pud_page(struct page *page, struct list_head *list, /* no file-back page support yet */ VM_BUG_ON(!PageAnon(page)); + /* */ + if (IS_ENABLED(CONFIG_CMA)) { + struct cma *cma = hugepage_cma[page_to_nid(head)]; + VM_BUG_ON(!cma_clear_bitmap_if_in_range(cma, head, + thp_nr_pages(head))); + } + for (i = HPAGE_PUD_NR - HPAGE_PMD_NR; i >= 1; i -= HPAGE_PMD_NR) { __split_huge_pud_page_tail(head, i, lruvec, list); } @@ -3753,3 +3765,21 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) update_mmu_cache_pmd(vma, address, pvmw->pmd); } #endif + +struct page *alloc_thp_pud_page(int nid) +{ + struct page *page = NULL; +#ifdef CONFIG_CMA + page = cma_alloc(hugepage_cma[nid], HPAGE_PUD_NR, HPAGE_PUD_ORDER, true); +#endif + return page; +} + +bool free_thp_pud_page(struct page *page, int order) +{ + bool ret = false; +#ifdef CONFIG_CMA + ret = cma_release(hugepage_cma[page_to_nid(page)], page, 1< MAX_ORDER) { - page = alloc_contig_pages(1UL< MAX_ORDER) { - page = alloc_contig_pages(1UL<= MAX_ORDER) { destroy_compound_gigantic_page(page, order); - free_contig_range(page_to_pfn(page), 1 << order); + if (!free_thp_pud_page(page, order)) + free_contig_range(page_to_pfn(page), 1 << order); } else { migratetype = get_pfnblock_migratetype(page, pfn); local_irq_save(flags);