From patchwork Tue Jan 12 02:12:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Wang X-Patchwork-Id: 12012225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB2BDC433DB for ; Tue, 12 Jan 2021 02:12:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A7AA22EBD for ; Tue, 12 Jan 2021 02:12:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A7AA22EBD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 094726B02FA; Mon, 11 Jan 2021 21:12:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 044AA6B02FB; Mon, 11 Jan 2021 21:12:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9C556B02FC; Mon, 11 Jan 2021 21:12:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0230.hostedemail.com [216.40.44.230]) by kanga.kvack.org (Postfix) with ESMTP id D21A06B02FA for ; Mon, 11 Jan 2021 21:12:25 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9883E824556B for ; Tue, 12 Jan 2021 02:12:25 +0000 (UTC) X-FDA: 77695498650.18.thumb59_07013ff27511 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 7F316100EC667 for ; Tue, 12 Jan 2021 02:12:25 +0000 (UTC) X-HE-Tag: thumb59_07013ff27511 X-Filterd-Recvd-Size: 6089 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Jan 2021 02:12:24 +0000 (UTC) Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.60]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4DFDZb0Nhwzl3KW; Tue, 12 Jan 2021 10:11:03 +0800 (CST) Received: from huawei.com (10.174.176.179) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.498.0; Tue, 12 Jan 2021 10:12:13 +0800 From: wangbin To: , , , , , , , , , , , CC: , , , Subject: [PATCH] mm: thp: introduce NR_PARTIAL_THPS Date: Tue, 12 Jan 2021 10:12:08 +0800 Message-ID: <20210112021208.1875-1-wangbin224@huawei.com> X-Mailer: git-send-email 2.29.2.windows.3 MIME-Version: 1.0 X-Originating-IP: [10.174.176.179] X-CFilter-Loop: Reflected X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Bin Wang Currently we don't split transhuge pages on partial unmap. After using the deferred_split_huge_page() to solve the memory overhead, we still have a problem with memory count. We have no idea about how much partial unmap memory there is because the partial unmap memory is covered in transhuge pages until the pages are split. Why should we know this? Just image that there is a process, which does the following: 1)Mmap() 1GB memory and all the memory is transferred to transhuge pages by kernel. What happened: System free memory decreases 1GB. AnonHugePages increases 1GB. 2)Call madvise() don't need 1MB per transhuge page. What happened: Rss of the process decreases 512MB. AnonHugePages decreases 1GB. System free memory doesn't increase. It's confusing that the system free memory is less than expected. And this is because that we just call split_huge_pmd() on partial unmap. I think we shouldn't roll back to split_huge_page(), but we can add NR_PARTIAL_THPS in node_stat_item to show the count of partial unmap pages. We can follow the deferred_split_huge_page() codepath to record the partial unmap pages. And reduce the count when transhuge pages are split eventually. Signed-off-by: Bin Wang --- fs/proc/meminfo.c | 2 ++ include/linux/mmzone.h | 1 + mm/huge_memory.c | 7 ++++++- mm/rmap.c | 9 +++++++-- mm/vmstat.c | 1 + 5 files changed, 17 insertions(+), 3 deletions(-) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index d6fc74619625..f6f02469dd9e 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -138,6 +138,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) global_node_page_state(NR_FILE_THPS) * HPAGE_PMD_NR); show_val_kb(m, "FilePmdMapped: ", global_node_page_state(NR_FILE_PMDMAPPED) * HPAGE_PMD_NR); + show_val_kb(m, "PartFreePages: ", + global_node_page_state(NR_PARTIAL_THPS)); #endif #ifdef CONFIG_CMA diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b593316bff3d..cc417c9870ad 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -194,6 +194,7 @@ enum node_stat_item { NR_FILE_THPS, NR_FILE_PMDMAPPED, NR_ANON_THPS, + NR_PARTIAL_THPS, /* partial free pages of transhuge pages */ NR_VMSCAN_WRITE, NR_VMSCAN_IMMEDIATE, /* Prioritise for reclaim when writeback ends */ NR_DIRTIED, /* page dirtyings since bootup */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9237976abe72..2f2856cf1ed0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2788,6 +2788,8 @@ void free_transhuge_page(struct page *page) if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; list_del(page_deferred_list(page)); + __mod_node_page_state(page_pgdat(page), NR_PARTIAL_THPS, + -HPAGE_PMD_NR); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); free_compound_page(page); @@ -2880,8 +2882,11 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, if (!trylock_page(page)) goto next; /* split_huge_page() removes page from list on success */ - if (!split_huge_page(page)) + if (!split_huge_page(page)) { split++; + __mod_node_page_state(page_pgdat(page), + NR_PARTIAL_THPS, -HPAGE_PMD_NR); + } unlock_page(page); next: put_page(page); diff --git a/mm/rmap.c b/mm/rmap.c index 08c56aaf72eb..269edf41ccd7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1309,8 +1309,11 @@ static void page_remove_anon_compound_rmap(struct page *page) * page of the compound page is unmapped, but at least one * small page is still mapped. */ - if (nr && nr < thp_nr_pages(page)) + if (nr && nr < thp_nr_pages(page)) { + __mod_node_page_state(page_pgdat(page), + NR_PARTIAL_THPS, nr); deferred_split_huge_page(page); + } } else { nr = thp_nr_pages(page); } @@ -1357,8 +1360,10 @@ void page_remove_rmap(struct page *page, bool compound) if (unlikely(PageMlocked(page))) clear_page_mlock(page); - if (PageTransCompound(page)) + if (PageTransCompound(page)) { + __inc_node_page_state(page, NR_PARTIAL_THPS); deferred_split_huge_page(compound_head(page)); + } /* * It would be tidy to reset the PageAnon mapping here, diff --git a/mm/vmstat.c b/mm/vmstat.c index f8942160fc95..93459dde0dcd 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1203,6 +1203,7 @@ const char * const vmstat_text[] = { "nr_file_hugepages", "nr_file_pmdmapped", "nr_anon_transparent_hugepages", + "nr_partial_free_pages", "nr_vmscan_write", "nr_vmscan_immediate_reclaim", "nr_dirtied",