From patchwork Wed Jun 9 12:13:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12309975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4235BC48BCD for ; Wed, 9 Jun 2021 12:16:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A598961287 for ; Wed, 9 Jun 2021 12:16:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A598961287 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 48CC86B006E; Wed, 9 Jun 2021 08:16:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4162B6B0070; Wed, 9 Jun 2021 08:16:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 291A46B0071; Wed, 9 Jun 2021 08:16:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id E6CB86B006E for ; Wed, 9 Jun 2021 08:16:12 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8F7B28249980 for ; Wed, 9 Jun 2021 12:16:12 +0000 (UTC) X-FDA: 78234082584.12.8769C0B Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf24.hostedemail.com (Postfix) with ESMTP id 79F38A00026A for ; Wed, 9 Jun 2021 12:16:07 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id t9so19292314pgn.4 for ; Wed, 09 Jun 2021 05:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fJItIIJHgO+A1N5IYOVKn5JeE414bhhp/oA3qles6Xw=; b=WlLFVMDvmulHtyEubYUDQU+7ljThYaUnjabUlZJLsA5GieKSU/8XkzkYlt1dydnyas 1p/KUyW3jX5i6O0IT77DXTefphWxh+5xQpdCGdtjvT7ynnHIWz1oSFozd3HRbr8SbQ+3 3BjVNa+WfTXUCGvFwWBkA3+maJBbRE9gDXvrif5OeXSnu3A50z29rTRnxoidxxXXJwme ZYuSyhXyIdfIlMn92Xy9IWg+kQ70GzDoNpqKeuKEIkgsbobaedlwRk14+AZPMMsQ0jGk ZxmT937nSKVj/gS900GTe8G3oa0ilU8/OuC+eBhETclxl/qSYQvCnGXu1xRssjde7JjB yEnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fJItIIJHgO+A1N5IYOVKn5JeE414bhhp/oA3qles6Xw=; b=HV3JGHwCjQTWxrd7laSi07T6Du2J1IZVrRmvlgxfXuhs+wu6011UxvskLpZ2kIez7H 5YBLe4Lz5FYvoxP9BiIsrkb14WfwHPM5IGc0fHkjC46GW3Njbrc5hBLywtLKkng+3R/I 0KcaemQXyATQ+c+PV8RcMLyN3JQATl/f2GQOCYrqEGJeU3xSHGJa1Zzcs2K8xrSwf4Oi xndnNs8o6Us841tpjKqc4TnqfiN3oOwI3oTV3oXoDyEFqAIyopj5zxulqVeYUnAHZh7E S4SGhbTyo9TzXoM91XtruJSPe9JXeCfYgIvJ41HC8Cq6F+mnomOZLsru2fFN9E/Jc6HO 3biQ== X-Gm-Message-State: AOAM533a0czqkb2rvzZ4Mv/PFvmo2YlUt/ogfx54sxloijJu4Jq4C6TA 34rW3om4a4EUHWAwedJ7/YKjVw== X-Google-Smtp-Source: ABdhPJzV04zNYLcNHfhY3m8qoB8RRWvVZ7sZMQTbXBOa04i5tBd1rQKv0iScYO3eQX+AF0GHGpm3Zg== X-Received: by 2002:a63:5050:: with SMTP id q16mr3590109pgl.318.1623240970715; Wed, 09 Jun 2021 05:16:10 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h16sm13689224pfk.119.2021.06.09.05.16.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jun 2021 05:16:10 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH 1/5] mm: hugetlb: introduce helpers to preallocate/free page tables Date: Wed, 9 Jun 2021 20:13:06 +0800 Message-Id: <20210609121310.62229-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210609121310.62229-1-songmuchun@bytedance.com> References: <20210609121310.62229-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=WlLFVMDv; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf24.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam02 X-Stat-Signature: 9jw17o5ye5onwwpcmpdjh7aub7sfb196 X-Rspamd-Queue-Id: 79F38A00026A X-HE-Tag: 1623240967-203146 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On some architectures (e.g. x86_64 and arm64), vmemmap pages are usually mapped with huge pmd. We will disable the huge pmd mapping of vmemmap pages when the feature of "Free vmemmap pages of HugeTLB page" is enabled. This can affect the non-HugeTLB pages. What we want is only mapping the vmemmap pages associated with HugeTLB pages with base page. We can split the huge pmd mapping of vmemmap pages when freeing vmemmap pages of HugeTLB page. But we need to preallocate page tables. In this patch, we introduce page tables allocationg/freeing helpers. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz --- mm/hugetlb_vmemmap.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++ mm/hugetlb_vmemmap.h | 12 ++++++++++++ 2 files changed, 66 insertions(+) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index f9f9bb212319..628e2752714f 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -170,6 +170,9 @@ */ #define pr_fmt(fmt) "HugeTLB: " fmt +#include +#include + #include "hugetlb_vmemmap.h" /* @@ -209,6 +212,57 @@ static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h) return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT; } +static inline unsigned int vmemmap_pages_per_hpage(struct hstate *h) +{ + return free_vmemmap_pages_per_hpage(h) + RESERVE_VMEMMAP_NR; +} + +static inline unsigned long vmemmap_pages_size_per_hpage(struct hstate *h) +{ + return (unsigned long)vmemmap_pages_per_hpage(h) << PAGE_SHIFT; +} + +static inline unsigned int pgtable_pages_to_prealloc_per_hpage(struct hstate *h) +{ + unsigned long vmemmap_size = vmemmap_pages_size_per_hpage(h); + + /* + * No need to pre-allocate page tables when there is no vmemmap pages + * to be freed. + */ + if (!free_vmemmap_pages_per_hpage(h)) + return 0; + + return ALIGN(vmemmap_size, PMD_SIZE) >> PMD_SHIFT; +} + +void vmemmap_pgtable_free(struct list_head *pgtables) +{ + struct page *pte_page, *t_page; + + list_for_each_entry_safe(pte_page, t_page, pgtables, lru) + pte_free_kernel(&init_mm, page_to_virt(pte_page)); +} + +int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables) +{ + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); + + while (nr--) { + pte_t *pte_p; + + pte_p = pte_alloc_one_kernel(&init_mm); + if (!pte_p) + goto out; + list_add(&virt_to_page(pte_p)->lru, pgtables); + } + + return 0; +out: + vmemmap_pgtable_free(pgtables); + return -ENOMEM; +} + /* * Previously discarded vmemmap pages will be allocated and remapping * after this function returns zero. diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index cb2bef8f9e73..306e15519da1 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -14,6 +14,8 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head); void free_huge_page_vmemmap(struct hstate *h, struct page *head); void hugetlb_vmemmap_init(struct hstate *h); +int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables); +void vmemmap_pgtable_free(struct list_head *pgtables); /* * How many vmemmap pages associated with a HugeTLB page that can be freed @@ -33,6 +35,16 @@ static inline void free_huge_page_vmemmap(struct hstate *h, struct page *head) { } +static inline int vmemmap_pgtable_prealloc(struct hstate *h, + struct list_head *pgtables) +{ + return 0; +} + +static inline void vmemmap_pgtable_free(struct list_head *pgtables) +{ +} + static inline void hugetlb_vmemmap_init(struct hstate *h) { } From patchwork Wed Jun 9 12:13:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12309977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1297C48BCF for ; Wed, 9 Jun 2021 12:16:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5A15D613CA for ; Wed, 9 Jun 2021 12:16:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A15D613CA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC5DA6B0070; Wed, 9 Jun 2021 08:16:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E76026B0071; Wed, 9 Jun 2021 08:16:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA2086B0072; Wed, 9 Jun 2021 08:16:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 968646B0070 for ; Wed, 9 Jun 2021 08:16:18 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 33E678249980 for ; Wed, 9 Jun 2021 12:16:18 +0000 (UTC) X-FDA: 78234082836.31.D3E3746 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf07.hostedemail.com (Postfix) with ESMTP id 0A2CCA00025C for ; Wed, 9 Jun 2021 12:16:12 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id k5so1236857pjj.1 for ; Wed, 09 Jun 2021 05:16:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Q5gMWEv9tC8/vo4nEZxaNdZR95vKmKnlaLF90wl0oFU=; b=fF+rSzdrOYcR9ijjFVFrOhei1ZRjgv4y5N87LenL7irAqVNjAsEvLUOZbVHDTOBl27 5GyM0gAB28wmHolfQQszATK2NLf0pC1NXbWBfc4O2tHsS5xwlTnj4pVa5e2cyPTow1Mi t750zb5lZNj4wElyK/c4dZJ0KZ406xm1z0oCfs6yn7A3LcrvojwLivhUbr+RT2meDfV6 WdPy+xrxr7qLkGmoP4Ry12kceUi99PdX8EZlpknB36edvksdwxJ8We63qW2Arq2eCeTl 2yZDbJiOrJU1Y2s9z4oM2CaR63D8ZK8+lR0FmeG8mAvgtTEfLiXm13+vG3J8R5SNq7To ZSUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Q5gMWEv9tC8/vo4nEZxaNdZR95vKmKnlaLF90wl0oFU=; b=ZCMmkrpSDlJYlwHr+JtF9Ez+zvWBTuWXkSvJF5oJDewxVAg3YZoUhqjZ6Z5/yWUrFm iF3et39fp2gjEUyZCB9YY6Vk89SJDj0qTfKYR58qoO49DgsmLVVc46SxkYL79kCMrAjV ChrPSA6q3cnyTX0ANFrf1VK5cA0Z5Slq4mrN9Isi8Jvkg91TNGnD6NOrEyGmDJKyNora FamK2101QjILPMStq27HiHXC78zKTfYie6vQoWgxF7vo5nMOLYd2l8yKrb86mo76nMfP DN2SuhYVJwxWZ6wGmE5wQJwc/QYPPu0+n955cGZNntd+/xaEG76PZDFXw7HDl2leXH8F C/3Q== X-Gm-Message-State: AOAM531HB253FRDlsZoWfVVIc15HLmGicJnGZWfYmluF2byMJpPoXP4G HeC/F31f+zYYw4XiNYBP1zqsUw== X-Google-Smtp-Source: ABdhPJzOgyiAQiBObGnuElieU/iq/wW8CUGGkfOtZNVadqwy30FNgVSA44A84vH2Tc9oftskRcB8zw== X-Received: by 2002:a17:902:d305:b029:10d:c8a3:657f with SMTP id b5-20020a170902d305b029010dc8a3657fmr4756404plc.0.1623240976435; Wed, 09 Jun 2021 05:16:16 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h16sm13689224pfk.119.2021.06.09.05.16.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jun 2021 05:16:16 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator Date: Wed, 9 Jun 2021 20:13:07 +0800 Message-Id: <20210609121310.62229-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210609121310.62229-1-songmuchun@bytedance.com> References: <20210609121310.62229-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=fF+rSzdr; spf=pass (imf07.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 0A2CCA00025C X-Stat-Signature: feqd3t9erw6n5bqmdawfsuswotrz7mac X-HE-Tag: 1623240972-866022 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If we want to split the huge PMD of vmemmap pages associated with each gigantic page allocated from bootmem allocator, we should pre-allocate the page tables from bootmem allocator. In this patch, we introduce some helpers to preallocate page tables for gigantic pages. Signed-off-by: Muchun Song --- include/linux/hugetlb.h | 3 +++ mm/hugetlb_vmemmap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ mm/hugetlb_vmemmap.h | 13 ++++++++++ 3 files changed, 79 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 03ca83db0a3e..c27a299c4211 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -622,6 +622,9 @@ struct hstate { struct huge_bootmem_page { struct list_head list; struct hstate *hstate; +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP + pte_t *vmemmap_pte; +#endif }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 628e2752714f..6f3a47b4ebd3 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -171,6 +171,7 @@ #define pr_fmt(fmt) "HugeTLB: " fmt #include +#include #include #include "hugetlb_vmemmap.h" @@ -263,6 +264,68 @@ int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables) return -ENOMEM; } +unsigned long __init gigantic_vmemmap_pgtable_prealloc(void) +{ + struct huge_bootmem_page *m, *tmp; + unsigned long nr_free = 0; + + list_for_each_entry_safe(m, tmp, &huge_boot_pages, list) { + struct hstate *h = m->hstate; + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); + unsigned long size; + + if (!nr) + continue; + + size = nr << PAGE_SHIFT; + m->vmemmap_pte = memblock_alloc_try_nid(size, PAGE_SIZE, 0, + MEMBLOCK_ALLOC_ACCESSIBLE, + NUMA_NO_NODE); + if (!m->vmemmap_pte) { + nr_free++; + list_del(&m->list); + memblock_free_early(__pa(m), huge_page_size(h)); + } + } + + return nr_free; +} + +void __init gigantic_vmemmap_pgtable_init(struct huge_bootmem_page *m, + struct page *head) +{ + struct hstate *h = m->hstate; + unsigned long pte = (unsigned long)m->vmemmap_pte; + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h); + + if (!nr) + return; + + /* + * If we had gigantic hugepages allocated at boot time, we need + * to restore the 'stolen' pages to totalram_pages in order to + * fix confusing memory reports from free(1) and another + * side-effects, like CommitLimit going negative. + */ + adjust_managed_page_count(head, nr); + + /* + * Use the huge page lru list to temporarily store the preallocated + * pages. The preallocated pages are used and the list is emptied + * before the huge page is put into use. When the huge page is put + * into use by prep_new_huge_page() the list will be reinitialized. + */ + INIT_LIST_HEAD(&head->lru); + + while (nr--) { + struct page *pte_page = virt_to_page(pte); + + __ClearPageReserved(pte_page); + list_add(&pte_page->lru, &head->lru); + pte += PAGE_SIZE; + } +} + /* * Previously discarded vmemmap pages will be allocated and remapping * after this function returns zero. diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 306e15519da1..f6170720f183 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -16,6 +16,9 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head); void hugetlb_vmemmap_init(struct hstate *h); int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables); void vmemmap_pgtable_free(struct list_head *pgtables); +unsigned long gigantic_vmemmap_pgtable_prealloc(void); +void gigantic_vmemmap_pgtable_init(struct huge_bootmem_page *m, + struct page *head); /* * How many vmemmap pages associated with a HugeTLB page that can be freed @@ -45,6 +48,16 @@ static inline void vmemmap_pgtable_free(struct list_head *pgtables) { } +static inline unsigned long gigantic_vmemmap_pgtable_prealloc(void) +{ + return 0; +} + +static inline void gigantic_vmemmap_pgtable_init(struct huge_bootmem_page *m, + struct page *head) +{ +} + static inline void hugetlb_vmemmap_init(struct hstate *h) { } From patchwork Wed Jun 9 12:13:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12309979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C5C9C48BCF for ; Wed, 9 Jun 2021 12:16:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AEB5C61287 for ; Wed, 9 Jun 2021 12:16:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AEB5C61287 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5514D6B0071; Wed, 9 Jun 2021 08:16:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 500066B0072; Wed, 9 Jun 2021 08:16:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32C326B0073; Wed, 9 Jun 2021 08:16:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 01EA76B0071 for ; Wed, 9 Jun 2021 08:16:24 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A0AE383FB for ; Wed, 9 Jun 2021 12:16:24 +0000 (UTC) X-FDA: 78234083088.31.BF3E007 Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf20.hostedemail.com (Postfix) with ESMTP id B651854F for ; Wed, 9 Jun 2021 12:16:18 +0000 (UTC) Received: by mail-pf1-f169.google.com with SMTP id q25so18250844pfh.7 for ; Wed, 09 Jun 2021 05:16:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=i+YuMGvQ8I4aHNaQaeBXk0HeHBqQjOIoX3TXzPNfO+k=; b=rL5Z4RdfL1hryYc92LT7ZOR91SgRzhlx45ro+1T6E8QEmsllYV0EFMl2OINfumlxOV Ctuye8C/sAuQ0hal6i+orsubT9l2gQh84yqBrP526QckR5YY9FrMEkrkJPehc9baKddo gFvxxkxMh2ZvDIKj6zIdgwCkF8gEgexto5kh2FyN6PGAZ7TsRivgCDNdV/Ap1+8VB/MP 1e+LNba5nHgRn8gvn+t5HRzTvbB5H/Wne7oO7OjcAd/BNBkWXEoQV6hB0ozGwxY1sUm3 G3RJfll8AKFnrA6ZEhsbSZX9AGtpi2+HsKfhIPqR2hT0YIR3ju82sGF7B7umFPrTg4EP JRnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=i+YuMGvQ8I4aHNaQaeBXk0HeHBqQjOIoX3TXzPNfO+k=; b=MWEqfiXDBZrjRmjvvCwmA/p0+dVJZmAg2TQ8qzGM5R2EDZ2hzavzQpK9tKg4QZBDUc 90krZOGA/ga3Eh3414JEzeGtI8pZQilGSi4ROezp8SPWPmzDIAhgXhDq6r0Uefm+QgJK VkZCoWZtr6J5X798asrmt8rvtXlGg3YLbwbgt7wQwNkYJpiWEl5flhZR+TtNes1T4Ksy yQyAZ5nJX+kIWR3xKOsyp9JXt1d6lDv0A5urFQRfgBLX+eeFYxzGnWSyBs5LtX86oYfh wmf6gGV3CC62XVolamvC26TXRhaUCsih5ymvgmk0Lmf+wRZRCm54/+CVLBkh4XTtmwhy C5RQ== X-Gm-Message-State: AOAM531fEPbPsX3E0o5tFWT/vsEVtugswDeYptK3oeQOKslC9RjQIluU XraZwAbnJfM6jQ4Tlp8ilwytiQ== X-Google-Smtp-Source: ABdhPJxKU3CaWzedSOu29IBrceGRxurTCuv80BSN3kIYScZbcRjIkmko/im28NGvJPIQvcfAHTGVkA== X-Received: by 2002:a05:6a00:844:b029:2f4:829f:e186 with SMTP id q4-20020a056a000844b02902f4829fe186mr828648pfk.31.1623240982829; Wed, 09 Jun 2021 05:16:22 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h16sm13689224pfk.119.2021.06.09.05.16.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jun 2021 05:16:22 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH 3/5] mm: sparsemem: split the huge PMD mapping of vmemmap pages Date: Wed, 9 Jun 2021 20:13:08 +0800 Message-Id: <20210609121310.62229-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210609121310.62229-1-songmuchun@bytedance.com> References: <20210609121310.62229-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B651854F Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=rL5Z4Rdf; spf=pass (imf20.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.169 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Stat-Signature: 9da5a46araj3rhzeq8xpr5bd77as3hi1 X-HE-Tag: 1623240978-281758 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the vmemmap is huge PMD mapped, we should split the huge PMD firstly and then we can change the PTE page table entry. In this patch, we add the ability of splitting the huge PMD mapping of vmemmap pages. Signed-off-by: Muchun Song --- include/linux/mm.h | 2 +- mm/hugetlb.c | 42 ++++++++++++++++++++++++++++++++++-- mm/hugetlb_vmemmap.c | 3 ++- mm/sparse-vmemmap.c | 61 +++++++++++++++++++++++++++++++++++++++++++++------- 4 files changed, 96 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cadc8cc2c715..b97e1486c5c1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3056,7 +3056,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) #endif void vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse); + unsigned long reuse, struct list_head *pgtables); int vmemmap_remap_alloc(unsigned long start, unsigned long end, unsigned long reuse, gfp_t gfp_mask); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c3b2a8a494d6..3137c72d9cc7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1609,6 +1609,13 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void __prep_new_huge_page(struct hstate *h, struct page *page) { free_huge_page_vmemmap(h, page); + /* + * Because we store preallocated pages on @page->lru, + * vmemmap_pgtable_free() must be called before the + * initialization of @page->lru in INIT_LIST_HEAD(). + */ + vmemmap_pgtable_free(&page->lru); + INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); hugetlb_set_page_subpool(page, NULL); @@ -1775,14 +1782,29 @@ static struct page *alloc_fresh_huge_page(struct hstate *h, nodemask_t *node_alloc_noretry) { struct page *page; + LIST_HEAD(pgtables); + + if (vmemmap_pgtable_prealloc(h, &pgtables)) + return NULL; if (hstate_is_gigantic(h)) page = alloc_gigantic_page(h, gfp_mask, nid, nmask); else page = alloc_buddy_huge_page(h, gfp_mask, nid, nmask, node_alloc_noretry); - if (!page) + if (!page) { + vmemmap_pgtable_free(&pgtables); return NULL; + } + + /* + * Use the huge page lru list to temporarily store the preallocated + * pages. The preallocated pages are used and the list is emptied + * before the huge page is put into use. When the huge page is put + * into use by __prep_new_huge_page() the list will be reinitialized. + */ + INIT_LIST_HEAD(&page->lru); + list_splice(&pgtables, &page->lru); if (hstate_is_gigantic(h)) prep_compound_gigantic_page(page, huge_page_order(h)); @@ -2417,6 +2439,10 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, int nid = page_to_nid(old_page); struct page *new_page; int ret = 0; + LIST_HEAD(pgtables); + + if (vmemmap_pgtable_prealloc(h, &pgtables)) + return -ENOMEM; /* * Before dissolving the page, we need to allocate a new one for the @@ -2426,8 +2452,15 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, * under the lock. */ new_page = alloc_buddy_huge_page(h, gfp_mask, nid, NULL, NULL); - if (!new_page) + if (!new_page) { + vmemmap_pgtable_free(&pgtables); return -ENOMEM; + } + + /* See the comments in alloc_fresh_huge_page(). */ + INIT_LIST_HEAD(&new_page->lru); + list_splice(&pgtables, &new_page->lru); + __prep_new_huge_page(h, new_page); retry: @@ -2711,6 +2744,7 @@ static void __init gather_bootmem_prealloc(void) WARN_ON(page_count(page) != 1); prep_compound_huge_page(page, huge_page_order(h)); WARN_ON(PageReserved(page)); + gigantic_vmemmap_pgtable_init(m, page); prep_new_huge_page(h, page, page_to_nid(page)); put_page(page); /* free it into the hugepage allocator */ @@ -2763,6 +2797,10 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) break; cond_resched(); } + + if (hstate_is_gigantic(h)) + i -= gigantic_vmemmap_pgtable_prealloc(); + if (i < h->max_huge_pages) { char buf[32]; diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 6f3a47b4ebd3..01f3652fa359 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -375,7 +375,8 @@ void free_huge_page_vmemmap(struct hstate *h, struct page *head) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse); + vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse, + &head->lru); SetHPageVmemmapOptimized(head); } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 693de0aec7a8..fedb3f56110c 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -42,6 +42,8 @@ * @reuse_addr: the virtual address of the @reuse_page page. * @vmemmap_pages: the list head of the vmemmap pages that can be freed * or is mapped from. + * @pgtables: the list of page tables which is used for splitting huge + * PMD page tables. */ struct vmemmap_remap_walk { void (*remap_pte)(pte_t *pte, unsigned long addr, @@ -49,8 +51,49 @@ struct vmemmap_remap_walk { struct page *reuse_page; unsigned long reuse_addr; struct list_head *vmemmap_pages; + struct list_head *pgtables; }; +#define VMEMMAP_HPMD_ORDER (PMD_SHIFT - PAGE_SHIFT) +#define VMEMMAP_HPMD_NR (1 << VMEMMAP_HPMD_ORDER) + +static inline pte_t *pte_withdraw(struct vmemmap_remap_walk *walk) +{ + pgtable_t pgtable; + + pgtable = list_first_entry(walk->pgtables, struct page, lru); + list_del(&pgtable->lru); + + return page_to_virt(pgtable); +} + +static void split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, + struct vmemmap_remap_walk *walk) +{ + int i; + pmd_t tmp; + pte_t *new = pte_withdraw(walk); + struct page *page = pmd_page(*pmd); + unsigned long addr = start; + + pmd_populate_kernel(&init_mm, &tmp, new); + + for (i = 0; i < VMEMMAP_HPMD_NR; i++, addr += PAGE_SIZE) { + pte_t entry, *pte; + pgprot_t pgprot = PAGE_KERNEL; + + entry = mk_pte(page + i, pgprot); + pte = pte_offset_kernel(&tmp, addr); + set_pte_at(&init_mm, addr, pte, entry); + } + + /* Make pte visible before pmd. See comment in __pte_alloc(). */ + smp_wmb(); + pmd_populate_kernel(&init_mm, pmd, new); + + flush_tlb_kernel_range(start, start + PMD_SIZE); +} + static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct vmemmap_remap_walk *walk) @@ -84,8 +127,8 @@ static void vmemmap_pmd_range(pud_t *pud, unsigned long addr, pmd = pmd_offset(pud, addr); do { - BUG_ON(pmd_leaf(*pmd)); - + if (pmd_leaf(*pmd)) + split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, walk); next = pmd_addr_end(addr, end); vmemmap_pte_range(pmd, addr, next, walk); } while (pmd++, addr = next, addr != end); @@ -192,18 +235,17 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, * @end: end address of the vmemmap virtual address range that we want to * remap. * @reuse: reuse address. - * - * Note: This function depends on vmemmap being base page mapped. Please make - * sure that we disable PMD mapping of vmemmap pages when calling this function. + * @pgtables: the list of page tables used for splitting huge PMD. */ void vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse) + unsigned long reuse, struct list_head *pgtables) { LIST_HEAD(vmemmap_pages); struct vmemmap_remap_walk walk = { .remap_pte = vmemmap_remap_pte, .reuse_addr = reuse, .vmemmap_pages = &vmemmap_pages, + .pgtables = pgtables, }; /* @@ -221,7 +263,10 @@ void vmemmap_remap_free(unsigned long start, unsigned long end, */ BUG_ON(start - reuse != PAGE_SIZE); + mmap_write_lock(&init_mm); vmemmap_remap_range(reuse, end, &walk); + mmap_write_unlock(&init_mm); + free_vmemmap_page_list(&vmemmap_pages); } @@ -287,12 +332,12 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end, /* See the comment in the vmemmap_remap_free(). */ BUG_ON(start - reuse != PAGE_SIZE); - might_sleep_if(gfpflags_allow_blocking(gfp_mask)); - if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages)) return -ENOMEM; + mmap_read_lock(&init_mm); vmemmap_remap_range(reuse, end, &walk); + mmap_read_unlock(&init_mm); return 0; } From patchwork Wed Jun 9 12:13:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12309981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6F07C48BCF for ; Wed, 9 Jun 2021 12:16:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 34A1160FEB for ; Wed, 9 Jun 2021 12:16:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34A1160FEB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF13A6B0072; Wed, 9 Jun 2021 08:16:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CA1326B0073; Wed, 9 Jun 2021 08:16:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF4066B0074; Wed, 9 Jun 2021 08:16:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 7E58C6B0072 for ; Wed, 9 Jun 2021 08:16:30 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1D5F8181AEF07 for ; Wed, 9 Jun 2021 12:16:30 +0000 (UTC) X-FDA: 78234083340.10.7371AB7 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf05.hostedemail.com (Postfix) with ESMTP id 6E733E00026B for ; Wed, 9 Jun 2021 12:16:24 +0000 (UTC) Received: by mail-pl1-f178.google.com with SMTP id v12so12413651plo.10 for ; Wed, 09 Jun 2021 05:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vy2vVe473iC0KFAOyJz/xUTl6s66HkwJPteHv90RCR4=; b=UYsb96zxHNzDVv58SGuFNoM6Oquj4+VDx3w5pjoq90F1hKpgb70LeDcKzYdewfO9+y vzCGoB/6uXnCU3WlBsvDTfTWTra7WQfFi9wa+5anAgVVGrQRaRgyCOHfP7s9FNmY6lDB zgHN8LF6CUEdi+kmie8hLZOS+d6EGeTB4/SRCU3v1KrpmIU7XzaKKELbo/7xCDcHs7Wy t74LrOZs+f7/ajBn+jBzXIUw93l4VtQJBJk3JeZVRPqewPS8IN4G/aF2R9VLweQzAEjo 5c9BldllaK/gpAUtnHLuOf88ZC6svddAnUlcN/9a1MrXlPn6bwM9N3lEWl3NFRoopmVj XwKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vy2vVe473iC0KFAOyJz/xUTl6s66HkwJPteHv90RCR4=; b=LgRvveRBw+0B/TwT7SdhLQm+HzB04/+ln5WInkv3UU9KXlRL5zXeF7x5f5PFhV//os kTHXGmUtho72w8QekUv2cSMjlex1eBQH5nnWhaLXjfbvF3iBsOY9ifLf0IV1mBAa/yTB 07L0r9CgeBt+aHvn//kIivmxgvTbMT2R936++W5rOPZJ4I1WzjhDTCvQeE+pMUXOdj8/ liXly+q++4FUzJ3S4G1EeD+oULOeZie/IXBQ+wOIXaSUD2ZZEpN8RD5PXRPnpUKFt6Zx vHr38kzNx9U5HiD+iQnEqexR0gc5w1l+zrndWYWfkD6ebe2+99sN5y+bwaEJwb3H2DJt JKKQ== X-Gm-Message-State: AOAM533qvgK4tcyJJiSWxhHNhB9VQY9vdJflKUovqmHwow3ILJGs57Yb Q4CW9H3+9HsNKM3a32GiDuCgOQ== X-Google-Smtp-Source: ABdhPJzLaWWfHVtSZ3+fY3NWpPqUPpix3owtwLIGLB42Oi9QcZ2FGMjZ/s9C4Z7llgAw1/4gonqY3g== X-Received: by 2002:a17:902:c78a:b029:109:edbb:44de with SMTP id w10-20020a170902c78ab0290109edbb44demr4744623pla.6.1623240988524; Wed, 09 Jun 2021 05:16:28 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h16sm13689224pfk.119.2021.06.09.05.16.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jun 2021 05:16:28 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH 4/5] mm: sparsemem: use huge PMD mapping for vmemmap pages Date: Wed, 9 Jun 2021 20:13:09 +0800 Message-Id: <20210609121310.62229-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210609121310.62229-1-songmuchun@bytedance.com> References: <20210609121310.62229-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=UYsb96zx; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf05.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam02 X-Stat-Signature: r5foax3fnpxnngnktphyr1mb3p388frg X-Rspamd-Queue-Id: 6E733E00026B X-HE-Tag: 1623240984-848830 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The preparation of splitting huge PMD mapping of vmemmap pages is ready, so switch the mapping from PTE to PMD. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 7 ------- arch/x86/mm/init_64.c | 8 ++------ include/linux/hugetlb.h | 25 ++++++------------------- mm/memory_hotplug.c | 2 +- 4 files changed, 9 insertions(+), 33 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index db1ef6739613..a01aadafee38 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1599,13 +1599,6 @@ enabled. Allows heavy hugetlb users to free up some more memory (6 * PAGE_SIZE for each 2MB hugetlb page). - This feauture is not free though. Large page - tables are not used to back vmemmap pages which - can lead to a performance degradation for some - workloads. Also there will be memory allocation - required when hugetlb pages are freed from the - pool which can lead to corner cases under heavy - memory pressure. Format: { on | off (default) } on: enable the feature diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 9d9d18d0c2a1..65ea58527176 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -34,7 +34,6 @@ #include #include #include -#include #include #include @@ -1610,8 +1609,7 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE)); - if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || - end - start < PAGES_PER_SECTION * sizeof(struct page)) + if (end - start < PAGES_PER_SECTION * sizeof(struct page)) err = vmemmap_populate_basepages(start, end, node, NULL); else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); @@ -1639,8 +1637,6 @@ void register_page_bootmem_memmap(unsigned long section_nr, pmd_t *pmd; unsigned int nr_pmd_pages; struct page *page; - bool base_mapping = !boot_cpu_has(X86_FEATURE_PSE) || - is_hugetlb_free_vmemmap_enabled(); for (; addr < end; addr = next) { pte_t *pte = NULL; @@ -1666,7 +1662,7 @@ void register_page_bootmem_memmap(unsigned long section_nr, } get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO); - if (base_mapping) { + if (!boot_cpu_has(X86_FEATURE_PSE)) { next = (addr + PAGE_SIZE) & PAGE_MASK; pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c27a299c4211..2b46e6494114 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -907,20 +907,6 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, } #endif -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; - -static inline bool is_hugetlb_free_vmemmap_enabled(void) -{ - return hugetlb_free_vmemmap_enabled; -} -#else -static inline bool is_hugetlb_free_vmemmap_enabled(void) -{ - return false; -} -#endif - #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1080,13 +1066,14 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } - -static inline bool is_hugetlb_free_vmemmap_enabled(void) -{ - return false; -} #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; +#else +#define hugetlb_free_vmemmap_enabled false +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d96a3c7551c8..9d8a551c08d5 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1056,7 +1056,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * populate a single PMD. */ return memmap_on_memory && - !is_hugetlb_free_vmemmap_enabled() && + !hugetlb_free_vmemmap_enabled && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && From patchwork Wed Jun 9 12:13:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12309983 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49DB6C48BD1 for ; Wed, 9 Jun 2021 12:16:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF986613BE for ; Wed, 9 Jun 2021 12:16:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF986613BE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 947956B0073; Wed, 9 Jun 2021 08:16:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D0256B0074; Wed, 9 Jun 2021 08:16:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 723136B0075; Wed, 9 Jun 2021 08:16:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id 42A966B0073 for ; Wed, 9 Jun 2021 08:16:36 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CFB75180AD81A for ; Wed, 9 Jun 2021 12:16:35 +0000 (UTC) X-FDA: 78234083550.38.0A015E7 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf25.hostedemail.com (Postfix) with ESMTP id 048826000141 for ; Wed, 9 Jun 2021 12:16:31 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id h12so15412253pfe.2 for ; Wed, 09 Jun 2021 05:16:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tTl4UccQsOsZ7x/J6uEesGioaOGAN8DIKPZAiWCgVB8=; b=pUrg+IUIYAa870AP/uTMqgxs/hNNa64H0IDqPvy2JmAIvWBKiuXjtOLMdyse81wG8D WRbbR4uZgvPBXCGyhgzP/wqY9aYRqL+ZoV+55cQjs+c9cm60asG69y8qJ1wlODMfUnn2 34cg+mlhNukSqKfoHVyTp5z8MEwHPU9o8jOMNGCVGf8c/Df94k3QiCzFxJc1ognSj0pP Mx0wzc+B1XWe4+wcrKEqEZ4LNxVCLVcerdQUHKwN+3ZNI7kFNrLlt1e7DzIMzfu3l9q5 v4/W69wvh74zLdx14G8oBO5H3e/RKBvMCJN23kDW77DCFhOgIubiri/Qdiq2aEIxu2yc ADpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tTl4UccQsOsZ7x/J6uEesGioaOGAN8DIKPZAiWCgVB8=; b=sJHrqn8dhrjLeWXyPY+e3FSbMfpL1xgTwKoYk2CfF5ZiW32ZohUDuj6UY+bbDmxMsU y3FVh4nmlCBY2/MhdtzSiQgOz/qJfNGTkmjhKkrdK86Ny62CZBnJQZd+LPlb7imMYx6M nObFrxR8QqSRo53rKY2+BMgyvR6Ohpa7uutZnep/OgSDjqOKuPQYYHd6ksI8/kpNCsFb POsrL6FgfYyYe6lihv/CTXgODk3uGTCX4D3BhHuL4G+u3qBcAld9U8aFP4NJBSuuIuGj FZRqaZIcefAG3Zbs1zOArbvjMuXXd0IYFa0aB1DKRi11vhi6y1Nzae/e2A+aIgMaeqWJ UJjg== X-Gm-Message-State: AOAM531nMXo1Nx9qaJfW7jBgqHYkD1O4qnb3sSQYBIvPN4xu4w/733YW 5bG30XfcfB2pS3UMzlND/gNLzA== X-Google-Smtp-Source: ABdhPJy/g1DbQ4/wfWWkn6IXh4LI8nxts1Sh2nq2bT5Zi7Saxf7F7hs8/+84CR2ArxeYPuxbatrEpw== X-Received: by 2002:aa7:8588:0:b029:28e:dfa1:e31a with SMTP id w8-20020aa785880000b029028edfa1e31amr4821652pfn.77.1623240994717; Wed, 09 Jun 2021 05:16:34 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h16sm13689224pfk.119.2021.06.09.05.16.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jun 2021 05:16:34 -0700 (PDT) From: Muchun Song To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song Subject: [PATCH 5/5] mm: hugetlb: introduce CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON Date: Wed, 9 Jun 2021 20:13:10 +0800 Message-Id: <20210609121310.62229-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210609121310.62229-1-songmuchun@bytedance.com> References: <20210609121310.62229-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=pUrg+IUI; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf25.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspamd-Server: rspam02 X-Stat-Signature: 7wn8958mfnhp8zmn1pycjimxs6qsn9wa X-Rspamd-Queue-Id: 048826000141 X-HE-Tag: 1623240991-17325 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap pages associated with each HugeTLB page is default off. Now the vmemmap is PMD mapped. So there is no side effect when this feature is enabled with no HugeTLB pages in the system. Someone may want to enable this feature in the compiler time instead of using boot command line. So add a config to make it default on when someone do not want to enable it via command line. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ fs/Kconfig | 10 ++++++++++ mm/hugetlb_vmemmap.c | 6 ++++-- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a01aadafee38..8eee439d943c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1604,6 +1604,9 @@ on: enable the feature off: disable the feature + Built with CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON=y, + the default is on. + This is not compatible with memory_hotplug.memmap_on_memory. If both parameters are enabled, hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. diff --git a/fs/Kconfig b/fs/Kconfig index f40b5b98f7ba..e78bc5daf7b0 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -245,6 +245,16 @@ config HUGETLB_PAGE_FREE_VMEMMAP depends on X86_64 depends on SPARSEMEM_VMEMMAP +config HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON + bool "Default freeing vmemmap pages of HugeTLB to on" + default n + depends on HUGETLB_PAGE_FREE_VMEMMAP + help + When using HUGETLB_PAGE_FREE_VMEMMAP, the freeing unused vmemmap + pages associated with each HugeTLB page is default off. Say Y here + to enable freeing vmemmap pages of HugeTLB by default. It can then + be disabled on the command line via hugetlb_free_vmemmap=off. + config MEMFD_CREATE def_bool TMPFS || HUGETLBFS diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 01f3652fa359..b5f4f29e042a 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -186,7 +186,7 @@ #define RESERVE_VMEMMAP_NR 2U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -bool hugetlb_free_vmemmap_enabled; +bool hugetlb_free_vmemmap_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); static int __init early_hugetlb_free_vmemmap_param(char *buf) { @@ -201,7 +201,9 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf) if (!strcmp(buf, "on")) hugetlb_free_vmemmap_enabled = true; - else if (strcmp(buf, "off")) + else if (!strcmp(buf, "off")) + hugetlb_free_vmemmap_enabled = false; + else return -EINVAL; return 0;