From patchwork Wed Mar 30 15:37:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12796032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4296DC433F5 for ; Wed, 30 Mar 2022 15:38:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF5F86B0073; Wed, 30 Mar 2022 11:38:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA64B8D0002; Wed, 30 Mar 2022 11:38:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9467F8D0001; Wed, 30 Mar 2022 11:38:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 830396B0073 for ; Wed, 30 Mar 2022 11:38:28 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 52EF421C43 for ; Wed, 30 Mar 2022 15:38:28 +0000 (UTC) X-FDA: 79301459496.13.51B6181 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf11.hostedemail.com (Postfix) with ESMTP id BCD4140014 for ; Wed, 30 Mar 2022 15:38:27 +0000 (UTC) Received: by mail-pg1-f180.google.com with SMTP id k14so17924518pga.0 for ; Wed, 30 Mar 2022 08:38:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=h81o0zrOo3MmiqLhXNQs++uBdbtx9emPgi7EUEIWoeg=; b=JXogY18B9Li/QyeeQ4h7enOzye+7Wr8M0oHfdOsyIxsicMR+wphDN2S28Ok7Vv8mdR eL333WRR6cAOAplCN9S83Sr/2pct5HkiQuApm024TGDeF3zdy8gDMgcbfc1C4EWTGo+T CTLxjMbvCmzR069ibJLzijcoTpRYGVpzxPsRdVmJoP5NfbwjQNyJuFizCykL4QX3yARm 0RFDiHzSa8dfgySYTeGQ56ULbR4LK+K0YRzlfqicbWDG0uk832apkSH7eRbg3+x2l9mB I9TRDPNd6KyLjydqjWKxWy+paxKkoOpVdsmEEcnwesmyFej03cLc66wlWa+RVW1qDRdE Fzkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=h81o0zrOo3MmiqLhXNQs++uBdbtx9emPgi7EUEIWoeg=; b=XaC2iGckSzCs0PkZqmpHh5Hf9Shc1OlCKRiNirvdUdUtoaYEoALehtJMDvBAOBHL/y dbHXAaiVI9mX+SjKQKDVTWT/mu3bdyan5iw58+j4z+9FJqPUSQymOjoI4vhHpNXbwVD2 I6DNucB6Y3ZDX4cOshB2NhslRNZgHTMme22+J8DtOtHLFemCwKUpE2CAFzQt24s+CjcK KTzZ6UzRG189tPbydNEJ9KmyVE7A5kgSugZl/3DGPm1PGLZIysrJw4IPjuE01k3SnaOH aXuWGXigVVvTub9XRfUy7vJ5nCRixGdkadO5z3U3IhOmLVflw8zC9GfEK6AH7MH236FQ YVlQ== X-Gm-Message-State: AOAM531VxRHdjWsX1CSvv1hGBpPbDBrLB+VFqNzXkLUo2frAqPmekGwR 2TUghvQUSk2VMVa+NgFBPMljUg== X-Google-Smtp-Source: ABdhPJw76LGvWldqRycLoo/x5V2Cwand9sJPfGmDuBBYpLXujydBQNXxINbnTmeUG6iu8MQAG80cQg== X-Received: by 2002:a63:520c:0:b0:382:2953:a338 with SMTP id g12-20020a63520c000000b003822953a338mr6618861pgb.610.1648654706651; Wed, 30 Mar 2022 08:38:26 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.239]) by smtp.gmail.com with ESMTPSA id y8-20020a17090aa40800b001c6ccb2c395sm6686039pjp.9.2022.03.30.08.38.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 08:38:26 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v6 1/4] mm: hugetlb_vmemmap: introduce STRUCT_PAGE_SIZE_IS_POWER_OF_2 Date: Wed, 30 Mar 2022 23:37:42 +0800 Message-Id: <20220330153745.20465-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220330153745.20465-1-songmuchun@bytedance.com> References: <20220330153745.20465-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BCD4140014 X-Stat-Signature: ciwkfp1nxm1gwpe1kwimjpf1ub4cnxeg Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=JXogY18B; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf11.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspam-User: X-HE-Tag: 1648654707-774523 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the size of "struct page" is not the power of two and this feature is enabled, then the vmemmap pages of HugeTLB will be corrupted after remapping (panic is about to happen in theory). But this only exists when !CONFIG_MEMCG && !CONFIG_SLUB on x86_64. However, it is not a conventional configuration nowadays. So it is not a real word issue, just the result of a code review. But we have to prevent anyone from configuring that combined configuration. In order to avoid many checks like "is_power_of_2 (sizeof(struct page))" through mm/hugetlb_vmemmap.c. Introduce STRUCT_PAGE_SIZE_IS_POWER_OF_2 to detect if the size of struct page is power of 2 and make this feature depends on this new macro. Then we could prevent anyone do any unexpected configuration. Signed-off-by: Muchun Song Suggested-by: Luis Chamberlain Reported-by: kernel test robot --- Kbuild | 15 ++++++++++++++- include/linux/mm_types.h | 2 ++ include/linux/page-flags.h | 3 ++- mm/hugetlb_vmemmap.c | 8 ++------ mm/hugetlb_vmemmap.h | 4 ++-- mm/struct_page_size.c | 20 ++++++++++++++++++++ 6 files changed, 42 insertions(+), 10 deletions(-) create mode 100644 mm/struct_page_size.c diff --git a/Kbuild b/Kbuild index fa441b98c9f6..7f90ba21dd51 100644 --- a/Kbuild +++ b/Kbuild @@ -24,6 +24,19 @@ $(timeconst-file): kernel/time/timeconst.bc FORCE $(call filechk,gentimeconst) ##### +# Generate struct_page_size.h. + +struct_page_size-file := include/generated/struct_page_size.h + +always-y += $(struct_page_size-file) +targets += mm/struct_page_size.s + +mm/struct_page_size.s: $(timeconst-file) $(bounds-file) + +$(struct_page_size-file): mm/struct_page_size.s FORCE + $(call filechk,offsets,__LINUX_STRUCT_PAGE_SIZE_H__) + +##### # Generate asm-offsets.h offsets-file := include/generated/asm-offsets.h @@ -31,7 +44,7 @@ offsets-file := include/generated/asm-offsets.h always-y += $(offsets-file) targets += arch/$(SRCARCH)/kernel/asm-offsets.s -arch/$(SRCARCH)/kernel/asm-offsets.s: $(timeconst-file) $(bounds-file) +arch/$(SRCARCH)/kernel/asm-offsets.s: $(timeconst-file) $(bounds-file) $(struct_page_size-file) $(offsets-file): arch/$(SRCARCH)/kernel/asm-offsets.s FORCE $(call filechk,offsets,__ASM_OFFSETS_H__) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8834e38c06a4..5fbff44a4310 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -223,6 +223,7 @@ struct page { #endif } _struct_page_alignment; +#ifndef __GENERATING_STRUCT_PAGE_SIZE_IS_POWER_OF_2_H /** * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. @@ -844,5 +845,6 @@ enum fault_flag { FAULT_FLAG_INSTRUCTION = 1 << 8, FAULT_FLAG_INTERRUPTIBLE = 1 << 9, }; +#endif /* !__GENERATING_STRUCT_PAGE_SIZE_IS_POWER_OF_2_H */ #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index fc4f294cc8d7..15fcdff0e7ee 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -12,6 +12,7 @@ #ifndef __GENERATING_BOUNDS_H #include #include +#include #endif /* !__GENERATING_BOUNDS_H */ /* @@ -190,7 +191,7 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +#if defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP) && defined(STRUCT_PAGE_SIZE_IS_POWER_OF_2) DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, hugetlb_free_vmemmap_enabled_key); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 791626983c2e..951cf83010c7 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -178,6 +178,7 @@ #include "hugetlb_vmemmap.h" +#ifdef STRUCT_PAGE_SIZE_IS_POWER_OF_2 /* * There are a lot of struct page structures associated with each HugeTLB page. * For tail pages, the value of compound_head is the same. So we can reuse first @@ -194,12 +195,6 @@ EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); static int __init early_hugetlb_free_vmemmap_param(char *buf) { - /* We cannot optimize if a "struct page" crosses page boundaries. */ - if (!is_power_of_2(sizeof(struct page))) { - pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); - return 0; - } - if (!buf) return -EINVAL; @@ -302,3 +297,4 @@ void __init hugetlb_vmemmap_init(struct hstate *h) pr_info("can free %d vmemmap pages for %s\n", h->nr_free_vmemmap_pages, h->name); } +#endif /* STRUCT_PAGE_SIZE_IS_POWER_OF_2 */ diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index cb2bef8f9e73..b137fd8b6ba4 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -10,7 +10,7 @@ #define _LINUX_HUGETLB_VMEMMAP_H #include -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +#if defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP) && defined(STRUCT_PAGE_SIZE_IS_POWER_OF_2) int alloc_huge_page_vmemmap(struct hstate *h, struct page *head); void free_huge_page_vmemmap(struct hstate *h, struct page *head); void hugetlb_vmemmap_init(struct hstate *h); @@ -41,5 +41,5 @@ static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h) { return 0; } -#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP */ +#endif /* CONFIG_HUGETLB_PAGE_FREE_VMEMMAP && STRUCT_PAGE_SIZE_IS_POWER_OF_2 */ #endif /* _LINUX_HUGETLB_VMEMMAP_H */ diff --git a/mm/struct_page_size.c b/mm/struct_page_size.c new file mode 100644 index 000000000000..6fc29c1227a0 --- /dev/null +++ b/mm/struct_page_size.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Generate definitions needed by the preprocessor. + * This code generates raw asm output which is post-processed + * to extract and format the required data. + */ + +#define __GENERATING_STRUCT_PAGE_SIZE_IS_POWER_OF_2_H +/* Include headers that define the enum constants of interest */ +#include +#include +#include + +int main(void) +{ + if (is_power_of_2(sizeof(struct page))) + DEFINE(STRUCT_PAGE_SIZE_IS_POWER_OF_2, is_power_of_2(sizeof(struct page))); + + return 0; +} From patchwork Wed Mar 30 15:37:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12796033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 669C0C433F5 for ; Wed, 30 Mar 2022 15:38:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8F7D6B0074; Wed, 30 Mar 2022 11:38:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3E388D0002; Wed, 30 Mar 2022 11:38:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBA6B8D0001; Wed, 30 Mar 2022 11:38:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id AACDA6B0074 for ; Wed, 30 Mar 2022 11:38:33 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 608FB1828AC94 for ; Wed, 30 Mar 2022 15:38:33 +0000 (UTC) X-FDA: 79301459706.31.E0425BE Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf07.hostedemail.com (Postfix) with ESMTP id E117F4000D for ; Wed, 30 Mar 2022 15:38:32 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id s8so19024598pfk.12 for ; Wed, 30 Mar 2022 08:38:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WwEC4WbmrPNUQyCrBW01Ge/3AIxEgFtA8vMHcjbrgUI=; b=DL6MvDKNwjEQAcAjwqdIUR2I8Y7w3p50Rxwzf5Gami2jRkn7Omgk00iSd2R/cVitvy JAxo5e70v9RocacCHTEFuqP0nWFlqfihqMkP09MRQ7jNZEoTwKOe2AwGdcAgyoF9ehYZ WUHLFD8Op0Y0jfuWDOragISN2nhXe6QtaGu8FEFZ5S5/QAPh/4Slhp7csz4jnj4kRSIu cd8xyQCLxUR5dhzG8Dktrpuk88Q75Dw2XiQRcrrHf0lPz5Oh0lrEJbOZQn/YnQA+b6oM Migr6cQ27ui58BOxlm/IZuyRNbzfg6KMGWhVrWFNX+/T6H/kHHCwxdtfgrkO/IERP86Q mfOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WwEC4WbmrPNUQyCrBW01Ge/3AIxEgFtA8vMHcjbrgUI=; b=WnRqNzYIBeh7qOM9nV16s2JtINuP7JD5LzDaDdhQJYEe8dyHeGu9tDssMRmTFLg9iD XCNOFFLM9fVUYDpGiD9k1Ro3/6EX9e+g4oOD/yo06Rs0RhwMHKW7KiRmpUJaubb+ltoP KLOLQ91l5MzVxASurTV01qPeVMnpmgC2Im3GBG2Cb6dl6bsG+XDqF1p0n4dUAVYXMC3Y wUyz6W3vZEoxdrcw3os3epzFMD7yT30ZW9hb6PP4JSe2UpWDchMVQXilJg4imE1AWoQB fB5GXc/5OM3QXEkfqZTEd/GYU8tX7c5iL4ddj+9SWSW6mSkhlvFE7D6N/s0xcAUVfk95 SPcQ== X-Gm-Message-State: AOAM530oDxkSPs+fFA4UDwKCnicHf4FsGjSbSOA/6xNNQXEs9m3UFp4m QUfwwRnQbSzNVGrr0RgP9VhnSQ== X-Google-Smtp-Source: ABdhPJzYQ3DB1iiUN+HrZJlK9ubuScf4cy2yWJ1ccvf2ryWt32fgS0PQ7IPR8xmBmftgaxU73KaEHg== X-Received: by 2002:a63:4147:0:b0:382:9ac9:b12b with SMTP id o68-20020a634147000000b003829ac9b12bmr6832277pga.277.1648654711766; Wed, 30 Mar 2022 08:38:31 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.239]) by smtp.gmail.com with ESMTPSA id y8-20020a17090aa40800b001c6ccb2c395sm6686039pjp.9.2022.03.30.08.38.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 08:38:31 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v6 2/4] mm: memory_hotplug: override memmap_on_memory when hugetlb_free_vmemmap=on Date: Wed, 30 Mar 2022 23:37:43 +0800 Message-Id: <20220330153745.20465-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220330153745.20465-1-songmuchun@bytedance.com> References: <20220330153745.20465-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E117F4000D X-Stat-Signature: giti4dgytkix5sx5xz9zjsiuryn5169w Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=DL6MvDKN; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf07.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspam-User: X-HE-Tag: 1648654712-441736 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When "hugetlb_free_vmemmap=on" and "memory_hotplug.memmap_on_memory" are both passed to boot cmdline, the variable of "memmap_on_memory" will be set to 1 even if the vmemmap pages will not be allocated from the hotadded memory since the former takes precedence over the latter. In the next patch, we want to enable or disable the feature of freeing vmemmap pages of HugeTLB via sysctl. We need a way to know if the feature of memory_hotplug.memmap_on_memory is enabled when enabling the feature of freeing vmemmap pages since those two features are not compatible, however, the variable of "memmap_on_memory" cannot indicate this nowadays. Do not set "memmap_on_memory" to 1 when both parameters are passed to cmdline, in this case, "memmap_on_memory" could indicate if this feature is enabled by the users. Also introduce mhp_memmap_on_memory() helper to move the definition of "memmap_on_memory" to the scope of CONFIG_MHP_MEMMAP_ON_MEMORY. In the next patch, mhp_memmap_on_memory() will also be exported to be used in hugetlb_vmemmap.c. Signed-off-by: Muchun Song --- mm/memory_hotplug.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 416b38ca8def..da594b382829 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -42,14 +42,36 @@ #include "internal.h" #include "shuffle.h" +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +static int memmap_on_memory_set(const char *val, const struct kernel_param *kp) +{ + if (hugetlb_free_vmemmap_enabled()) + return 0; + return param_set_bool(val, kp); +} + +static const struct kernel_param_ops memmap_on_memory_ops = { + .flags = KERNEL_PARAM_OPS_FL_NOARG, + .set = memmap_on_memory_set, + .get = param_get_bool, +}; /* * memory_hotplug.memmap_on_memory parameter */ static bool memmap_on_memory __ro_after_init; -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -module_param(memmap_on_memory, bool, 0444); +module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); + +static inline bool mhp_memmap_on_memory(void) +{ + return memmap_on_memory; +} +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} #endif enum { @@ -1288,9 +1310,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * altmap as an alternative source of memory, and we do not exactly * populate a single PMD. */ - return memmap_on_memory && - !hugetlb_free_vmemmap_enabled() && - IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && + return mhp_memmap_on_memory() && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)); @@ -2074,7 +2094,7 @@ static int __ref try_remove_memory(u64 start, u64 size) * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in * the same granularity it was added - a single memory block. */ - if (memmap_on_memory) { + if (mhp_memmap_on_memory()) { nr_vmemmap_pages = walk_memory_blocks(start, size, NULL, get_nr_vmemmap_pages_cb); if (nr_vmemmap_pages) { From patchwork Wed Mar 30 15:37:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12796034 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61E11C433FE for ; Wed, 30 Mar 2022 15:38:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F13F58D0001; Wed, 30 Mar 2022 11:38:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC3B76B0078; Wed, 30 Mar 2022 11:38:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D649B8D0001; Wed, 30 Mar 2022 11:38:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id C961A6B0075 for ; Wed, 30 Mar 2022 11:38:38 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 81613A45D6 for ; Wed, 30 Mar 2022 15:38:38 +0000 (UTC) X-FDA: 79301459916.19.8240ABC Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf26.hostedemail.com (Postfix) with ESMTP id 104F9140010 for ; Wed, 30 Mar 2022 15:38:37 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id g9-20020a17090ace8900b001c7cce3c0aeso200546pju.2 for ; Wed, 30 Mar 2022 08:38:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VeiakI6ZdfDqy26wAEKtPtiKSfYNO4Hqsr8Ua8hMV6s=; b=nrS6qOjHi3EhirqIG6pDj0g4N09YKZLwksyGxAsNeS5zL69Y2hwg/RfRuCxzrUtxYl v7jQN4L85S48zZt4kac2+icIhFRUUxt7mMBo1eL+aynzkPZtu/RdgQkVfX6e1FT5rbk0 qbCzDVoGSMEsX2SgJ0bM2JCSlsPYBypGz/DZLvhCFT6uOPlbNB1yIi8S//burANkm550 iP1jWXu0vst/oHtGxCzZUjgDpC0OOlqglIlFR3wWARtJQ1KSAVOpYTv1tK2TNUOx3kvI xQuzHjcx3IdHCGQJ4FJ2eBVgrthM2ZABheWSIccFndfxGfNeE+1uV2UiV3xa1hSIA56t bGcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VeiakI6ZdfDqy26wAEKtPtiKSfYNO4Hqsr8Ua8hMV6s=; b=cgRMFg3ndvDR+cxEms74xHgcjedeiIo8MsXKjJqhTjS8ijcUJd9yG12194cUiVDQtR GBt6sviGRO5JziCOdxZMpDE7ldN5YiP8ayEEZ3CNyGYoTW4CJNfQ8mxpEM71Gfs55kEZ W+g4DeA0d3XsbRDL9jo1AX9/E18SEfwgwd3rf79bBb3cxXUXtuR7mvC9ZTYX+ddxeS2S l4YmVr50NYzCQYnMgQkmdq+mYqJ0myzKPEFLhlH6y0iB9GoFEJnS/ueVJbxnQNKKe0Z6 ryj6f2EJ91iC7pk7p3K3TIZ4o+Da3AqnSLhk3DchqwwrLUuxwnzZQXdr/JeEPUq5jaQP uSkQ== X-Gm-Message-State: AOAM533Y1E90C+SD3wy+m8CZFBUs8xCy8FBxZF72c2pw8syLEJH9yl+D wvBS1L4AaYbtejNB2YGun2wEKA== X-Google-Smtp-Source: ABdhPJxYsUVAVVFAUKw0isQyNo9pKj3eD+47hgdFaTL8oWHpFu17OySjpRL49cBEu4QmAaAEKUAozQ== X-Received: by 2002:a17:90b:504:b0:1c7:3095:fd78 with SMTP id r4-20020a17090b050400b001c73095fd78mr6449pjz.142.1648654717088; Wed, 30 Mar 2022 08:38:37 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.239]) by smtp.gmail.com with ESMTPSA id y8-20020a17090aa40800b001c6ccb2c395sm6686039pjp.9.2022.03.30.08.38.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 08:38:36 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v6 3/4] sysctl: allow to set extra1 to SYSCTL_ONE Date: Wed, 30 Mar 2022 23:37:44 +0800 Message-Id: <20220330153745.20465-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220330153745.20465-1-songmuchun@bytedance.com> References: <20220330153745.20465-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=nrS6qOjH; spf=pass (imf26.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 104F9140010 X-Stat-Signature: sb9pu5396uc83j9bmgua7we915ybgmgh X-HE-Tag: 1648654717-36485 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: proc_do_static_key() does not consider the situation where a sysctl is only allowed to be enabled and cannot be disabled under certain circumstances since it set "->extra1" to SYSCTL_ZERO unconditionally. This patch add the functionality to set "->extra1" accordingly. Signed-off-by: Muchun Song --- kernel/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 770d5f7c7ae4..1e89c3e428ad 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1638,7 +1638,7 @@ int proc_do_static_key(struct ctl_table *table, int write, .data = &val, .maxlen = sizeof(val), .mode = table->mode, - .extra1 = SYSCTL_ZERO, + .extra1 = table->extra1 == SYSCTL_ONE ? SYSCTL_ONE : SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }; From patchwork Wed Mar 30 15:37:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12796035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5998C433FE for ; Wed, 30 Mar 2022 15:38:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7729F8D0002; Wed, 30 Mar 2022 11:38:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7238E6B0078; Wed, 30 Mar 2022 11:38:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59C848D0002; Wed, 30 Mar 2022 11:38:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 4B7156B0075 for ; Wed, 30 Mar 2022 11:38:44 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 036F0A5D34 for ; Wed, 30 Mar 2022 15:38:44 +0000 (UTC) X-FDA: 79301460168.18.68E233F Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf25.hostedemail.com (Postfix) with ESMTP id 79E6AA0009 for ; Wed, 30 Mar 2022 15:38:43 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id c15-20020a17090a8d0f00b001c9c81d9648so355284pjo.2 for ; Wed, 30 Mar 2022 08:38:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vMrE2fwILswLZ8XbAHwddFp4d2tcI8XOJ8QYncHwtsI=; b=DpiQOqJ5dIm+6FVGldmSs2a8Rpx8m0kcriU+kSdzm446mL9jbwTLUXQpM6pKUS8f76 M+++NwZLS5qivXbjKHaDW5Z7Gt7GOxki+jMMuT36B2T3g/GhZNjDw0gMmCtu/zCc6HrW 4NIEgfdb3ic6ZxknCvGtXIBuPEUuKqR+xeK8cJQQnDLTdTJHMgXcbpAcxbjc2dPCg9ic SqNpulGJcjsrQiO/690vHxgbbmLWjKCoRoN3KXi3ck6WTBwHj7L0ShTbnR9YbjhK/vMC NrSYui4sxJ4SXYY/hJ4kqf+2MYkGaGfDbtfuClH0uKGEX+mp0FnsXIo39m4hbilDChM/ bJOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vMrE2fwILswLZ8XbAHwddFp4d2tcI8XOJ8QYncHwtsI=; b=6prA90JWjsSKjYIrI4y28u73NVyySk2+BivbzIm9y8iArjtDjPaGVwfGvj6oxDeW5h 3IDU+vjZVo0g6YP/xtN6aQ2E5PtF96UO5bItmc6R5ICw3QpIIB5K9Qd+LEaTHnlliUtR /9Xdtu6arg9vMRzlSk8qgnRKXmIlCQ1a5jp+KkgbFBYNoX73E/Fy++lPU13QQ49AFnnZ D2+sZnal/YyOji7vZN64TPw5KZnylSp4AyEJSp4N78QjI+80NlQ4h+QJua8zjbE0cJJw /4xKu2RxMH8xI3VcfrCj7XyX41mmv5rJYGlOeldKhXlVaOBKu7eiCupyrrsJTinydwih aLFA== X-Gm-Message-State: AOAM5337l6ywIJg3BMRZ0ryil5MIz9/xX9hanpjXv8R/G1MMGJ1vEZ/O ahX2LazoJ9kawZXbX/8xI8B/0A== X-Google-Smtp-Source: ABdhPJzufw+laMfngpcggRaye1EbrKr1KKZ9xxuJY+5DXMnpbAJP5JbVVj353mOhYPCSQiug38JBLw== X-Received: by 2002:a17:902:ce04:b0:156:3be:8a7f with SMTP id k4-20020a170902ce0400b0015603be8a7fmr17184375plg.149.1648654722371; Wed, 30 Mar 2022 08:38:42 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.239]) by smtp.gmail.com with ESMTPSA id y8-20020a17090aa40800b001c6ccb2c395sm6686039pjp.9.2022.03.30.08.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Mar 2022 08:38:42 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v6 4/4] mm: hugetlb_vmemmap: add hugetlb_free_vmemmap sysctl Date: Wed, 30 Mar 2022 23:37:45 +0800 Message-Id: <20220330153745.20465-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220330153745.20465-1-songmuchun@bytedance.com> References: <20220330153745.20465-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: rksud7o5rp687qez1pmxpc47yoz7psen Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=DpiQOqJ5; spf=pass (imf25.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 79E6AA0009 X-HE-Tag: 1648654723-760171 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We must add "hugetlb_free_vmemmap=on" to boot cmdline and reboot the server to enable the feature of freeing vmemmap pages of HugeTLB pages. Rebooting usually takes a long time. Add a sysctl to enable or disable the feature at runtime without rebooting. Disabling requires there is no any optimized HugeTLB page in the system. If you fail to disable it, you can set "nr_hugepages" to 0 and then retry. Signed-off-by: Muchun Song --- Documentation/admin-guide/sysctl/vm.rst | 14 +++++ include/linux/memory_hotplug.h | 9 +++ mm/hugetlb_vmemmap.c | 101 +++++++++++++++++++++++++------- mm/hugetlb_vmemmap.h | 4 +- mm/memory_hotplug.c | 7 +-- 5 files changed, 108 insertions(+), 27 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f4804ce37c58..9e0e153ed935 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -561,6 +561,20 @@ Change the minimum size of the hugepage pool. See Documentation/admin-guide/mm/hugetlbpage.rst +hugetlb_free_vmemmap +==================== + +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap +pages associated with each HugeTLB page. Once true, the vmemmap pages of +subsequent allocation of HugeTLB pages from buddy system will be optimized, +whereas already allocated HugeTLB pages will not be optimized. If you fail +to disable this feature, you can set "nr_hugepages" to 0 and then retry +since it is only allowed to be disabled after there is no any optimized +HugeTLB page in the system. + +See Documentation/admin-guide/mm/hugetlbpage.rst + + nr_hugepages_mempolicy ====================== diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1ce6f8044f1e..9b015b254e86 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -348,4 +348,13 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +bool mhp_memmap_on_memory(void); +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} +#endif + #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 951cf83010c7..5df41b148e79 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -176,6 +176,7 @@ */ #define pr_fmt(fmt) "HugeTLB: " fmt +#include #include "hugetlb_vmemmap.h" #ifdef STRUCT_PAGE_SIZE_IS_POWER_OF_2 @@ -193,6 +194,10 @@ DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, hugetlb_free_vmemmap_enabled_key); EXPORT_SYMBOL(hugetlb_free_vmemmap_enabled_key); +/* How many HugeTLB pages with vmemmap pages optimized. */ +static atomic_long_t optimized_pages = ATOMIC_LONG_INIT(0); +static DECLARE_RWSEM(sysctl_rwsem); + static int __init early_hugetlb_free_vmemmap_param(char *buf) { if (!buf) @@ -209,11 +214,6 @@ static int __init early_hugetlb_free_vmemmap_param(char *buf) } early_param("hugetlb_free_vmemmap", early_hugetlb_free_vmemmap_param); -static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h) -{ - return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT; -} - /* * Previously discarded vmemmap pages will be allocated and remapping * after this function returns zero. @@ -222,14 +222,18 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) { int ret; unsigned long vmemmap_addr = (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; if (!HPageVmemmapOptimized(head)) return 0; - vmemmap_addr += RESERVE_VMEMMAP_SIZE; - vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); - vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + vmemmap_addr += RESERVE_VMEMMAP_SIZE; + vmemmap_pages = free_vmemmap_pages_per_hpage(h); + vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); + vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + + VM_BUG_ON_PAGE(!vmemmap_pages, head); + /* * The pages which the vmemmap virtual address range [@vmemmap_addr, * @vmemmap_end) are mapped to are freed to the buddy allocator, and @@ -239,8 +243,14 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) + if (!ret) { ClearHPageVmemmapOptimized(head); + /* + * Paired with acquire semantic in + * hugetlb_free_vmemmap_handler(). + */ + atomic_long_dec_return_release(&optimized_pages); + } return ret; } @@ -248,22 +258,28 @@ int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) void free_huge_page_vmemmap(struct hstate *h, struct page *head) { unsigned long vmemmap_addr = (unsigned long)head; - unsigned long vmemmap_end, vmemmap_reuse; + unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; - if (!free_vmemmap_pages_per_hpage(h)) - return; + down_read(&sysctl_rwsem); + vmemmap_pages = free_vmemmap_pages_per_hpage(h); + if (!vmemmap_pages) + goto out; - vmemmap_addr += RESERVE_VMEMMAP_SIZE; - vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); - vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + vmemmap_addr += RESERVE_VMEMMAP_SIZE; + vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); + vmemmap_reuse = vmemmap_addr - PAGE_SIZE; /* * Remap the vmemmap virtual address range [@vmemmap_addr, @vmemmap_end) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) { SetHPageVmemmapOptimized(head); + atomic_long_inc(&optimized_pages); + } +out: + up_read(&sysctl_rwsem); } void __init hugetlb_vmemmap_init(struct hstate *h) @@ -279,9 +295,6 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - if (!hugetlb_free_vmemmap_enabled()) - return; - vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* * The head page is not to be freed to buddy allocator, the other tail @@ -297,4 +310,52 @@ void __init hugetlb_vmemmap_init(struct hstate *h) pr_info("can free %d vmemmap pages for %s\n", h->nr_free_vmemmap_pages, h->name); } + +static int hugetlb_free_vmemmap_handler(struct ctl_table *table, int write, + void *buffer, size_t *length, + loff_t *ppos) +{ + int ret; + + down_write(&sysctl_rwsem); + /* + * Cannot be disabled when there is at lease one optimized + * HugeTLB in the system. + * + * The acquire semantic is paired with release semantic in + * alloc_huge_page_vmemmap(). If we saw the @optimized_pages + * with 0, all the operations of vmemmap pages remapping from + * alloc_huge_page_vmemmap() are visible too so that we can + * safely disable static key. + */ + table->extra1 = atomic_long_read_acquire(&optimized_pages) ? + SYSCTL_ONE : SYSCTL_ZERO; + ret = proc_do_static_key(table, write, buffer, length, ppos); + up_write(&sysctl_rwsem); + + return ret; +} + +static struct ctl_table hugetlb_vmemmap_sysctls[] = { + { + .procname = "hugetlb_free_vmemmap", + .data = &hugetlb_free_vmemmap_enabled_key.key, + .mode = 0644, + .proc_handler = hugetlb_free_vmemmap_handler, + }, + { } +}; + +static __init int hugetlb_vmemmap_sysctls_init(void) +{ + /* + * The vmemmap pages cannot be optimized if + * "memory_hotplug.memmap_on_memory" is enabled. + */ + if (!mhp_memmap_on_memory()) + register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + + return 0; +} +late_initcall(hugetlb_vmemmap_sysctls_init); #endif /* STRUCT_PAGE_SIZE_IS_POWER_OF_2 */ diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index b137fd8b6ba4..89cd85fb0d26 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -21,7 +21,9 @@ void hugetlb_vmemmap_init(struct hstate *h); */ static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h) { - return h->nr_free_vmemmap_pages; + if (hugetlb_free_vmemmap_enabled()) + return h->nr_free_vmemmap_pages; + return 0; } #else static inline int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index da594b382829..793c04cfe46f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -63,15 +63,10 @@ static bool memmap_on_memory __ro_after_init; module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); -static inline bool mhp_memmap_on_memory(void) +bool mhp_memmap_on_memory(void) { return memmap_on_memory; } -#else -static inline bool mhp_memmap_on_memory(void) -{ - return false; -} #endif enum {