From patchwork Sun Jun 19 13:38:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12886664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCC70C43334 for ; Sun, 19 Jun 2022 13:39:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 777696B0080; Sun, 19 Jun 2022 09:39:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 727908D0002; Sun, 19 Jun 2022 09:39:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C90E8D0001; Sun, 19 Jun 2022 09:39:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4F03C6B0080 for ; Sun, 19 Jun 2022 09:39:09 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 037BF204ED for ; Sun, 19 Jun 2022 13:39:08 +0000 (UTC) X-FDA: 79595091618.27.12D878E Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf11.hostedemail.com (Postfix) with ESMTP id 923C140013 for ; Sun, 19 Jun 2022 13:39:08 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id b12-20020a17090a6acc00b001ec2b181c98so6882444pjm.4 for ; Sun, 19 Jun 2022 06:39:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EMatSV1+t4qFy1NZ5GBQuBSsvAWCaC/xn6RiPVNR2iY=; b=egHHCkuudg/7I8ZBB5X4/KpYkcCBo1KW8NC5gvcUAbUmJuJzsaOvA/wvkAMnpEImG6 Xwt3fJTLMv+n0o/HKvIYPjPKkreRvNNwwfabkvMQbi6FV4wT4dRS+UCBQzCUrwCAz2Nh hbnlbsf6Mp1GqYpyQi0tZs3KGvp2FWJ2SQ+FrUlVi2DqzQpHOnHr1Y01R0wEfpX4Pt1K SjBpW0zFJ3UjmpnM3BxZ7vYtdqF36EVZDzcx7qtyfiy2Wq1o0RjHydJ6Et75P5MwdTCH 7BfrqaL5aoqghMRVQJZIz+pAxvlVjTHg+3E3B32w7R8fQWR3NAkEWMuQuweJ5YwVvSVU 8K/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EMatSV1+t4qFy1NZ5GBQuBSsvAWCaC/xn6RiPVNR2iY=; b=CBqZY63zxsYBjtlO740CuyJCT1jvOMqF/UUCkgJQk0e9/W29oAIl9no2S7oI7dRtGt pI6OBcfbuYyv7on5zj34HucRLjOjCIYgv5a77oZOjFytM65lNryaeOp5uoPw11sOszWu 2LzrcIfKECKaaBNKAr/cL2LqIFnDAAT7GLTZs7qqp1mwnWddp9Q3FE1Njtea57XBJPfO OdRESUCeTblGA3OCcWgT17NW7MCWbdvH1pLBNyd7ynZeHsX4wFljIkeacLRiZZ2eb6zn 6XPjoECPYyVUveQ2qIp+dl7vduNCiQdhSLuKzjDbn4/LOXXBOLIy1mrbFRcy2weERZVe PXzg== X-Gm-Message-State: AJIora/3rYEOcm6QM2sN5QvSHN4eimEYChgAE/m+t8nQCw0SGy2ZNKip vxRBlO+UvwozT2UxWOPQPrg8Fw== X-Google-Smtp-Source: AGRyM1teSCoKuhJy4pJtr6LdpTRxLya/8YCG8snNHMD8qJItI0CGEttEwXXS3E/TvTZOXzPXNFC7cQ== X-Received: by 2002:a17:902:7e84:b0:166:395c:4b68 with SMTP id z4-20020a1709027e8400b00166395c4b68mr19417328pla.8.1655645947264; Sun, 19 Jun 2022 06:39:07 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id y23-20020a056a001c9700b0051b95c76752sm6990982pfw.153.2022.06.19.06.39.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jun 2022 06:39:07 -0700 (PDT) From: Muchun Song To: akpm@linux-foundation.org, corbet@lwn.net, david@redhat.com, mike.kravetz@oracle.com, osalvador@suse.de, paulmck@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v4 1/2] mm: memory_hotplug: enumerate all supported section flags Date: Sun, 19 Jun 2022 21:38:50 +0800 Message-Id: <20220619133851.68184-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220619133851.68184-1-songmuchun@bytedance.com> References: <20220619133851.68184-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655645948; a=rsa-sha256; cv=none; b=GM4CSlqxlVYi7axIgmbvylpeAjzk/gu+tMoG9XDsPkcd9Yi7CoYmFeJ0Mw/M+kD9ZSqK/P nBeJEz6n5QBafPgGpmOjWNpCkhimWrsJTh8ZaHy5TjxBlBLoudNJCm2wQvU1a1BzgxYGEt DiF6IbR4yEvYAhaJiWVnmdwon+9mEmk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=egHHCkuu; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf11.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655645948; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EMatSV1+t4qFy1NZ5GBQuBSsvAWCaC/xn6RiPVNR2iY=; b=G+d/eyxSCf2ybESWEYaasU8Ap6xiiT6DuxJaiKdgqSXF/raEU3pGeifyJBYanBRoDrEGNl p8RLf6YO7rVH90l05zPDEy+Rehcj3OYrHcNY0J9HsURMZilGKrENQcEW0l9Z9DgtFoPrw5 pd0QGG9HHGOwtc4PaB+o0j7Ek5hANOE= X-Rspamd-Server: rspam01 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=egHHCkuu; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf11.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Stat-Signature: y83j9xd4j8fbnsotjrxt3t6s9ppepun5 X-Rspamd-Queue-Id: 923C140013 X-HE-Tag: 1655645948-693269 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We are almost running out of section flags, only one bit is available in the worst case (powerpc with 256k pages). However, there are still some free bits (in ->section_mem_map) on other architectures (e.g. x86_64 has 10 bits available, arm64 has 8 bits available with worst case of 64K pages). We have hard coded those numbers in code, it is inconvenient to use those bits on other architectures except powerpc. So transfer those section flags to enumeration to make it easy to add new section flags in the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of CONFIG_ZONE_DEVICE to save a bit on non-zone-device case. Signed-off-by: Muchun Song --- include/linux/mmzone.h | 44 +++++++++++++++++++++++++++++++++++--------- mm/memory_hotplug.c | 6 ++++++ mm/sparse.c | 2 +- 3 files changed, 42 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aab70355d64f..932843c6459b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1418,16 +1418,35 @@ extern size_t mem_section_usage_size(void); * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the * worst combination is powerpc with 256k pages, * which results in PFN_SECTION_SHIFT equal 6. - * To sum it up, at least 6 bits are available. + * To sum it up, at least 6 bits are available on all architectures. + * However, we can exceed 6 bits on some other architectures except + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available + * with the worst case of 64K pages on arm64) if we make sure the + * exceeded bit is not applicable to powerpc. */ -#define SECTION_MARKED_PRESENT (1UL<<0) -#define SECTION_HAS_MEM_MAP (1UL<<1) -#define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_IS_EARLY (1UL<<3) -#define SECTION_TAINT_ZONE_DEVICE (1UL<<4) -#define SECTION_MAP_LAST_BIT (1UL<<5) -#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 6 +enum { + SECTION_MARKED_PRESENT_BIT, + SECTION_HAS_MEM_MAP_BIT, + SECTION_IS_ONLINE_BIT, + SECTION_IS_EARLY_BIT, +#ifdef CONFIG_ZONE_DEVICE + SECTION_TAINT_ZONE_DEVICE_BIT, +#endif + SECTION_MAP_LAST_BIT, +}; + +enum { + SECTION_MARKED_PRESENT = BIT(SECTION_MARKED_PRESENT_BIT), + SECTION_HAS_MEM_MAP = BIT(SECTION_HAS_MEM_MAP_BIT), + SECTION_IS_ONLINE = BIT(SECTION_IS_ONLINE_BIT), + SECTION_IS_EARLY = BIT(SECTION_IS_EARLY_BIT), +#ifdef CONFIG_ZONE_DEVICE + SECTION_TAINT_ZONE_DEVICE = BIT(SECTION_TAINT_ZONE_DEVICE_BIT), +#endif +}; + +#define SECTION_MAP_MASK (~(BIT(SECTION_MAP_LAST_BIT) - 1)) +#define SECTION_NID_SHIFT SECTION_MAP_LAST_BIT static inline struct page *__section_mem_map_addr(struct mem_section *section) { @@ -1466,12 +1485,19 @@ static inline int online_section(struct mem_section *section) return (section && (section->section_mem_map & SECTION_IS_ONLINE)); } +#ifdef CONFIG_ZONE_DEVICE static inline int online_device_section(struct mem_section *section) { unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE; return section && ((section->section_mem_map & flags) == flags); } +#else +static inline int online_device_section(struct mem_section *section) +{ + return 0; +} +#endif static inline int online_section_nr(unsigned long nr) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1f1a730c4499..6662b86e9e64 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -670,12 +670,18 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon } +#ifdef CONFIG_ZONE_DEVICE static void section_taint_zone_device(unsigned long pfn) { struct mem_section *ms = __pfn_to_section(pfn); ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE; } +#else +static inline void section_taint_zone_device(unsigned long pfn) +{ +} +#endif /* * Associate the pfn range with the given zone, initializing the memmaps diff --git a/mm/sparse.c b/mm/sparse.c index cb3bfae64036..e5a8a3a0edd7 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -281,7 +281,7 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p { unsigned long coded_mem_map = (unsigned long)(mem_map - (section_nr_to_pfn(pnum))); - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > (1UL< PFN_SECTION_SHIFT); BUG_ON(coded_mem_map & ~SECTION_MAP_MASK); return coded_mem_map; } From patchwork Sun Jun 19 13:38:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12886665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AD07C433EF for ; Sun, 19 Jun 2022 13:39:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E8158D0001; Sun, 19 Jun 2022 09:39:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94A7A6B0083; Sun, 19 Jun 2022 09:39:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 775DF8D0001; Sun, 19 Jun 2022 09:39:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 61F206B0082 for ; Sun, 19 Jun 2022 09:39:13 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 3998860293 for ; Sun, 19 Jun 2022 13:39:13 +0000 (UTC) X-FDA: 79595091786.12.EF0679A Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) by imf06.hostedemail.com (Postfix) with ESMTP id D833318009B for ; Sun, 19 Jun 2022 13:39:12 +0000 (UTC) Received: by mail-pg1-f176.google.com with SMTP id 68so2598800pgb.10 for ; Sun, 19 Jun 2022 06:39:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bbGK9Cq/QWBNj6gBb6mPcVvrSb8ReFvWjVflhIKzRWs=; b=rrTVVCLMAjnM1+iVJatswqIcAjrTDLivAhEksYylMzYTE62fDIBvFj8k8Z7h0BBUdT +WsGb6Hu/sGx6GK5ywjn6SAz3SakeLqEB2h73m/GumXE2U+lHyol1NIgak8/DmWPrs13 qTwjGL87wdjLTjMu6akWK1uB8FZ21Iw2xJj/nruXCvqc+DmWT/hrzVrge3HoFARRXqOv VB28FTF63Yjqok8tO2rAJ+i/fxFzt943jZSLY2wP2k9U4IBEhN7K8jLp6fWZUHJiCSPx OmUeeZIibVFk/jNb3UKf8hrtQxHFXw1om8GvdLrR8AkbjktTI/9wqEWlv/06hDHCAGs5 Quqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bbGK9Cq/QWBNj6gBb6mPcVvrSb8ReFvWjVflhIKzRWs=; b=CFSx2PgCxR9zFIr0h3YTJne2gfzLJKvqBllrZpLq+boicNs+e9CLxk0uKahKoR2h8S AsbZ9fVgnIUF0V+yEPZfR/ZT/S0nlobPNE4r6oySYeJOqMbw2YFbcbtwopIE4MFT9jIZ EJP6ykSBD0AXf8AP4U9N9jy6kzT6WE0FZxIgrxM3D5J8YJoXGmBnHBFsNZ+cLUPpea3Y Qg2bTVx4QIe0LfgWP8wK/NQ45Q3XK/kfZwMsPA/kCE9EvSGTAeAAF9eI44xZGOepKBcK S5nZh/Ex416DJaUHFbdaUZhniZLDeReac6EYRDys3kqFDp45ICQ519fUWrH0DaLzVIUn 3c0Q== X-Gm-Message-State: AJIora+I3rMrqZqS/cJxb3iP7/0Qs/WpTeWa82j5zIApoOvfLIqhiMvV CZuF7oLPfK3aEiOS91ZLgqCtRw== X-Google-Smtp-Source: AGRyM1st/sgk1MrcQAH2Rp8wmGiSGnKtidFnDrDQccBM0dtJs6MhGnNGM51UuqornnzuEPbgq4Vomw== X-Received: by 2002:a63:9257:0:b0:40c:7b32:4734 with SMTP id s23-20020a639257000000b0040c7b324734mr6329043pgn.317.1655645951737; Sun, 19 Jun 2022 06:39:11 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id y23-20020a056a001c9700b0051b95c76752sm6990982pfw.153.2022.06.19.06.39.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Jun 2022 06:39:11 -0700 (PDT) From: Muchun Song To: akpm@linux-foundation.org, corbet@lwn.net, david@redhat.com, mike.kravetz@oracle.com, osalvador@suse.de, paulmck@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v4 2/2] mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory Date: Sun, 19 Jun 2022 21:38:51 +0800 Message-Id: <20220619133851.68184-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220619133851.68184-1-songmuchun@bytedance.com> References: <20220619133851.68184-1-songmuchun@bytedance.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655645952; a=rsa-sha256; cv=none; b=0fyDYyAkKgzAiGwlqgVjrd6jiF/Lf918Lt7mARSqmJt6VcYxmg3rnjXB+01MRmRssBwcuz 4mo2O7hngprzlEZnxtytnadQSkiOjvjYpfr3TEXKBJre5ZYejM8TE717aEtcxcIjZwTuuQ aH3TFAGsq/8HKF4ffQI/zO9O7JX72Kg= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=rrTVVCLM; spf=pass (imf06.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655645952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bbGK9Cq/QWBNj6gBb6mPcVvrSb8ReFvWjVflhIKzRWs=; b=RhklT5z78J4AZc+AWf3eydSV/sAYZE99IQ22jE4ChvZsCblOf515/VjgCs3Trd4I7iw7Xv B3t1mme0+HaSSuvcEWVKrH47CobS3y7qwQF6pHDpHZpW4QFdz2NRwNZ9iPHI7p3Ic8U+BZ /Q1aUAGC/HOHoBxYf42gtkOVrKI12Lg= X-Stat-Signature: yprw6yk1ahdkqb7s9yubajhbdwcrh91a X-Rspamd-Queue-Id: D833318009B Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=rrTVVCLM; spf=pass (imf06.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.176 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspamd-Server: rspam07 X-Rspam-User: X-HE-Tag: 1655645952-887444 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag to detect if a vmemmap page can be optimized. Signed-off-by: Muchun Song Co-developed-by: Oscar Salvador Signed-off-by: Oscar Salvador --- Documentation/admin-guide/kernel-parameters.txt | 22 +++++------ Documentation/admin-guide/sysctl/vm.rst | 5 +-- include/linux/memory_hotplug.h | 9 ----- include/linux/page-flags.h | 11 ++++++ mm/hugetlb_vmemmap.c | 52 +++++++++++++++++++++---- mm/memory_hotplug.c | 27 ++++++------- 6 files changed, 79 insertions(+), 47 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8090130b544b..d740e2ed0e61 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1722,9 +1722,11 @@ Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y, the default is on. - This is not compatible with memory_hotplug.memmap_on_memory. - If both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. + Note that the vmemmap pages may be allocated from the added + memory block itself when memory_hotplug.memmap_on_memory is + enabled, those vmemmap pages cannot be optimized even if this + feature is enabled. Other vmemmap pages not allocated from + the added memory block itself do not be affected. hung_task_panic= [KNL] Should the hung task detector generate panics. @@ -3069,10 +3071,12 @@ [KNL,X86,ARM] Boolean flag to enable this feature. Format: {on | off (default)} When enabled, runtime hotplugged memory will - allocate its internal metadata (struct pages) - from the hotadded memory which will allow to - hotadd a lot of memory without requiring - additional memory to do so. + allocate its internal metadata (struct pages, + those vmemmap pages cannot be optimized even + if hugetlb_free_vmemmap is enabled) from the + hotadded memory which will allow to hotadd a + lot of memory without requiring additional + memory to do so. This feature is disabled by default because it has some implication on large (e.g. GB) allocations in some configurations (e.g. small @@ -3082,10 +3086,6 @@ Note that even when enabled, there are a few cases where the feature is not effective. - This is not compatible with hugetlb_free_vmemmap. If - both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. - memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest Format: default : 0 diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 5c9aa171a0d3..d7374a1e8ac9 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst hugetlb_optimize_vmemmap ======================== -This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter) -is configured or the size of 'struct page' (a structure defined in -include/linux/mm_types.h) is not power of two (an unusual system config could +This knob is not available when the size of 'struct page' (a structure defined +in include/linux/mm_types.h) is not power of two (an unusual system config could result in this). Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 20d7edf62a6a..e0b2209ab71c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -bool mhp_memmap_on_memory(void); -#else -static inline bool mhp_memmap_on_memory(void) -{ - return false; -} -#endif - #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e66f7aa3191d..2aa5dcbfe468 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -193,6 +193,11 @@ enum pageflags { /* Only valid for buddy pages. Used to track pages that are reported */ PG_reported = PG_uptodate, + +#ifdef CONFIG_MEMORY_HOTPLUG + /* For self-hosted memmap pages */ + PG_vmemmap_self_hosted = PG_owner_priv_1, +#endif }; #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison) */ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) +#ifdef CONFIG_MEMORY_HOTPLUG +PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY) +#else +PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted) +#endif + /* * On an anonymous page mapped into a user virtual memory area, * page->mapping points to its anon_vma, not to a struct address_space; diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 1089ea8a9c98..73bfbb47f6a4 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -10,7 +10,7 @@ */ #define pr_fmt(fmt) "HugeTLB: " fmt -#include +#include #include "hugetlb_vmemmap.h" /* @@ -97,18 +97,54 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) return ret; } +static unsigned int vmemmap_optimizable_pages(struct hstate *h, + struct page *head) +{ + if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF) + return 0; + + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) { + unsigned long pfn = page_to_pfn(head); + + /* + * Due to HugeTLB alignment requirements and the vmemmap pages + * being at the start of the hotplugged memory region in + * memory_hotplug.memmap_on_memory case. Checking the first + * vmemmap page's vmemmap if it is marked as VmemmapSelfHosted + * is sufficient. + * + * [ hotplugged memory ] + * [ section ][...][ section ] + * [ vmemmap ][ usable memory ] + * ^ | | | + * +---+ | | + * ^ | | + * +-------+ | + * ^ | + * +-------------------------------------------+ + * + * Hotplugged memory block never has non-present sections, while + * boot memory block can have one or more. So pfn_valid() is + * used to filter out the non-present section which also cannot + * be memmap_on_memory. + */ + pfn = ALIGN_DOWN(pfn, PHYS_PFN(memory_block_size_bytes())); + if (pfn_valid(pfn) && PageVmemmapSelfHosted(pfn_to_page(pfn))) + return 0; + } + + return hugetlb_optimize_vmemmap_pages(h); +} + void hugetlb_vmemmap_free(struct hstate *h, struct page *head) { unsigned long vmemmap_addr = (unsigned long)head; unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; - vmemmap_pages = hugetlb_optimize_vmemmap_pages(h); + vmemmap_pages = vmemmap_optimizable_pages(h, head); if (!vmemmap_pages) return; - if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF) - return; - static_branch_inc(&hugetlb_optimize_vmemmap_key); vmemmap_addr += RESERVE_VMEMMAP_SIZE; @@ -199,10 +235,10 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = { static __init int hugetlb_vmemmap_sysctls_init(void) { /* - * If "memory_hotplug.memmap_on_memory" is enabled or "struct page" - * crosses page boundaries, the vmemmap pages cannot be optimized. + * If "struct page" crosses page boundaries, the vmemmap pages cannot + * be optimized. */ - if (!mhp_memmap_on_memory() && is_power_of_2(sizeof(struct page))) + if (is_power_of_2(sizeof(struct page))) register_sysctl_init("vm", hugetlb_vmemmap_sysctls); return 0; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6662b86e9e64..3a59d4e97c03 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -43,30 +43,22 @@ #include "shuffle.h" #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -static int memmap_on_memory_set(const char *val, const struct kernel_param *kp) -{ - if (hugetlb_optimize_vmemmap_enabled()) - return 0; - return param_set_bool(val, kp); -} - -static const struct kernel_param_ops memmap_on_memory_ops = { - .flags = KERNEL_PARAM_OPS_FL_NOARG, - .set = memmap_on_memory_set, - .get = param_get_bool, -}; - /* * memory_hotplug.memmap_on_memory parameter */ static bool memmap_on_memory __ro_after_init; -module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); +module_param(memmap_on_memory, bool, 0444); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); -bool mhp_memmap_on_memory(void) +static inline bool mhp_memmap_on_memory(void) { return memmap_on_memory; } +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} #endif enum { @@ -1035,7 +1027,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, struct zone *zone) { unsigned long end_pfn = pfn + nr_pages; - int ret; + int ret, i; ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); if (ret) @@ -1043,6 +1035,9 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); + for (i = 0; i < nr_pages; i++) + SetPageVmemmapSelfHosted(pfn_to_page(pfn + i)); + /* * It might be that the vmemmap_pages fully span sections. If that is * the case, mark those sections online here as otherwise they will be