From patchwork Mon May 16 10:22:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850546 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 389D1C433F5 for ; Mon, 16 May 2022 10:23:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7D136B0072; Mon, 16 May 2022 06:23:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C03516B0073; Mon, 16 May 2022 06:23:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACBDE6B0075; Mon, 16 May 2022 06:23:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9E5676B0072 for ; Mon, 16 May 2022 06:23:09 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 79ADE610E6 for ; Mon, 16 May 2022 10:23:09 +0000 (UTC) X-FDA: 79471218498.25.4944476 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf14.hostedemail.com (Postfix) with ESMTP id 7365B1000C6 for ; Mon, 16 May 2022 10:23:06 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id 31so13659567pgp.8 for ; Mon, 16 May 2022 03:23:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=E1HA7mTkokdfUBOTOx/UT/zRHvpwkRZMgaF2tqkC7Lc=; b=I0/JeIClQwBorAMPSgPnbRPgkEOQC7Mr3Pms0T10DCu0fw6VhfTLuxj7jOtLphdp95 2qMARSAiX5pBAbL4qdkFvFE3g3Mx2OLrOD61gNNw6loxErIQzZWg3y9F4UGd6h2GccgH BmINNSNVBA9veX8ta2i94KfzCly2VD/huBZRwQEtAveOHwDk4R3SeBoO2NUUarW9PqKG EEf5fPmZDyWYHJBD4VCbMznNow+dRJL5kc9OAiFvqAIbycA0Tf2lPHNw0UPuCPu0GVmJ zX33ayCnncNEV7Zu+mW3ITLLR9kOnBrXxJNprV2Oqql25YtAlOAzqgZqziQwIrhRKC3s Ke7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=E1HA7mTkokdfUBOTOx/UT/zRHvpwkRZMgaF2tqkC7Lc=; b=U9pZk0iEVK95pviNProkoHxg1/knPKhjek5lRw/w4QSKYApiLrwd2HeuKzGiLHibt1 qE2dF/Srcpm4bJ10UXcnmNI7kNCm29pbPyBnOhwWOjDf2NUWC8MNAfcBYsllTybD2Kn/ MM9O7UGoMqFS57LwnjyojEsEE0f6hBftTSaAUWKVLR4HxDyfNk+rH+AU62QLYY/uvOp3 Nmd2Q7sHDLe0B0k46l0VAdVqO2GnbdKTsdhaYiXo1CL0XhG3un0/OT1ysQlUupOsf59K EWJDkLXx8UgTuUgab36C+jmtsyncXwK5bIflE5M2CpA4v6dUgYmPQlPNaVq95uDiZBY0 MeKw== X-Gm-Message-State: AOAM532dTRPLlNF9+ZCc76hVm9tBznJODFKvISdQwHvMNalKa7facP+8 lHxxV7M47mX+gvsBZMGxuschovyxPICVFg== X-Google-Smtp-Source: ABdhPJx+le9SIrFuAU5RA8TqPnUWh9duy4U2Dkz5aeGmIOp6I4BVYjVIVE/iqBYV3UcV8r9rt34QsA== X-Received: by 2002:a05:6a00:1492:b0:50e:11ae:f62f with SMTP id v18-20020a056a00149200b0050e11aef62fmr17044341pfu.43.1652696588092; Mon, 16 May 2022 03:23:08 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:07 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 1/7] mm: hugetlb_vmemmap: disable hugetlb_optimize_vmemmap when struct page crosses page boundaries Date: Mon, 16 May 2022 18:22:05 +0800 Message-Id: <20220516102211.41557-2-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: edafj95mhe6u9nx1h566u91rq4rm8q83 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7365B1000C6 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="I0/JeICl"; spf=pass (imf14.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-HE-Tag: 1652696586-985551 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the size of "struct page" is not the power of two but with the feature of minimizing overhead of struct page associated with each HugeTLB is enabled, then the vmemmap pages of HugeTLB will be corrupted after remapping (panic is about to happen in theory). But this only exists when !CONFIG_MEMCG && !CONFIG_SLUB on x86_64. However, it is not a conventional configuration nowadays. So it is not a real word issue, just the result of a code review. But we cannot prevent anyone from configuring that combined configure. This hugetlb_optimize_vmemmap should be disable in this case to fix this issue. Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador --- mm/hugetlb_vmemmap.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 29554c6ef2ae..6254bb2d4ae5 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -28,12 +28,6 @@ EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); static int __init hugetlb_vmemmap_early_param(char *buf) { - /* We cannot optimize if a "struct page" crosses page boundaries. */ - if (!is_power_of_2(sizeof(struct page))) { - pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); - return 0; - } - if (!buf) return -EINVAL; @@ -119,6 +113,12 @@ void __init hugetlb_vmemmap_init(struct hstate *h) if (!hugetlb_optimize_vmemmap_enabled()) return; + if (!is_power_of_2(sizeof(struct page))) { + pr_warn_once("cannot optimize vmemmap pages because \"struct page\" crosses page boundaries\n"); + static_branch_disable(&hugetlb_optimize_vmemmap_key); + return; + } + vmemmap_pages = (nr_pages * sizeof(struct page)) >> PAGE_SHIFT; /* * The head page is not to be freed to buddy allocator, the other tail From patchwork Mon May 16 10:22:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850547 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BEDEC433EF for ; Mon, 16 May 2022 10:23:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F12FA6B0073; Mon, 16 May 2022 06:23:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC2108D0002; Mon, 16 May 2022 06:23:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8A7C8D0001; Mon, 16 May 2022 06:23:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id CB5B66B0073 for ; Mon, 16 May 2022 06:23:15 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A77E32196D for ; Mon, 16 May 2022 10:23:15 +0000 (UTC) X-FDA: 79471218750.13.87B6374 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf29.hostedemail.com (Postfix) with ESMTP id EB4D81200D5 for ; Mon, 16 May 2022 10:23:06 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id q7so2583103plx.3 for ; Mon, 16 May 2022 03:23:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=D8MT2hpyVq6RWkeIwFsjJwgP64ytBP9yS8NMgw6YICA=; b=DKzSkynxZeLrta+Y2p0gffzxOvDJHI6t71Uy+/CZImu1BPgayQiPvNudLg0SLSTZkt Iz3qrv0fn288Qp21h6fbFOz8USa18tIo14smiheLi0oCVsNIHFk+tJa2ZKSa0md5EZCA 0nNfAbv6x1RYJeT+AV3KT70HHMy+yge+s//u83DSm0wixBgz3c2xSlyD7pugDiOCHEWE XNfpWYPPjjLLFuH7Qlkq5rfxrOtMyd78/tMd1E9Iqr+NZsrcAn+Z67mFnouHnpcieUwi LQXcgkYH+L+SUToJfKGzYoULwqtHtBziNZcsFbRv1ALDNkyJmfUFRvcnaSzyINGw2UI4 G2yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=D8MT2hpyVq6RWkeIwFsjJwgP64ytBP9yS8NMgw6YICA=; b=ui3BMnr79zK5qqvd2P1YqhfpyrGZDufURb0rZw8J4GRdhTVmFUQZTqwqC5SMe0AElV baW9F8XICtw18gK7HA/9djWTN2ChcwfpMveYOv9jxL6vfTp5BSHY2RJRBCVy0b5tCkJ5 ZgMD35Gir2xGILPa3mahHmoG5DMGXkdMjhHKSlpPHDkcZjJaxvDWDJi1YeBy9G1+/c3J 4x89ED3lyrbuwDlHxScGShERhjeCBZAiSMFzG04DW87oD77U/8IMu4Isj46pws4Vhe+S ZqL5hYY1Rs1zZTeeIXjVZ4zJIGRmX6Hm3FEL+BP1HJrDuGGJT3wj/FP9xDqTvzKYUFjy tC+g== X-Gm-Message-State: AOAM531opAVu/wLwD1nSswihTmbVQJKrlhjDiukVGApX5sgWMnjSArrD bKkJlQYL73PiFrsCVYpGIaUo7Q== X-Google-Smtp-Source: ABdhPJw8tB9MAEiv0ASMkUFc6CnGi+nHHhMKptiG8ntp7nzRoMhc0imKrJfdtQmE9hPZnzGG0TQICA== X-Received: by 2002:a17:90b:4d11:b0:1dc:ec4f:a19c with SMTP id mw17-20020a17090b4d1100b001dcec4fa19cmr29767454pjb.117.1652696594183; Mon, 16 May 2022 03:23:14 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:13 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 2/7] mm: hugetlb_vmemmap: use kstrtobool for hugetlb_vmemmap param parsing Date: Mon, 16 May 2022 18:22:06 +0800 Message-Id: <20220516102211.41557-3-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EB4D81200D5 X-Stat-Signature: 8twjm86p5oc9rmuwxtx1ht7b77ipm738 X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=DKzSkynx; spf=pass (imf29.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1652696586-765500 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Use kstrtobool rather than open coding "on" and "off" parsing in mm/hugetlb_vmemmap.c, which is more powerful to handle all kinds of parameters like 'Yy1Nn0' or [oO][NnFf] for "on" and "off". Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador --- mm/hugetlb_vmemmap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 6254bb2d4ae5..cc4ec752ec16 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -28,15 +28,15 @@ EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); static int __init hugetlb_vmemmap_early_param(char *buf) { - if (!buf) + bool enable; + + if (kstrtobool(buf, &enable)) return -EINVAL; - if (!strcmp(buf, "on")) + if (enable) static_branch_enable(&hugetlb_optimize_vmemmap_key); - else if (!strcmp(buf, "off")) - static_branch_disable(&hugetlb_optimize_vmemmap_key); else - return -EINVAL; + static_branch_disable(&hugetlb_optimize_vmemmap_key); return 0; } From patchwork Mon May 16 10:22:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850548 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54FA7C433EF for ; Mon, 16 May 2022 10:23:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E77B16B0075; Mon, 16 May 2022 06:23:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFFAA8D0002; Mon, 16 May 2022 06:23:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C792A8D0001; Mon, 16 May 2022 06:23:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B81096B0075 for ; Mon, 16 May 2022 06:23:21 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 839D9218E6 for ; Mon, 16 May 2022 10:23:21 +0000 (UTC) X-FDA: 79471219002.15.0768013 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf25.hostedemail.com (Postfix) with ESMTP id EAFDBA00BA for ; Mon, 16 May 2022 10:22:58 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id q4so13974949plr.11 for ; Mon, 16 May 2022 03:23:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=304UY58xJZ+d2CkQooSXCSWWtYW1Q2iHG2puQFFW3wk=; b=TVrqS+L9G/q2rbodkQRoJPGzhMf2BPdNYw7FBBOap7cujZkLJN/HgS00hqAD+4eiNx TwFVPXjnc1k87mNghh6HMrJ7ZRmjtxW3Sz8Y1egdbjz1iK13UVUtSo9PN4r3MfqVFM9W sGzOpiwNENmW4tNkb3El0KH+0HgCAjqrPDbWGQoKxx/ZKitejeBHdR0UJd4al+Q05o/b mpw4b4hPBJPQ8mQgGhBou3w5oZ9DX/XHrNdE2lVytllAzl9yV+BZQXoAsFLeVi4o2QlI +LCoVhg+HfjgFKqq1mm5OCS06/Ivfxdz7EDPWGWl+0CEnQvCMo49R/0pes1f1a9EF9M2 6iZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=304UY58xJZ+d2CkQooSXCSWWtYW1Q2iHG2puQFFW3wk=; b=2cIVqsAy/sgQ2jIqs5O0U3f7ApSmxO4ApjTcWOZO9p/ss1hs9PQBQpinehLdC/6KR2 Fqzufr215jpiFmrzVCpoN5Od6sgsxN6Lj95EltOFJJqSGR0kZ1o1Z4Sr6CERLAs5vXiX Qhhmdpd+Q+WAkkayZ2Qi9KcK6dQS+E2GluumH5tgkluKHWfiVseYRSexU0O3ueoXMrvZ g7mO62sEzk2usuJjqZLsqdopg3VmUDyC2C/4so27fR5udOhP2dcAA2fwvHVlbgfqq5dy 3p+kqcnKRXdjbCqKBSuXrqrJHg3qUqn+5PldMq0hynqXu47B5g/I6hC4DYzRQL8YqjvV 1CZg== X-Gm-Message-State: AOAM530dgufln1tfewMXQeJfrrfAUL4i0BCw4DSKywNObb2GSQ/ekadG 1feQaCl5cYsn0Ge587Ps8FhRjQ== X-Google-Smtp-Source: ABdhPJzLO2sspgSF1GMFuqfn3V6NhOrHC6osoIFEG23hv3578fqx7SBZnJ4T/3rONCHLN0+TuozGow== X-Received: by 2002:a17:902:a9ca:b0:161:54a6:af3f with SMTP id b10-20020a170902a9ca00b0016154a6af3fmr9856256plr.48.1652696599775; Mon, 16 May 2022 03:23:19 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:19 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 3/7] mm: memory_hotplug: enumerate all supported section flags Date: Mon, 16 May 2022 18:22:07 +0800 Message-Id: <20220516102211.41557-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: 1ib63sn8a3jrz1xs66xdwzb71ir1n1th X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: EAFDBA00BA X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=TVrqS+L9; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf25.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1652696578-83034 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We are almost running out of free slots, only one bit is available in the worst case (powerpc with 256k pages). However, there are still some free slots on other architectures (e.g. x86_64 has 10 bits available, arm64 has 8 bits available with worst case of 64K pages). We have hard coded those numbers in code, it is inconvenient to use those bits on other architectures except powerpc. So transfer those section flags to enumeration to make it easy to add new section flags in the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of CONFIG_ZONE_DEVICE to save a bit on non-zone-device case. Signed-off-by: Muchun Song --- include/linux/kconfig.h | 1 + include/linux/mmzone.h | 37 +++++++++++++++++++++++++++++-------- mm/memory_hotplug.c | 6 ++++++ 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/include/linux/kconfig.h b/include/linux/kconfig.h index 20d1079e92b4..7044032b9f42 100644 --- a/include/linux/kconfig.h +++ b/include/linux/kconfig.h @@ -10,6 +10,7 @@ #define __LITTLE_ENDIAN 1234 #endif +#define __ARG_PLACEHOLDER_ 0, #define __ARG_PLACEHOLDER_1 0, #define __take_second_arg(__ignored, val, ...) val diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aab70355d64f..af057e20b9d7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1418,16 +1418,37 @@ extern size_t mem_section_usage_size(void); * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the * worst combination is powerpc with 256k pages, * which results in PFN_SECTION_SHIFT equal 6. - * To sum it up, at least 6 bits are available. + * To sum it up, at least 6 bits are available on all architectures. + * However, we can exceed 6 bits on some other architectures except + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available + * with the worst case of 64K pages on arm64) if we make sure the + * exceeded bit is not applicable to powerpc. */ -#define SECTION_MARKED_PRESENT (1UL<<0) -#define SECTION_HAS_MEM_MAP (1UL<<1) -#define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_IS_EARLY (1UL<<3) -#define SECTION_TAINT_ZONE_DEVICE (1UL<<4) -#define SECTION_MAP_LAST_BIT (1UL<<5) +#define ENUM_SECTION_FLAG(MAPPER) \ + MAPPER(MARKED_PRESENT) \ + MAPPER(HAS_MEM_MAP) \ + MAPPER(IS_ONLINE) \ + MAPPER(IS_EARLY) \ + MAPPER(TAINT_ZONE_DEVICE, CONFIG_ZONE_DEVICE) \ + MAPPER(MAP_LAST_BIT) + +#define __SECTION_SHIFT_FLAG_MAPPER_0(x) +#define __SECTION_SHIFT_FLAG_MAPPER_1(x) SECTION_##x##_SHIFT, +#define __SECTION_SHIFT_FLAG_MAPPER(x, ...) \ + __PASTE(__SECTION_SHIFT_FLAG_MAPPER_, IS_ENABLED(__VA_ARGS__))(x) + +#define __SECTION_FLAG_MAPPER_0(x) +#define __SECTION_FLAG_MAPPER_1(x) SECTION_##x = BIT(SECTION_##x##_SHIFT), +#define __SECTION_FLAG_MAPPER(x, ...) \ + __PASTE(__SECTION_FLAG_MAPPER_, IS_ENABLED(__VA_ARGS__))(x) + +enum { + ENUM_SECTION_FLAG(__SECTION_SHIFT_FLAG_MAPPER) + ENUM_SECTION_FLAG(__SECTION_FLAG_MAPPER) +}; + #define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 6 +#define SECTION_NID_SHIFT SECTION_MAP_LAST_BIT_SHIFT static inline struct page *__section_mem_map_addr(struct mem_section *section) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 111684878fd9..aef3f041dec7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -655,12 +655,18 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon } +#ifdef CONFIG_ZONE_DEVICE static void section_taint_zone_device(unsigned long pfn) { struct mem_section *ms = __pfn_to_section(pfn); ms->section_mem_map |= SECTION_TAINT_ZONE_DEVICE; } +#else +static inline void section_taint_zone_device(unsigned long pfn) +{ +} +#endif /* * Associate the pfn range with the given zone, initializing the memmaps From patchwork Mon May 16 10:22:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850549 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E6CAC433EF for ; Mon, 16 May 2022 10:23:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28FCC6B0078; Mon, 16 May 2022 06:23:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23FA08D0002; Mon, 16 May 2022 06:23:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 106918D0001; Mon, 16 May 2022 06:23:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 02F736B0078 for ; Mon, 16 May 2022 06:23:27 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C0D24338DE for ; Mon, 16 May 2022 10:23:26 +0000 (UTC) X-FDA: 79471219212.06.D0133DD Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf14.hostedemail.com (Postfix) with ESMTP id DBA971000C6 for ; Mon, 16 May 2022 10:23:23 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id v11so13619678pff.6 for ; Mon, 16 May 2022 03:23:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7JtnflfEJKqY3LT1d1QkIpZrqHmFAbtgKUqGlXFNu5E=; b=zRkfLhylUGkW87aY79564FT+hxq3E8o3xWGXVlFy5/+agN802xij4IQ2Jukpjml3jf ORGeN/u+DCHrINLpyO1K7vYpgkzZVHv/9TiCbQrCrWYBJchQkMGaD5Jxbv+1sDvPOgub HgJpIdm+FEqbItSW8SVZm4pinzoXxQrmnAr3mAiQcicfuGRE3b8XU60z/l0F6xw6x1Uh V1KWKntZlX/ldkdrHqYKR2UpfYWKHum3iOuKJT6xzzKSTel5YDB7QPb8VvKK8e+FLFQ7 ctexPvxj0Bx3IUyNlRl12y9lYXgNrpmXYR7apL32qy2Sfp1iE8+cVeu4e+Smsn61v50t RwfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7JtnflfEJKqY3LT1d1QkIpZrqHmFAbtgKUqGlXFNu5E=; b=QpO2wSX3DiPl2CvhFpTue0QMT/9BZgMlgLxY8dcIe7AaeksVhc0Gx3mxhzGUwt5S1E 4Pl7UDDf99hxVs1BxaQDyr08lTHCdqn5hPYCarFRDiumFpoQdWcQgo6mZz5UmENo/Nzx PkoKWUf1+6+K8/Ozq08AtqJPiEjj67Zt2JUDk7moSDzvtDBdkLE+DhQ409fizNnVuND6 mYGtoMhuLwMRF4BOFHTT90kId0gyI1O/9EAigzD608naqdbf/HlSGi6nIyUTUE2GqN34 0j7zUNrzm8qDNLFNzb5nYdMQk4jN5/3+RoO0IjCNHN+Shl0MKzMLUKE4mJpXzCnF+ZCE J2NQ== X-Gm-Message-State: AOAM5315Df8V+hsYGF2bU3x3VDgjbb90UCKuXg+Rmtyp39RFETb07pDB zYa7Y9uoJPuhyhnaLds4AEcTug== X-Google-Smtp-Source: ABdhPJxVdKZZYlQsIfygts/ghRsAOYGQgWuAMSj1u7mQPP1e+E/GnmSwo7g+ZIlowU+ehRCdMmkPDg== X-Received: by 2002:a05:6a00:2295:b0:510:635b:5eee with SMTP id f21-20020a056a00229500b00510635b5eeemr16796975pfe.20.1652696605396; Mon, 16 May 2022 03:23:25 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:25 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 4/7] mm: hotplug: introduce SECTION_CANNOT_OPTIMIZE_VMEMMAP Date: Mon, 16 May 2022 18:22:08 +0800 Message-Id: <20220516102211.41557-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: stu3yqdiabbn6tqzoqpahk467z7gxmj9 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DBA971000C6 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=zRkfLhyl; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf14.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1652696603-158508 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces SECTION_CANNOT_OPTIMIZE_VMEMMAP to indicate whether the section could be optimized. Signed-off-by: Muchun Song --- Documentation/admin-guide/kernel-parameters.txt | 22 +++++++++++----------- include/linux/mmzone.h | 17 +++++++++++++++++ mm/hugetlb_vmemmap.c | 16 +++++++++++++++- mm/memory_hotplug.c | 1 - mm/sparse.c | 7 +++++++ 5 files changed, 50 insertions(+), 13 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 308da668bbb1..a0a014f2104c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1711,9 +1711,11 @@ Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y, the default is on. - This is not compatible with memory_hotplug.memmap_on_memory. - If both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. + Note that the vmemmap pages may be allocated from the added + memory block itself when memory_hotplug.memmap_on_memory is + enabled, those vmemmap pages cannot be optimized even if this + feature is enabled. Other vmemmap pages not allocated from + the added memory block itself do not be affected. hung_task_panic= [KNL] Should the hung task detector generate panics. @@ -3038,10 +3040,12 @@ [KNL,X86,ARM] Boolean flag to enable this feature. Format: {on | off (default)} When enabled, runtime hotplugged memory will - allocate its internal metadata (struct pages) - from the hotadded memory which will allow to - hotadd a lot of memory without requiring - additional memory to do so. + allocate its internal metadata (struct pages, + those vmemmap pages cannot be optimized even + if hugetlb_free_vmemmap is enabled) from the + hotadded memory which will allow to hotadd a + lot of memory without requiring additional + memory to do so. This feature is disabled by default because it has some implication on large (e.g. GB) allocations in some configurations (e.g. small @@ -3051,10 +3055,6 @@ Note that even when enabled, there are a few cases where the feature is not effective. - This is not compatible with hugetlb_free_vmemmap. If - both parameters are enabled, hugetlb_free_vmemmap takes - precedence over memory_hotplug.memmap_on_memory. - memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest Format: default : 0 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index af057e20b9d7..7b69acc5c2a9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1430,6 +1430,7 @@ extern size_t mem_section_usage_size(void); MAPPER(IS_ONLINE) \ MAPPER(IS_EARLY) \ MAPPER(TAINT_ZONE_DEVICE, CONFIG_ZONE_DEVICE) \ + MAPPER(CANNOT_OPTIMIZE_VMEMMAP, CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) \ MAPPER(MAP_LAST_BIT) #define __SECTION_SHIFT_FLAG_MAPPER_0(x) @@ -1457,6 +1458,22 @@ static inline struct page *__section_mem_map_addr(struct mem_section *section) return (struct page *)map; } +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +static inline void section_mark_cannot_optimize_vmemmap(struct mem_section *ms) +{ + ms->section_mem_map |= SECTION_CANNOT_OPTIMIZE_VMEMMAP; +} + +static inline int section_cannot_optimize_vmemmap(struct mem_section *ms) +{ + return (ms && (ms->section_mem_map & SECTION_CANNOT_OPTIMIZE_VMEMMAP)); +} +#else +static inline void section_mark_cannot_optimize_vmemmap(struct mem_section *ms) +{ +} +#endif + static inline int present_section(struct mem_section *section) { return (section && (section->section_mem_map & SECTION_MARKED_PRESENT)); diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index cc4ec752ec16..970c36b8935f 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -75,12 +75,26 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) return ret; } +static unsigned int optimizable_vmemmap_pages(struct hstate *h, + struct page *head) +{ + unsigned long pfn = page_to_pfn(head); + unsigned long end = pfn + pages_per_huge_page(h); + + for (; pfn < end; pfn += PAGES_PER_SECTION) { + if (section_cannot_optimize_vmemmap(__pfn_to_section(pfn))) + return 0; + } + + return hugetlb_optimize_vmemmap_pages(h); +} + void hugetlb_vmemmap_free(struct hstate *h, struct page *head) { unsigned long vmemmap_addr = (unsigned long)head; unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; - vmemmap_pages = hugetlb_optimize_vmemmap_pages(h); + vmemmap_pages = optimizable_vmemmap_pages(h, head); if (!vmemmap_pages) return; diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index aef3f041dec7..1d0225d57166 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1270,7 +1270,6 @@ bool mhp_supports_memmap_on_memory(unsigned long size) * populate a single PMD. */ return memmap_on_memory && - !hugetlb_optimize_vmemmap_enabled() && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && IS_ALIGNED(vmemmap_size, PMD_SIZE) && diff --git a/mm/sparse.c b/mm/sparse.c index d2d76d158b39..8197ef9b7c4c 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -913,6 +913,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, ms = __nr_to_section(section_nr); set_section_nid(section_nr, nid); __section_mark_present(ms, section_nr); + /* + * Mark whole section as non-optimizable once there is a subsection + * whose vmemmap pages are allocated from alternative allocator. The + * early section is always optimizable. + */ + if (!early_section(ms) && altmap) + section_mark_cannot_optimize_vmemmap(ms); /* Align memmap to section boundary in the subsection case */ if (section_nr_to_pfn(section_nr) != start_pfn) From patchwork Mon May 16 10:22:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850550 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6469C433F5 for ; Mon, 16 May 2022 10:23:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 638CE6B007B; Mon, 16 May 2022 06:23:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E90A8D0002; Mon, 16 May 2022 06:23:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46DC18D0001; Mon, 16 May 2022 06:23:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 37CC26B007B for ; Mon, 16 May 2022 06:23:34 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 183BD21979 for ; Mon, 16 May 2022 10:23:34 +0000 (UTC) X-FDA: 79471219548.15.5789722 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf25.hostedemail.com (Postfix) with ESMTP id 7F88CA00B8 for ; Mon, 16 May 2022 10:23:11 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id nk9-20020a17090b194900b001df2fcdc165so3725343pjb.0 for ; Mon, 16 May 2022 03:23:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jSjHlEJZ1lWJ5sUgf+NErUi604xPZiJFf+96LoXPZRA=; b=tvaEqhT5/0VShTdmGcoV7QtkaEOFtFRpQgSee1AARQzV7qYhbtTSIUiemTABLiPqRJ wO0IWpVnGfUGXIuoRzl/oYax5/wlxFS7PIMUc9w28HhTQ9vupeD5RBq5b+SWjGq4BUQ+ daFSBYXRBJyH7Mb0rqkhKeUBYrwF2/lSQtUUcPp9B5FpIXRdP+dp11fvmMhY3RQwUct8 Qn6SF5MYrU6EgxrBpTj+oTe7c8e/pZXcdGtirgqcBmgBpO8pECK6KjYP7pzgUhQvU8BN XmKyNkvkbG2dH7BgTbrwpssZKSaxayfFF0JxUZ9QyiXXmkeT0M59bXNImnZ/bz6nZsY5 ASBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jSjHlEJZ1lWJ5sUgf+NErUi604xPZiJFf+96LoXPZRA=; b=jDZAfN9juWZR8d5z40TRkIZq8VnvKoCA4wOH4fxyi3lbWPBrJGg18iBrXcKfoqMkra rtz90LNkkUY69v7xHI12t+Zs8qRJUwe0hmVHggjzxRUjagZP2kviHw9SNaqkSVyrFZRm SFXpPnB1mAZhSdkt6UPDuba1t1G4JrTfEkPD4e3f69/9D+vksQttutgrAcCbMBiXHzWG qvIOTNLwAr5qa3ZwC/Qig32mdqXX/5NPoVNoctN4xeSw1vI+CjJmUK8IMh3GIGgNoPuJ pwlscab8dygQa++aRNDSqg8W69U93GOCWX4AOZEKJO1uh977RHEezraPcYlXdLrBm7ck yhIA== X-Gm-Message-State: AOAM5322ubDSLR+jGDVK4BUYi6f56EKnja2vrmOQnlW5A+qcSGr0LMM/ SNHyQ96QlK6wHIpPlNR9Gdq8Pg== X-Google-Smtp-Source: ABdhPJz8RxTwBuh5VCxsEKPELii/OLyV7DdKmnwjxoTMtu2gqnnHrzqm70OoYqtS83/xrVBs05SXow== X-Received: by 2002:a17:90b:1249:b0:1df:257a:539a with SMTP id gx9-20020a17090b124900b001df257a539amr9957427pjb.47.1652696612277; Mon, 16 May 2022 03:23:32 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:32 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song , Catalin Marinas , Will Deacon , Anshuman Khandual Subject: [PATCH v12 5/7] mm: hugetlb_vmemmap: remove hugetlb_optimize_vmemmap_enabled() Date: Mon, 16 May 2022 18:22:09 +0800 Message-Id: <20220516102211.41557-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: cx16u8q7hjkt3ti56azopajhffrjhg3m Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=tvaEqhT5; spf=pass (imf25.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 7F88CA00B8 X-HE-Tag: 1652696591-847823 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: There is only one user of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c. However, it does not need to call hugetlb_optimize_vmemmap_enabled() in flush_dcache_page() since HugeTLB pages are always fully mapped and only head page will be set PG_dcache_clean meaning only head page 's flag may need to be cleared (see commit cf5a501d985b). After this change, there is no users of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap. So remove hugetlb_optimize_vmemmap_enabled() to simplify the code. Signed-off-by: Muchun Song Cc: Catalin Marinas Cc: Will Deacon Cc: Anshuman Khandual --- arch/arm64/mm/flush.c | 13 +++---------- include/linux/page-flags.h | 14 ++------------ mm/hugetlb_vmemmap.c | 3 ++- 3 files changed, 7 insertions(+), 23 deletions(-) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index fc4f710e9820..5f9379b3c8c8 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -76,17 +76,10 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache); void flush_dcache_page(struct page *page) { /* - * Only the head page's flags of HugeTLB can be cleared since the tail - * vmemmap pages associated with each HugeTLB page are mapped with - * read-only when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is enabled (more - * details can refer to vmemmap_remap_pte()). Although - * __sync_icache_dcache() only set PG_dcache_clean flag on the head - * page struct, there is more than one page struct with PG_dcache_clean - * associated with the HugeTLB page since the head vmemmap page frame - * is reused (more details can refer to the comments above - * page_fixed_fake_head()). + * HugeTLB pages are always fully mapped and only head page will be + * set PG_dcache_clean (see comments in __sync_icache_dcache()). */ - if (hugetlb_optimize_vmemmap_enabled() && PageHuge(page)) + if (PageHuge(page)) page = compound_head(page); if (test_bit(PG_dcache_clean, &page->flags)) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b70124b9c7c1..404f4ede17f5 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -203,12 +203,6 @@ enum pageflags { DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, hugetlb_optimize_vmemmap_key); -static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key); -} - /* * If the feature of optimizing vmemmap pages associated with each HugeTLB * page is enabled, the head vmemmap page frame is reused and all of the tail @@ -227,7 +221,8 @@ static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_optimize_vmemmap_enabled()) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, + &hugetlb_optimize_vmemmap_key)) return page; /* @@ -255,11 +250,6 @@ static inline const struct page *page_fixed_fake_head(const struct page *page) { return page; } - -static inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return false; -} #endif static __always_inline int page_is_fake_head(struct page *page) diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 970c36b8935f..d1fea65fec98 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -124,7 +124,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - if (!hugetlb_optimize_vmemmap_enabled()) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, + &hugetlb_optimize_vmemmap_key)) return; if (!is_power_of_2(sizeof(struct page))) { From patchwork Mon May 16 10:22:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850551 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 273DFC433F5 for ; Mon, 16 May 2022 10:23:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B91108D0001; Mon, 16 May 2022 06:23:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B3FEB6B007E; Mon, 16 May 2022 06:23:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E0798D0001; Mon, 16 May 2022 06:23:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8F8E76B007D for ; Mon, 16 May 2022 06:23:39 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6B02E1319 for ; Mon, 16 May 2022 10:23:39 +0000 (UTC) X-FDA: 79471219758.25.4C99329 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) by imf09.hostedemail.com (Postfix) with ESMTP id 432C91400B8 for ; Mon, 16 May 2022 10:23:29 +0000 (UTC) Received: by mail-pl1-f175.google.com with SMTP id i1so13989663plg.7 for ; Mon, 16 May 2022 03:23:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=66QHju611WH53dYegSIeTLMF0HCCMVlkCTvmtXu5zP0=; b=dbOdnoBhv8yDK+GYyHYeN5G+d7/MHd9XVwc4tEPW7W3O+cZRT3Wv7x+znTXvNtO+f3 6bEsAy40OaeqAaep5aSPNUgrcDvjiV7Mbt0ptchb8A3pYElKJt1bNlTdgx7zFZ4MxyHX 0B5xxhVpkjVQjpaCkqHkLxuE/ntC47wRcsxOZZAg+3yzJVnK0NBoVC5kQbXHy99msudO Oe6PF6U9kalMK2v1ZjOxyE9Jl0fXH67dfu4LtcJmdbOW8f8piy081NEqSkdF/H+wgM4B qYkLF9/8nR7Xdej7zfxtkgHwip0sZnt3tx+uk8zKb9pebOPiv1hSqyGFpMK/ACJwdU4+ eKoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=66QHju611WH53dYegSIeTLMF0HCCMVlkCTvmtXu5zP0=; b=dNddc1XxhW8ezhLbC6DogwDs8iMtm09CZD7upBlT7l5WMm++TT9eqiaYqTPAlbBPAJ ZKkJ9Q38XyXfdjfnvmoufFhLRz85hNbu5x+8Z07jO3PXPD2sYumt5zhWxlIFBfQ4lHtL tYs0PTh6EWfwuAKfcRUnFJgmV+oPQkkwPRy14Ly/6CBFmfYn4llPcQhyygqbV8zRCrUf 3igt6dM5OvwI4nIDH5xaTEfU9M4LSn0mgoXfPeWejx1Xj9d8t4eJepaRwtFpTPRbaB/l 1YF1s7REqrj9pZSo/t4PLbEfNfuZ3l2qdd/Bzbtg0naeM64m1USWIXALMpuxdTY8e9Lx vuDA== X-Gm-Message-State: AOAM532v9rcYiaMQfyADfOqLkXZhP2NdeQR7gmtPe6FE5JGkEXhDBdF9 zygRrjU0fZzsE7LxRPGdHdD8WQ== X-Google-Smtp-Source: ABdhPJx7T73yVTTd72pWQdg0tme4uuKLleJpMJUhliphuOjg/2KAtHyrnRxxoAQV31EyZlOMr2VLWg== X-Received: by 2002:a17:902:ab8c:b0:15e:fd9f:3f39 with SMTP id f12-20020a170902ab8c00b0015efd9f3f39mr16805245plr.103.1652696618000; Mon, 16 May 2022 03:23:38 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:37 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 6/7] sysctl: handle table->maxlen properly for proc_dobool Date: Mon, 16 May 2022 18:22:10 +0800 Message-Id: <20220516102211.41557-7-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Stat-Signature: egjz44xzzk3rf7nzwd4nxb375obr6b4s X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 432C91400B8 X-Rspam-User: Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=dbOdnoBh; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf09.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.175 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-HE-Tag: 1652696609-774496 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Setting ->proc_handler to proc_dobool at the same time setting ->maxlen to sizeof(int) is counter-intuitive, it is easy to make mistakes. For robustness, fix it by handling able->maxlen properly for proc_dobool in __do_proc_dointvec(). In the next patch, we will use proc_dobool which depends on this change. Signed-off-by: Muchun Song Cc: Luis Chamberlain Cc: Kees Cook Cc: Iurii Zaikin --- fs/lockd/svc.c | 2 +- kernel/sysctl.c | 22 ++++++++++++---------- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 59ef8a1f843f..6e48ee787f49 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -496,7 +496,7 @@ static struct ctl_table nlm_sysctls[] = { { .procname = "nsm_use_hostnames", .data = &nsm_use_hostnames, - .maxlen = sizeof(int), + .maxlen = sizeof(nsm_use_hostnames), .mode = 0644, .proc_handler = proc_dobool, }, diff --git a/kernel/sysctl.c b/kernel/sysctl.c index e52b6e372c60..353fb9093012 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -428,6 +428,8 @@ static int do_proc_dobool_conv(bool *negp, unsigned long *lvalp, int write, void *data) { if (write) { + if (*negp || (*lvalp != 0 && *lvalp != 1)) + return -EINVAL; *(bool *)valp = *lvalp; } else { int val = *(bool *)valp; @@ -489,17 +491,17 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, int write, void *data), void *data) { - int *i, vleft, first = 1, err = 0; + int vleft, first = 1, err = 0, size; size_t left; char *p; - + if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } - - i = (int *) tbl_data; - vleft = table->maxlen / sizeof(*i); + + size = conv == do_proc_dobool_conv ? sizeof(bool) : sizeof(int); + vleft = table->maxlen / size; left = *lenp; if (!conv) @@ -514,7 +516,7 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, p = buffer; } - for (; left && vleft--; i++, first=0) { + for (; left && vleft--; tbl_data = (char *)tbl_data + size, first=0) { unsigned long lval; bool neg; @@ -528,12 +530,12 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, sizeof(proc_wspace_sep), NULL); if (err) break; - if (conv(&neg, &lval, i, 1, data)) { + if (conv(&neg, &lval, tbl_data, 1, data)) { err = -EINVAL; break; } } else { - if (conv(&neg, &lval, i, 0, data)) { + if (conv(&neg, &lval, tbl_data, 0, data)) { err = -EINVAL; break; } @@ -708,8 +710,8 @@ int do_proc_douintvec(struct ctl_table *table, int write, * @lenp: the size of the user buffer * @ppos: file position * - * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. + * Reads/writes up to table->maxlen/sizeof(bool) bool values from/to + * the user buffer, treated as an ASCII string. * * Returns 0 on success. */ From patchwork Mon May 16 10:22:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 12850552 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64BB5C433FE for ; Mon, 16 May 2022 10:23:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA1208D0002; Mon, 16 May 2022 06:23:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C50E96B007E; Mon, 16 May 2022 06:23:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACA5C8D0002; Mon, 16 May 2022 06:23:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9D8456B007D for ; Mon, 16 May 2022 06:23:45 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 74AE3218E6 for ; Mon, 16 May 2022 10:23:45 +0000 (UTC) X-FDA: 79471220010.27.5014DAB Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf16.hostedemail.com (Postfix) with ESMTP id 3CE001800DE for ; Mon, 16 May 2022 10:23:34 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id k16so6857464pff.5 for ; Mon, 16 May 2022 03:23:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=T2nrfWBUpSmxs8BMPg2U3sDIf+/GmZoBVIZhJdaoytg=; b=dKfrkrt8fHskoCLeIWrzp3GTykb1mfrzZfzaNwlQhbuuppQU9DwGo3kN4jh2/u0bLJ 3wc5VM8Cum+4SLs8sovUu57dWoGKwIrj+pr6k+XfkLP4kCraKzQWV0J35AENNIYdmpcZ ALZdPHw2X/7y26GuGhJevp3hLFXJQCdq5mOMtIdYMASSV8V0hfb6pVUoty2udz2oz+vi ODQsPi1GDNcbojpVQHDCO5mqwsikzPZnK2mLUXGX03e3QleSYgMvs+4uB+9UplS+EYi9 OsUTfuLgwiatHZvBWUb+OhVnbkCxExJa17wYKRhMpX9DevNs6EhyvSsJy0DBiAXjzmZy HxqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=T2nrfWBUpSmxs8BMPg2U3sDIf+/GmZoBVIZhJdaoytg=; b=yhOJDXt3HtpiahLHQV5dQKT0vREKgK1bgX+nMqnKr4g9Dz/GwViI6jd8OkjyZzSUEm CDyFwFtMb5jKGdYwbdSi/e80ISBUTvbcu4iL+WYAp0YwmQGhvLswCejn92KAhpsomZJU X2UGEQkm8zcrLstDF4rUdeGaKbgOZ7XDDiCmJs8VhoBrqw4Kq/C5jmXwDxZX1D0AlTK7 +/i5PhXA4SoDMbJBesHrJ8C9uTma+0y600nZNYVGyWyTAPz1bcjfXllLkECYsO1Bggl8 v7oG/4cEH1yUQbc2xH8qSC0LpvR+EW4UD0G+OyqAUlxXZJ4sZ+sdZqCMR40bgp9QRr1v m8nQ== X-Gm-Message-State: AOAM531DsTbVkBnPYlyL6YMByDCvhArVsNtxBNhMplYOloEgv3iNZQk1 ilXBUIxyMo1On+mtD9CG6nZ/Cw== X-Google-Smtp-Source: ABdhPJz3wbyg+mEtptAOIyrqWF5aj1M8NyLRlopaeMm1hMN0AyhWhrSKsExN37wIZ3lwQKkUyFyMKg== X-Received: by 2002:a05:6a00:198f:b0:50d:bf61:3de9 with SMTP id d15-20020a056a00198f00b0050dbf613de9mr16755109pfl.16.1652696623985; Mon, 16 May 2022 03:23:43 -0700 (PDT) Received: from FVFYT0MHHV2J.bytedance.net ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id i9-20020aa79089000000b0050dc76281e4sm6472731pfa.190.2022.05.16.03.23.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 May 2022 03:23:43 -0700 (PDT) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, osalvador@suse.de, david@redhat.com, masahiroy@kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com, Muchun Song Subject: [PATCH v12 7/7] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl Date: Mon, 16 May 2022 18:22:11 +0800 Message-Id: <20220516102211.41557-8-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.1 (Apple Git-133) In-Reply-To: <20220516102211.41557-1-songmuchun@bytedance.com> References: <20220516102211.41557-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3CE001800DE Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=dKfrkrt8; spf=pass (imf16.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-Stat-Signature: kqnr5xhpms88w71rdc9i335zjm1n8taf X-HE-Tag: 1652696614-553415 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and reboot the server to enable or disable the feature of optimizing vmemmap pages associated with HugeTLB pages. However, rebooting usually takes a long time. So add a sysctl to enable or disable the feature at runtime without rebooting. Why we need this? There are 3 use cases. 1) The feature of minimizing overhead of struct page associated with each HugeTLB is disabled by default without passing "hugetlb_free_vmemmap=on" to the boot cmdline. When we (ByteDance) deliver the servers to the users who want to enable this feature, they have to configure the grub (change boot cmdline) and reboot the servers, whereas rebooting usually takes a long time (we have thousands of servers). It's a very bad experience for the users. So we need a approach to enable this feature after rebooting. This is a use case in our practical environment. 2) Some use cases are that HugeTLB pages are allocated 'on the fly' instead of being pulled from the HugeTLB pool, those workloads would be affected with this feature enabled. Those workloads could be identified by the characteristics of they never explicitly allocating huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time. We can confirm it is a real use case from the commit 099730d67417. For those workloads, the page fault time could be ~2x slower than before. We suspect those users want to disable this feature if the system has enabled this before and they don't think the memory savings benefit is enough to make up for the performance drop. 3) If the workload which wants vmemmap pages to be optimized and the workload which wants to set 'nr_overcommit_hugepages' and does not want the extera overhead at fault time when the overcommitted pages be allocated from the buddy allocator are deployed in the same server. The user could enable this feature and set 'nr_hugepages' and 'nr_overcommit_hugepages', then disable the feature. In this case, the overcommited HugeTLB pages will not encounter the extra overhead at fault time. Signed-off-by: Muchun Song --- Documentation/admin-guide/sysctl/vm.rst | 38 ++++++++++++++++++++ include/linux/page-flags.h | 6 ++-- mm/hugetlb_vmemmap.c | 61 ++++++++++++++++++++++----------- 3 files changed, 81 insertions(+), 24 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 747e325ebcd0..d7374a1e8ac9 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -562,6 +562,44 @@ Change the minimum size of the hugepage pool. See Documentation/admin-guide/mm/hugetlbpage.rst +hugetlb_optimize_vmemmap +======================== + +This knob is not available when the size of 'struct page' (a structure defined +in include/linux/mm_types.h) is not power of two (an unusual system config could +result in this). + +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages +associated with each HugeTLB page. + +Once enabled, the vmemmap pages of subsequent allocation of HugeTLB pages from +buddy allocator will be optimized (7 pages per 2MB HugeTLB page and 4095 pages +per 1GB HugeTLB page), whereas already allocated HugeTLB pages will not be +optimized. When those optimized HugeTLB pages are freed from the HugeTLB pool +to the buddy allocator, the vmemmap pages representing that range needs to be +remapped again and the vmemmap pages discarded earlier need to be rellocated +again. If your use case is that HugeTLB pages are allocated 'on the fly' (e.g. +never explicitly allocating HugeTLB pages with 'nr_hugepages' but only set +'nr_overcommit_hugepages', those overcommitted HugeTLB pages are allocated 'on +the fly') instead of being pulled from the HugeTLB pool, you should weigh the +benefits of memory savings against the more overhead (~2x slower than before) +of allocation or freeing HugeTLB pages between the HugeTLB pool and the buddy +allocator. Another behavior to note is that if the system is under heavy memory +pressure, it could prevent the user from freeing HugeTLB pages from the HugeTLB +pool to the buddy allocator since the allocation of vmemmap pages could be +failed, you have to retry later if your system encounter this situation. + +Once disabled, the vmemmap pages of subsequent allocation of HugeTLB pages from +buddy allocator will not be optimized meaning the extra overhead at allocation +time from buddy allocator disappears, whereas already optimized HugeTLB pages +will not be affected. If you want to make sure there are no optimized HugeTLB +pages, you can set "nr_hugepages" to 0 first and then disable this. Note that +writing 0 to nr_hugepages will make any "in use" HugeTLB pages become surplus +pages. So, those surplus pages are still optimized until they are no longer +in use. You would need to wait for those surplus pages to be released before +there are no optimized pages in the system. + + nr_hugepages_mempolicy ====================== diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 404f4ede17f5..07d8d444d9f1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -200,8 +200,7 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); /* * If the feature of optimizing vmemmap pages associated with each HugeTLB @@ -221,8 +220,7 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key)) + if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) return page; /* diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index d1fea65fec98..02862f117c2b 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -22,23 +22,15 @@ #define RESERVE_VMEMMAP_NR 1U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) -DEFINE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key); +static bool optimize_vmemmap_enabled = + IS_ENABLED(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON); + static int __init hugetlb_vmemmap_early_param(char *buf) { - bool enable; - - if (kstrtobool(buf, &enable)) - return -EINVAL; - - if (enable) - static_branch_enable(&hugetlb_optimize_vmemmap_key); - else - static_branch_disable(&hugetlb_optimize_vmemmap_key); - - return 0; + return kstrtobool(buf, &optimize_vmemmap_enabled); } early_param("hugetlb_free_vmemmap", hugetlb_vmemmap_early_param); @@ -69,8 +61,10 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head) */ ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE); - if (!ret) + if (!ret) { ClearHPageVmemmapOptimized(head); + static_branch_dec(&hugetlb_optimize_vmemmap_key); + } return ret; } @@ -81,6 +75,9 @@ static unsigned int optimizable_vmemmap_pages(struct hstate *h, unsigned long pfn = page_to_pfn(head); unsigned long end = pfn + pages_per_huge_page(h); + if (!READ_ONCE(optimize_vmemmap_enabled)) + return 0; + for (; pfn < end; pfn += PAGES_PER_SECTION) { if (section_cannot_optimize_vmemmap(__pfn_to_section(pfn))) return 0; @@ -98,6 +95,8 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head) if (!vmemmap_pages) return; + static_branch_inc(&hugetlb_optimize_vmemmap_key); + vmemmap_addr += RESERVE_VMEMMAP_SIZE; vmemmap_end = vmemmap_addr + (vmemmap_pages << PAGE_SHIFT); vmemmap_reuse = vmemmap_addr - PAGE_SIZE; @@ -107,7 +106,9 @@ void hugetlb_vmemmap_free(struct hstate *h, struct page *head) * to the page which @vmemmap_reuse is mapped to, then free the pages * which the range [@vmemmap_addr, @vmemmap_end] is mapped to. */ - if (!vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + if (vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse)) + static_branch_dec(&hugetlb_optimize_vmemmap_key); + else SetHPageVmemmapOptimized(head); } @@ -124,13 +125,8 @@ void __init hugetlb_vmemmap_init(struct hstate *h) BUILD_BUG_ON(__NR_USED_SUBPAGE >= RESERVE_VMEMMAP_SIZE / sizeof(struct page)); - if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key)) - return; - if (!is_power_of_2(sizeof(struct page))) { pr_warn_once("cannot optimize vmemmap pages because \"struct page\" crosses page boundaries\n"); - static_branch_disable(&hugetlb_optimize_vmemmap_key); return; } @@ -149,3 +145,28 @@ void __init hugetlb_vmemmap_init(struct hstate *h) pr_info("can optimize %d vmemmap pages for %s\n", h->optimize_vmemmap_pages, h->name); } + +#ifdef CONFIG_PROC_SYSCTL +static struct ctl_table hugetlb_vmemmap_sysctls[] = { + { + .procname = "hugetlb_optimize_vmemmap", + .data = &optimize_vmemmap_enabled, + .maxlen = sizeof(optimize_vmemmap_enabled), + .mode = 0644, + .proc_handler = proc_dobool, + }, + { } +}; + +static int __init hugetlb_vmemmap_sysctls_init(void) +{ + /* + * If "struct page" crosses page boundaries, the vmemmap pages cannot + * be optimized. + */ + if (is_power_of_2(sizeof(struct page))) + register_sysctl_init("vm", hugetlb_vmemmap_sysctls); + return 0; +} +late_initcall(hugetlb_vmemmap_sysctls_init); +#endif /* CONFIG_PROC_SYSCTL */