From patchwork Thu Aug 5 19:02:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zi Yan X-Patchwork-Id: 12421855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E6ADC4338F for ; Thu, 5 Aug 2021 19:03:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7A0176105A for ; Thu, 5 Aug 2021 19:03:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7A0176105A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B92FD6B006C; Thu, 5 Aug 2021 15:03:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B42C56B0071; Thu, 5 Aug 2021 15:03:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0AA26B0072; Thu, 5 Aug 2021 15:03:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id 86F006B006C for ; Thu, 5 Aug 2021 15:03:43 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2D34A23E46 for ; Thu, 5 Aug 2021 19:03:43 +0000 (UTC) X-FDA: 78441951126.39.28A421B Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by imf23.hostedemail.com (Postfix) with ESMTP id B2BBF9001B39 for ; Thu, 5 Aug 2021 19:03:42 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 36DB65C012E; Thu, 5 Aug 2021 15:03:42 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Thu, 05 Aug 2021 15:03:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:reply-to:mime-version :content-transfer-encoding; s=fm1; bh=ikQQ45fLfFR3RTVxZ01ZPTIWq6 pilqkf8qSMtcqFJC4=; b=jAlF4nf9E1HY7B801BTjxPRTNtIA9jcRNBcn9rE6IK DbPRiy9n9jPEuFaq6ezeBGXFI9hN6ApWAHtH+GjhVnp0JcegDF2ioaX8G73ja5ch gy7wh/W/PB+GUwhfpjHx4Wvs+lZzJGAGvG/WcP+KfSn1Ly0P1yKhfKkvPwNfF6rl ICjzyBFq7z/F6GW+EGItssg+kakooioVNpTG9okFWAAUSgHE1MAV/Zn6aKd5IejF j1OKShe47+s+EwDHvLt28nrwid5t5s/Vp8/z9pAh7P7qEvz0R6dMyKxe2+akeX6e qooi+2kxLS/7oYYdtP5kbLdnW9utylH6gPcvgT/cYSSA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :message-id:mime-version:reply-to:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=ikQQ45 fLfFR3RTVxZ01ZPTIWq6pilqkf8qSMtcqFJC4=; b=koK5lJxwsX3Vwp1chcjxwV 5kboLkMieJmt386YaUjllfuHZYrN30bLliezuOZWPS72a8Dq6eqyT5EoBgMnMNMC ul9lyWKqvzq6Hgd3TPd58DCm1R7ug8rnq2p/ViQXLIwvC3lpeA0kw5DIhjvwEayr itSIxSRjSXhwST038G9Jz0NvO7gyXcdLVUXk+Gx+pmEHI6FWj+CtiGckqgn09hpm DGJK9Ftnq28FniPkWBdpUMBn7iVxrgZbsr9hW5KwwWLIXdwIovdcYwrQ2sKkia+x Nvr3mZVfwnE0odVVa8FpLJVCbQ2AK7YExmHcRgk0xRzmCoUNfviv2ebnmIshb5NQ == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrieelgdduvdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfforhgggfestdhqredtredttdenucfhrhhomhepkghiucgjrghn uceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepteevueeije eklefhudejkeehfeffuedvhfdtteffvefgleefheefgfelveetvdfhnecuffhomhgrihhn pehsihiivgdrmhhmpdhprghgvghsrdhmmhenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 5 Aug 2021 15:03:40 -0400 (EDT) From: Zi Yan To: David Hildenbrand , linux-mm@kvack.org Cc: Matthew Wilcox , Vlastimil Babka , "Kirill A . Shutemov" , Mike Kravetz , Michal Hocko , John Hubbard , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter. Date: Thu, 5 Aug 2021 15:02:38 -0400 Message-Id: <20210805190253.2795604-1-zi.yan@sent.com> X-Mailer: git-send-email 2.30.2 Reply-To: Zi Yan MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: B2BBF9001B39 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=sent.com header.s=fm1 header.b=jAlF4nf9; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=koK5lJxw; dmarc=pass (policy=none) header.from=sent.com; spf=pass (imf23.hostedemail.com: domain of zi.yan@sent.com designates 66.111.4.26 as permitted sender) smtp.mailfrom=zi.yan@sent.com X-Stat-Signature: y77skqtjc9srjbuxpoqbo5zpkz5bxzwy X-HE-Tag: 1628190222-778105 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan Hi all, This patchset add support for kernel boot time adjustable MAX_ORDER, so that user can change the largest size of pages obtained from buddy allocator. It also removes the restriction on MAX_ORDER based on SECTION_SIZE_BITS, so that buddy allocator can merge PFNs across memory sections when SPARSEMEM_VMEMMAP is set. It is on top of v5.14-rc4-mmotm-2021-08-02-18-51. Motivation === This enables kernel to allocate 1GB pages and is necessary for my ongoing work on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with after some discussion with David Hildenbrand on what methods should be used for allocating gigantic pages[2], since other approaches like using CMA allocator or alloc_contig_pages() are regarded as suboptimal. This also prevents increasing SECTION_SIZE_BITS when increasing MAX_ORDER, since increasing SECTION_SIZE_BITS is not desirable as memory hotadd/hotremove chunk size will be increased as well, causing memory management difficulty for VMs. In addition, make MAX_ORDER a kernel boot time parameter can enable user to adjust buddy allocator without recompiling the kernel for their own needs, so that one can still have a small MAX_ORDER if he/she does not need to allocate gigantic pages like 1GB PUD THPs. Background === At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS restriction. This prevents buddy allocator merging pages across memory sections, as PFNs might not be contiguous and code like page++ would fail. But this would not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are virtually contiguous. In addition, as long as buddy allocator checks the PFN validity during buddy page merging (done in Patch 3), pages allocated from buddy allocator can be manipulated by code like page++. Description === I tested the patchset on both x86_64 and ARM64 at 4KB, 16KB, and 64KB base pages. The systems boot and ltp mm test suite finished without issue. Also memory hotplug worked on x86_64 when I tested. It definitely needs more tests and reviews for other architectures. In terms of the concerns on performance degradation if MAX_ORDER is increased, I did some initial performance tests comparing MAX_ORDER=11 and MAX_ORDER=20 on x86_64 machines and saw no performance difference[3]. Patch 1 excludes MAX_ORDER check from 32bit vdso compilation. The check uses irrelevant 32bit SECTION_SIZE_BITS during 64bit kernel compilation. The exclusion does not break the check in 32bit kernel, since the check will still be performed during other kernel component compilation. Patch 2 gives FORCE_MAX_ZONEORDER a better name. Patch 3 restores the pfn_valid_within() check when buddy allocator can merge pages across memory sections. The check was removed when ARM64 gets rid of holes in zones, but holes can appear in zones again after this patchset. Patch 4-11 convert the use of MAX_ORDER to SECTION_SIZE_BITS or its derivative constants, since these code places use MAX_ORDER as boundary check for physically contiguous pages, where SECTION_SIZE_BITS should be used. After this patchset, MAX_ORDER can go beyond SECTION_SIZE_BITS, the code can break. I separate changes to different patches for easy review and can merge them into a single one if that works better. Patch 12 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64. Patch 13 converts statically allocated arrays with MAX_ORDER length to dynamic ones if possible and prepares for making MAX_ORDER a boot time parameter. Patch 14 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic MAX_ORDER for places where converting static array to dynamic one is causing hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB. Patch 15 finally changes MAX_ORDER to be a kernel boot time parameter. Any suggestion and/or comment is welcome. Thanks. TODO === 1. Redo the performance comparison tests using this patchset to understand the performance implication of changing MAX_ORDER. Zi Yan (15): arch: x86: remove MAX_ORDER exceeding SECTION_SIZE check for 32bit vdso. arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER mm: check pfn validity when buddy allocator can merge pages across mem sections. mm: prevent pageblock size being larger than section size. mm/memory_hotplug: online pages at section size. mm: use PAGES_PER_SECTION instead for mem_map_offset/next(). mm: hugetlb: use PAGES_PER_SECTION to check mem_map discontiguity fs: proc: use PAGES_PER_SECTION for page offline checking period. virtio: virtio_mem: use PAGES_PER_SECTION instead of MAX_ORDER_NR_PAGES virtio: virtio_balloon: use PAGES_PER_SECTION instead of MAX_ORDER_NR_PAGES. mm/page_reporting: report pages at section size instead of MAX_ORDER. mm: Make MAX_ORDER of buddy allocator configurable via Kconfig SET_MAX_ORDER. mm: convert MAX_ORDER sized static arrays to dynamic ones. mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time constant. mm: make MAX_ORDER a kernel boot time parameter. .../admin-guide/kdump/vmcoreinfo.rst | 2 +- .../admin-guide/kernel-parameters.txt | 5 + arch/Kconfig | 4 + arch/arc/Kconfig | 2 +- arch/arm/Kconfig | 2 +- arch/arm/configs/imx_v6_v7_defconfig | 2 +- arch/arm/configs/milbeaut_m10v_defconfig | 2 +- arch/arm/configs/oxnas_v6_defconfig | 2 +- arch/arm/configs/sama7_defconfig | 2 +- arch/arm64/Kconfig | 2 +- arch/arm64/kvm/hyp/include/nvhe/gfp.h | 2 +- arch/arm64/kvm/hyp/nvhe/page_alloc.c | 3 +- arch/csky/Kconfig | 2 +- arch/ia64/Kconfig | 2 +- arch/ia64/include/asm/sparsemem.h | 6 +- arch/m68k/Kconfig.cpu | 2 +- arch/mips/Kconfig | 2 +- arch/nios2/Kconfig | 2 +- arch/powerpc/Kconfig | 2 +- arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +- arch/powerpc/configs/fsl-emb-nonhw.config | 2 +- arch/sh/configs/ecovec24_defconfig | 2 +- arch/sh/mm/Kconfig | 2 +- arch/sparc/Kconfig | 2 +- arch/x86/entry/vdso/Makefile | 1 + arch/xtensa/Kconfig | 2 +- drivers/gpu/drm/ttm/ttm_device.c | 7 +- drivers/gpu/drm/ttm/ttm_pool.c | 58 +++++++++- drivers/virtio/virtio_balloon.c | 2 +- drivers/virtio/virtio_mem.c | 12 +- fs/proc/kcore.c | 2 +- include/drm/ttm/ttm_pool.h | 4 +- include/linux/memory_hotplug.h | 1 + include/linux/mmzone.h | 56 ++++++++- include/linux/pageblock-flags.h | 7 +- include/linux/slab.h | 8 +- mm/Kconfig | 16 +++ mm/compaction.c | 20 ++-- mm/hugetlb.c | 2 +- mm/internal.h | 4 +- mm/memory_hotplug.c | 18 ++- mm/page_alloc.c | 108 ++++++++++++++++-- mm/page_isolation.c | 7 +- mm/page_owner.c | 14 ++- mm/page_reporting.c | 3 +- mm/slab.c | 2 +- mm/slub.c | 7 +- mm/vmscan.c | 1 - 48 files changed, 334 insertions(+), 86 deletions(-)