mbox series

[v1,00/12] Make MAX_ORDER adjustable as a kernel boot time parameter.

Message ID 20220922011252.2266780-1-zi.yan@sent.com (mailing list archive)
Headers show
Series Make MAX_ORDER adjustable as a kernel boot time parameter. | expand

Message

Zi Yan Sept. 22, 2022, 1:12 a.m. UTC
From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset adds support for kernel boot time adjustable MAX_ORDER, so that
user can change the largest size of pages buddy allocator allocates.
It is on top of mm-everything-2022-09-19-00-45.

Changelog
===
From RFCv2

1. Dropped RFC, collected reviewed-by.
2. Added back page validation check in find_buddy_page_pfn() since it is
needed when zone is not contiguous.
3. Converted MAX_ORDER sized static array used in recently added kmsan code to
a dynamic one.

Motivation
===

This enables kernel to allocate 1GB pages and is necessary for my ongoing work
on adding support for 1GB PUD THP[1]. This is also the conclusion I came up with
after some discussion with David Hildenbrand on what methods should be used for
allocating gigantic pages[2], since other approaches like using CMA allocator or
alloc_contig_pages() are regarded as suboptimal.

In addition, make MAX_ORDER a kernel boot time parameter can enable user to
adjust buddy allocator without recompiling the kernel for their own needs, so
that one can still have a small MAX_ORDER if he/she does not need to allocate
gigantic pages like 1GB PUD THPs.

Background
===

At the moment, kernel imposes MAX_ORDER - 1 + PAGE_SHFIT < SECTION_SIZE_BITS
restriction. This prevents buddy allocator merging pages across memory sections,
as PFNs might not be contiguous and code like page++ would fail. But this would
not be an issue when SPARSEMEM_VMEMMAP is set, since all struct page are
virtually contiguous. So boot time adjustable MAX_ORDER depends on
SPARSEMEM_VMEMMAP.

Description
===

I tested the patchset on both x86_64 and ARM64 at 4KB base pages. The systems
boot and run.

In terms of the concerns on performance degradation if MAX_ORDER is increased,
I run vm-scalability from lkp comparing current system, my patchset with
MAX_ORDER=11 and my patchset with MAX_ORDER=20 on a x86_64 VM and saw
almost no performance difference, please vm-scalability reports in the
RFCv2: https://lore.kernel.org/linux-mm/20220811231643.1012912-1-zi.yan@sent.com/

Patch 1 changes MAX_ORDER to represent the max order of pages allocated
by buddy allocator. right now MAX_ORDER - 1 represents that and it is
confusing. Suggested by Vlastimil Babka. checkpatch.pl is updated to
warn future use of MAX_ORDER, since its semantics is changed.

Patch 2 adds a page validation in find_buddy_page_pfn() when zone is not
contiguous, since some pages in the middle of a zone can be invalid.

Patch 3 make deferred struct page initialization work when MAX_ORDER is
bigger than a memory section size.

Patch 4-7 convert the use of MAX_ORDER to pageblock_order. Since
pageblock_order is a constant when MAX_ORDER can be changed at boot time
and close to current MAX_ORDER value. I separate changes to different patches
for easy review and can merge them into a single one if that works better.

Patch 8 replaces MAX_ORDER with MAX_PHYS_CONTIG_ORDER when it is used to
indicate the maximum number of physically contiguous pages.

Patch 9 adds a new Kconfig option SET_MAX_ORDER to allow specifying MAX_ORDER
when ARCH_FORCE_MAX_ORDER is not used by the arch, like x86_64.

Patch 10 converts statically allocated arrays with MAX_ORDER length to dynamic
ones if possible and prepares for making MAX_ORDER a boot time parameter.

Patch 11 adds a new MIN_MAX_ORDER constant to replace soon-to-be-dynamic
MAX_ORDER for places where converting static array to dynamic one is causing
hassle and not necessary, i.e., ARM64 hypervisor page allocation and SLAB.

Patch 12 changes MAX_ORDER to be a kernel boot time parameter and it is
opt-in as an mm/Kconfig option.


Any suggestion and/or comment is welcome. Thanks.


[1] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/e132fdd9-65af-1cad-8a6e-71844ebfe6a2@redhat.com/

Zi Yan (12):
  mm: rectify MAX_ORDER semantics to be the largest page order from
    buddy allocator
  mm: check page validity when find a buddy page in a non-contiguous
    zone
  mm: adapt deferred struct page init to new MAX_ORDER.
  mm: prevent pageblock size being larger than section size.
  fs: proc: use pageblock_nr_pages for reschedule period in read_kcore()
  virtio: virtio_balloon: use pageblock_order instead of MAX_ORDER
  mm/page_reporting: set page_reporting_order to -1 to prevent it
    running
  mm: replace MAX_ORDER when it is used to indicate max physical
    contiguity.
  mm: Make MAX_ORDER of buddy allocator configurable via Kconfig
    SET_MAX_ORDER.
  mm: convert MAX_ORDER sized static arrays to dynamic ones.
  mm: introduce MIN_MAX_ORDER to replace MAX_ORDER as compile time
    constant.
  mm: make MAX_ORDER a kernel boot time parameter.

 .../admin-guide/kdump/vmcoreinfo.rst          |   4 +-
 .../admin-guide/kernel-parameters.txt         |   9 +-
 arch/Kconfig                                  |   4 +
 arch/arc/Kconfig                              |   4 +-
 arch/arm/Kconfig                              |  12 +-
 arch/arm/configs/imx_v6_v7_defconfig          |   2 +-
 arch/arm/configs/milbeaut_m10v_defconfig      |   2 +-
 arch/arm/configs/oxnas_v6_defconfig           |   2 +-
 arch/arm/configs/pxa_defconfig                |   2 +-
 arch/arm/configs/sama7_defconfig              |   2 +-
 arch/arm/configs/sp7021_defconfig             |   2 +-
 arch/arm64/Kconfig                            |  16 +--
 arch/arm64/include/asm/sparsemem.h            |   2 +-
 arch/arm64/kvm/hyp/include/nvhe/gfp.h         |   2 +-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |   2 +-
 arch/csky/Kconfig                             |   2 +-
 arch/ia64/Kconfig                             |   8 +-
 arch/ia64/include/asm/sparsemem.h             |   4 +-
 arch/ia64/mm/hugetlbpage.c                    |   2 +-
 arch/loongarch/Kconfig                        |  16 +--
 arch/m68k/Kconfig.cpu                         |   8 +-
 arch/mips/Kconfig                             |  22 ++-
 arch/nios2/Kconfig                            |  10 +-
 arch/powerpc/Kconfig                          |  30 ++---
 arch/powerpc/configs/85xx/ge_imp3a_defconfig  |   2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config     |   2 +-
 arch/powerpc/mm/book3s64/iommu_api.c          |   2 +-
 arch/powerpc/mm/hugetlbpage.c                 |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c     |   2 +-
 arch/sh/configs/ecovec24_defconfig            |   2 +-
 arch/sh/mm/Kconfig                            |  20 ++-
 arch/sparc/Kconfig                            |   8 +-
 arch/sparc/kernel/pci_sun4v.c                 |   2 +-
 arch/sparc/kernel/traps_64.c                  |   2 +-
 arch/sparc/mm/tsb.c                           |   4 +-
 arch/um/kernel/um_arch.c                      |   4 +-
 arch/xtensa/Kconfig                           |   8 +-
 drivers/base/regmap/regmap-debugfs.c          |   8 +-
 drivers/crypto/hisilicon/sgl.c                |   6 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |   2 +-
 drivers/gpu/drm/ttm/ttm_device.c              |   7 +-
 drivers/gpu/drm/ttm/ttm_pool.c                |  72 ++++++++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   2 +-
 drivers/irqchip/irq-gic-v3-its.c              |   4 +-
 drivers/md/dm-bufio.c                         |   2 +-
 drivers/misc/genwqe/card_utils.c              |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   2 +-
 drivers/net/ethernet/ibm/ibmvnic.h            |   2 +-
 drivers/video/fbdev/hyperv_fb.c               |   6 +-
 drivers/virtio/virtio_balloon.c               |   2 +-
 drivers/virtio/virtio_mem.c                   |   8 +-
 fs/proc/kcore.c                               |   2 +-
 fs/ramfs/file-nommu.c                         |   2 +-
 include/drm/ttm/ttm_pool.h                    |   4 +-
 include/linux/hugetlb.h                       |   2 +-
 include/linux/mmzone.h                        |  36 ++++-
 include/linux/pageblock-flags.h               |  21 ++-
 include/linux/slab.h                          |   8 +-
 kernel/crash_core.c                           |   2 +-
 kernel/dma/pool.c                             |   8 +-
 kernel/events/ring_buffer.c                   |   2 +-
 mm/Kconfig                                    |  33 ++++-
 mm/compaction.c                               |   8 +-
 mm/debug_vm_pgtable.c                         |   4 +-
 mm/huge_memory.c                              |   2 +-
 mm/hugetlb.c                                  |   4 +-
 mm/internal.h                                 |   8 +-
 mm/kmsan/init.c                               |  18 ++-
 mm/memblock.c                                 |   8 +-
 mm/memory.c                                   |   4 +-
 mm/memory_hotplug.c                           |   6 +-
 mm/page_alloc.c                               | 127 +++++++++++++-----
 mm/page_isolation.c                           |  14 +-
 mm/page_owner.c                               |   6 +-
 mm/page_reporting.c                           |   8 +-
 mm/shuffle.h                                  |   2 +-
 mm/slab.c                                     |   2 +-
 mm/slub.c                                     |   6 +-
 mm/vmscan.c                                   |   1 -
 mm/vmstat.c                                   |  14 +-
 net/smc/smc_ib.c                              |   2 +-
 scripts/checkpatch.pl                         |   8 ++
 security/integrity/ima/ima_crypto.c           |   2 +-
 tools/testing/memblock/linux/mmzone.h         |   6 +-
 84 files changed, 462 insertions(+), 272 deletions(-)