Message ID | 20250213075703.1270713-1-quic_zhenhuah@quicinc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v6] arm64: mm: Populate vmemmap/linear at the page level for hotplugged sections | expand |
On 13.02.25 08:57, Zhenhua Huang wrote: > On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set > to 27, making one section 128M. The related page struct which vmemmap > points to is 2M then. > Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the > vmemmap to populate at the PMD section level which was suitable > initially since hot plug granule is always one section(128M). However, > commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the > existing arm64 assumptions. > > Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when > pmd_sect() is true, the entire PMD section is cleared, even if there is > other effective subsection. For example page_struct_map1 and > page_strcut_map2 are part of a single PMD entry and they are hot-added > sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear > the entire PMD entry freeing the struct page map for the whole section, > even though page_struct_map2 is still active. Similar problem exists > with linear mapping as well, for 16K base page(PMD size = 32M) or 64K > base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. > Tearing down the entire PMD mapping too will leave other subsections > unmapped in the linear mapping. > > To address the issue, we need to prevent PMD/PUD/CONT mappings for both > linear and vmemmap for non-boot sections if corresponding size on the > given base page exceeds SUBSECTION_SIZE(2MB now). > > Cc: <stable@vger.kernel.org> # v5.4+ > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> > Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Just so I understand correctly: for ordinary memory-sections-size hotplug (NVDIMM, virtio-mem), we still get a large mapping where possible?
On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote: > On 13.02.25 08:57, Zhenhua Huang wrote: > > On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set > > to 27, making one section 128M. The related page struct which vmemmap > > points to is 2M then. > > Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the > > vmemmap to populate at the PMD section level which was suitable > > initially since hot plug granule is always one section(128M). However, > > commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > > introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the > > existing arm64 assumptions. > > > > Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when > > pmd_sect() is true, the entire PMD section is cleared, even if there is > > other effective subsection. For example page_struct_map1 and > > page_strcut_map2 are part of a single PMD entry and they are hot-added > > sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear > > the entire PMD entry freeing the struct page map for the whole section, > > even though page_struct_map2 is still active. Similar problem exists > > with linear mapping as well, for 16K base page(PMD size = 32M) or 64K > > base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. > > Tearing down the entire PMD mapping too will leave other subsections > > unmapped in the linear mapping. > > > > To address the issue, we need to prevent PMD/PUD/CONT mappings for both > > linear and vmemmap for non-boot sections if corresponding size on the > > given base page exceeds SUBSECTION_SIZE(2MB now). > > > > Cc: <stable@vger.kernel.org> # v5.4+ > > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> > > Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> > > Just so I understand correctly: for ordinary memory-sections-size hotplug > (NVDIMM, virtio-mem), we still get a large mapping where possible? Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The vmemmap mapping is also limited to PAGE_SIZE mappings (we could use contiguous mappings for vmemmap but it's not wired up; I don't think it's worth the hassle).
On 13.02.25 16:49, Catalin Marinas wrote: > On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote: >> On 13.02.25 08:57, Zhenhua Huang wrote: >>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set >>> to 27, making one section 128M. The related page struct which vmemmap >>> points to is 2M then. >>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >>> vmemmap to populate at the PMD section level which was suitable >>> initially since hot plug granule is always one section(128M). However, >>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the >>> existing arm64 assumptions. >>> >>> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when >>> pmd_sect() is true, the entire PMD section is cleared, even if there is >>> other effective subsection. For example page_struct_map1 and >>> page_strcut_map2 are part of a single PMD entry and they are hot-added >>> sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear >>> the entire PMD entry freeing the struct page map for the whole section, >>> even though page_struct_map2 is still active. Similar problem exists >>> with linear mapping as well, for 16K base page(PMD size = 32M) or 64K >>> base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. >>> Tearing down the entire PMD mapping too will leave other subsections >>> unmapped in the linear mapping. >>> >>> To address the issue, we need to prevent PMD/PUD/CONT mappings for both >>> linear and vmemmap for non-boot sections if corresponding size on the >>> given base page exceeds SUBSECTION_SIZE(2MB now). >>> >>> Cc: <stable@vger.kernel.org> # v5.4+ >>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> >>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> >> >> Just so I understand correctly: for ordinary memory-sections-size hotplug >> (NVDIMM, virtio-mem), we still get a large mapping where possible? > > Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The > vmemmap mapping is also limited to PAGE_SIZE mappings (we could use > contiguous mappings for vmemmap but it's not wired up; I don't think > it's worth the hassle). But that's messed up, no? If someone hotplugs a memory section, they have to hotunplug a memory section, not parts of it. That's why x86 does in vmemmap_populate(): if (end - start < PAGES_PER_SECTION * sizeof(struct page)) err = vmemmap_populate_basepages(start, end, node, NULL); else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); ... Maybe I'm missing something. Most importantly, why the weird subsection stuff is supposed to degrade ordinary hotplug of dimms/virtio-mem etc.
On Thu, Feb 13, 2025 at 05:16:37PM +0100, David Hildenbrand wrote: > On 13.02.25 16:49, Catalin Marinas wrote: > > On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote: > > > On 13.02.25 08:57, Zhenhua Huang wrote: > > > > On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set > > > > to 27, making one section 128M. The related page struct which vmemmap > > > > points to is 2M then. > > > > Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the > > > > vmemmap to populate at the PMD section level which was suitable > > > > initially since hot plug granule is always one section(128M). However, > > > > commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > > > > introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the > > > > existing arm64 assumptions. > > > > > > > > Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when > > > > pmd_sect() is true, the entire PMD section is cleared, even if there is > > > > other effective subsection. For example page_struct_map1 and > > > > page_strcut_map2 are part of a single PMD entry and they are hot-added > > > > sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear > > > > the entire PMD entry freeing the struct page map for the whole section, > > > > even though page_struct_map2 is still active. Similar problem exists > > > > with linear mapping as well, for 16K base page(PMD size = 32M) or 64K > > > > base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. > > > > Tearing down the entire PMD mapping too will leave other subsections > > > > unmapped in the linear mapping. > > > > > > > > To address the issue, we need to prevent PMD/PUD/CONT mappings for both > > > > linear and vmemmap for non-boot sections if corresponding size on the > > > > given base page exceeds SUBSECTION_SIZE(2MB now). > > > > > > > > Cc: <stable@vger.kernel.org> # v5.4+ > > > > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") > > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> > > > > Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> > > > > > > Just so I understand correctly: for ordinary memory-sections-size hotplug > > > (NVDIMM, virtio-mem), we still get a large mapping where possible? > > > > Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The > > vmemmap mapping is also limited to PAGE_SIZE mappings (we could use > > contiguous mappings for vmemmap but it's not wired up; I don't think > > it's worth the hassle). > > But that's messed up, no? > > If someone hotplugs a memory section, they have to hotunplug a memory > section, not parts of it. > > That's why x86 does in vmemmap_populate(): > > if (end - start < PAGES_PER_SECTION * sizeof(struct page)) > err = vmemmap_populate_basepages(start, end, node, NULL); > else if (boot_cpu_has(X86_FEATURE_PSE)) > err = vmemmap_populate_hugepages(start, end, node, altmap); > ... > > Maybe I'm missing something. Most importantly, why the weird subsection > stuff is supposed to degrade ordinary hotplug of dimms/virtio-mem etc. I think that's based on the discussion for a previous version assuming that the hotplug/unplug sizes are not guaranteed to be symmetric: https://lore.kernel.org/lkml/a720aaa5-a75e-481e-b396-a5f2b50ed362@quicinc.com/ If that's not the case, we can indeed ignore the SUBSECTION_SIZE altogether and just rely on the start/end of the hotplugged region.
On 13.02.25 18:56, Catalin Marinas wrote: > On Thu, Feb 13, 2025 at 05:16:37PM +0100, David Hildenbrand wrote: >> On 13.02.25 16:49, Catalin Marinas wrote: >>> On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote: >>>> On 13.02.25 08:57, Zhenhua Huang wrote: >>>>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set >>>>> to 27, making one section 128M. The related page struct which vmemmap >>>>> points to is 2M then. >>>>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >>>>> vmemmap to populate at the PMD section level which was suitable >>>>> initially since hot plug granule is always one section(128M). However, >>>>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the >>>>> existing arm64 assumptions. >>>>> >>>>> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when >>>>> pmd_sect() is true, the entire PMD section is cleared, even if there is >>>>> other effective subsection. For example page_struct_map1 and >>>>> page_strcut_map2 are part of a single PMD entry and they are hot-added >>>>> sequentially. Then page_struct_map1 is removed, vmemmap_free() will clear >>>>> the entire PMD entry freeing the struct page map for the whole section, >>>>> even though page_struct_map2 is still active. Similar problem exists >>>>> with linear mapping as well, for 16K base page(PMD size = 32M) or 64K >>>>> base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. >>>>> Tearing down the entire PMD mapping too will leave other subsections >>>>> unmapped in the linear mapping. >>>>> >>>>> To address the issue, we need to prevent PMD/PUD/CONT mappings for both >>>>> linear and vmemmap for non-boot sections if corresponding size on the >>>>> given base page exceeds SUBSECTION_SIZE(2MB now). >>>>> >>>>> Cc: <stable@vger.kernel.org> # v5.4+ >>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> >>>>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> >>>> >>>> Just so I understand correctly: for ordinary memory-sections-size hotplug >>>> (NVDIMM, virtio-mem), we still get a large mapping where possible? >>> >>> Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The >>> vmemmap mapping is also limited to PAGE_SIZE mappings (we could use >>> contiguous mappings for vmemmap but it's not wired up; I don't think >>> it's worth the hassle). >> >> But that's messed up, no? >> >> If someone hotplugs a memory section, they have to hotunplug a memory >> section, not parts of it. >> >> That's why x86 does in vmemmap_populate(): >> >> if (end - start < PAGES_PER_SECTION * sizeof(struct page)) >> err = vmemmap_populate_basepages(start, end, node, NULL); >> else if (boot_cpu_has(X86_FEATURE_PSE)) >> err = vmemmap_populate_hugepages(start, end, node, altmap); >> ... >> >> Maybe I'm missing something. Most importantly, why the weird subsection >> stuff is supposed to degrade ordinary hotplug of dimms/virtio-mem etc. > > I think that's based on the discussion for a previous version assuming > that the hotplug/unplug sizes are not guaranteed to be symmetric: > > https://lore.kernel.org/lkml/a720aaa5-a75e-481e-b396-a5f2b50ed362@quicinc.com/ > > If that's not the case, we can indeed ignore the SUBSECTION_SIZE> altogether and just rely on the start/end of the hotplugged region. All cases I know about hotunplug system RAM in the same granularity they hotplugged (virtio-mem, dax/kmem, dimm, dlpar), and if they wouldn't, they wouldn't operate on sub-section sizes either way. Regarding dax/pmem, I also recall that it happens always in the same granularity. If not, it should be fixed: this weird subsection hotplug should not make all other hotplug users suffer (e.g., no vmemmap PMD). What can likely happen (dax/pmem) is that we hotplug something that spans part of 128 MiB section (subsections), to then hotplug something that spans another part of a 128 MiB section (subsections). Hotunplugging either should not hotplug something part of the other device (e.g., rip out the vmemmap PMD). I think this was expressed with: "However, if start or end is not aligned to a section boundary, such as when a subsection is hot added, populating the entire section is wasteful." -- which is what we should focus on. I thought x86-64 would handle that case; it would surprise me if handling between both archs would have to differ in that regard: with 4k arm64 we have the same section/subsection sizes as on x86-64.
On 2025/2/14 2:20, David Hildenbrand wrote: > On 13.02.25 18:56, Catalin Marinas wrote: >> On Thu, Feb 13, 2025 at 05:16:37PM +0100, David Hildenbrand wrote: >>> On 13.02.25 16:49, Catalin Marinas wrote: >>>> On Thu, Feb 13, 2025 at 01:59:25PM +0100, David Hildenbrand wrote: >>>>> On 13.02.25 08:57, Zhenhua Huang wrote: >>>>>> On the arm64 platform with 4K base page config, SECTION_SIZE_BITS >>>>>> is set >>>>>> to 27, making one section 128M. The related page struct which vmemmap >>>>>> points to is 2M then. >>>>>> Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the >>>>>> vmemmap to populate at the PMD section level which was suitable >>>>>> initially since hot plug granule is always one section(128M). >>>>>> However, >>>>>> commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>>>> introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted >>>>>> the >>>>>> existing arm64 assumptions. >>>>>> >>>>>> Considering the vmemmap_free -> unmap_hotplug_pmd_range path, when >>>>>> pmd_sect() is true, the entire PMD section is cleared, even if >>>>>> there is >>>>>> other effective subsection. For example page_struct_map1 and >>>>>> page_strcut_map2 are part of a single PMD entry and they are hot- >>>>>> added >>>>>> sequentially. Then page_struct_map1 is removed, vmemmap_free() >>>>>> will clear >>>>>> the entire PMD entry freeing the struct page map for the whole >>>>>> section, >>>>>> even though page_struct_map2 is still active. Similar problem exists >>>>>> with linear mapping as well, for 16K base page(PMD size = 32M) or 64K >>>>>> base page(PMD = 512M), their block mappings exceed SUBSECTION_SIZE. >>>>>> Tearing down the entire PMD mapping too will leave other subsections >>>>>> unmapped in the linear mapping. >>>>>> >>>>>> To address the issue, we need to prevent PMD/PUD/CONT mappings for >>>>>> both >>>>>> linear and vmemmap for non-boot sections if corresponding size on the >>>>>> given base page exceeds SUBSECTION_SIZE(2MB now). >>>>>> >>>>>> Cc: <stable@vger.kernel.org> # v5.4+ >>>>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") >>>>>> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> >>>>>> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> >>>>> >>>>> Just so I understand correctly: for ordinary memory-sections-size >>>>> hotplug >>>>> (NVDIMM, virtio-mem), we still get a large mapping where possible? >>>> >>>> Up to 2MB blocks only since that's the SUBSECTION_SIZE value. The >>>> vmemmap mapping is also limited to PAGE_SIZE mappings (we could use >>>> contiguous mappings for vmemmap but it's not wired up; I don't think >>>> it's worth the hassle). >>> >>> But that's messed up, no? >>> >>> If someone hotplugs a memory section, they have to hotunplug a memory >>> section, not parts of it. >>> >>> That's why x86 does in vmemmap_populate(): >>> >>> if (end - start < PAGES_PER_SECTION * sizeof(struct page)) >>> err = vmemmap_populate_basepages(start, end, node, NULL); >>> else if (boot_cpu_has(X86_FEATURE_PSE)) >>> err = vmemmap_populate_hugepages(start, end, node, altmap); >>> ... >>> >>> Maybe I'm missing something. Most importantly, why the weird subsection >>> stuff is supposed to degrade ordinary hotplug of dimms/virtio-mem etc. >> >> I think that's based on the discussion for a previous version assuming >> that the hotplug/unplug sizes are not guaranteed to be symmetric: >> >> https://lore.kernel.org/lkml/a720aaa5-a75e-481e-b396- >> a5f2b50ed362@quicinc.com/ >> > > If that's not the case, we can indeed ignore the SUBSECTION_SIZE> > altogether and just rely on the start/end of the hotplugged region. > > All cases I know about hotunplug system RAM in the same granularity they > hotplugged (virtio-mem, dax/kmem, dimm, dlpar), and if they wouldn't, > they wouldn't operate on sub-section sizes either way. > > Regarding dax/pmem, I also recall that it happens always in the same > granularity. If not, it should be fixed: this weird subsection hotplug > should not make all other hotplug users suffer (e.g., no vmemmap PMD). > > What can likely happen (dax/pmem) is that we hotplug something that > spans part of 128 MiB section (subsections), to then hotplug something > that spans another part of a 128 MiB section (subsections). > Hotunplugging either should not hotplug something part of the other > device (e.g., rip out the vmemmap PMD). > > I think this was expressed with: > > "However, if start or end is not aligned to a section boundary, such as > when a subsection is hot added, populating the entire section is > wasteful." -- which is what we should focus on. > > I thought x86-64 would handle that case; it would surprise me if > handling between both archs would have to differ in that regard: with 4k > arm64 we have the same section/subsection sizes as on x86-64. > Thanks David and Catalin. From your discussion, I understand that hotplug/unplug sizes are guaranteed to be symmetric ? Therefore, it should be straightforward to populate to base pages if (end - start < PAGES_PER_SECTION * sizeof(struct page)) ? I will write patch and verify. Please correct me if my understanding is incorrect.
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index b4df5bc5b1b8..b1089365f3e7 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -42,9 +42,13 @@ #include <asm/pgalloc.h> #include <asm/kfence.h> -#define NO_BLOCK_MAPPINGS BIT(0) -#define NO_CONT_MAPPINGS BIT(1) -#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ +#define NO_PMD_BLOCK_MAPPINGS BIT(0) +#define NO_PUD_BLOCK_MAPPINGS BIT(1) /* Hotplug case: do not want block mapping for PUD */ +#define NO_BLOCK_MAPPINGS (NO_PMD_BLOCK_MAPPINGS | NO_PUD_BLOCK_MAPPINGS) +#define NO_PTE_CONT_MAPPINGS BIT(2) +#define NO_PMD_CONT_MAPPINGS BIT(3) /* Hotplug case: do not want cont mapping for PMD */ +#define NO_CONT_MAPPINGS (NO_PTE_CONT_MAPPINGS | NO_PMD_CONT_MAPPINGS) +#define NO_EXEC_MAPPINGS BIT(4) /* assumes FEAT_HPDS is not used */ u64 kimage_voffset __ro_after_init; EXPORT_SYMBOL(kimage_voffset); @@ -224,7 +228,7 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, /* use a contiguous mapping if the range is suitably aligned */ if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) && - (flags & NO_CONT_MAPPINGS) == 0) + (flags & NO_PTE_CONT_MAPPINGS) == 0) __prot = __pgprot(pgprot_val(prot) | PTE_CONT); init_pte(ptep, addr, next, phys, __prot); @@ -254,7 +258,7 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end, /* try section mapping first */ if (((addr | next | phys) & ~PMD_MASK) == 0 && - (flags & NO_BLOCK_MAPPINGS) == 0) { + (flags & NO_PMD_BLOCK_MAPPINGS) == 0) { pmd_set_huge(pmdp, phys, prot); /* @@ -311,7 +315,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, /* use a contiguous mapping if the range is suitably aligned */ if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) && - (flags & NO_CONT_MAPPINGS) == 0) + (flags & NO_PMD_CONT_MAPPINGS) == 0) __prot = __pgprot(pgprot_val(prot) | PTE_CONT); init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, flags); @@ -358,8 +362,8 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end, * For 4K granule only, attempt to put down a 1GB block */ if (pud_sect_supported() && - ((addr | next | phys) & ~PUD_MASK) == 0 && - (flags & NO_BLOCK_MAPPINGS) == 0) { + ((addr | next | phys) & ~PUD_MASK) == 0 && + (flags & NO_PUD_BLOCK_MAPPINGS) == 0) { pud_set_huge(pudp, phys, prot); /* @@ -1178,7 +1182,13 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, { WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END)); - if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) + /* + * Hotplugged section does not support hugepages as + * PMD_SIZE (hence PUD_SIZE) section mapping covers + * struct page range that exceeds a SUBSECTION_SIZE + * i.e 2MB - for all available base page sizes. + */ + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) || system_state != SYSTEM_BOOTING) return vmemmap_populate_basepages(start, end, node, altmap); else return vmemmap_populate_hugepages(start, end, node, altmap); @@ -1340,9 +1350,27 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params) { int ret, flags = NO_EXEC_MAPPINGS; + unsigned long start_pfn = PFN_DOWN(start); + struct mem_section *ms = __pfn_to_section(start_pfn); VM_BUG_ON(!mhp_range_allowed(start, size, true)); + /* should not be invoked by early section */ + WARN_ON(early_section(ms)); + + /* + * Disallow BLOCK/CONT mappings if the corresponding size exceeds + * SUBSECTION_SIZE which now is 2MB. + * + * PUD_BLOCK or PMD_CONT should consistently exceed SUBSECTION_SIZE + * across all variable page size configurations, so add them directly + */ + flags |= NO_PUD_BLOCK_MAPPINGS | NO_PMD_CONT_MAPPINGS; + if (SUBSECTION_SHIFT < PMD_SHIFT) + flags |= NO_PMD_BLOCK_MAPPINGS; + if (SUBSECTION_SHIFT < CONT_PTE_SHIFT) + flags |= NO_PTE_CONT_MAPPINGS; + if (can_set_direct_map()) flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;