[v3,05/28] mm/hugetlb: fix round-robin bootmem allocation

Message ID	20250206185109.1210657-6-fvdl@google.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Thu, 6 Feb 2025 18:50:45 +0000 In-Reply-To: <20250206185109.1210657-1-fvdl@google.com> Mime-Version: 1.0 References: <20250206185109.1210657-1-fvdl@google.com> Message-ID: <20250206185109.1210657-6-fvdl@google.com> Subject: [PATCH v3 05/28] mm/hugetlb: fix round-robin bootmem allocation From: Frank van der Linden <fvdl@google.com> To: akpm@linux-foundation.org, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: yuzhao@google.com, usamaarif642@gmail.com, joao.m.martins@oracle.com, roman.gushchin@linux.dev, Frank van der Linden <fvdl@google.com>, Zhenguo Yao <yaozhenguo1@gmail.com> Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	hugetlb/CMA improvements for large systems \| expand [v3,00/28] hugetlb/CMA improvements for large systems [v3,01/28] mm/cma: export total and free number of pages for CMA areas [v3,02/28] mm, cma: support multiple contiguous ranges, if requested [v3,03/28] mm/cma: introduce cma_intersects function [v3,04/28] mm, hugetlb: use cma_declare_contiguous_multi [v3,05/28] mm/hugetlb: fix round-robin bootmem allocation [v3,06/28] mm/hugetlb: remove redundant __ClearPageReserved [v3,07/28] mm/hugetlb: use online nodes for bootmem allocation [v3,08/28] mm/hugetlb: convert cmdline parameters from setup to early [v3,09/28] x86/mm: make register_page_bootmem_memmap handle PTE mappings [v3,10/28] mm/bootmem_info: export register_page_bootmem_memmap [v3,11/28] mm/sparse: allow for alternate vmemmap section init at boot [v3,12/28] mm/hugetlb: set migratetype for bootmem folios [v3,13/28] mm: define __init_reserved_page_zone function [v3,14/28] mm/hugetlb: check bootmem pages for zone intersections [v3,15/28] mm/sparse: add vmemmap_*_hvo functions [v3,16/28] mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc [v3,17/28] mm/hugetlb: move huge_boot_pages list init to hugetlb_bootmem_alloc [v3,18/28] mm/hugetlb: add pre-HVO framework [v3,19/28] mm/hugetlb_vmemmap: fix hugetlb_vmemmap_restore_folios definition [v3,20/28] mm/hugetlb: do pre-HVO for bootmem allocated pages [v3,21/28] x86/setup: call hugetlb_bootmem_alloc early [v3,22/28] x86/mm: set ARCH_WANT_SPARSEMEM_VMEMMAP_PREINIT [v3,23/28] mm/cma: simplify zone intersection check [v3,24/28] mm/cma: introduce a cma validate function [v3,25/28] mm/cma: introduce interface for early reservations [v3,26/28] mm/hugetlb: add hugetlb_cma_only cmdline option [v3,27/28] mm/hugetlb: enable bootmem allocation from CMA areas [v3,28/28] mm/hugetlb: move hugetlb CMA code in to its own file

Message ID

20250206185109.1210657-6-fvdl@google.com (mailing list archive)

State

New

Headers

Date: Thu,  6 Feb 2025 18:50:45 +0000
In-Reply-To: <20250206185109.1210657-1-fvdl@google.com>
Mime-Version: 1.0
References: <20250206185109.1210657-1-fvdl@google.com>
Message-ID: <20250206185109.1210657-6-fvdl@google.com>
Subject: [PATCH v3 05/28] mm/hugetlb: fix round-robin bootmem allocation
From: Frank van der Linden <fvdl@google.com>
To: akpm@linux-foundation.org, muchun.song@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Cc: yuzhao@google.com, usamaarif642@gmail.com, joao.m.martins@oracle.com,
	roman.gushchin@linux.dev, Frank van der Linden <fvdl@google.com>,
	Zhenguo Yao <yaozhenguo1@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

hugetlb/CMA improvements for large systems | expand

Commit Message

Frank van der Linden Feb. 6, 2025, 6:50 p.m. UTC

Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
changed the NUMA_NO_NODE round-robin allocation behavior in case of a
failure to allocate from one NUMA node. The code originally moved on to
the next node to try again, but now it immediately breaks out of the loop.

Restore the original behavior.

Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Signed-off-by: Frank van der Linden <fvdl@google.com>
---
 mm/hugetlb.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Oscar Salvador Feb. 10, 2025, 12:57 p.m. UTC | #1

On Thu, Feb 06, 2025 at 06:50:45PM +0000, Frank van der Linden wrote:
> Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> changed the NUMA_NO_NODE round-robin allocation behavior in case of a
> failure to allocate from one NUMA node. The code originally moved on to
> the next node to try again, but now it immediately breaks out of the loop.
> 
> Restore the original behavior.

Did you stumble upon this?

AFAICS, memblock_alloc_range_nid() will call memblock_find_in_range_node() with
NUMA_NO_NODE for exact_nide = false, which would be our case.
Meaning that if memblock_alloc_try_nid_raw() returns false here, it is because
we could not allocate the page in any node, right?

Frank van der Linden Feb. 10, 2025, 6:30 p.m. UTC | #2

On Mon, Feb 10, 2025 at 4:57 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, Feb 06, 2025 at 06:50:45PM +0000, Frank van der Linden wrote:
> > Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> > changed the NUMA_NO_NODE round-robin allocation behavior in case of a
> > failure to allocate from one NUMA node. The code originally moved on to
> > the next node to try again, but now it immediately breaks out of the loop.
> >
> > Restore the original behavior.
>
> Did you stumble upon this?
>
> AFAICS, memblock_alloc_range_nid() will call memblock_find_in_range_node() with
> NUMA_NO_NODE for exact_nide = false, which would be our case.
> Meaning that if memblock_alloc_try_nid_raw() returns false here, it is because
> we could not allocate the page in any node, right?

Hmm.. this patch is from earlier in the development when I stumbled on
a situation that looked like I describe it. However, you're right,
there is really no point to this patch. I mean, it's harmless in the
sense that it doesn't cause bugs, but it just wastes time. The problem
I saw might just have been caused by my own changes at the time, which
have been re-arranged since. I was probably also confused by the
for_each, which isn't really a for loop the way that it's used in this
function.

Looking at it further, it's actually the node-specific case that is
wrong. That case also uses memblock_alloc_try_nid_raw(), but it should
be using memblock_alloc_exact_nid_raw(). Right now, if you do e.g.
hugepages=0:X,1:Y, and there aren't X pages available on node 0, it
will still fall back to node 1, which is unexpected/unwanted behavior.

So, I'll change this patch to fix the node-specific case instead.
Looking at it, I also need to avoid the same fallback for the CMA
case, which will affect those patches a bit.

So, there'll be a v4 coming up. Fortunately, it won't affect the
patches with your Reviewed-by, so thanks again for those.

- Frank

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 828ae0080ab5..1d8ec21dc2c2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3156,16 +3156,13 @@  int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 		m = memblock_alloc_try_nid_raw(
 				huge_page_size(h), huge_page_size(h),
 				0, MEMBLOCK_ALLOC_ACCESSIBLE, node);
-		/*
-		 * Use the beginning of the huge page to store the
-		 * huge_bootmem_page struct (until gather_bootmem
-		 * puts them into the mem_map).
-		 */
-		if (!m)
-			return 0;
-		goto found;
+		if (m)
+			break;
 	}
 
+	if (!m)
+		return 0;
+
 found:
 
 	/*
@@ -3177,7 +3174,14 @@  int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	 */
 	memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE),
 		huge_page_size(h) - PAGE_SIZE);
-	/* Put them into a private list first because mem_map is not up yet */
+	/*
+	 * Use the beginning of the huge page to store the
+	 * huge_bootmem_page struct (until gather_bootmem
+	 * puts them into the mem_map).
+	 *
+	 * Put them into a private list first because mem_map
+	 * is not up yet.
+	 */
 	INIT_LIST_HEAD(&m->list);
 	list_add(&m->list, &huge_boot_pages[node]);
 	m->hstate = h;

[v3,05/28] mm/hugetlb: fix round-robin bootmem allocation

Commit Message

Comments

Patch