diff mbox series

[v3,05/28] mm/hugetlb: fix round-robin bootmem allocation

Message ID 20250206185109.1210657-6-fvdl@google.com (mailing list archive)
State New
Headers show
Series hugetlb/CMA improvements for large systems | expand

Commit Message

Frank van der Linden Feb. 6, 2025, 6:50 p.m. UTC
Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
changed the NUMA_NO_NODE round-robin allocation behavior in case of a
failure to allocate from one NUMA node. The code originally moved on to
the next node to try again, but now it immediately breaks out of the loop.

Restore the original behavior.

Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Signed-off-by: Frank van der Linden <fvdl@google.com>
---
 mm/hugetlb.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Comments

Oscar Salvador Feb. 10, 2025, 12:57 p.m. UTC | #1
On Thu, Feb 06, 2025 at 06:50:45PM +0000, Frank van der Linden wrote:
> Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> changed the NUMA_NO_NODE round-robin allocation behavior in case of a
> failure to allocate from one NUMA node. The code originally moved on to
> the next node to try again, but now it immediately breaks out of the loop.
> 
> Restore the original behavior.

Did you stumble upon this?

AFAICS, memblock_alloc_range_nid() will call memblock_find_in_range_node() with
NUMA_NO_NODE for exact_nide = false, which would be our case.
Meaning that if memblock_alloc_try_nid_raw() returns false here, it is because
we could not allocate the page in any node, right?
Frank van der Linden Feb. 10, 2025, 6:30 p.m. UTC | #2
On Mon, Feb 10, 2025 at 4:57 AM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, Feb 06, 2025 at 06:50:45PM +0000, Frank van der Linden wrote:
> > Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> > changed the NUMA_NO_NODE round-robin allocation behavior in case of a
> > failure to allocate from one NUMA node. The code originally moved on to
> > the next node to try again, but now it immediately breaks out of the loop.
> >
> > Restore the original behavior.
>
> Did you stumble upon this?
>
> AFAICS, memblock_alloc_range_nid() will call memblock_find_in_range_node() with
> NUMA_NO_NODE for exact_nide = false, which would be our case.
> Meaning that if memblock_alloc_try_nid_raw() returns false here, it is because
> we could not allocate the page in any node, right?

Hmm.. this patch is from earlier in the development when I stumbled on
a situation that looked like I describe it. However, you're right,
there is really no point to this patch. I mean, it's harmless in the
sense that it doesn't cause bugs, but it just wastes time. The problem
I saw might just have been caused by my own changes at the time, which
have been re-arranged since. I was probably also confused by the
for_each, which isn't really a for loop the way that it's used in this
function.

Looking at it further, it's actually the node-specific case that is
wrong. That case also uses memblock_alloc_try_nid_raw(), but it should
be using memblock_alloc_exact_nid_raw(). Right now, if you do e.g.
hugepages=0:X,1:Y, and there aren't X pages available on node 0, it
will still fall back to node 1, which is unexpected/unwanted behavior.

So, I'll change this patch to fix the node-specific case instead.
Looking at it, I also need to avoid the same fallback for the CMA
case, which will affect those patches a bit.

So, there'll be a v4 coming up. Fortunately, it won't affect the
patches with your Reviewed-by, so thanks again for those.

- Frank
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 828ae0080ab5..1d8ec21dc2c2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3156,16 +3156,13 @@  int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 		m = memblock_alloc_try_nid_raw(
 				huge_page_size(h), huge_page_size(h),
 				0, MEMBLOCK_ALLOC_ACCESSIBLE, node);
-		/*
-		 * Use the beginning of the huge page to store the
-		 * huge_bootmem_page struct (until gather_bootmem
-		 * puts them into the mem_map).
-		 */
-		if (!m)
-			return 0;
-		goto found;
+		if (m)
+			break;
 	}
 
+	if (!m)
+		return 0;
+
 found:
 
 	/*
@@ -3177,7 +3174,14 @@  int __alloc_bootmem_huge_page(struct hstate *h, int nid)
 	 */
 	memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE),
 		huge_page_size(h) - PAGE_SIZE);
-	/* Put them into a private list first because mem_map is not up yet */
+	/*
+	 * Use the beginning of the huge page to store the
+	 * huge_bootmem_page struct (until gather_bootmem
+	 * puts them into the mem_map).
+	 *
+	 * Put them into a private list first because mem_map
+	 * is not up yet.
+	 */
 	INIT_LIST_HEAD(&m->list);
 	list_add(&m->list, &huge_boot_pages[node]);
 	m->hstate = h;