Message ID | 20250206185109.1210657-6-fvdl@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb/CMA improvements for large systems | expand |
On Thu, Feb 06, 2025 at 06:50:45PM +0000, Frank van der Linden wrote: > Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") > changed the NUMA_NO_NODE round-robin allocation behavior in case of a > failure to allocate from one NUMA node. The code originally moved on to > the next node to try again, but now it immediately breaks out of the loop. > > Restore the original behavior. Did you stumble upon this? AFAICS, memblock_alloc_range_nid() will call memblock_find_in_range_node() with NUMA_NO_NODE for exact_nide = false, which would be our case. Meaning that if memblock_alloc_try_nid_raw() returns false here, it is because we could not allocate the page in any node, right?
On Mon, Feb 10, 2025 at 4:57 AM Oscar Salvador <osalvador@suse.de> wrote: > > On Thu, Feb 06, 2025 at 06:50:45PM +0000, Frank van der Linden wrote: > > Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") > > changed the NUMA_NO_NODE round-robin allocation behavior in case of a > > failure to allocate from one NUMA node. The code originally moved on to > > the next node to try again, but now it immediately breaks out of the loop. > > > > Restore the original behavior. > > Did you stumble upon this? > > AFAICS, memblock_alloc_range_nid() will call memblock_find_in_range_node() with > NUMA_NO_NODE for exact_nide = false, which would be our case. > Meaning that if memblock_alloc_try_nid_raw() returns false here, it is because > we could not allocate the page in any node, right? Hmm.. this patch is from earlier in the development when I stumbled on a situation that looked like I describe it. However, you're right, there is really no point to this patch. I mean, it's harmless in the sense that it doesn't cause bugs, but it just wastes time. The problem I saw might just have been caused by my own changes at the time, which have been re-arranged since. I was probably also confused by the for_each, which isn't really a for loop the way that it's used in this function. Looking at it further, it's actually the node-specific case that is wrong. That case also uses memblock_alloc_try_nid_raw(), but it should be using memblock_alloc_exact_nid_raw(). Right now, if you do e.g. hugepages=0:X,1:Y, and there aren't X pages available on node 0, it will still fall back to node 1, which is unexpected/unwanted behavior. So, I'll change this patch to fix the node-specific case instead. Looking at it, I also need to avoid the same fallback for the CMA case, which will affect those patches a bit. So, there'll be a v4 coming up. Fortunately, it won't affect the patches with your Reviewed-by, so thanks again for those. - Frank
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 828ae0080ab5..1d8ec21dc2c2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3156,16 +3156,13 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) m = memblock_alloc_try_nid_raw( huge_page_size(h), huge_page_size(h), 0, MEMBLOCK_ALLOC_ACCESSIBLE, node); - /* - * Use the beginning of the huge page to store the - * huge_bootmem_page struct (until gather_bootmem - * puts them into the mem_map). - */ - if (!m) - return 0; - goto found; + if (m) + break; } + if (!m) + return 0; + found: /* @@ -3177,7 +3174,14 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) */ memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE), huge_page_size(h) - PAGE_SIZE); - /* Put them into a private list first because mem_map is not up yet */ + /* + * Use the beginning of the huge page to store the + * huge_bootmem_page struct (until gather_bootmem + * puts them into the mem_map). + * + * Put them into a private list first because mem_map + * is not up yet. + */ INIT_LIST_HEAD(&m->list); list_add(&m->list, &huge_boot_pages[node]); m->hstate = h;
Commit b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") changed the NUMA_NO_NODE round-robin allocation behavior in case of a failure to allocate from one NUMA node. The code originally moved on to the next node to try again, but now it immediately breaks out of the loop. Restore the original behavior. Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation") Cc: Zhenguo Yao <yaozhenguo1@gmail.com> Signed-off-by: Frank van der Linden <fvdl@google.com> --- mm/hugetlb.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)