diff mbox series

mm, hugetlb: Avoid passing a null nodemask when there is mbind policy

Message ID 20250415121503.376811-1-osalvador@suse.de (mailing list archive)
State New
Headers show
Series mm, hugetlb: Avoid passing a null nodemask when there is mbind policy | expand

Commit Message

Oscar Salvador April 15, 2025, 12:15 p.m. UTC
Before trying to allocate a page, gather_surplus_pages() sets up a nodemask
for the nodes we can allocate from, but instead of passing the nodemask
down the road to the page allocator, it iterates over the nodes within that
nodemask right there, meaning that the page allocator will receive a preferred_nid
and a null nodemask.

This is a problem when using a memory policy, because it might be that
the page allocator ends up using a node as a fallback which is not
represented in the policy.

Avoid that by passing the nodemask directly to the page allocator, so it can
filter out fallback nodes that are not part of the nodemask.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hugetlb.c | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

Comments

Vlastimil Babka April 16, 2025, 7:08 a.m. UTC | #1
On 4/15/25 14:15, Oscar Salvador wrote:
> Before trying to allocate a page, gather_surplus_pages() sets up a nodemask
> for the nodes we can allocate from, but instead of passing the nodemask
> down the road to the page allocator, it iterates over the nodes within that
> nodemask right there, meaning that the page allocator will receive a preferred_nid
> and a null nodemask.
> 
> This is a problem when using a memory policy, because it might be that
> the page allocator ends up using a node as a fallback which is not
> represented in the policy.
> 
> Avoid that by passing the nodemask directly to the page allocator, so it can
> filter out fallback nodes that are not part of the nodemask.

It will also try the fallbacks using numa distance and not incrementing nid.

> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/hugetlb.c | 22 ++++++----------------
>  1 file changed, 6 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ccc4f08f8481..5e1cba0f835f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2419,7 +2419,6 @@ static int gather_surplus_pages(struct hstate *h, long delta)
>  	long i;
>  	long needed, allocated;
>  	bool alloc_ok = true;
> -	int node;
>  	nodemask_t *mbind_nodemask, alloc_nodemask;
>  
>  	mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
> @@ -2443,21 +2442,12 @@ static int gather_surplus_pages(struct hstate *h, long delta)
>  	for (i = 0; i < needed; i++) {
>  		folio = NULL;
>  
> -		/* Prioritize current node */
> -		if (node_isset(numa_mem_id(), alloc_nodemask))
> -			folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
> -					numa_mem_id(), NULL);
> -
> -		if (!folio) {
> -			for_each_node_mask(node, alloc_nodemask) {
> -				if (node == numa_mem_id())
> -					continue;
> -				folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
> -						node, NULL);
> -				if (folio)
> -					break;
> -			}
> -		}
> +		/*
> +		 * It is okay to use NUMA_NO_NODE because we use numa_mem_id()
> +		 * down the road to pick the current node if that is the case.
> +		 */
> +		folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
> +						    NUMA_NO_NODE, &alloc_nodemask);
>  		if (!folio) {
>  			alloc_ok = false;
>  			break;
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ccc4f08f8481..5e1cba0f835f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2419,7 +2419,6 @@  static int gather_surplus_pages(struct hstate *h, long delta)
 	long i;
 	long needed, allocated;
 	bool alloc_ok = true;
-	int node;
 	nodemask_t *mbind_nodemask, alloc_nodemask;
 
 	mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
@@ -2443,21 +2442,12 @@  static int gather_surplus_pages(struct hstate *h, long delta)
 	for (i = 0; i < needed; i++) {
 		folio = NULL;
 
-		/* Prioritize current node */
-		if (node_isset(numa_mem_id(), alloc_nodemask))
-			folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
-					numa_mem_id(), NULL);
-
-		if (!folio) {
-			for_each_node_mask(node, alloc_nodemask) {
-				if (node == numa_mem_id())
-					continue;
-				folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
-						node, NULL);
-				if (folio)
-					break;
-			}
-		}
+		/*
+		 * It is okay to use NUMA_NO_NODE because we use numa_mem_id()
+		 * down the road to pick the current node if that is the case.
+		 */
+		folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
+						    NUMA_NO_NODE, &alloc_nodemask);
 		if (!folio) {
 			alloc_ok = false;
 			break;