diff mbox series

mm/hugeltb: fix nodes huge page allocation when there are surplus pages

Message ID 20230826035748.891697-1-xueshi.hu@smartx.com (mailing list archive)
State New
Headers show
Series mm/hugeltb: fix nodes huge page allocation when there are surplus pages | expand

Commit Message

Xueshi Hu Aug. 26, 2023, 3:57 a.m. UTC
In set_nr_huge_pages(), local variable "count" is used to record
persistent_huge_pages(), but when it cames to nodes huge page allocation,
the semantics changes to nr_huge_pages. When there exists surplus huge
pages and using the interface under
/sys/devices/system/node/node*/hugepages to change huge page pool size,
this difference can result in the allocation of an unexpected number of
huge pages.

Steps to reproduce the bug:

Starting with:

				  Node 0          Node 1    Total
	HugePages_Total             0.00            0.00     0.00
	HugePages_Free              0.00            0.00     0.00
	HugePages_Surp              0.00            0.00     0.00

create 100 huge pages in Node 0 and consume it, then set Node 0 's
nr_hugepages to 0.

yields:

				  Node 0          Node 1    Total
	HugePages_Total           200.00            0.00   200.00
	HugePages_Free              0.00            0.00     0.00
	HugePages_Surp            200.00            0.00   200.00

write 100 to Node 1's nr_hugepages

		echo 100 > /sys/devices/system/node/node1/\
	hugepages/hugepages-2048kB/nr_hugepages

gets:

				  Node 0          Node 1    Total
	HugePages_Total           200.00          400.00   600.00
	HugePages_Free              0.00          400.00   400.00
	HugePages_Surp            200.00            0.00   200.00

Kernel is expected to create only 100 huge pages and it gives 200.

Fixes: fd875dca7c71 ("hugetlbfs: fix potential over/underflow setting node specific nr_hugepages")
Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
---
 mm/hugetlb.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Mike Kravetz Aug. 28, 2023, 11:34 p.m. UTC | #1
On 08/26/23 11:57, Xueshi Hu wrote:
> In set_nr_huge_pages(), local variable "count" is used to record
> persistent_huge_pages(), but when it cames to nodes huge page allocation,
> the semantics changes to nr_huge_pages. When there exists surplus huge
> pages and using the interface under
> /sys/devices/system/node/node*/hugepages to change huge page pool size,
> this difference can result in the allocation of an unexpected number of
> huge pages.
> 
> Steps to reproduce the bug:
> 
> Starting with:
> 
> 				  Node 0          Node 1    Total
> 	HugePages_Total             0.00            0.00     0.00
> 	HugePages_Free              0.00            0.00     0.00
> 	HugePages_Surp              0.00            0.00     0.00
> 
> create 100 huge pages in Node 0 and consume it, then set Node 0 's
> nr_hugepages to 0.
> 
> yields:
> 
> 				  Node 0          Node 1    Total
> 	HugePages_Total           200.00            0.00   200.00
> 	HugePages_Free              0.00            0.00     0.00
> 	HugePages_Surp            200.00            0.00   200.00
> 
> write 100 to Node 1's nr_hugepages
> 
> 		echo 100 > /sys/devices/system/node/node1/\
> 	hugepages/hugepages-2048kB/nr_hugepages
> 
> gets:
> 
> 				  Node 0          Node 1    Total
> 	HugePages_Total           200.00          400.00   600.00
> 	HugePages_Free              0.00          400.00   400.00
> 	HugePages_Surp            200.00            0.00   200.00
> 
> Kernel is expected to create only 100 huge pages and it gives 200.
> 
> Fixes: fd875dca7c71 ("hugetlbfs: fix potential over/underflow setting node specific nr_hugepages")
> Signed-off-by: Xueshi Hu <xueshi.hu@smartx.com>
> ---
>  mm/hugetlb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Thank you!

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

A note about the Fixes: tag.  Commit fd875dca7c71 moved the root cause of this
issue (the line changed below) from the routine __nr_hugepages_store_common
to the routine set_max_huge_pages.  I believe the actual issue has existed
since  commit 9a30523066cde that added hugetlb node specific support way back
in 2009 (2.6.32 timeframe).

If 9a30523066cde is used in the fixes tag, there will be some non-automatic
backports for releases where fd875dca7c71 does not exist.  I would suggest
changing the fixes tag.  We can ignore the broken backports for older releases,
as I really doubt this is a real issue for many/any people.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6da626bfb52e..54e2e2e12aa9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3494,7 +3494,9 @@  static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	if (nid != NUMA_NO_NODE) {
 		unsigned long old_count = count;
 
-		count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
+		count += persistent_huge_pages(h) -
+			 (h->nr_huge_pages_node[nid] -
+			  h->surplus_huge_pages_node[nid]);
 		/*
 		 * User may have specified a large count value which caused the
 		 * above calculation to overflow.  In this case, they wanted