diff mbox series

hugetlb: Fix hugepages_setup when deal with pernode

Message ID 20220324074037.2024608-1-liupeng256@huawei.com (mailing list archive)
State New
Headers show
Series hugetlb: Fix hugepages_setup when deal with pernode | expand

Commit Message

Peng Liu March 24, 2022, 7:40 a.m. UTC
Hugepages can be specified to pernode since "hugetlbfs: extend
the definition of hugepages parameter to support node allocation",
but the following two problems are observed.

1) Confusing behavior is observed when both 1G and 2M hugepage
is set after "numa=off".
 cmdline hugepage settings:
  hugepagesz=1G hugepages=0:3,1:3
  hugepagesz=2M hugepages=0:1024,1:1024
 results:
  HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages

2) Using invalid option values causes the entire kernel boot option
string to be reported as Unknown.
 Unknown kernel command line parameters "hugepages=0:1024,1:1024"

To fix "1)", hugetlb_hstate_alloc_pages should be called even when
hugepages_setup going to invalid. To fix "2)", return 1 so that
init's environment is not polluted with kernel boot options.

Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
---
 mm/hugetlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Mike Kravetz March 24, 2022, 9:57 p.m. UTC | #1
On 3/24/22 00:40, Peng Liu wrote:
> Hugepages can be specified to pernode since "hugetlbfs: extend
> the definition of hugepages parameter to support node allocation",
> but the following two problems are observed.
> 
> 1) Confusing behavior is observed when both 1G and 2M hugepage
> is set after "numa=off".
>  cmdline hugepage settings:
>   hugepagesz=1G hugepages=0:3,1:3
>   hugepagesz=2M hugepages=0:1024,1:1024
>  results:
>   HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>   HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
> 
> 2) Using invalid option values causes the entire kernel boot option
> string to be reported as Unknown.
>  Unknown kernel command line parameters "hugepages=0:1024,1:1024"

Thank you for debugging and sending the patch!

My first thought was "If someone is specifying 'numa=off' as well as
numa node specific allocations on the same command line, we should just
fail the allocation request".  However, this same situation could exist
without the 'numa=off' option as long as an invalid node is included in
the list.

With your patch, the node specific allocations are parsed (and processed)
until there is an error.  So, in the example above 3 1G pages and 1024 2M
pages are allocated on node 0.  That seems correct.

Now suppose the node specific allocations are specified as:
hugepagesz=1G hugepages=1:3,0:3
hugepagesz=2M hugepages=1:1024,0:1024

Since node 1 is invalid, we experience an error here and do not allocate
any pages on node 0.

I am wondering if we should just error and ignore the entire string if
ANY of the specified nodes are invalid?  Thoughts?
Peng Liu March 25, 2022, 3:15 a.m. UTC | #2
On 2022/3/25 5:57, Mike Kravetz wrote:
> On 3/24/22 00:40, Peng Liu wrote:
>> Hugepages can be specified to pernode since "hugetlbfs: extend
>> the definition of hugepages parameter to support node allocation",
>> but the following two problems are observed.
>>
>> 1) Confusing behavior is observed when both 1G and 2M hugepage
>> is set after "numa=off".
>>   cmdline hugepage settings:
>>    hugepagesz=1G hugepages=0:3,1:3
>>    hugepagesz=2M hugepages=0:1024,1:1024
>>   results:
>>    HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>    HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>
>> 2) Using invalid option values causes the entire kernel boot option
>> string to be reported as Unknown.
>>   Unknown kernel command line parameters "hugepages=0:1024,1:1024"
> Thank you for debugging and sending the patch!
>
> My first thought was "If someone is specifying 'numa=off' as well as
> numa node specific allocations on the same command line, we should just
> fail the allocation request".  However, this same situation could exist
> without the 'numa=off' option as long as an invalid node is included in
> the list.
We will "specifying 'numa=off' as well as numa node specific allocations"
for some debugging and test cases. If the original command line can be
partly effective, this will be convenient. Yet, we also test "an invalid
node is included in the list", the behavior is the same with "numa=off".

> With your patch, the node specific allocations are parsed (and processed)
> until there is an error.  So, in the example above 3 1G pages and 1024 2M
> pages are allocated on node 0.  That seems correct.
>
> Now suppose the node specific allocations are specified as:
> hugepagesz=1G hugepages=1:3,0:3
> hugepagesz=2M hugepages=1:1024,0:1024
For this case, with/without this patch, huge page will be not allocated
on any node.
> Since node 1 is invalid, we experience an error here and do not allocate
> any pages on node 0.
>
> I am wondering if we should just error and ignore the entire string if
> ANY of the specified nodes are invalid?  Thoughts?

Thank you for your response.

This patch only to be consistent between 2M/1G behavior, and repair 
"return 0"
as 1d02b444b8d1 ("tracing: Fix return value of __setup handlers").
With this patch, a node could allocate huge pages until there is an 
error, and it
will print the invalid parameter from the first parse error. So, I think 
this
is acceptable.

The following tests is added with this patch.

*Case1*:
settings:
hugepagesz=1G hugepages=0:3,1:3,2:3
hugepagesz=2M hugepages=0:1024,1:1024,3:1024

results:
HugeTLB: Invalid hugepages parameter 1:3,2:3
HugeTLB: Invalid hugepages parameter 1:1024,3:1024
HugeTLB registered 1.00 GiB page size, pre-allocated 3 pages
HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages

*Case2*:

settings:
hugepagesz=1G hugepages=1:3,0:3
hugepagesz=2M hugepages=1:1024,0:1024

results:
HugeTLB: Invalid hugepages parameter 1:3,0:3
HugeTLB: Invalid hugepages parameter 1:1024,0:1024
HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages

Peng Liu.
Mike Kravetz March 29, 2022, 2:46 a.m. UTC | #3
On 3/24/22 20:15, liupeng (DM) wrote:
> 
> On 2022/3/25 5:57, Mike Kravetz wrote:
>> On 3/24/22 00:40, Peng Liu wrote:
>>> Hugepages can be specified to pernode since "hugetlbfs: extend
>>> the definition of hugepages parameter to support node allocation",
>>> but the following two problems are observed.
>>>
>>> 1) Confusing behavior is observed when both 1G and 2M hugepage
>>> is set after "numa=off".
>>>   cmdline hugepage settings:
>>>    hugepagesz=1G hugepages=0:3,1:3
>>>    hugepagesz=2M hugepages=0:1024,1:1024
>>>   results:
>>>    HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>>    HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>>
>>> 2) Using invalid option values causes the entire kernel boot option
>>> string to be reported as Unknown.
>>>   Unknown kernel command line parameters "hugepages=0:1024,1:1024"
>> Thank you for debugging and sending the patch!
>>
>> My first thought was "If someone is specifying 'numa=off' as well as
>> numa node specific allocations on the same command line, we should just
>> fail the allocation request".  However, this same situation could exist
>> without the 'numa=off' option as long as an invalid node is included in
>> the list.
> We will "specifying 'numa=off' as well as numa node specific allocations"
> for some debugging and test cases. If the original command line can be
> partly effective, this will be convenient. Yet, we also test "an invalid
> node is included in the list", the behavior is the same with "numa=off".
> 
>> With your patch, the node specific allocations are parsed (and processed)
>> until there is an error.  So, in the example above 3 1G pages and 1024 2M
>> pages are allocated on node 0.  That seems correct.
>>
>> Now suppose the node specific allocations are specified as:
>> hugepagesz=1G hugepages=1:3,0:3
>> hugepagesz=2M hugepages=1:1024,0:1024
> For this case, with/without this patch, huge page will be not allocated
> on any node.
>> Since node 1 is invalid, we experience an error here and do not allocate
>> any pages on node 0.
>>
>> I am wondering if we should just error and ignore the entire string if
>> ANY of the specified nodes are invalid?  Thoughts?
> 
> Thank you for your response.
> 
> This patch only to be consistent between 2M/1G behavior, and repair "return 0"
> as 1d02b444b8d1 ("tracing: Fix return value of __setup handlers").
> With this patch, a node could allocate huge pages until there is an error, and it
> will print the invalid parameter from the first parse error. So, I think this
> is acceptable.

Yes, I agree that the change is needed and the current behavior is
unacceptable.

One remaining question is the change from returning '0' to '1' in the case
of error.  I do understand this is to prevent the invalid parameter string
from being passed to init.  It may not be correct/right, but in every other
case where an invalid parameter in encountered in hugetlb command line
processing we return "0".  Should we perhaps change all these other places
to be consistent?  I honestly do not know what is the appropriate behavior
in these situations.
Peng Liu March 29, 2022, 3:59 a.m. UTC | #4
On 2022/3/29 10:46, Mike Kravetz wrote:
> Yes, I agree that the change is needed and the current behavior is
> unacceptable.
>
> One remaining question is the change from returning '0' to '1' in the case
> of error.  I do understand this is to prevent the invalid parameter string
> from being passed to init.  It may not be correct/right, but in every other
> case where an invalid parameter in encountered in hugetlb command line
> processing we return "0".  Should we perhaps change all these other places
> to be consistent?  I honestly do not know what is the appropriate behavior
> in these situations.

Thank you for your carefulness and question.

I have checked default_hugepagesz_setup and hugepages_setup will both print
some information before return '0', so there is no need to print again in
"Unknown kernel command line parameters".

Should I send another patch to repair the rest "return 0" in hugetlb?

Some other tests for current linux-master:

cmdline:
hugepagesz=1G hugepages=1 hugepagesz=1G hugepages=2
dmesg:
HugeTLB: hugepagesz=1G specified twice, ignoring
Unknown kernel command line parameters " hugepagesz=1G hugepages=2"

cmdline:
hugepagesz=1Y hugepages=1
dmesg:
HugeTLB: unsupported hugepagesz=1Y
HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"

--
Peng Liu
.
Mike Kravetz March 29, 2022, 5:43 p.m. UTC | #5
On 3/28/22 20:59, liupeng (DM) wrote:
> 
> On 2022/3/29 10:46, Mike Kravetz wrote:
>> Yes, I agree that the change is needed and the current behavior is
>> unacceptable.
>>
>> One remaining question is the change from returning '0' to '1' in the case
>> of error.  I do understand this is to prevent the invalid parameter string
>> from being passed to init.  It may not be correct/right, but in every other
>> case where an invalid parameter in encountered in hugetlb command line
>> processing we return "0".  Should we perhaps change all these other places
>> to be consistent?  I honestly do not know what is the appropriate behavior
>> in these situations.
> 
> Thank you for your carefulness and question.
> 
> I have checked default_hugepagesz_setup and hugepages_setup will both print
> some information before return '0', so there is no need to print again in
> "Unknown kernel command line parameters".
> 
> Should I send another patch to repair the rest "return 0" in hugetlb?

I would suggest two patches:

1) Fix the issue with invalid nodes specified.  However, leave the "return 0"
   behavior in hugepages_setup to be consistent with the rest of the code.
   This patch can be sent to stable with "Fixes: b5389086ad7b" tag as it
   addresses an existing issue.
2) Clean up the places where we return 0 and it would be better to return 1.
   No cc stable here and just let the changes target future releases.
Peng Liu March 30, 2022, 1:01 a.m. UTC | #6
On 2022/3/30 1:43, Mike Kravetz wrote:
> On 3/28/22 20:59, liupeng (DM) wrote:
>> On 2022/3/29 10:46, Mike Kravetz wrote:
>>> Yes, I agree that the change is needed and the current behavior is
>>> unacceptable.
>>>
>>> One remaining question is the change from returning '0' to '1' in the case
>>> of error.  I do understand this is to prevent the invalid parameter string
>>> from being passed to init.  It may not be correct/right, but in every other
>>> case where an invalid parameter in encountered in hugetlb command line
>>> processing we return "0".  Should we perhaps change all these other places
>>> to be consistent?  I honestly do not know what is the appropriate behavior
>>> in these situations.
>> Thank you for your carefulness and question.
>>
>> I have checked default_hugepagesz_setup and hugepages_setup will both print
>> some information before return '0', so there is no need to print again in
>> "Unknown kernel command line parameters".
>>
>> Should I send another patch to repair the rest "return 0" in hugetlb?
> I would suggest two patches:
>
> 1) Fix the issue with invalid nodes specified.  However, leave the "return 0"
>     behavior in hugepages_setup to be consistent with the rest of the code.
>     This patch can be sent to stable with "Fixes: b5389086ad7b" tag as it
>     addresses an existing issue.
> 2) Clean up the places where we return 0 and it would be better to return 1.
>     No cc stable here and just let the changes target future releases.

Thanks, I will do it as your suggestions.
Peng Liu March 31, 2022, 11:23 a.m. UTC | #7
On 2022/3/30 1:43, Mike Kravetz wrote:
> On 3/28/22 20:59, liupeng (DM) wrote:
>> On 2022/3/29 10:46, Mike Kravetz wrote:
>>> Yes, I agree that the change is needed and the current behavior is
>>> unacceptable.
>>>
>>> One remaining question is the change from returning '0' to '1' in the case
>>> of error.  I do understand this is to prevent the invalid parameter string
>>> from being passed to init.  It may not be correct/right, but in every other
>>> case where an invalid parameter in encountered in hugetlb command line
>>> processing we return "0".  Should we perhaps change all these other places
>>> to be consistent?  I honestly do not know what is the appropriate behavior
>>> in these situations.
>> Thank you for your carefulness and question.
>>
>> I have checked default_hugepagesz_setup and hugepages_setup will both print
>> some information before return '0', so there is no need to print again in
>> "Unknown kernel command line parameters".
>>
>> Should I send another patch to repair the rest "return 0" in hugetlb?
> I would suggest two patches:
>
> 1) Fix the issue with invalid nodes specified.  However, leave the "return 0"
>     behavior in hugepages_setup to be consistent with the rest of the code.
>     This patch can be sent to stable with "Fixes: b5389086ad7b" tag as it
>     addresses an existing issue.
> 2) Clean up the places where we return 0 and it would be better to return 1.
>     No cc stable here and just let the changes target future releases.
I have tried to write a patch as your suggestion, but the best way I can 
carry it
out is the original patch. To meet "Fix invalid nodes issue and leave 
thereturn
0 behavior", I have to add the following redundant code:

  invalid:
         pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p);
+
+       /* Allocate gigantic hstates for successfully parsed parameters*/
+       if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
+ hugetlb_hstate_alloc_pages(parsed_hstate);
+       last_mhp = mhp;
return 0;

Any ideas?
Mike Kravetz March 31, 2022, 9:11 p.m. UTC | #8
On 3/31/22 04:23, liupeng (DM) wrote:
> On 2022/3/30 1:43, Mike Kravetz wrote:
>> On 3/28/22 20:59, liupeng (DM) wrote:
>>> On 2022/3/29 10:46, Mike Kravetz wrote:
>>>> Yes, I agree that the change is needed and the current behavior is
>>>> unacceptable.
>>>>
>>>> One remaining question is the change from returning '0' to '1' in the case
>>>> of error.  I do understand this is to prevent the invalid parameter string
>>>> from being passed to init.  It may not be correct/right, but in every other
>>>> case where an invalid parameter in encountered in hugetlb command line
>>>> processing we return "0".  Should we perhaps change all these other places
>>>> to be consistent?  I honestly do not know what is the appropriate behavior
>>>> in these situations.
>>> Thank you for your carefulness and question.
>>>
>>> I have checked default_hugepagesz_setup and hugepages_setup will both print
>>> some information before return '0', so there is no need to print again in
>>> "Unknown kernel command line parameters".
>>>
>>> Should I send another patch to repair the rest "return 0" in hugetlb?
>> I would suggest two patches:
>>
>> 1) Fix the issue with invalid nodes specified.  However, leave the "return 0"
>>     behavior in hugepages_setup to be consistent with the rest of the code.
>>     This patch can be sent to stable with "Fixes: b5389086ad7b" tag as it
>>     addresses an existing issue.
>> 2) Clean up the places where we return 0 and it would be better to return 1.
>>     No cc stable here and just let the changes target future releases.
> I have tried to write a patch as your suggestion, but the best way I can carry it
> out is the original patch. To meet "Fix invalid nodes issue and leave thereturn
> 0 behavior", I have to add the following redundant code:
> 
>  invalid:
>         pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p);
> +
> +       /* Allocate gigantic hstates for successfully parsed parameters*/
> +       if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
> + hugetlb_hstate_alloc_pages(parsed_hstate);
> +       last_mhp = mhp;
> return 0;
> 

I was thinking something like the attached (untested).  It is very similar to
your original code.
Peng Liu April 1, 2022, 9:56 a.m. UTC | #9
On 2022/4/1 5:11, Mike Kravetz wrote:
>
> I was thinking something like the attached (untested).  It is very similar to
> your original code.
Thanks, patch-v2 is sent.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b34f50156f7e..de8e1d8a2c66 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4189,6 +4189,7 @@  static int __init hugepages_setup(char *s)
 		}
 	}
 
+out:
 	/*
 	 * Global state is always initialized later in hugetlb_init.
 	 * But we need to allocate gigantic hstates here early to still
@@ -4203,7 +4204,7 @@  static int __init hugepages_setup(char *s)
 
 invalid:
 	pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p);
-	return 0;
+	goto out;
 }
 __setup("hugepages=", hugepages_setup);