diff mbox series

mm: set hugepage to false when anon mthp allocation

Message ID 20240910140625.175700-1-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: set hugepage to false when anon mthp allocation | expand

Commit Message

Kefeng Wang Sept. 10, 2024, 2:06 p.m. UTC
When the hugepage parameter is true in vma_alloc_folio(), it indicates
that we only try allocation on preferred node if possible for PMD_ORDER,
but it could lead to lots of failures for large folio allocation,
luckily the hugepage parameter was deprecated since commit ddc1a5cbc05d
("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
effect on runtime behavior.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---

Found the issue when backport mthp to inner kernel without ddc1a5cbc05d,
but for mainline, there is no issue, no clue why hugepage parameter was
retained, maybe just kill the parameter for mainline?

 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Kefeng Wang Sept. 10, 2024, 2:18 p.m. UTC | #1
On 2024/9/10 22:06, Kefeng Wang wrote:
> When the hugepage parameter is true in vma_alloc_folio(), it indicates
> that we only try allocation on preferred node if possible for PMD_ORDER,

Should remove "for PMD_ORDER", I mean that it was used for PMD_ORDER, 
but for other high-order, it will reduce the success rate of allocation 
if without ddc1a5cbc05d.


> but it could lead to lots of failures for large folio allocation,
> luckily the hugepage parameter was deprecated since commit ddc1a5cbc05d
> ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
> effect on runtime behavior.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> 
> Found the issue when backport mthp to inner kernel without ddc1a5cbc05d,
> but for mainline, there is no issue, no clue why hugepage parameter was
> retained, maybe just kill the parameter for mainline?
> 
>   mm/memory.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index b84443e689a8..89a15858348a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4479,7 +4479,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>   	gfp = vma_thp_gfp_mask(vma);
>   	while (orders) {
>   		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
> -		folio = vma_alloc_folio(gfp, order, vma, addr, true);
> +		folio = vma_alloc_folio(gfp, order, vma, addr, false);
>   		if (folio) {
>   			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
>   				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
Kefeng Wang Sept. 13, 2024, 10:36 a.m. UTC | #2
Hi All,

On 2024/9/10 22:18, Kefeng Wang wrote:
> 
> 
> On 2024/9/10 22:06, Kefeng Wang wrote:
>> When the hugepage parameter is true in vma_alloc_folio(), it indicates
>> that we only try allocation on preferred node if possible for PMD_ORDER,
> 
> Should remove "for PMD_ORDER", I mean that it was used for PMD_ORDER, 
> but for other high-order, it will reduce the success rate of allocation 
> if without ddc1a5cbc05d.
> 
> 
>> but it could lead to lots of failures for large folio allocation,
>> luckily the hugepage parameter was deprecated since commit ddc1a5cbc05d
>> ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
>> effect on runtime behavior.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>
>> Found the issue when backport mthp to inner kernel without ddc1a5cbc05d,
>> but for mainline, there is no issue, no clue why hugepage parameter was
>> retained, maybe just kill the parameter for mainline?


Any comments, fix in alloc_anon_folio() or remove hugepage parameter in 
vma_alloc_folio(), thanks.

>>
>>   mm/memory.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b84443e689a8..89a15858348a 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4479,7 +4479,7 @@ static struct folio *alloc_anon_folio(struct 
>> vm_fault *vmf)
>>       gfp = vma_thp_gfp_mask(vma);
>>       while (orders) {
>>           addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
>> -        folio = vma_alloc_folio(gfp, order, vma, addr, true);
>> +        folio = vma_alloc_folio(gfp, order, vma, addr, false);
>>           if (folio) {
>>               if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
>>                   count_mthp_stat(order, 
>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>
Kefeng Wang Oct. 9, 2024, 9:15 a.m. UTC | #3
On 2024/9/13 18:36, Kefeng Wang wrote:
> Hi All,
> 
> On 2024/9/10 22:18, Kefeng Wang wrote:
>>
>>
>> On 2024/9/10 22:06, Kefeng Wang wrote:
>>> When the hugepage parameter is true in vma_alloc_folio(), it indicates
>>> that we only try allocation on preferred node if possible for PMD_ORDER,
>>
>> Should remove "for PMD_ORDER", I mean that it was used for PMD_ORDER, 
>> but for other high-order, it will reduce the success rate of 
>> allocation if without ddc1a5cbc05d.
>>
>>
>>> but it could lead to lots of failures for large folio allocation,
>>> luckily the hugepage parameter was deprecated since commit ddc1a5cbc05d
>>> ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
>>> effect on runtime behavior.
>>>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> ---
>>>
>>> Found the issue when backport mthp to inner kernel without ddc1a5cbc05d,
>>> but for mainline, there is no issue, no clue why hugepage parameter was
>>> retained, maybe just kill the parameter for mainline?
> 
> 
> Any comments, fix in alloc_anon_folio() or remove hugepage parameter in 
> vma_alloc_folio(), thanks.

* vma_alloc_folio - Allocate a folio for a VMA.
@hugepage: Unused (was: For hugepages try only preferred node if possible).

Since hugepage won't be used in vma_alloc_folio(), maybe just delete 
this parameter?

> 
>>>
>>>   mm/memory.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index b84443e689a8..89a15858348a 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4479,7 +4479,7 @@ static struct folio *alloc_anon_folio(struct 
>>> vm_fault *vmf)
>>>       gfp = vma_thp_gfp_mask(vma);
>>>       while (orders) {
>>>           addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
>>> -        folio = vma_alloc_folio(gfp, order, vma, addr, true);
>>> +        folio = vma_alloc_folio(gfp, order, vma, addr, false);
>>>           if (folio) {
>>>               if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
>>>                   count_mthp_stat(order, 
>>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>>
Ryan Roberts Oct. 9, 2024, 10:44 a.m. UTC | #4
On 09/10/2024 10:15, Kefeng Wang wrote:
> 
> On 2024/9/13 18:36, Kefeng Wang wrote:
>> Hi All,
>>
>> On 2024/9/10 22:18, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/9/10 22:06, Kefeng Wang wrote:
>>>> When the hugepage parameter is true in vma_alloc_folio(), it indicates
>>>> that we only try allocation on preferred node if possible for PMD_ORDER,
>>>
>>> Should remove "for PMD_ORDER", I mean that it was used for PMD_ORDER, but for
>>> other high-order, it will reduce the success rate of allocation if without
>>> ddc1a5cbc05d.
>>>
>>>
>>>> but it could lead to lots of failures for large folio allocation,
>>>> luckily the hugepage parameter was deprecated since commit ddc1a5cbc05d
>>>> ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
>>>> effect on runtime behavior.
>>>>
>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> ---
>>>>
>>>> Found the issue when backport mthp to inner kernel without ddc1a5cbc05d,
>>>> but for mainline, there is no issue, no clue why hugepage parameter was
>>>> retained, maybe just kill the parameter for mainline?
>>
>>
>> Any comments, fix in alloc_anon_folio() or remove hugepage parameter in
>> vma_alloc_folio(), thanks.
> 
> * vma_alloc_folio - Allocate a folio for a VMA.
> @hugepage: Unused (was: For hugepages try only preferred node if possible).
> 
> Since hugepage won't be used in vma_alloc_folio(), maybe just delete this
> parameter?

Sorry for the radio silence. Given the param is no longer used, I think it would
be cleaner to just remove it.

It was set to true here on purpose though; the aim was to follow the pattern set
by PMD-sized THP, which also sets it to true. And the aargument was that the
benefit of having a huge page would be outstripped by having to access it on a
remote node.

Now that the parameter is deprecated, do you know if the policy is still
enforced by other means?

Thanks,
Ryan


> 
>>
>>>>
>>>>   mm/memory.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index b84443e689a8..89a15858348a 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -4479,7 +4479,7 @@ static struct folio *alloc_anon_folio(struct vm_fault
>>>> *vmf)
>>>>       gfp = vma_thp_gfp_mask(vma);
>>>>       while (orders) {
>>>>           addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
>>>> -        folio = vma_alloc_folio(gfp, order, vma, addr, true);
>>>> +        folio = vma_alloc_folio(gfp, order, vma, addr, false);
>>>>           if (folio) {
>>>>               if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
>>>>                   count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>>>
>
David Hildenbrand Oct. 9, 2024, 2:28 p.m. UTC | #5
On 09.10.24 12:44, Ryan Roberts wrote:
> On 09/10/2024 10:15, Kefeng Wang wrote:
>>
>> On 2024/9/13 18:36, Kefeng Wang wrote:
>>> Hi All,
>>>
>>> On 2024/9/10 22:18, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2024/9/10 22:06, Kefeng Wang wrote:
>>>>> When the hugepage parameter is true in vma_alloc_folio(), it indicates
>>>>> that we only try allocation on preferred node if possible for PMD_ORDER,
>>>>
>>>> Should remove "for PMD_ORDER", I mean that it was used for PMD_ORDER, but for
>>>> other high-order, it will reduce the success rate of allocation if without
>>>> ddc1a5cbc05d.
>>>>
>>>>
>>>>> but it could lead to lots of failures for large folio allocation,
>>>>> luckily the hugepage parameter was deprecated since commit ddc1a5cbc05d
>>>>> ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
>>>>> effect on runtime behavior.
>>>>>
>>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>> ---
>>>>>
>>>>> Found the issue when backport mthp to inner kernel without ddc1a5cbc05d,
>>>>> but for mainline, there is no issue, no clue why hugepage parameter was
>>>>> retained, maybe just kill the parameter for mainline?
>>>
>>>
>>> Any comments, fix in alloc_anon_folio() or remove hugepage parameter in
>>> vma_alloc_folio(), thanks.
>>
>> * vma_alloc_folio - Allocate a folio for a VMA.
>> @hugepage: Unused (was: For hugepages try only preferred node if possible).
>>
>> Since hugepage won't be used in vma_alloc_folio(), maybe just delete this
>> parameter?
> 
> Sorry for the radio silence. Given the param is no longer used, I think it would
> be cleaner to just remove it.

Agreed, no dead code.

> 
> It was set to true here on purpose though; the aim was to follow the pattern set
> by PMD-sized THP, which also sets it to true. And the aargument was that the
> benefit of having a huge page would be outstripped by having to access it on a
> remote node.
> 
> Now that the parameter is deprecated, do you know if the policy is still
> enforced by other means?

Right, it might indicate a bug. So figuring out why there are no users 
left would be interesting. Maybe it was all on purpose.
Kefeng Wang Oct. 10, 2024, 1:13 a.m. UTC | #6
On 2024/10/9 22:28, David Hildenbrand wrote:
> On 09.10.24 12:44, Ryan Roberts wrote:
>> On 09/10/2024 10:15, Kefeng Wang wrote:
>>>
>>> On 2024/9/13 18:36, Kefeng Wang wrote:
>>>> Hi All,
>>>>
>>>> On 2024/9/10 22:18, Kefeng Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/9/10 22:06, Kefeng Wang wrote:
>>>>>> When the hugepage parameter is true in vma_alloc_folio(), it 
>>>>>> indicates
>>>>>> that we only try allocation on preferred node if possible for 
>>>>>> PMD_ORDER,
>>>>>
>>>>> Should remove "for PMD_ORDER", I mean that it was used for 
>>>>> PMD_ORDER, but for
>>>>> other high-order, it will reduce the success rate of allocation if 
>>>>> without
>>>>> ddc1a5cbc05d.
>>>>>
>>>>>
>>>>>> but it could lead to lots of failures for large folio allocation,
>>>>>> luckily the hugepage parameter was deprecated since commit 
>>>>>> ddc1a5cbc05d
>>>>>> ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), so no
>>>>>> effect on runtime behavior.
>>>>>>
>>>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>> ---
>>>>>>
>>>>>> Found the issue when backport mthp to inner kernel without 
>>>>>> ddc1a5cbc05d,
>>>>>> but for mainline, there is no issue, no clue why hugepage 
>>>>>> parameter was
>>>>>> retained, maybe just kill the parameter for mainline?
>>>>
>>>>
>>>> Any comments, fix in alloc_anon_folio() or remove hugepage parameter in
>>>> vma_alloc_folio(), thanks.
>>>
>>> * vma_alloc_folio - Allocate a folio for a VMA.
>>> @hugepage: Unused (was: For hugepages try only preferred node if 
>>> possible).
>>>
>>> Since hugepage won't be used in vma_alloc_folio(), maybe just delete 
>>> this
>>> parameter?
>>
>> Sorry for the radio silence. Given the param is no longer used, I 
>> think it would
>> be cleaner to just remove it.
> 
> Agreed, no dead code.

Sure.

> 
>>
>> It was set to true here on purpose though; the aim was to follow the 
>> pattern set
>> by PMD-sized THP, which also sets it to true. And the aargument was 
>> that the
>> benefit of having a huge page would be outstripped by having to access 
>> it on a
>> remote node.>>
>> Now that the parameter is deprecated, do you know if the policy is still
>> enforced by other means?
> 
> Right, it might indicate a bug. So figuring out why there are no users 
> left would be interesting. Maybe it was all on purpose.
> 

Before v6.7 ddc1a5cbc05d(mTHP from v6.8), it checks hugepage parameter
(only for PMD THP), after that commit, we check the order == PMD_ORDER,
so no different for PMD THP, but for other high-order, it doesn't follow
the pattern set by PMD THP.


   if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
       /* filter "hugepage" allocation, unless from alloc_pages() */
       order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) {


"The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
passed down from vma-less alloc_pages() is also used as hint not to use
THP-style hugepage allocation - to avoid the overhead of a hugepage arg
(though I don't understand why we never just added a GFP bit for THP - if
it actually needs a different allocation strategy from other pages of the
same order).  vma_alloc_folio() still carries its hugepage arg here, but
it is not used, and should be removed when agreed.
"

For Hugh's changelog, it seems that we could just remove it, but since 
no mTHP when made this change, Hugh didn't take into account for other
high-order folio allocation.
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index b84443e689a8..89a15858348a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4479,7 +4479,7 @@  static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	gfp = vma_thp_gfp_mask(vma);
 	while (orders) {
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
-		folio = vma_alloc_folio(gfp, order, vma, addr, true);
+		folio = vma_alloc_folio(gfp, order, vma, addr, false);
 		if (folio) {
 			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
 				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);