diff mbox series

[1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes

Message ID 20230905031312.91929-1-yuancan@huawei.com (mailing list archive)
State New
Headers show
Series [1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes | expand

Commit Message

Yuan Can Sept. 5, 2023, 3:13 a.m. UTC
The decreasing of hugetlb pages number failed with the following message
given:

 sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
 CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace.part.6+0x84/0xe4
  show_stack+0x18/0x24
  dump_stack_lvl+0x48/0x60
  dump_stack+0x18/0x24
  warn_alloc+0x100/0x1bc
  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
  __alloc_pages+0x244/0x2d0
  hugetlb_vmemmap_restore+0x104/0x1e4
  __update_and_free_hugetlb_folio+0x44/0x1f4
  update_and_free_hugetlb_folio+0x20/0x68
  update_and_free_pages_bulk+0x4c/0xac
  set_max_huge_pages+0x198/0x334
  nr_hugepages_store_common+0x118/0x178
  nr_hugepages_store+0x18/0x24
  kobj_attr_store+0x18/0x2c
  sysfs_kf_write+0x40/0x54
  kernfs_fop_write_iter+0x164/0x1dc
  vfs_write+0x3a8/0x460
  ksys_write+0x6c/0x100
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x44/0x100
  el0_svc_common.constprop.1+0x6c/0xe4
  do_el0_svc+0x38/0x94
  el0_svc+0x28/0x74
  el0t_64_sync_handler+0xa0/0xc4
  el0t_64_sync+0x174/0x178
 Mem-Info:
  ...

The reason is that the hugetlb pages being released are allocated from
movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
need to be allocated from the same node during the hugetlb pages
releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
node is always failed. Fix this problem by removing __GFP_THISNODE.

Signed-off-by: Yuan Can <yuancan@huawei.com>
---
 mm/hugetlb_vmemmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Muchun Song Sept. 5, 2023, 9:06 a.m. UTC | #1
> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
> 
> The decreasing of hugetlb pages number failed with the following message
> given:
> 
> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.6+0x84/0xe4
>  show_stack+0x18/0x24
>  dump_stack_lvl+0x48/0x60
>  dump_stack+0x18/0x24
>  warn_alloc+0x100/0x1bc
>  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>  __alloc_pages+0x244/0x2d0
>  hugetlb_vmemmap_restore+0x104/0x1e4
>  __update_and_free_hugetlb_folio+0x44/0x1f4
>  update_and_free_hugetlb_folio+0x20/0x68
>  update_and_free_pages_bulk+0x4c/0xac
>  set_max_huge_pages+0x198/0x334
>  nr_hugepages_store_common+0x118/0x178
>  nr_hugepages_store+0x18/0x24
>  kobj_attr_store+0x18/0x2c
>  sysfs_kf_write+0x40/0x54
>  kernfs_fop_write_iter+0x164/0x1dc
>  vfs_write+0x3a8/0x460
>  ksys_write+0x6c/0x100
>  __arm64_sys_write+0x1c/0x28
>  invoke_syscall+0x44/0x100
>  el0_svc_common.constprop.1+0x6c/0xe4
>  do_el0_svc+0x38/0x94
>  el0_svc+0x28/0x74
>  el0t_64_sync_handler+0xa0/0xc4
>  el0t_64_sync+0x174/0x178
> Mem-Info:
>  ...
> 
> The reason is that the hugetlb pages being released are allocated from
> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
> need to be allocated from the same node during the hugetlb pages

Thanks for your fix, I think it should be a real word issue, it's better
to add a Fixes tag to indicate backporting. Thanks.

> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
> node is always failed. Fix this problem by removing __GFP_THISNODE.
> 
> Signed-off-by: Yuan Can <yuancan@huawei.com>
> ---
> mm/hugetlb_vmemmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index c2007ef5e9b0..0485e471d224 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>   				   struct list_head *list)
> {
> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;

There is a little change for non-movable case after this change, we fist try
to allocate memory from the preferred node (it is same as original), if it
fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
huge pages could be freed once other nodes could satisfy the allocation of
vmemmap pages.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

> 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
> 	int nid = page_to_nid((struct page *)start);
> 	struct page *page, *next;
> -- 
> 2.17.1
> 
>
Kefeng Wang Sept. 5, 2023, 10:43 a.m. UTC | #2
On 2023/9/5 17:06, Muchun Song wrote:
> 
> 
>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>
>> The decreasing of hugetlb pages number failed with the following message
>> given:
>>
>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace.part.6+0x84/0xe4
>>   show_stack+0x18/0x24
>>   dump_stack_lvl+0x48/0x60
>>   dump_stack+0x18/0x24
>>   warn_alloc+0x100/0x1bc
>>   __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>   __alloc_pages+0x244/0x2d0
>>   hugetlb_vmemmap_restore+0x104/0x1e4
>>   __update_and_free_hugetlb_folio+0x44/0x1f4
>>   update_and_free_hugetlb_folio+0x20/0x68
>>   update_and_free_pages_bulk+0x4c/0xac
>>   set_max_huge_pages+0x198/0x334
>>   nr_hugepages_store_common+0x118/0x178
>>   nr_hugepages_store+0x18/0x24
>>   kobj_attr_store+0x18/0x2c
>>   sysfs_kf_write+0x40/0x54
>>   kernfs_fop_write_iter+0x164/0x1dc
>>   vfs_write+0x3a8/0x460
>>   ksys_write+0x6c/0x100
>>   __arm64_sys_write+0x1c/0x28
>>   invoke_syscall+0x44/0x100
>>   el0_svc_common.constprop.1+0x6c/0xe4
>>   do_el0_svc+0x38/0x94
>>   el0_svc+0x28/0x74
>>   el0t_64_sync_handler+0xa0/0xc4
>>   el0t_64_sync+0x174/0x178
>> Mem-Info:
>>   ...
>>
>> The reason is that the hugetlb pages being released are allocated from
>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>> need to be allocated from the same node during the hugetlb pages
> 
> Thanks for your fix, I think it should be a real word issue, it's better
> to add a Fixes tag to indicate backporting. Thanks.
> 
>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>> node is always failed. Fix this problem by removing __GFP_THISNODE.

Should be ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated 
with each HugeTLB page")

>>
>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>> ---
>> mm/hugetlb_vmemmap.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index c2007ef5e9b0..0485e471d224 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>    				   struct list_head *list)
>> {
>> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> 
> There is a little change for non-movable case after this change, we fist try
> to allocate memory from the preferred node (it is same as original), if it
> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
> huge pages could be freed once other nodes could satisfy the allocation of
> vmemmap pages.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> Thanks.
> 
>> 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
>> 	int nid = page_to_nid((struct page *)start);
>> 	struct page *page, *next;
>> -- 
>> 2.17.1
>>
>>
>
Yuan Can Sept. 5, 2023, 12:41 p.m. UTC | #3
在 2023/9/5 17:06, Muchun Song 写道:
>
>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>
>> The decreasing of hugetlb pages number failed with the following message
>> given:
>>
>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace.part.6+0x84/0xe4
>>   show_stack+0x18/0x24
>>   dump_stack_lvl+0x48/0x60
>>   dump_stack+0x18/0x24
>>   warn_alloc+0x100/0x1bc
>>   __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>   __alloc_pages+0x244/0x2d0
>>   hugetlb_vmemmap_restore+0x104/0x1e4
>>   __update_and_free_hugetlb_folio+0x44/0x1f4
>>   update_and_free_hugetlb_folio+0x20/0x68
>>   update_and_free_pages_bulk+0x4c/0xac
>>   set_max_huge_pages+0x198/0x334
>>   nr_hugepages_store_common+0x118/0x178
>>   nr_hugepages_store+0x18/0x24
>>   kobj_attr_store+0x18/0x2c
>>   sysfs_kf_write+0x40/0x54
>>   kernfs_fop_write_iter+0x164/0x1dc
>>   vfs_write+0x3a8/0x460
>>   ksys_write+0x6c/0x100
>>   __arm64_sys_write+0x1c/0x28
>>   invoke_syscall+0x44/0x100
>>   el0_svc_common.constprop.1+0x6c/0xe4
>>   do_el0_svc+0x38/0x94
>>   el0_svc+0x28/0x74
>>   el0t_64_sync_handler+0xa0/0xc4
>>   el0t_64_sync+0x174/0x178
>> Mem-Info:
>>   ...
>>
>> The reason is that the hugetlb pages being released are allocated from
>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>> need to be allocated from the same node during the hugetlb pages
> Thanks for your fix, I think it should be a real word issue, it's better
> to add a Fixes tag to indicate backporting. Thanks.
>
>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>> node is always failed. Fix this problem by removing __GFP_THISNODE.
>>
>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>> ---
>> mm/hugetlb_vmemmap.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index c2007ef5e9b0..0485e471d224 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>    				   struct list_head *list)
>> {
>> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> There is a little change for non-movable case after this change, we fist try
> to allocate memory from the preferred node (it is same as original), if it
> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
> huge pages could be freed once other nodes could satisfy the allocation of
> vmemmap pages.
>
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
>
> Thanks.
Thanks for the review, I will send the v2 patch with Fixes tag and your 
Reviewed-by soon.
Mike Kravetz Sept. 6, 2023, 12:28 a.m. UTC | #4
On 09/05/23 17:06, Muchun Song wrote:
> 
> 
> > On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
> > 
> > The decreasing of hugetlb pages number failed with the following message
> > given:
> > 
> > sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
> > CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.6+0x84/0xe4
> >  show_stack+0x18/0x24
> >  dump_stack_lvl+0x48/0x60
> >  dump_stack+0x18/0x24
> >  warn_alloc+0x100/0x1bc
> >  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
> >  __alloc_pages+0x244/0x2d0
> >  hugetlb_vmemmap_restore+0x104/0x1e4
> >  __update_and_free_hugetlb_folio+0x44/0x1f4
> >  update_and_free_hugetlb_folio+0x20/0x68
> >  update_and_free_pages_bulk+0x4c/0xac
> >  set_max_huge_pages+0x198/0x334
> >  nr_hugepages_store_common+0x118/0x178
> >  nr_hugepages_store+0x18/0x24
> >  kobj_attr_store+0x18/0x2c
> >  sysfs_kf_write+0x40/0x54
> >  kernfs_fop_write_iter+0x164/0x1dc
> >  vfs_write+0x3a8/0x460
> >  ksys_write+0x6c/0x100
> >  __arm64_sys_write+0x1c/0x28
> >  invoke_syscall+0x44/0x100
> >  el0_svc_common.constprop.1+0x6c/0xe4
> >  do_el0_svc+0x38/0x94
> >  el0_svc+0x28/0x74
> >  el0t_64_sync_handler+0xa0/0xc4
> >  el0t_64_sync+0x174/0x178
> > Mem-Info:
> >  ...
> > 
> > The reason is that the hugetlb pages being released are allocated from
> > movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
> > need to be allocated from the same node during the hugetlb pages
> 
> Thanks for your fix, I think it should be a real word issue, it's better
> to add a Fixes tag to indicate backporting. Thanks.
> 

I thought we might get get the same error (Unable to allocate on movable
node) when creating the hugetlb page.  Why?  Because we replace the head
vmemmap page.  However, I see that failure to allocate there is not a
fatal error and we fallback to the currently mapped page.  We also pass
__GFP_NOWARN to that allocation attempt so there will be no report of the
failure.

We might want to change this as well?

> > releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
> > node is always failed. Fix this problem by removing __GFP_THISNODE.
> > 
> > Signed-off-by: Yuan Can <yuancan@huawei.com>
> > ---
> > mm/hugetlb_vmemmap.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index c2007ef5e9b0..0485e471d224 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> > static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> >   				   struct list_head *list)
> > {
> > - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
> > + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> 
> There is a little change for non-movable case after this change, we fist try
> to allocate memory from the preferred node (it is same as original), if it
> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
> huge pages could be freed once other nodes could satisfy the allocation of
> vmemmap pages.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>

This looks reasonable to me as well.

Cc'ing David and Michal as they are expert in hotplug.
Muchun Song Sept. 6, 2023, 2:32 a.m. UTC | #5
> On Sep 6, 2023, at 08:28, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
> On 09/05/23 17:06, Muchun Song wrote:
>> 
>> 
>>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>> 
>>> The decreasing of hugetlb pages number failed with the following message
>>> given:
>>> 
>>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>>> Hardware name: linux,dummy-virt (DT)
>>> Call trace:
>>> dump_backtrace.part.6+0x84/0xe4
>>> show_stack+0x18/0x24
>>> dump_stack_lvl+0x48/0x60
>>> dump_stack+0x18/0x24
>>> warn_alloc+0x100/0x1bc
>>> __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>> __alloc_pages+0x244/0x2d0
>>> hugetlb_vmemmap_restore+0x104/0x1e4
>>> __update_and_free_hugetlb_folio+0x44/0x1f4
>>> update_and_free_hugetlb_folio+0x20/0x68
>>> update_and_free_pages_bulk+0x4c/0xac
>>> set_max_huge_pages+0x198/0x334
>>> nr_hugepages_store_common+0x118/0x178
>>> nr_hugepages_store+0x18/0x24
>>> kobj_attr_store+0x18/0x2c
>>> sysfs_kf_write+0x40/0x54
>>> kernfs_fop_write_iter+0x164/0x1dc
>>> vfs_write+0x3a8/0x460
>>> ksys_write+0x6c/0x100
>>> __arm64_sys_write+0x1c/0x28
>>> invoke_syscall+0x44/0x100
>>> el0_svc_common.constprop.1+0x6c/0xe4
>>> do_el0_svc+0x38/0x94
>>> el0_svc+0x28/0x74
>>> el0t_64_sync_handler+0xa0/0xc4
>>> el0t_64_sync+0x174/0x178
>>> Mem-Info:
>>> ...
>>> 
>>> The reason is that the hugetlb pages being released are allocated from
>>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>>> need to be allocated from the same node during the hugetlb pages
>> 
>> Thanks for your fix, I think it should be a real word issue, it's better
>> to add a Fixes tag to indicate backporting. Thanks.
>> 
> 
> I thought we might get get the same error (Unable to allocate on movable
> node) when creating the hugetlb page.  Why?  Because we replace the head
> vmemmap page.  However, I see that failure to allocate there is not a
> fatal error and we fallback to the currently mapped page.  We also pass
> __GFP_NOWARN to that allocation attempt so there will be no report of the
> failure.
> 
> We might want to change this as well?

I think yes. I also thought about this yesterday, but I think
this one is not a fetal error, it should be an improvement patch.
So it is better not to fold this change into this patch (a bug fix one).

Thanks.

> 
>>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>>> node is always failed. Fix this problem by removing __GFP_THISNODE.
>>> 
>>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>>> ---
>>> mm/hugetlb_vmemmap.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index c2007ef5e9b0..0485e471d224 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>>      struct list_head *list)
>>> {
>>> -  gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>>> +  gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
>> 
>> There is a little change for non-movable case after this change, we fist try
>> to allocate memory from the preferred node (it is same as original), if it
>> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
>> huge pages could be freed once other nodes could satisfy the allocation of
>> vmemmap pages.
>> 
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> This looks reasonable to me as well.
> 
> Cc'ing David and Michal as they are expert in hotplug.
> -- 
> Mike Kravetz
> 
>> 
>> Thanks.
>> 
>>> unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
>>> int nid = page_to_nid((struct page *)start);
>>> struct page *page, *next;
>>> -- 
>>> 2.17.1
Yuan Can Sept. 6, 2023, 2:59 a.m. UTC | #6
在 2023/9/6 10:32, Muchun Song 写道:
>
>> On Sep 6, 2023, at 08:28, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 09/05/23 17:06, Muchun Song wrote:
>>>
>>>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>>>
>>>> The decreasing of hugetlb pages number failed with the following message
>>>> given:
>>>>
>>>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>>>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>>>> Hardware name: linux,dummy-virt (DT)
>>>> Call trace:
>>>> dump_backtrace.part.6+0x84/0xe4
>>>> show_stack+0x18/0x24
>>>> dump_stack_lvl+0x48/0x60
>>>> dump_stack+0x18/0x24
>>>> warn_alloc+0x100/0x1bc
>>>> __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>>> __alloc_pages+0x244/0x2d0
>>>> hugetlb_vmemmap_restore+0x104/0x1e4
>>>> __update_and_free_hugetlb_folio+0x44/0x1f4
>>>> update_and_free_hugetlb_folio+0x20/0x68
>>>> update_and_free_pages_bulk+0x4c/0xac
>>>> set_max_huge_pages+0x198/0x334
>>>> nr_hugepages_store_common+0x118/0x178
>>>> nr_hugepages_store+0x18/0x24
>>>> kobj_attr_store+0x18/0x2c
>>>> sysfs_kf_write+0x40/0x54
>>>> kernfs_fop_write_iter+0x164/0x1dc
>>>> vfs_write+0x3a8/0x460
>>>> ksys_write+0x6c/0x100
>>>> __arm64_sys_write+0x1c/0x28
>>>> invoke_syscall+0x44/0x100
>>>> el0_svc_common.constprop.1+0x6c/0xe4
>>>> do_el0_svc+0x38/0x94
>>>> el0_svc+0x28/0x74
>>>> el0t_64_sync_handler+0xa0/0xc4
>>>> el0t_64_sync+0x174/0x178
>>>> Mem-Info:
>>>> ...
>>>>
>>>> The reason is that the hugetlb pages being released are allocated from
>>>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>>>> need to be allocated from the same node during the hugetlb pages
>>> Thanks for your fix, I think it should be a real word issue, it's better
>>> to add a Fixes tag to indicate backporting. Thanks.
>>>
>> I thought we might get get the same error (Unable to allocate on movable
>> node) when creating the hugetlb page.  Why?  Because we replace the head
>> vmemmap page.  However, I see that failure to allocate there is not a
>> fatal error and we fallback to the currently mapped page.  We also pass
>> __GFP_NOWARN to that allocation attempt so there will be no report of the
>> failure.
>>
>> We might want to change this as well?
> I think yes. I also thought about this yesterday, but I think
> this one is not a fetal error, it should be an improvement patch.
> So it is better not to fold this change into this patch (a bug fix one).
>
> Thanks.
Sure, let me send another patch passing __GFP_NOWARN.
David Hildenbrand Sept. 6, 2023, 7:25 a.m. UTC | #7
>>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>>> node is always failed. Fix this problem by removing __GFP_THISNODE.
>>>
>>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>>> ---
>>> mm/hugetlb_vmemmap.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index c2007ef5e9b0..0485e471d224 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>>    				   struct list_head *list)
>>> {
>>> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>>> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
>>
>> There is a little change for non-movable case after this change, we fist try
>> to allocate memory from the preferred node (it is same as original), if it
>> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
>> huge pages could be freed once other nodes could satisfy the allocation of
>> vmemmap pages.
>>
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> This looks reasonable to me as well.
> 
> Cc'ing David and Michal as they are expert in hotplug.

IIUC, we still won't allocate from ZONE_MOVABLE / MIGRATE_CMA (due to 
GFP_KERNEL), so it should be fine.
diff mbox series

Patch

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c2007ef5e9b0..0485e471d224 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -386,7 +386,7 @@  static int vmemmap_remap_free(unsigned long start, unsigned long end,
 static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
 				   struct list_head *list)
 {
-	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
+	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
 	int nid = page_to_nid((struct page *)start);
 	struct page *page, *next;