Message ID | 20230905031312.91929-1-yuancan@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes | expand |
> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote: > > The decreasing of hugetlb pages number failed with the following message > given: > > sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) > CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.6+0x84/0xe4 > show_stack+0x18/0x24 > dump_stack_lvl+0x48/0x60 > dump_stack+0x18/0x24 > warn_alloc+0x100/0x1bc > __alloc_pages_slowpath.constprop.107+0xa40/0xad8 > __alloc_pages+0x244/0x2d0 > hugetlb_vmemmap_restore+0x104/0x1e4 > __update_and_free_hugetlb_folio+0x44/0x1f4 > update_and_free_hugetlb_folio+0x20/0x68 > update_and_free_pages_bulk+0x4c/0xac > set_max_huge_pages+0x198/0x334 > nr_hugepages_store_common+0x118/0x178 > nr_hugepages_store+0x18/0x24 > kobj_attr_store+0x18/0x2c > sysfs_kf_write+0x40/0x54 > kernfs_fop_write_iter+0x164/0x1dc > vfs_write+0x3a8/0x460 > ksys_write+0x6c/0x100 > __arm64_sys_write+0x1c/0x28 > invoke_syscall+0x44/0x100 > el0_svc_common.constprop.1+0x6c/0xe4 > do_el0_svc+0x38/0x94 > el0_svc+0x28/0x74 > el0t_64_sync_handler+0xa0/0xc4 > el0t_64_sync+0x174/0x178 > Mem-Info: > ... > > The reason is that the hugetlb pages being released are allocated from > movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages > need to be allocated from the same node during the hugetlb pages Thanks for your fix, I think it should be a real word issue, it's better to add a Fixes tag to indicate backporting. Thanks. > releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable > node is always failed. Fix this problem by removing __GFP_THISNODE. > > Signed-off-by: Yuan Can <yuancan@huawei.com> > --- > mm/hugetlb_vmemmap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index c2007ef5e9b0..0485e471d224 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, > struct list_head *list) > { > - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; > + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; There is a little change for non-movable case after this change, we fist try to allocate memory from the preferred node (it is same as original), if it fails, it fallbacks to other nodes now. For me, it makes sense. At least, those huge pages could be freed once other nodes could satisfy the allocation of vmemmap pages. Reviewed-by: Muchun Song <songmuchun@bytedance.com> Thanks. > unsigned long nr_pages = (end - start) >> PAGE_SHIFT; > int nid = page_to_nid((struct page *)start); > struct page *page, *next; > -- > 2.17.1 > >
On 2023/9/5 17:06, Muchun Song wrote: > > >> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote: >> >> The decreasing of hugetlb pages number failed with the following message >> given: >> >> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) >> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 >> Hardware name: linux,dummy-virt (DT) >> Call trace: >> dump_backtrace.part.6+0x84/0xe4 >> show_stack+0x18/0x24 >> dump_stack_lvl+0x48/0x60 >> dump_stack+0x18/0x24 >> warn_alloc+0x100/0x1bc >> __alloc_pages_slowpath.constprop.107+0xa40/0xad8 >> __alloc_pages+0x244/0x2d0 >> hugetlb_vmemmap_restore+0x104/0x1e4 >> __update_and_free_hugetlb_folio+0x44/0x1f4 >> update_and_free_hugetlb_folio+0x20/0x68 >> update_and_free_pages_bulk+0x4c/0xac >> set_max_huge_pages+0x198/0x334 >> nr_hugepages_store_common+0x118/0x178 >> nr_hugepages_store+0x18/0x24 >> kobj_attr_store+0x18/0x2c >> sysfs_kf_write+0x40/0x54 >> kernfs_fop_write_iter+0x164/0x1dc >> vfs_write+0x3a8/0x460 >> ksys_write+0x6c/0x100 >> __arm64_sys_write+0x1c/0x28 >> invoke_syscall+0x44/0x100 >> el0_svc_common.constprop.1+0x6c/0xe4 >> do_el0_svc+0x38/0x94 >> el0_svc+0x28/0x74 >> el0t_64_sync_handler+0xa0/0xc4 >> el0t_64_sync+0x174/0x178 >> Mem-Info: >> ... >> >> The reason is that the hugetlb pages being released are allocated from >> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages >> need to be allocated from the same node during the hugetlb pages > > Thanks for your fix, I think it should be a real word issue, it's better > to add a Fixes tag to indicate backporting. Thanks. > >> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable >> node is always failed. Fix this problem by removing __GFP_THISNODE. Should be ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") >> >> Signed-off-by: Yuan Can <yuancan@huawei.com> >> --- >> mm/hugetlb_vmemmap.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >> index c2007ef5e9b0..0485e471d224 100644 >> --- a/mm/hugetlb_vmemmap.c >> +++ b/mm/hugetlb_vmemmap.c >> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, >> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, >> struct list_head *list) >> { >> - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; >> + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; > > There is a little change for non-movable case after this change, we fist try > to allocate memory from the preferred node (it is same as original), if it > fails, it fallbacks to other nodes now. For me, it makes sense. At least, those > huge pages could be freed once other nodes could satisfy the allocation of > vmemmap pages. > > Reviewed-by: Muchun Song <songmuchun@bytedance.com> > > Thanks. > >> unsigned long nr_pages = (end - start) >> PAGE_SHIFT; >> int nid = page_to_nid((struct page *)start); >> struct page *page, *next; >> -- >> 2.17.1 >> >> >
在 2023/9/5 17:06, Muchun Song 写道: > >> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote: >> >> The decreasing of hugetlb pages number failed with the following message >> given: >> >> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) >> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 >> Hardware name: linux,dummy-virt (DT) >> Call trace: >> dump_backtrace.part.6+0x84/0xe4 >> show_stack+0x18/0x24 >> dump_stack_lvl+0x48/0x60 >> dump_stack+0x18/0x24 >> warn_alloc+0x100/0x1bc >> __alloc_pages_slowpath.constprop.107+0xa40/0xad8 >> __alloc_pages+0x244/0x2d0 >> hugetlb_vmemmap_restore+0x104/0x1e4 >> __update_and_free_hugetlb_folio+0x44/0x1f4 >> update_and_free_hugetlb_folio+0x20/0x68 >> update_and_free_pages_bulk+0x4c/0xac >> set_max_huge_pages+0x198/0x334 >> nr_hugepages_store_common+0x118/0x178 >> nr_hugepages_store+0x18/0x24 >> kobj_attr_store+0x18/0x2c >> sysfs_kf_write+0x40/0x54 >> kernfs_fop_write_iter+0x164/0x1dc >> vfs_write+0x3a8/0x460 >> ksys_write+0x6c/0x100 >> __arm64_sys_write+0x1c/0x28 >> invoke_syscall+0x44/0x100 >> el0_svc_common.constprop.1+0x6c/0xe4 >> do_el0_svc+0x38/0x94 >> el0_svc+0x28/0x74 >> el0t_64_sync_handler+0xa0/0xc4 >> el0t_64_sync+0x174/0x178 >> Mem-Info: >> ... >> >> The reason is that the hugetlb pages being released are allocated from >> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages >> need to be allocated from the same node during the hugetlb pages > Thanks for your fix, I think it should be a real word issue, it's better > to add a Fixes tag to indicate backporting. Thanks. > >> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable >> node is always failed. Fix this problem by removing __GFP_THISNODE. >> >> Signed-off-by: Yuan Can <yuancan@huawei.com> >> --- >> mm/hugetlb_vmemmap.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >> index c2007ef5e9b0..0485e471d224 100644 >> --- a/mm/hugetlb_vmemmap.c >> +++ b/mm/hugetlb_vmemmap.c >> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, >> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, >> struct list_head *list) >> { >> - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; >> + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; > There is a little change for non-movable case after this change, we fist try > to allocate memory from the preferred node (it is same as original), if it > fails, it fallbacks to other nodes now. For me, it makes sense. At least, those > huge pages could be freed once other nodes could satisfy the allocation of > vmemmap pages. > > Reviewed-by: Muchun Song <songmuchun@bytedance.com> > > Thanks. Thanks for the review, I will send the v2 patch with Fixes tag and your Reviewed-by soon.
On 09/05/23 17:06, Muchun Song wrote: > > > > On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote: > > > > The decreasing of hugetlb pages number failed with the following message > > given: > > > > sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) > > CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 > > Hardware name: linux,dummy-virt (DT) > > Call trace: > > dump_backtrace.part.6+0x84/0xe4 > > show_stack+0x18/0x24 > > dump_stack_lvl+0x48/0x60 > > dump_stack+0x18/0x24 > > warn_alloc+0x100/0x1bc > > __alloc_pages_slowpath.constprop.107+0xa40/0xad8 > > __alloc_pages+0x244/0x2d0 > > hugetlb_vmemmap_restore+0x104/0x1e4 > > __update_and_free_hugetlb_folio+0x44/0x1f4 > > update_and_free_hugetlb_folio+0x20/0x68 > > update_and_free_pages_bulk+0x4c/0xac > > set_max_huge_pages+0x198/0x334 > > nr_hugepages_store_common+0x118/0x178 > > nr_hugepages_store+0x18/0x24 > > kobj_attr_store+0x18/0x2c > > sysfs_kf_write+0x40/0x54 > > kernfs_fop_write_iter+0x164/0x1dc > > vfs_write+0x3a8/0x460 > > ksys_write+0x6c/0x100 > > __arm64_sys_write+0x1c/0x28 > > invoke_syscall+0x44/0x100 > > el0_svc_common.constprop.1+0x6c/0xe4 > > do_el0_svc+0x38/0x94 > > el0_svc+0x28/0x74 > > el0t_64_sync_handler+0xa0/0xc4 > > el0t_64_sync+0x174/0x178 > > Mem-Info: > > ... > > > > The reason is that the hugetlb pages being released are allocated from > > movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages > > need to be allocated from the same node during the hugetlb pages > > Thanks for your fix, I think it should be a real word issue, it's better > to add a Fixes tag to indicate backporting. Thanks. > I thought we might get get the same error (Unable to allocate on movable node) when creating the hugetlb page. Why? Because we replace the head vmemmap page. However, I see that failure to allocate there is not a fatal error and we fallback to the currently mapped page. We also pass __GFP_NOWARN to that allocation attempt so there will be no report of the failure. We might want to change this as well? > > releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable > > node is always failed. Fix this problem by removing __GFP_THISNODE. > > > > Signed-off-by: Yuan Can <yuancan@huawei.com> > > --- > > mm/hugetlb_vmemmap.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > index c2007ef5e9b0..0485e471d224 100644 > > --- a/mm/hugetlb_vmemmap.c > > +++ b/mm/hugetlb_vmemmap.c > > @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > > static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, > > struct list_head *list) > > { > > - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; > > + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; > > There is a little change for non-movable case after this change, we fist try > to allocate memory from the preferred node (it is same as original), if it > fails, it fallbacks to other nodes now. For me, it makes sense. At least, those > huge pages could be freed once other nodes could satisfy the allocation of > vmemmap pages. > > Reviewed-by: Muchun Song <songmuchun@bytedance.com> This looks reasonable to me as well. Cc'ing David and Michal as they are expert in hotplug.
> On Sep 6, 2023, at 08:28, Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 09/05/23 17:06, Muchun Song wrote: >> >> >>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote: >>> >>> The decreasing of hugetlb pages number failed with the following message >>> given: >>> >>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) >>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 >>> Hardware name: linux,dummy-virt (DT) >>> Call trace: >>> dump_backtrace.part.6+0x84/0xe4 >>> show_stack+0x18/0x24 >>> dump_stack_lvl+0x48/0x60 >>> dump_stack+0x18/0x24 >>> warn_alloc+0x100/0x1bc >>> __alloc_pages_slowpath.constprop.107+0xa40/0xad8 >>> __alloc_pages+0x244/0x2d0 >>> hugetlb_vmemmap_restore+0x104/0x1e4 >>> __update_and_free_hugetlb_folio+0x44/0x1f4 >>> update_and_free_hugetlb_folio+0x20/0x68 >>> update_and_free_pages_bulk+0x4c/0xac >>> set_max_huge_pages+0x198/0x334 >>> nr_hugepages_store_common+0x118/0x178 >>> nr_hugepages_store+0x18/0x24 >>> kobj_attr_store+0x18/0x2c >>> sysfs_kf_write+0x40/0x54 >>> kernfs_fop_write_iter+0x164/0x1dc >>> vfs_write+0x3a8/0x460 >>> ksys_write+0x6c/0x100 >>> __arm64_sys_write+0x1c/0x28 >>> invoke_syscall+0x44/0x100 >>> el0_svc_common.constprop.1+0x6c/0xe4 >>> do_el0_svc+0x38/0x94 >>> el0_svc+0x28/0x74 >>> el0t_64_sync_handler+0xa0/0xc4 >>> el0t_64_sync+0x174/0x178 >>> Mem-Info: >>> ... >>> >>> The reason is that the hugetlb pages being released are allocated from >>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages >>> need to be allocated from the same node during the hugetlb pages >> >> Thanks for your fix, I think it should be a real word issue, it's better >> to add a Fixes tag to indicate backporting. Thanks. >> > > I thought we might get get the same error (Unable to allocate on movable > node) when creating the hugetlb page. Why? Because we replace the head > vmemmap page. However, I see that failure to allocate there is not a > fatal error and we fallback to the currently mapped page. We also pass > __GFP_NOWARN to that allocation attempt so there will be no report of the > failure. > > We might want to change this as well? I think yes. I also thought about this yesterday, but I think this one is not a fetal error, it should be an improvement patch. So it is better not to fold this change into this patch (a bug fix one). Thanks. > >>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable >>> node is always failed. Fix this problem by removing __GFP_THISNODE. >>> >>> Signed-off-by: Yuan Can <yuancan@huawei.com> >>> --- >>> mm/hugetlb_vmemmap.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index c2007ef5e9b0..0485e471d224 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, >>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, >>> struct list_head *list) >>> { >>> - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; >>> + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; >> >> There is a little change for non-movable case after this change, we fist try >> to allocate memory from the preferred node (it is same as original), if it >> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those >> huge pages could be freed once other nodes could satisfy the allocation of >> vmemmap pages. >> >> Reviewed-by: Muchun Song <songmuchun@bytedance.com> > > This looks reasonable to me as well. > > Cc'ing David and Michal as they are expert in hotplug. > -- > Mike Kravetz > >> >> Thanks. >> >>> unsigned long nr_pages = (end - start) >> PAGE_SHIFT; >>> int nid = page_to_nid((struct page *)start); >>> struct page *page, *next; >>> -- >>> 2.17.1
在 2023/9/6 10:32, Muchun Song 写道: > >> On Sep 6, 2023, at 08:28, Mike Kravetz <mike.kravetz@oracle.com> wrote: >> >> On 09/05/23 17:06, Muchun Song wrote: >>> >>>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote: >>>> >>>> The decreasing of hugetlb pages number failed with the following message >>>> given: >>>> >>>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) >>>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 >>>> Hardware name: linux,dummy-virt (DT) >>>> Call trace: >>>> dump_backtrace.part.6+0x84/0xe4 >>>> show_stack+0x18/0x24 >>>> dump_stack_lvl+0x48/0x60 >>>> dump_stack+0x18/0x24 >>>> warn_alloc+0x100/0x1bc >>>> __alloc_pages_slowpath.constprop.107+0xa40/0xad8 >>>> __alloc_pages+0x244/0x2d0 >>>> hugetlb_vmemmap_restore+0x104/0x1e4 >>>> __update_and_free_hugetlb_folio+0x44/0x1f4 >>>> update_and_free_hugetlb_folio+0x20/0x68 >>>> update_and_free_pages_bulk+0x4c/0xac >>>> set_max_huge_pages+0x198/0x334 >>>> nr_hugepages_store_common+0x118/0x178 >>>> nr_hugepages_store+0x18/0x24 >>>> kobj_attr_store+0x18/0x2c >>>> sysfs_kf_write+0x40/0x54 >>>> kernfs_fop_write_iter+0x164/0x1dc >>>> vfs_write+0x3a8/0x460 >>>> ksys_write+0x6c/0x100 >>>> __arm64_sys_write+0x1c/0x28 >>>> invoke_syscall+0x44/0x100 >>>> el0_svc_common.constprop.1+0x6c/0xe4 >>>> do_el0_svc+0x38/0x94 >>>> el0_svc+0x28/0x74 >>>> el0t_64_sync_handler+0xa0/0xc4 >>>> el0t_64_sync+0x174/0x178 >>>> Mem-Info: >>>> ... >>>> >>>> The reason is that the hugetlb pages being released are allocated from >>>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages >>>> need to be allocated from the same node during the hugetlb pages >>> Thanks for your fix, I think it should be a real word issue, it's better >>> to add a Fixes tag to indicate backporting. Thanks. >>> >> I thought we might get get the same error (Unable to allocate on movable >> node) when creating the hugetlb page. Why? Because we replace the head >> vmemmap page. However, I see that failure to allocate there is not a >> fatal error and we fallback to the currently mapped page. We also pass >> __GFP_NOWARN to that allocation attempt so there will be no report of the >> failure. >> >> We might want to change this as well? > I think yes. I also thought about this yesterday, but I think > this one is not a fetal error, it should be an improvement patch. > So it is better not to fold this change into this patch (a bug fix one). > > Thanks. Sure, let me send another patch passing __GFP_NOWARN.
>>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable >>> node is always failed. Fix this problem by removing __GFP_THISNODE. >>> >>> Signed-off-by: Yuan Can <yuancan@huawei.com> >>> --- >>> mm/hugetlb_vmemmap.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index c2007ef5e9b0..0485e471d224 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, >>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, >>> struct list_head *list) >>> { >>> - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; >>> + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; >> >> There is a little change for non-movable case after this change, we fist try >> to allocate memory from the preferred node (it is same as original), if it >> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those >> huge pages could be freed once other nodes could satisfy the allocation of >> vmemmap pages. >> >> Reviewed-by: Muchun Song <songmuchun@bytedance.com> > > This looks reasonable to me as well. > > Cc'ing David and Michal as they are expert in hotplug. IIUC, we still won't allocate from ZONE_MOVABLE / MIGRATE_CMA (due to GFP_KERNEL), so it should be fine.
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index c2007ef5e9b0..0485e471d224 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, struct list_head *list) { - gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE; + gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL; unsigned long nr_pages = (end - start) >> PAGE_SHIFT; int nid = page_to_nid((struct page *)start); struct page *page, *next;
The decreasing of hugetlb pages number failed with the following message given: sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE) CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.6+0x84/0xe4 show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 dump_stack+0x18/0x24 warn_alloc+0x100/0x1bc __alloc_pages_slowpath.constprop.107+0xa40/0xad8 __alloc_pages+0x244/0x2d0 hugetlb_vmemmap_restore+0x104/0x1e4 __update_and_free_hugetlb_folio+0x44/0x1f4 update_and_free_hugetlb_folio+0x20/0x68 update_and_free_pages_bulk+0x4c/0xac set_max_huge_pages+0x198/0x334 nr_hugepages_store_common+0x118/0x178 nr_hugepages_store+0x18/0x24 kobj_attr_store+0x18/0x2c sysfs_kf_write+0x40/0x54 kernfs_fop_write_iter+0x164/0x1dc vfs_write+0x3a8/0x460 ksys_write+0x6c/0x100 __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x100 el0_svc_common.constprop.1+0x6c/0xe4 do_el0_svc+0x38/0x94 el0_svc+0x28/0x74 el0t_64_sync_handler+0xa0/0xc4 el0t_64_sync+0x174/0x178 Mem-Info: ... The reason is that the hugetlb pages being released are allocated from movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages need to be allocated from the same node during the hugetlb pages releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable node is always failed. Fix this problem by removing __GFP_THISNODE. Signed-off-by: Yuan Can <yuancan@huawei.com> --- mm/hugetlb_vmemmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)