diff mbox series

hugetlb: fix uninitialized subpool pointer

Message ID 20210223215544.313871-1-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series hugetlb: fix uninitialized subpool pointer | expand

Commit Message

Mike Kravetz Feb. 23, 2021, 9:55 p.m. UTC
Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
with linux-next 5.12.0-20210222.
Call trace:
  hugepage_subpool_put_pages.part.0+0x2c/0x138
  __free_huge_page+0xce/0x310
  alloc_pool_huge_page+0x102/0x120
  set_max_huge_pages+0x13e/0x350
  hugetlb_sysctl_handler_common+0xd8/0x110
  hugetlb_sysctl_handler+0x48/0x58
  proc_sys_call_handler+0x138/0x238
  new_sync_write+0x10e/0x198
  vfs_write.part.0+0x12c/0x238
  ksys_write+0x68/0xf8
  do_syscall+0x82/0xd0
  __do_syscall+0xb4/0xc8
  system_call+0x72/0x98

This is a result of the change which moved the hugetlb page subpool
pointer from page->private to page[1]->private.  When new pages are
allocated from the buddy allocator, the private field of the head
page will be cleared, but the private field of subpages is not modified.
Therefore, old values may remain.

Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().

Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")
Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Oscar Salvador Feb. 23, 2021, 10:45 p.m. UTC | #1
On Tue, Feb 23, 2021 at 01:55:44PM -0800, Mike Kravetz wrote:
> Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
> with linux-next 5.12.0-20210222.
> Call trace:
>   hugepage_subpool_put_pages.part.0+0x2c/0x138
>   __free_huge_page+0xce/0x310
>   alloc_pool_huge_page+0x102/0x120
>   set_max_huge_pages+0x13e/0x350
>   hugetlb_sysctl_handler_common+0xd8/0x110
>   hugetlb_sysctl_handler+0x48/0x58
>   proc_sys_call_handler+0x138/0x238
>   new_sync_write+0x10e/0x198
>   vfs_write.part.0+0x12c/0x238
>   ksys_write+0x68/0xf8
>   do_syscall+0x82/0xd0
>   __do_syscall+0xb4/0xc8
>   system_call+0x72/0x98
> 
> This is a result of the change which moved the hugetlb page subpool
> pointer from page->private to page[1]->private.  When new pages are
> allocated from the buddy allocator, the private field of the head
> page will be cleared, but the private field of subpages is not modified.
> Therefore, old values may remain.
> 
> Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().
> 
> Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")
> Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Do we need the hugetlb_set_page_subpool() in __free_huge_page?

Reviewed-by: Oscar Salvador <osalvador@suse.de>
Mike Kravetz Feb. 23, 2021, 10:55 p.m. UTC | #2
On 2/23/21 2:45 PM, Oscar Salvador wrote:
> On Tue, Feb 23, 2021 at 01:55:44PM -0800, Mike Kravetz wrote:
>> Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
>> with linux-next 5.12.0-20210222.
>> Call trace:
>>   hugepage_subpool_put_pages.part.0+0x2c/0x138
>>   __free_huge_page+0xce/0x310
>>   alloc_pool_huge_page+0x102/0x120
>>   set_max_huge_pages+0x13e/0x350
>>   hugetlb_sysctl_handler_common+0xd8/0x110
>>   hugetlb_sysctl_handler+0x48/0x58
>>   proc_sys_call_handler+0x138/0x238
>>   new_sync_write+0x10e/0x198
>>   vfs_write.part.0+0x12c/0x238
>>   ksys_write+0x68/0xf8
>>   do_syscall+0x82/0xd0
>>   __do_syscall+0xb4/0xc8
>>   system_call+0x72/0x98
>>
>> This is a result of the change which moved the hugetlb page subpool
>> pointer from page->private to page[1]->private.  When new pages are
>> allocated from the buddy allocator, the private field of the head
>> page will be cleared, but the private field of subpages is not modified.
>> Therefore, old values may remain.
>>
>> Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().
>>
>> Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")
>> Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Do we need the hugetlb_set_page_subpool() in __free_huge_page?

Yes, that is the more common case where the once active hugetlb page
will be simply added to the free list via enqueue_huge_page().  This
path does not go through prep_new_huge_page.
Oscar Salvador Feb. 23, 2021, 10:58 p.m. UTC | #3
On 2021-02-23 23:55, Mike Kravetz wrote:
> Yes, that is the more common case where the once active hugetlb page
> will be simply added to the free list via enqueue_huge_page().  This
> path does not go through prep_new_huge_page.

Right, I see.

Thanks
Mike Kravetz Feb. 23, 2021, 11:21 p.m. UTC | #4
On 2/23/21 2:58 PM, Oscar Salvador wrote:
> On 2021-02-23 23:55, Mike Kravetz wrote:
>> Yes, that is the more common case where the once active hugetlb page
>> will be simply added to the free list via enqueue_huge_page().  This
>> path does not go through prep_new_huge_page.
> 
> Right, I see.
> 
> Thanks

You got me thinking ...
When we dynamically allocate gigantic pages via alloc_contig_pages, we
will not use the buddy allocator.  Therefore, the usual 'page prepping'
will not take place.  Specifically, I could not find anything in that
path which clears page->private of the head page.
Am I missing that somewhere?  If not, then we need to clear that as well
in prep_compound_gigantic_page.  Or, just clear it in prep_new_huge_page
to handle any change in assumptions about the buddy allocator.

This is not something introduced with the recent field shuffling, it
looks like something that existed for some time.
Mike Kravetz Feb. 24, 2021, 12:27 a.m. UTC | #5
On 2/23/21 3:21 PM, Mike Kravetz wrote:
> On 2/23/21 2:58 PM, Oscar Salvador wrote:
>> On 2021-02-23 23:55, Mike Kravetz wrote:
>>> Yes, that is the more common case where the once active hugetlb page
>>> will be simply added to the free list via enqueue_huge_page().  This
>>> path does not go through prep_new_huge_page.
>>
>> Right, I see.
>>
>> Thanks
> 
> You got me thinking ...
> When we dynamically allocate gigantic pages via alloc_contig_pages, we
> will not use the buddy allocator.  Therefore, the usual 'page prepping'
> will not take place.  Specifically, I could not find anything in that
> path which clears page->private of the head page.
> Am I missing that somewhere?  If not, then we need to clear that as well
> in prep_compound_gigantic_page.  Or, just clear it in prep_new_huge_page
> to handle any change in assumptions about the buddy allocator.
> 
> This is not something introduced with the recent field shuffling, it
> looks like something that existed for some time.

nm, we do end up calling the same page prepping code (post_alloc_hook)
from alloc_contig_range->isolate_freepages_range.

Just to make sure, I purpously dirtied page->private of every page as it
was being freed.  Gigantic page allocation was just fine, and I even ran
ltp mm tests with this dirtying in place.
Muchun Song Feb. 24, 2021, 3:12 a.m. UTC | #6
On Wed, Feb 24, 2021 at 5:56 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
> with linux-next 5.12.0-20210222.
> Call trace:
>   hugepage_subpool_put_pages.part.0+0x2c/0x138
>   __free_huge_page+0xce/0x310
>   alloc_pool_huge_page+0x102/0x120
>   set_max_huge_pages+0x13e/0x350
>   hugetlb_sysctl_handler_common+0xd8/0x110
>   hugetlb_sysctl_handler+0x48/0x58
>   proc_sys_call_handler+0x138/0x238
>   new_sync_write+0x10e/0x198
>   vfs_write.part.0+0x12c/0x238
>   ksys_write+0x68/0xf8
>   do_syscall+0x82/0xd0
>   __do_syscall+0xb4/0xc8
>   system_call+0x72/0x98
>
> This is a result of the change which moved the hugetlb page subpool
> pointer from page->private to page[1]->private.  When new pages are
> allocated from the buddy allocator, the private field of the head
> page will be cleared, but the private field of subpages is not modified.
> Therefore, old values may remain.
>
> Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().
>
> Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")
> Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

> ---
>  mm/hugetlb.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c232cb67dda2..7ae5c18c98a7 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
>         INIT_LIST_HEAD(&page->lru);
>         set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> +       hugetlb_set_page_subpool(page, NULL);
>         set_hugetlb_cgroup(page, NULL);
>         set_hugetlb_cgroup_rsvd(page, NULL);
>         spin_lock(&hugetlb_lock);
> --
> 2.29.2
>
Michal Hocko Feb. 24, 2021, 9:43 a.m. UTC | #7
On Tue 23-02-21 13:55:44, Mike Kravetz wrote:
> Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
> with linux-next 5.12.0-20210222.
> Call trace:
>   hugepage_subpool_put_pages.part.0+0x2c/0x138
>   __free_huge_page+0xce/0x310
>   alloc_pool_huge_page+0x102/0x120
>   set_max_huge_pages+0x13e/0x350
>   hugetlb_sysctl_handler_common+0xd8/0x110
>   hugetlb_sysctl_handler+0x48/0x58
>   proc_sys_call_handler+0x138/0x238
>   new_sync_write+0x10e/0x198
>   vfs_write.part.0+0x12c/0x238
>   ksys_write+0x68/0xf8
>   do_syscall+0x82/0xd0
>   __do_syscall+0xb4/0xc8
>   system_call+0x72/0x98
> 
> This is a result of the change which moved the hugetlb page subpool
> pointer from page->private to page[1]->private.  When new pages are
> allocated from the buddy allocator, the private field of the head
> page will be cleared, but the private field of subpages is not modified.
> Therefore, old values may remain.

Very interesting. I have expected that the page->private would be in a
reasonable state when allocated. On the other hand hugetlb doesn't do
__GFP_COMP so tail pages are not initialized by the allocator.
 
> Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().
> 
> Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")

This is not a stable sha to refer to as it comes from linux next.

> Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Michal Hocko <mhocko@suse.com>

I think this would be worth a separate patch rather than having it
folded into the original patch. Thi is really subtle.

> ---
>  mm/hugetlb.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c232cb67dda2..7ae5c18c98a7 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  {
>  	INIT_LIST_HEAD(&page->lru);
>  	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> +	hugetlb_set_page_subpool(page, NULL);
>  	set_hugetlb_cgroup(page, NULL);
>  	set_hugetlb_cgroup_rsvd(page, NULL);
>  	spin_lock(&hugetlb_lock);
> -- 
> 2.29.2
>
Muchun Song Feb. 24, 2021, 11:10 a.m. UTC | #8
On Wed, Feb 24, 2021 at 5:43 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 23-02-21 13:55:44, Mike Kravetz wrote:
> > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
> > with linux-next 5.12.0-20210222.
> > Call trace:
> >   hugepage_subpool_put_pages.part.0+0x2c/0x138
> >   __free_huge_page+0xce/0x310
> >   alloc_pool_huge_page+0x102/0x120
> >   set_max_huge_pages+0x13e/0x350
> >   hugetlb_sysctl_handler_common+0xd8/0x110
> >   hugetlb_sysctl_handler+0x48/0x58
> >   proc_sys_call_handler+0x138/0x238
> >   new_sync_write+0x10e/0x198
> >   vfs_write.part.0+0x12c/0x238
> >   ksys_write+0x68/0xf8
> >   do_syscall+0x82/0xd0
> >   __do_syscall+0xb4/0xc8
> >   system_call+0x72/0x98
> >
> > This is a result of the change which moved the hugetlb page subpool
> > pointer from page->private to page[1]->private.  When new pages are
> > allocated from the buddy allocator, the private field of the head
> > page will be cleared, but the private field of subpages is not modified.
> > Therefore, old values may remain.
>
> Very interesting. I have expected that the page->private would be in a
> reasonable state when allocated. On the other hand hugetlb doesn't do
> __GFP_COMP so tail pages are not initialized by the allocator.

It seems that the buddy allocator does not initialize the private field
of the tail page even when we specify __GFP_COMP.


>
> > Fix by initializing hugetlb page subpool pointer in prep_new_huge_page().
> >
> > Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags")
>
> This is not a stable sha to refer to as it comes from linux next.
>
> > Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> I think this would be worth a separate patch rather than having it
> folded into the original patch. Thi is really subtle.
>
> > ---
> >  mm/hugetlb.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index c232cb67dda2..7ae5c18c98a7 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
> >  {
> >       INIT_LIST_HEAD(&page->lru);
> >       set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
> > +     hugetlb_set_page_subpool(page, NULL);
> >       set_hugetlb_cgroup(page, NULL);
> >       set_hugetlb_cgroup_rsvd(page, NULL);
> >       spin_lock(&hugetlb_lock);
> > --
> > 2.29.2
> >
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Feb. 24, 2021, 11:36 a.m. UTC | #9
On Wed 24-02-21 19:10:42, Muchun Song wrote:
> On Wed, Feb 24, 2021 at 5:43 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 23-02-21 13:55:44, Mike Kravetz wrote:
> > > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages()
> > > with linux-next 5.12.0-20210222.
> > > Call trace:
> > >   hugepage_subpool_put_pages.part.0+0x2c/0x138
> > >   __free_huge_page+0xce/0x310
> > >   alloc_pool_huge_page+0x102/0x120
> > >   set_max_huge_pages+0x13e/0x350
> > >   hugetlb_sysctl_handler_common+0xd8/0x110
> > >   hugetlb_sysctl_handler+0x48/0x58
> > >   proc_sys_call_handler+0x138/0x238
> > >   new_sync_write+0x10e/0x198
> > >   vfs_write.part.0+0x12c/0x238
> > >   ksys_write+0x68/0xf8
> > >   do_syscall+0x82/0xd0
> > >   __do_syscall+0xb4/0xc8
> > >   system_call+0x72/0x98
> > >
> > > This is a result of the change which moved the hugetlb page subpool
> > > pointer from page->private to page[1]->private.  When new pages are
> > > allocated from the buddy allocator, the private field of the head
> > > page will be cleared, but the private field of subpages is not modified.
> > > Therefore, old values may remain.
> >
> > Very interesting. I have expected that the page->private would be in a
> > reasonable state when allocated. On the other hand hugetlb doesn't do
> > __GFP_COMP so tail pages are not initialized by the allocator.
> 
> It seems that the buddy allocator does not initialize the private field
> of the tail page even when we specify __GFP_COMP.

Yes it doesn't. What I meant to say is that even if it did a lack of
__GFP_COMP would result in not doing so. I do not remember why hugetlb
doesn't use __GFP_COMP but I believe this was never the case.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c232cb67dda2..7ae5c18c98a7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1465,6 +1465,7 @@  static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
 	INIT_LIST_HEAD(&page->lru);
 	set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
+	hugetlb_set_page_subpool(page, NULL);
 	set_hugetlb_cgroup(page, NULL);
 	set_hugetlb_cgroup_rsvd(page, NULL);
 	spin_lock(&hugetlb_lock);