diff mbox series

[v2,2/6] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page

Message ID 20210106084739.63318-3-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Fix some bugs about HugeTLB code | expand

Commit Message

Muchun Song Jan. 6, 2021, 8:47 a.m. UTC
Because we only can isolate a active page via isolate_huge_page()
and hugetlbfs_fallocate() forget to mark it as active, we cannot
isolate and migrate those pages.

Only export set_page_huge_active, just leave clear_page_huge_active
as static. Because there are no external users.

Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 fs/hugetlbfs/inode.c    | 3 ++-
 include/linux/hugetlb.h | 2 ++
 mm/hugetlb.c            | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

Comments

Michal Hocko Jan. 6, 2021, 4:35 p.m. UTC | #1
On Wed 06-01-21 16:47:35, Muchun Song wrote:
> Because we only can isolate a active page via isolate_huge_page()
> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> isolate and migrate those pages.

I've little bit hard time to understand this initially and had to dive
into the code to make sense of it. I would consider the following
wording easier to grasp. Feel free to reuse if you like.
"
If a new hugetlb page is allocated during fallocate it will not be
marked as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to
move that page. Such a failure would be unexpected and wrong.
"

Now to the fix. I believe that this patch shows that the
set_page_huge_active is just too subtle. Is there any reason why we
cannot make all freshly allocated huge pages active by default?
 
> Only export set_page_huge_active, just leave clear_page_huge_active
> as static. Because there are no external users.
> 
> Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  fs/hugetlbfs/inode.c    | 3 ++-
>  include/linux/hugetlb.h | 2 ++
>  mm/hugetlb.c            | 2 +-
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..21c20fd5f9ee 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>  
>  		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  
> +		set_page_huge_active(page);
>  		/*
>  		 * unlock_page because locked by add_to_page_cache()
> -		 * page_put due to reference from alloc_huge_page()
> +		 * put_page() due to reference from alloc_huge_page()
>  		 */
>  		unlock_page(page);
>  		put_page(page);
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..b5807f23caf8 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
>  }
>  #endif
>  
> +void set_page_huge_active(struct page *page);
> +
>  #else	/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1f3bf1710b66..4741d60f8955 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
>  }
>  
>  /* never called for tail page */
> -static void set_page_huge_active(struct page *page)
> +void set_page_huge_active(struct page *page)
>  {
>  	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
>  	SetPagePrivate(&page[1]);
> -- 
> 2.11.0
Mike Kravetz Jan. 6, 2021, 7:30 p.m. UTC | #2
On 1/6/21 8:35 AM, Michal Hocko wrote:
> On Wed 06-01-21 16:47:35, Muchun Song wrote:
>> Because we only can isolate a active page via isolate_huge_page()
>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
>> isolate and migrate those pages.
> 
> I've little bit hard time to understand this initially and had to dive
> into the code to make sense of it. I would consider the following
> wording easier to grasp. Feel free to reuse if you like.
> "
> If a new hugetlb page is allocated during fallocate it will not be
> marked as active (set_page_huge_active) which will result in a later
> isolate_huge_page failure when the page migration code would like to
> move that page. Such a failure would be unexpected and wrong.
> "
> 
> Now to the fix. I believe that this patch shows that the
> set_page_huge_active is just too subtle. Is there any reason why we
> cannot make all freshly allocated huge pages active by default?

I looked into that yesterday.  The primary issue is in page fault code,
hugetlb_no_page is an example.  If page_huge_active is set, then it can
be isolated for migration.  So, migration could race with the page fault
and the page could be migrated before being added to the page table of
the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
right after allocating and clearing the huge page.  Commit cb6acd01e2e4
moved the set_page_huge_active after adding the page to the page table
to address this issue.
Michal Hocko Jan. 6, 2021, 8:02 p.m. UTC | #3
On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
> On 1/6/21 8:35 AM, Michal Hocko wrote:
> > On Wed 06-01-21 16:47:35, Muchun Song wrote:
> >> Because we only can isolate a active page via isolate_huge_page()
> >> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> >> isolate and migrate those pages.
> > 
> > I've little bit hard time to understand this initially and had to dive
> > into the code to make sense of it. I would consider the following
> > wording easier to grasp. Feel free to reuse if you like.
> > "
> > If a new hugetlb page is allocated during fallocate it will not be
> > marked as active (set_page_huge_active) which will result in a later
> > isolate_huge_page failure when the page migration code would like to
> > move that page. Such a failure would be unexpected and wrong.
> > "
> > 
> > Now to the fix. I believe that this patch shows that the
> > set_page_huge_active is just too subtle. Is there any reason why we
> > cannot make all freshly allocated huge pages active by default?
> 
> I looked into that yesterday.  The primary issue is in page fault code,
> hugetlb_no_page is an example.  If page_huge_active is set, then it can
> be isolated for migration.  So, migration could race with the page fault
> and the page could be migrated before being added to the page table of
> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
> moved the set_page_huge_active after adding the page to the page table
> to address this issue.

Thanks for the clarification. I was not aware of this subtlety. The
existing comment is not helping much TBH. I am still digesting the
suggested race. The page is new and exclusive and not visible via page
tables yet, so the only source of the migration would be pfn based
(hotplug, poisoning), right?

Btw. s@set_page_huge_active@set_page_huge_migrateable@ would help
readability IMHO. With a comment explaining that this _has_ to be called
after the page is fully initialized.
Mike Kravetz Jan. 6, 2021, 9:07 p.m. UTC | #4
On 1/6/21 12:02 PM, Michal Hocko wrote:
> On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
>> On 1/6/21 8:35 AM, Michal Hocko wrote:
>>> On Wed 06-01-21 16:47:35, Muchun Song wrote:
>>>> Because we only can isolate a active page via isolate_huge_page()
>>>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
>>>> isolate and migrate those pages.
>>>
>>> I've little bit hard time to understand this initially and had to dive
>>> into the code to make sense of it. I would consider the following
>>> wording easier to grasp. Feel free to reuse if you like.
>>> "
>>> If a new hugetlb page is allocated during fallocate it will not be
>>> marked as active (set_page_huge_active) which will result in a later
>>> isolate_huge_page failure when the page migration code would like to
>>> move that page. Such a failure would be unexpected and wrong.
>>> "
>>>
>>> Now to the fix. I believe that this patch shows that the
>>> set_page_huge_active is just too subtle. Is there any reason why we
>>> cannot make all freshly allocated huge pages active by default?
>>
>> I looked into that yesterday.  The primary issue is in page fault code,
>> hugetlb_no_page is an example.  If page_huge_active is set, then it can
>> be isolated for migration.  So, migration could race with the page fault
>> and the page could be migrated before being added to the page table of
>> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
>> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
>> moved the set_page_huge_active after adding the page to the page table
>> to address this issue.
> 
> Thanks for the clarification. I was not aware of this subtlety. The
> existing comment is not helping much TBH. I am still digesting the
> suggested race. The page is new and exclusive and not visible via page
> tables yet, so the only source of the migration would be pfn based
> (hotplug, poisoning), right?

That is correct.


> Btw. s@set_page_huge_active@set_page_huge_migrateable@ would help
> readability IMHO. With a comment explaining that this _has_ to be called
> after the page is fully initialized.

Agree, I will add that as a future enhancement.
Muchun Song Jan. 7, 2021, 2:58 a.m. UTC | #5
On Thu, Jan 7, 2021 at 12:35 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 06-01-21 16:47:35, Muchun Song wrote:
> > Because we only can isolate a active page via isolate_huge_page()
> > and hugetlbfs_fallocate() forget to mark it as active, we cannot
> > isolate and migrate those pages.
>
> I've little bit hard time to understand this initially and had to dive
> into the code to make sense of it. I would consider the following
> wording easier to grasp. Feel free to reuse if you like.
> "
> If a new hugetlb page is allocated during fallocate it will not be
> marked as active (set_page_huge_active) which will result in a later
> isolate_huge_page failure when the page migration code would like to
> move that page. Such a failure would be unexpected and wrong.
> "

Thanks for your suggestion. I will reuse it.

>
> Now to the fix. I believe that this patch shows that the
> set_page_huge_active is just too subtle. Is there any reason why we
> cannot make all freshly allocated huge pages active by default?
>
> > Only export set_page_huge_active, just leave clear_page_huge_active
> > as static. Because there are no external users.
> >
> > Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  fs/hugetlbfs/inode.c    | 3 ++-
> >  include/linux/hugetlb.h | 2 ++
> >  mm/hugetlb.c            | 2 +-
> >  3 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index b5c109703daa..21c20fd5f9ee 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
> >
> >               mutex_unlock(&hugetlb_fault_mutex_table[hash]);
> >
> > +             set_page_huge_active(page);
> >               /*
> >                * unlock_page because locked by add_to_page_cache()
> > -              * page_put due to reference from alloc_huge_page()
> > +              * put_page() due to reference from alloc_huge_page()
> >                */
> >               unlock_page(page);
> >               put_page(page);
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index ebca2ef02212..b5807f23caf8 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> >  }
> >  #endif
> >
> > +void set_page_huge_active(struct page *page);
> > +
> >  #else        /* CONFIG_HUGETLB_PAGE */
> >  struct hstate {};
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 1f3bf1710b66..4741d60f8955 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
> >  }
> >
> >  /* never called for tail page */
> > -static void set_page_huge_active(struct page *page)
> > +void set_page_huge_active(struct page *page)
> >  {
> >       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> >       SetPagePrivate(&page[1]);
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs
Michal Hocko Jan. 7, 2021, 8:36 a.m. UTC | #6
On Wed 06-01-21 13:07:40, Mike Kravetz wrote:
> On 1/6/21 12:02 PM, Michal Hocko wrote:
> > On Wed 06-01-21 11:30:25, Mike Kravetz wrote:
> >> On 1/6/21 8:35 AM, Michal Hocko wrote:
> >>> On Wed 06-01-21 16:47:35, Muchun Song wrote:
> >>>> Because we only can isolate a active page via isolate_huge_page()
> >>>> and hugetlbfs_fallocate() forget to mark it as active, we cannot
> >>>> isolate and migrate those pages.
> >>>
> >>> I've little bit hard time to understand this initially and had to dive
> >>> into the code to make sense of it. I would consider the following
> >>> wording easier to grasp. Feel free to reuse if you like.
> >>> "
> >>> If a new hugetlb page is allocated during fallocate it will not be
> >>> marked as active (set_page_huge_active) which will result in a later
> >>> isolate_huge_page failure when the page migration code would like to
> >>> move that page. Such a failure would be unexpected and wrong.
> >>> "
> >>>
> >>> Now to the fix. I believe that this patch shows that the
> >>> set_page_huge_active is just too subtle. Is there any reason why we
> >>> cannot make all freshly allocated huge pages active by default?
> >>
> >> I looked into that yesterday.  The primary issue is in page fault code,
> >> hugetlb_no_page is an example.  If page_huge_active is set, then it can
> >> be isolated for migration.  So, migration could race with the page fault
> >> and the page could be migrated before being added to the page table of
> >> the faulting task.  This was an issue when hugetlb_no_page set_page_huge_active
> >> right after allocating and clearing the huge page.  Commit cb6acd01e2e4
> >> moved the set_page_huge_active after adding the page to the page table
> >> to address this issue.
> > 
> > Thanks for the clarification. I was not aware of this subtlety. The
> > existing comment is not helping much TBH. I am still digesting the
> > suggested race. The page is new and exclusive and not visible via page
> > tables yet, so the only source of the migration would be pfn based
> > (hotplug, poisoning), right?
> 
> That is correct.
> 
> 
> > Btw. s@set_page_huge_active@set_page_huge_migrateable@ would help
> > readability IMHO. With a comment explaining that this _has_ to be called
> > after the page is fully initialized.
> 
> Agree, I will add that as a future enhancement.

Thanks!
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b5c109703daa..21c20fd5f9ee 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -735,9 +735,10 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 
+		set_page_huge_active(page);
 		/*
 		 * unlock_page because locked by add_to_page_cache()
-		 * page_put due to reference from alloc_huge_page()
+		 * put_page() due to reference from alloc_huge_page()
 		 */
 		unlock_page(page);
 		put_page(page);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..b5807f23caf8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -770,6 +770,8 @@  static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 }
 #endif
 
+void set_page_huge_active(struct page *page);
+
 #else	/* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1f3bf1710b66..4741d60f8955 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1348,7 +1348,7 @@  bool page_huge_active(struct page *page)
 }
 
 /* never called for tail page */
-static void set_page_huge_active(struct page *page)
+void set_page_huge_active(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
 	SetPagePrivate(&page[1]);