diff mbox series

[v2,2/5] hugetlb: convert page_huge_active() HPageMigratable flag

Message ID 20210120013049.311822-3-mike.kravetz@oracle.com (mailing list archive)
State New
Headers show
Series create hugetlb flags to consolidate state | expand

Commit Message

Mike Kravetz Jan. 20, 2021, 1:30 a.m. UTC
Use the new hugetlb page specific flag HPageMigratable to replace the
page_huge_active interfaces.  By it's name, page_huge_active implied
that a huge page was on the active list.  However, that is not really
what code checking the flag wanted to know.  It really wanted to determine
if the huge page could be migrated.  This happens when the page is actually
added the page cache and/or task page table.  This is the reasoning behind
the name change.

The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
really necessary as we KNOW the page is a hugetlb page.  Therefore, they
are removed.

The routine page_huge_active checked for PageHeadHuge before testing the
active bit.  This is unnecessary in the case where we hold a reference or
lock and know it is a hugetlb head page.  page_huge_active is also called
without holding a reference or lock (scan_movable_pages), and can race with
code freeing the page.  The extra check in page_huge_active shortened the
race window, but did not prevent the race.  Offline code calling
scan_movable_pages already deals with these races, so removing the check
is acceptable.  Add comment to racy code.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/hugetlbfs/inode.c       |  2 +-
 include/linux/hugetlb.h    |  5 +++++
 include/linux/page-flags.h |  6 -----
 mm/hugetlb.c               | 45 ++++++++++----------------------------
 mm/memory_hotplug.c        |  8 ++++++-
 5 files changed, 24 insertions(+), 42 deletions(-)

Comments

Oscar Salvador Jan. 20, 2021, 9:59 a.m. UTC | #1
On Tue, Jan 19, 2021 at 05:30:46PM -0800, Mike Kravetz wrote:
> Use the new hugetlb page specific flag HPageMigratable to replace the
> page_huge_active interfaces.  By it's name, page_huge_active implied
> that a huge page was on the active list.  However, that is not really
> what code checking the flag wanted to know.  It really wanted to determine
> if the huge page could be migrated.  This happens when the page is actually
> added the page cache and/or task page table.  This is the reasoning behind
> the name change.
> 
> The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
> really necessary as we KNOW the page is a hugetlb page.  Therefore, they
> are removed.
> 
> The routine page_huge_active checked for PageHeadHuge before testing the
> active bit.  This is unnecessary in the case where we hold a reference or
> lock and know it is a hugetlb head page.  page_huge_active is also called
> without holding a reference or lock (scan_movable_pages), and can race with
> code freeing the page.  The extra check in page_huge_active shortened the
> race window, but did not prevent the race.  Offline code calling
> scan_movable_pages already deals with these races, so removing the check
> is acceptable.  Add comment to racy code.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Hi Mike,

This comment addresses both this patch and the next one.

Instead of putting the SetHPageMigratable flag spread over the
allocation paths, would it make more sense to place it in
alloc_huge_page before returning the page?
Then we could opencode SetHPageMigratableIfSupported right there.

I might be missing something and this might not be possible, but if it
is, it would look cleaner and more logical to me.
Oscar Salvador Jan. 20, 2021, 10 a.m. UTC | #2
On Wed, Jan 20, 2021 at 10:59:05AM +0100, Oscar Salvador wrote:
> On Tue, Jan 19, 2021 at 05:30:46PM -0800, Mike Kravetz wrote:
> > Use the new hugetlb page specific flag HPageMigratable to replace the
> > page_huge_active interfaces.  By it's name, page_huge_active implied
> > that a huge page was on the active list.  However, that is not really
> > what code checking the flag wanted to know.  It really wanted to determine
> > if the huge page could be migrated.  This happens when the page is actually
> > added the page cache and/or task page table.  This is the reasoning behind
> > the name change.
> > 
> > The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
> > really necessary as we KNOW the page is a hugetlb page.  Therefore, they
> > are removed.
> > 
> > The routine page_huge_active checked for PageHeadHuge before testing the
> > active bit.  This is unnecessary in the case where we hold a reference or
> > lock and know it is a hugetlb head page.  page_huge_active is also called
> > without holding a reference or lock (scan_movable_pages), and can race with
> > code freeing the page.  The extra check in page_huge_active shortened the
> > race window, but did not prevent the race.  Offline code calling
> > scan_movable_pages already deals with these races, so removing the check
> > is acceptable.  Add comment to racy code.
> > 
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Hi Mike,
> 
> This comment addresses both this patch and the next one.
> 
> Instead of putting the SetHPageMigratable flag spread over the
> allocation paths, would it make more sense to place it in
> alloc_huge_page before returning the page?
> Then we could opencode SetHPageMigratableIfSupported right there.

and in putback_active_hugepage.
Mike Kravetz Jan. 20, 2021, 9:48 p.m. UTC | #3
On 1/20/21 2:00 AM, Oscar Salvador wrote:
> On Wed, Jan 20, 2021 at 10:59:05AM +0100, Oscar Salvador wrote:
>> On Tue, Jan 19, 2021 at 05:30:46PM -0800, Mike Kravetz wrote:
>>> Use the new hugetlb page specific flag HPageMigratable to replace the
>>> page_huge_active interfaces.  By it's name, page_huge_active implied
>>> that a huge page was on the active list.  However, that is not really
>>> what code checking the flag wanted to know.  It really wanted to determine
>>> if the huge page could be migrated.  This happens when the page is actually
>>> added the page cache and/or task page table.  This is the reasoning behind
>>> the name change.
>>>
>>> The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
>>> really necessary as we KNOW the page is a hugetlb page.  Therefore, they
>>> are removed.
>>>
>>> The routine page_huge_active checked for PageHeadHuge before testing the
>>> active bit.  This is unnecessary in the case where we hold a reference or
>>> lock and know it is a hugetlb head page.  page_huge_active is also called
>>> without holding a reference or lock (scan_movable_pages), and can race with
>>> code freeing the page.  The extra check in page_huge_active shortened the
>>> race window, but did not prevent the race.  Offline code calling
>>> scan_movable_pages already deals with these races, so removing the check
>>> is acceptable.  Add comment to racy code.
>>>
>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> Hi Mike,
>>
>> This comment addresses both this patch and the next one.
>>
>> Instead of putting the SetHPageMigratable flag spread over the
>> allocation paths, would it make more sense to place it in
>> alloc_huge_page before returning the page?
>> Then we could opencode SetHPageMigratableIfSupported right there.
> 
> and in putback_active_hugepage.


Hi Oscar,

In Muchun's series of hugetlb bug fixes, Michal asked the same question.

https://lore.kernel.org/linux-mm/7e69a55c-d501-6b42-8225-a677f09fb829@oracle.com/

The 'short answer' is that the this would allow a page to be migrated
after allocation but before the page fault code adds it to the page
cache or page tables.  This actually caused bugs in the past.
Oscar Salvador Jan. 21, 2021, 8:18 a.m. UTC | #4
On Wed, Jan 20, 2021 at 01:48:05PM -0800, Mike Kravetz wrote:
> >> This comment addresses both this patch and the next one.
> >>
> >> Instead of putting the SetHPageMigratable flag spread over the
> >> allocation paths, would it make more sense to place it in
> >> alloc_huge_page before returning the page?
> >> Then we could opencode SetHPageMigratableIfSupported right there.
> > 
> > and in putback_active_hugepage.
> 
> 
> Hi Oscar,
> 
> In Muchun's series of hugetlb bug fixes, Michal asked the same question.
> 
> https://lore.kernel.org/linux-mm/7e69a55c-d501-6b42-8225-a677f09fb829@oracle.com/
> 
> The 'short answer' is that the this would allow a page to be migrated
> after allocation but before the page fault code adds it to the page
> cache or page tables.  This actually caused bugs in the past.

Oh, I see. I jumped late into that patchset so I missed some early messages.
Thanks for explaining this again.
Oscar Salvador Jan. 21, 2021, 8:27 a.m. UTC | #5
On Tue, Jan 19, 2021 at 05:30:46PM -0800, Mike Kravetz wrote:
> Use the new hugetlb page specific flag HPageMigratable to replace the
> page_huge_active interfaces.  By it's name, page_huge_active implied
> that a huge page was on the active list.  However, that is not really
> what code checking the flag wanted to know.  It really wanted to determine
> if the huge page could be migrated.  This happens when the page is actually
> added the page cache and/or task page table.  This is the reasoning behind
> the name change.
> 
> The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
> really necessary as we KNOW the page is a hugetlb page.  Therefore, they
> are removed.
> 
> The routine page_huge_active checked for PageHeadHuge before testing the
> active bit.  This is unnecessary in the case where we hold a reference or
> lock and know it is a hugetlb head page.  page_huge_active is also called
> without holding a reference or lock (scan_movable_pages), and can race with
> code freeing the page.  The extra check in page_huge_active shortened the
> race window, but did not prevent the race.  Offline code calling
> scan_movable_pages already deals with these races, so removing the check
> is acceptable.  Add comment to racy code.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

> -/*
> - * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
> - * to hstate->hugepage_activelist.)
> - *
> - * This function can be called for tail pages, but never returns true for them.
> - */
> -bool page_huge_active(struct page *page)
> -{
> -	return PageHeadHuge(page) && PagePrivate(&page[1]);

This made me think once again.
I wonder if we could ever see a scenario where page[0] is a rightful page while
page[1] is poisoned/unitialized (poison_pages()).
A lot of things would have to happen between the two checks, so I do not see it
possible and as you mentioned earlier, the race is already there.

Just wanted to speak up my mind.
Muchun Song Jan. 21, 2021, 12:01 p.m. UTC | #6
On Wed, Jan 20, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Use the new hugetlb page specific flag HPageMigratable to replace the
> page_huge_active interfaces.  By it's name, page_huge_active implied
> that a huge page was on the active list.  However, that is not really
> what code checking the flag wanted to know.  It really wanted to determine
> if the huge page could be migrated.  This happens when the page is actually
> added the page cache and/or task page table.  This is the reasoning behind
> the name change.
>
> The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
> really necessary as we KNOW the page is a hugetlb page.  Therefore, they
> are removed.
>
> The routine page_huge_active checked for PageHeadHuge before testing the
> active bit.  This is unnecessary in the case where we hold a reference or
> lock and know it is a hugetlb head page.  page_huge_active is also called
> without holding a reference or lock (scan_movable_pages), and can race with
> code freeing the page.  The extra check in page_huge_active shortened the
> race window, but did not prevent the race.  Offline code calling
> scan_movable_pages already deals with these races, so removing the check
> is acceptable.  Add comment to racy code.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

> ---
>  fs/hugetlbfs/inode.c       |  2 +-
>  include/linux/hugetlb.h    |  5 +++++
>  include/linux/page-flags.h |  6 -----
>  mm/hugetlb.c               | 45 ++++++++++----------------------------
>  mm/memory_hotplug.c        |  8 ++++++-
>  5 files changed, 24 insertions(+), 42 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b8a661780c4a..ec9f03aa2738 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -735,7 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>
>                 mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>
> -               set_page_huge_active(page);
> +               SetHPageMigratable(page);
>                 /*
>                  * unlock_page because locked by add_to_page_cache()
>                  * put_page() due to reference from alloc_huge_page()
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index be71a00ee2a0..ce3d03da0133 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -480,9 +480,13 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   * HPG_restore_reserve - Set when a hugetlb page consumes a reservation at
>   *     allocation time.  Cleared when page is fully instantiated.  Free
>   *     routine checks flag to restore a reservation on error paths.
> + * HPG_migratable  - Set after a newly allocated page is added to the page
> + *     cache and/or page tables.  Indicates the page is a candidate for
> + *     migration.
>   */
>  enum hugetlb_page_flags {
>         HPG_restore_reserve = 0,
> +       HPG_migratable,
>         __NR_HPAGEFLAGS,
>  };
>
> @@ -529,6 +533,7 @@ static inline void ClearHPage##uname(struct page *page)             \
>   * Create functions associated with hugetlb page flags
>   */
>  HPAGEFLAG(RestoreReserve, restore_reserve)
> +HPAGEFLAG(Migratable, migratable)
>
>  #ifdef CONFIG_HUGETLB_PAGE
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index bc6fd1ee7dd6..04a34c08e0a6 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -592,15 +592,9 @@ static inline void ClearPageCompound(struct page *page)
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
>  int PageHeadHuge(struct page *page);
> -bool page_huge_active(struct page *page);
>  #else
>  TESTPAGEFLAG_FALSE(Huge)
>  TESTPAGEFLAG_FALSE(HeadHuge)
> -
> -static inline bool page_huge_active(struct page *page)
> -{
> -       return 0;
> -}
>  #endif
>
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bed6b5202d2..c24da40626d3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1353,30 +1353,6 @@ struct hstate *size_to_hstate(unsigned long size)
>         return NULL;
>  }
>
> -/*
> - * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
> - * to hstate->hugepage_activelist.)
> - *
> - * This function can be called for tail pages, but never returns true for them.
> - */
> -bool page_huge_active(struct page *page)
> -{
> -       return PageHeadHuge(page) && PagePrivate(&page[1]);
> -}
> -
> -/* never called for tail page */
> -void set_page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -       SetPagePrivate(&page[1]);
> -}
> -
> -static void clear_page_huge_active(struct page *page)
> -{
> -       VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -       ClearPagePrivate(&page[1]);
> -}
> -
>  /*
>   * Internal hugetlb specific page flag. Do not use outside of the hugetlb
>   * code
> @@ -1438,7 +1414,7 @@ static void __free_huge_page(struct page *page)
>         }
>
>         spin_lock(&hugetlb_lock);
> -       clear_page_huge_active(page);
> +       ClearHPageMigratable(page);
>         hugetlb_cgroup_uncharge_page(hstate_index(h),
>                                      pages_per_huge_page(h), page);
>         hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
> @@ -4220,7 +4196,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>                                 make_huge_pte(vma, new_page, 1));
>                 page_remove_rmap(old_page, true);
>                 hugepage_add_new_anon_rmap(new_page, vma, haddr);
> -               set_page_huge_active(new_page);
> +               SetHPageMigratable(new_page);
>                 /* Make the old page be freed below */
>                 new_page = old_page;
>         }
> @@ -4457,12 +4433,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>         spin_unlock(ptl);
>
>         /*
> -        * Only make newly allocated pages active.  Existing pages found
> -        * in the pagecache could be !page_huge_active() if they have been
> -        * isolated for migration.
> +        * Only set HPageMigratable in newly allocated pages.  Existing pages
> +        * found in the pagecache may not have HPageMigratableset if they have
> +        * been isolated for migration.
>          */
>         if (new_page)
> -               set_page_huge_active(page);
> +               SetHPageMigratable(page);
>
>         unlock_page(page);
>  out:
> @@ -4773,7 +4749,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>         update_mmu_cache(dst_vma, dst_addr, dst_pte);
>
>         spin_unlock(ptl);
> -       set_page_huge_active(page);
> +       SetHPageMigratable(page);
>         if (vm_shared)
>                 unlock_page(page);
>         ret = 0;
> @@ -5591,12 +5567,13 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>         bool ret = true;
>
>         spin_lock(&hugetlb_lock);
> -       if (!PageHeadHuge(page) || !page_huge_active(page) ||
> +       if (!PageHeadHuge(page) ||
> +           !HPageMigratable(page) ||
>             !get_page_unless_zero(page)) {
>                 ret = false;
>                 goto unlock;
>         }
> -       clear_page_huge_active(page);
> +       ClearHPageMigratable(page);
>         list_move_tail(&page->lru, list);
>  unlock:
>         spin_unlock(&hugetlb_lock);
> @@ -5607,7 +5584,7 @@ void putback_active_hugepage(struct page *page)
>  {
>         VM_BUG_ON_PAGE(!PageHead(page), page);
>         spin_lock(&hugetlb_lock);
> -       set_page_huge_active(page);
> +       SetHPageMigratable(page);
>         list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
>         spin_unlock(&hugetlb_lock);
>         put_page(page);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f9d57b9be8c7..563da803e0e0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1260,7 +1260,13 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>                 if (!PageHuge(page))
>                         continue;
>                 head = compound_head(page);
> -               if (page_huge_active(head))
> +               /*
> +                * This test is racy as we hold no reference or lock.  The
> +                * hugetlb page could have been free'ed and head is no longer
> +                * a hugetlb page before the following check.  In such unlikely
> +                * cases false positives and negatives are possible.
> +                */
> +               if (HPageMigratable(head))
>                         goto found;
>                 skip = compound_nr(head) - (page - head);
>                 pfn += skip - 1;
> --
> 2.29.2
>
Miaohe Lin Jan. 22, 2021, 6:53 a.m. UTC | #7
Hi:
On 2021/1/20 9:30, Mike Kravetz wrote:
> Use the new hugetlb page specific flag HPageMigratable to replace the
> page_huge_active interfaces.  By it's name, page_huge_active implied
> that a huge page was on the active list.  However, that is not really
> what code checking the flag wanted to know.  It really wanted to determine
> if the huge page could be migrated.  This happens when the page is actually
> added the page cache and/or task page table.  This is the reasoning behind

s/added/added to/

> the name change.
> 
> The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
> really necessary as we KNOW the page is a hugetlb page.  Therefore, they
> are removed.
> 
> The routine page_huge_active checked for PageHeadHuge before testing the
> active bit.  This is unnecessary in the case where we hold a reference or
> lock and know it is a hugetlb head page.  page_huge_active is also called
> without holding a reference or lock (scan_movable_pages), and can race with
> code freeing the page.  The extra check in page_huge_active shortened the
> race window, but did not prevent the race.  Offline code calling
> scan_movable_pages already deals with these races, so removing the check
> is acceptable.  Add comment to racy code.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c       |  2 +-
>  include/linux/hugetlb.h    |  5 +++++
>  include/linux/page-flags.h |  6 -----
>  mm/hugetlb.c               | 45 ++++++++++----------------------------
>  mm/memory_hotplug.c        |  8 ++++++-
>  5 files changed, 24 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b8a661780c4a..ec9f03aa2738 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -735,7 +735,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>  
>  		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
>  
> -		set_page_huge_active(page);
> +		SetHPageMigratable(page);
>  		/*
>  		 * unlock_page because locked by add_to_page_cache()
>  		 * put_page() due to reference from alloc_huge_page()
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index be71a00ee2a0..ce3d03da0133 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -480,9 +480,13 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>   * HPG_restore_reserve - Set when a hugetlb page consumes a reservation at
>   *	allocation time.  Cleared when page is fully instantiated.  Free
>   *	routine checks flag to restore a reservation on error paths.
> + * HPG_migratable  - Set after a newly allocated page is added to the page
> + *	cache and/or page tables.  Indicates the page is a candidate for
> + *	migration.
>   */
>  enum hugetlb_page_flags {
>  	HPG_restore_reserve = 0,
> +	HPG_migratable,
>  	__NR_HPAGEFLAGS,
>  };
>  
> @@ -529,6 +533,7 @@ static inline void ClearHPage##uname(struct page *page)		\
>   * Create functions associated with hugetlb page flags
>   */
>  HPAGEFLAG(RestoreReserve, restore_reserve)
> +HPAGEFLAG(Migratable, migratable)
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index bc6fd1ee7dd6..04a34c08e0a6 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -592,15 +592,9 @@ static inline void ClearPageCompound(struct page *page)
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
>  int PageHeadHuge(struct page *page);
> -bool page_huge_active(struct page *page);
>  #else
>  TESTPAGEFLAG_FALSE(Huge)
>  TESTPAGEFLAG_FALSE(HeadHuge)
> -
> -static inline bool page_huge_active(struct page *page)
> -{
> -	return 0;
> -}
>  #endif
>  
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bed6b5202d2..c24da40626d3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1353,30 +1353,6 @@ struct hstate *size_to_hstate(unsigned long size)
>  	return NULL;
>  }
>  
> -/*
> - * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
> - * to hstate->hugepage_activelist.)
> - *
> - * This function can be called for tail pages, but never returns true for them.
> - */
> -bool page_huge_active(struct page *page)
> -{
> -	return PageHeadHuge(page) && PagePrivate(&page[1]);
> -}
> -
> -/* never called for tail page */
> -void set_page_huge_active(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -	SetPagePrivate(&page[1]);
> -}
> -
> -static void clear_page_huge_active(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
> -	ClearPagePrivate(&page[1]);
> -}
> -
>  /*
>   * Internal hugetlb specific page flag. Do not use outside of the hugetlb
>   * code
> @@ -1438,7 +1414,7 @@ static void __free_huge_page(struct page *page)
>  	}
>  
>  	spin_lock(&hugetlb_lock);
> -	clear_page_huge_active(page);
> +	ClearHPageMigratable(page);
>  	hugetlb_cgroup_uncharge_page(hstate_index(h),
>  				     pages_per_huge_page(h), page);
>  	hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
> @@ -4220,7 +4196,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  				make_huge_pte(vma, new_page, 1));
>  		page_remove_rmap(old_page, true);
>  		hugepage_add_new_anon_rmap(new_page, vma, haddr);
> -		set_page_huge_active(new_page);
> +		SetHPageMigratable(new_page);
>  		/* Make the old page be freed below */
>  		new_page = old_page;
>  	}
> @@ -4457,12 +4433,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>  	spin_unlock(ptl);
>  
>  	/*
> -	 * Only make newly allocated pages active.  Existing pages found
> -	 * in the pagecache could be !page_huge_active() if they have been
> -	 * isolated for migration.
> +	 * Only set HPageMigratable in newly allocated pages.  Existing pages
> +	 * found in the pagecache may not have HPageMigratableset if they have
> +	 * been isolated for migration.
>  	 */
>  	if (new_page)
> -		set_page_huge_active(page);
> +		SetHPageMigratable(page);
>  
>  	unlock_page(page);
>  out:
> @@ -4773,7 +4749,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  	update_mmu_cache(dst_vma, dst_addr, dst_pte);
>  
>  	spin_unlock(ptl);
> -	set_page_huge_active(page);
> +	SetHPageMigratable(page);
>  	if (vm_shared)
>  		unlock_page(page);
>  	ret = 0;
> @@ -5591,12 +5567,13 @@ bool isolate_huge_page(struct page *page, struct list_head *list)
>  	bool ret = true;
>  
>  	spin_lock(&hugetlb_lock);
> -	if (!PageHeadHuge(page) || !page_huge_active(page) ||
> +	if (!PageHeadHuge(page) ||
> +	    !HPageMigratable(page) ||
>  	    !get_page_unless_zero(page)) {
>  		ret = false;
>  		goto unlock;
>  	}
> -	clear_page_huge_active(page);
> +	ClearHPageMigratable(page);
>  	list_move_tail(&page->lru, list);
>  unlock:
>  	spin_unlock(&hugetlb_lock);
> @@ -5607,7 +5584,7 @@ void putback_active_hugepage(struct page *page)
>  {
>  	VM_BUG_ON_PAGE(!PageHead(page), page);
>  	spin_lock(&hugetlb_lock);
> -	set_page_huge_active(page);
> +	SetHPageMigratable(page);
>  	list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
>  	spin_unlock(&hugetlb_lock);
>  	put_page(page);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index f9d57b9be8c7..563da803e0e0 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1260,7 +1260,13 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  		if (!PageHuge(page))
>  			continue;
>  		head = compound_head(page);
> -		if (page_huge_active(head))
> +		/*
> +		 * This test is racy as we hold no reference or lock.  The
> +		 * hugetlb page could have been free'ed and head is no longer
> +		 * a hugetlb page before the following check.  In such unlikely
> +		 * cases false positives and negatives are possible.
> +		 */

Is it necessary to mention that: "offline_pages() could handle these racy cases." in the comment ?

> +		if (HPageMigratable(head))
>  			goto found;
>  		skip = compound_nr(head) - (page - head);
>  		pfn += skip - 1;
> 

Looks good to me. Thanks.

Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Mike Kravetz Jan. 22, 2021, 11:12 p.m. UTC | #8
On 1/21/21 10:53 PM, Miaohe Lin wrote:
> Hi:
> On 2021/1/20 9:30, Mike Kravetz wrote:
>> Use the new hugetlb page specific flag HPageMigratable to replace the
>> page_huge_active interfaces.  By it's name, page_huge_active implied
>> that a huge page was on the active list.  However, that is not really
>> what code checking the flag wanted to know.  It really wanted to determine
>> if the huge page could be migrated.  This happens when the page is actually
>> added the page cache and/or task page table.  This is the reasoning behind
> 
> s/added/added to/

Thanks

> 
>> the name change.
>>
...
>> @@ -1260,7 +1260,13 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>>  		if (!PageHuge(page))
>>  			continue;
>>  		head = compound_head(page);
>> -		if (page_huge_active(head))
>> +		/*
>> +		 * This test is racy as we hold no reference or lock.  The
>> +		 * hugetlb page could have been free'ed and head is no longer
>> +		 * a hugetlb page before the following check.  In such unlikely
>> +		 * cases false positives and negatives are possible.
>> +		 */
> 
> Is it necessary to mention that: "offline_pages() could handle these racy cases." in the comment ?
> 

I will enhance the comment to say that calling code must deal with these
possible scenarios.
Muchun Song Feb. 3, 2021, 7:42 a.m. UTC | #9
On Wed, Jan 20, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Use the new hugetlb page specific flag HPageMigratable to replace the
> page_huge_active interfaces.  By it's name, page_huge_active implied
> that a huge page was on the active list.  However, that is not really
> what code checking the flag wanted to know.  It really wanted to determine
> if the huge page could be migrated.  This happens when the page is actually
> added the page cache and/or task page table.  This is the reasoning behind
> the name change.
>
> The VM_BUG_ON_PAGE() calls in the *_huge_active() interfaces are not
> really necessary as we KNOW the page is a hugetlb page.  Therefore, they
> are removed.
>
> The routine page_huge_active checked for PageHeadHuge before testing the
> active bit.  This is unnecessary in the case where we hold a reference or
> lock and know it is a hugetlb head page.  page_huge_active is also called
> without holding a reference or lock (scan_movable_pages), and can race with
> code freeing the page.  The extra check in page_huge_active shortened the
> race window, but did not prevent the race.  Offline code calling
> scan_movable_pages already deals with these races, so removing the check
> is acceptable.  Add comment to racy code.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Hi Mike,

I found that you may forget to remove set_page_huge_active()
from include/linux/hugetlb.h.

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 37fd248ce271..6f680cf0eee6 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -852,7 +852,7 @@ static inline void
huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
 }
 #endif

-void set_page_huge_active(struct page *page);
+

Thanks.
Mike Kravetz Feb. 4, 2021, 12:44 a.m. UTC | #10
On 2/2/21 11:42 PM, Muchun Song wrote:
> On Wed, Jan 20, 2021 at 9:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Hi Mike,
> 
> I found that you may forget to remove set_page_huge_active()
> from include/linux/hugetlb.h.
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 37fd248ce271..6f680cf0eee6 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -852,7 +852,7 @@ static inline void
> huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
>  }
>  #endif
> 
> -void set_page_huge_active(struct page *page);
> +
> 
> Thanks.

Thank you Muchun for finding this.

Thank you Andrew for fixing in your tree.
diff mbox series

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b8a661780c4a..ec9f03aa2738 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -735,7 +735,7 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 
 		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 
-		set_page_huge_active(page);
+		SetHPageMigratable(page);
 		/*
 		 * unlock_page because locked by add_to_page_cache()
 		 * put_page() due to reference from alloc_huge_page()
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index be71a00ee2a0..ce3d03da0133 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -480,9 +480,13 @@  unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
  * HPG_restore_reserve - Set when a hugetlb page consumes a reservation at
  *	allocation time.  Cleared when page is fully instantiated.  Free
  *	routine checks flag to restore a reservation on error paths.
+ * HPG_migratable  - Set after a newly allocated page is added to the page
+ *	cache and/or page tables.  Indicates the page is a candidate for
+ *	migration.
  */
 enum hugetlb_page_flags {
 	HPG_restore_reserve = 0,
+	HPG_migratable,
 	__NR_HPAGEFLAGS,
 };
 
@@ -529,6 +533,7 @@  static inline void ClearHPage##uname(struct page *page)		\
  * Create functions associated with hugetlb page flags
  */
 HPAGEFLAG(RestoreReserve, restore_reserve)
+HPAGEFLAG(Migratable, migratable)
 
 #ifdef CONFIG_HUGETLB_PAGE
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index bc6fd1ee7dd6..04a34c08e0a6 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -592,15 +592,9 @@  static inline void ClearPageCompound(struct page *page)
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
 int PageHeadHuge(struct page *page);
-bool page_huge_active(struct page *page);
 #else
 TESTPAGEFLAG_FALSE(Huge)
 TESTPAGEFLAG_FALSE(HeadHuge)
-
-static inline bool page_huge_active(struct page *page)
-{
-	return 0;
-}
 #endif
 
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bed6b5202d2..c24da40626d3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1353,30 +1353,6 @@  struct hstate *size_to_hstate(unsigned long size)
 	return NULL;
 }
 
-/*
- * Test to determine whether the hugepage is "active/in-use" (i.e. being linked
- * to hstate->hugepage_activelist.)
- *
- * This function can be called for tail pages, but never returns true for them.
- */
-bool page_huge_active(struct page *page)
-{
-	return PageHeadHuge(page) && PagePrivate(&page[1]);
-}
-
-/* never called for tail page */
-void set_page_huge_active(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
-	SetPagePrivate(&page[1]);
-}
-
-static void clear_page_huge_active(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
-	ClearPagePrivate(&page[1]);
-}
-
 /*
  * Internal hugetlb specific page flag. Do not use outside of the hugetlb
  * code
@@ -1438,7 +1414,7 @@  static void __free_huge_page(struct page *page)
 	}
 
 	spin_lock(&hugetlb_lock);
-	clear_page_huge_active(page);
+	ClearHPageMigratable(page);
 	hugetlb_cgroup_uncharge_page(hstate_index(h),
 				     pages_per_huge_page(h), page);
 	hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
@@ -4220,7 +4196,7 @@  static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 				make_huge_pte(vma, new_page, 1));
 		page_remove_rmap(old_page, true);
 		hugepage_add_new_anon_rmap(new_page, vma, haddr);
-		set_page_huge_active(new_page);
+		SetHPageMigratable(new_page);
 		/* Make the old page be freed below */
 		new_page = old_page;
 	}
@@ -4457,12 +4433,12 @@  static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
 	spin_unlock(ptl);
 
 	/*
-	 * Only make newly allocated pages active.  Existing pages found
-	 * in the pagecache could be !page_huge_active() if they have been
-	 * isolated for migration.
+	 * Only set HPageMigratable in newly allocated pages.  Existing pages
+	 * found in the pagecache may not have HPageMigratableset if they have
+	 * been isolated for migration.
 	 */
 	if (new_page)
-		set_page_huge_active(page);
+		SetHPageMigratable(page);
 
 	unlock_page(page);
 out:
@@ -4773,7 +4749,7 @@  int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	update_mmu_cache(dst_vma, dst_addr, dst_pte);
 
 	spin_unlock(ptl);
-	set_page_huge_active(page);
+	SetHPageMigratable(page);
 	if (vm_shared)
 		unlock_page(page);
 	ret = 0;
@@ -5591,12 +5567,13 @@  bool isolate_huge_page(struct page *page, struct list_head *list)
 	bool ret = true;
 
 	spin_lock(&hugetlb_lock);
-	if (!PageHeadHuge(page) || !page_huge_active(page) ||
+	if (!PageHeadHuge(page) ||
+	    !HPageMigratable(page) ||
 	    !get_page_unless_zero(page)) {
 		ret = false;
 		goto unlock;
 	}
-	clear_page_huge_active(page);
+	ClearHPageMigratable(page);
 	list_move_tail(&page->lru, list);
 unlock:
 	spin_unlock(&hugetlb_lock);
@@ -5607,7 +5584,7 @@  void putback_active_hugepage(struct page *page)
 {
 	VM_BUG_ON_PAGE(!PageHead(page), page);
 	spin_lock(&hugetlb_lock);
-	set_page_huge_active(page);
+	SetHPageMigratable(page);
 	list_move_tail(&page->lru, &(page_hstate(page))->hugepage_activelist);
 	spin_unlock(&hugetlb_lock);
 	put_page(page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f9d57b9be8c7..563da803e0e0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1260,7 +1260,13 @@  static int scan_movable_pages(unsigned long start, unsigned long end,
 		if (!PageHuge(page))
 			continue;
 		head = compound_head(page);
-		if (page_huge_active(head))
+		/*
+		 * This test is racy as we hold no reference or lock.  The
+		 * hugetlb page could have been free'ed and head is no longer
+		 * a hugetlb page before the following check.  In such unlikely
+		 * cases false positives and negatives are possible.
+		 */
+		if (HPageMigratable(head))
 			goto found;
 		skip = compound_nr(head) - (page - head);
 		pfn += skip - 1;