diff mbox series

mm/hugetlb: allow gigantic page allocation to migrate away smaller huge page

Message ID 1561350068-8966-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/hugetlb: allow gigantic page allocation to migrate away smaller huge page | expand

Commit Message

Pingfan Liu June 24, 2019, 4:21 a.m. UTC
The current pfn_range_valid_gigantic() rejects the pud huge page allocation
if there is a pmd huge page inside the candidate range.

But pud huge resource is more rare, which should align on 1GB on x86. It is
worth to allow migrating away pmd huge page to make room for a pud huge
page.

The same logic is applied to pgd and pud huge pages.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
---
 mm/hugetlb.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Ira Weiny June 24, 2019, 5:03 a.m. UTC | #1
On Mon, Jun 24, 2019 at 12:21:08PM +0800, Pingfan Liu wrote:
> The current pfn_range_valid_gigantic() rejects the pud huge page allocation
> if there is a pmd huge page inside the candidate range.
> 
> But pud huge resource is more rare, which should align on 1GB on x86. It is
> worth to allow migrating away pmd huge page to make room for a pud huge
> page.
> 
> The same logic is applied to pgd and pud huge pages.

I'm sorry but I don't quite understand why we should do this.  Is this a bug or
an optimization?  It sounds like an optimization.

> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/hugetlb.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ac843d3..02d1978 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1081,7 +1081,11 @@ static bool pfn_range_valid_gigantic(struct zone *z,
>  			unsigned long start_pfn, unsigned long nr_pages)
>  {
>  	unsigned long i, end_pfn = start_pfn + nr_pages;
> -	struct page *page;
> +	struct page *page = pfn_to_page(start_pfn);
> +
> +	if (PageHuge(page))
> +		if (compound_order(compound_head(page)) >= nr_pages)

I don't think you want compound_order() here.

Ira

> +			return false;
>  
>  	for (i = start_pfn; i < end_pfn; i++) {
>  		if (!pfn_valid(i))
> @@ -1098,8 +1102,6 @@ static bool pfn_range_valid_gigantic(struct zone *z,
>  		if (page_count(page) > 0)
>  			return false;
>  
> -		if (PageHuge(page))
> -			return false;
>  	}
>  
>  	return true;
> -- 
> 2.7.5
>
Anshuman Khandual June 24, 2019, 5:16 a.m. UTC | #2
On 06/24/2019 09:51 AM, Pingfan Liu wrote:
> The current pfn_range_valid_gigantic() rejects the pud huge page allocation
> if there is a pmd huge page inside the candidate range.
> 
> But pud huge resource is more rare, which should align on 1GB on x86. It is
> worth to allow migrating away pmd huge page to make room for a pud huge
> page.
> 
> The same logic is applied to pgd and pud huge pages.

The huge page in the range can either be a THP or HugeTLB and migrating them has
different costs and chances of success. THP migration will involve splitting if
THP migration is not enabled and all related TLB related costs. Are you sure
that a PUD HugeTLB allocation really should go through these ? Is there any
guarantee that after migration of multiple PMD sized THP/HugeTLB pages on the
given range, the allocation request for PUD will succeed ?

> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/hugetlb.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ac843d3..02d1978 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1081,7 +1081,11 @@ static bool pfn_range_valid_gigantic(struct zone *z,
>  			unsigned long start_pfn, unsigned long nr_pages)
>  {
>  	unsigned long i, end_pfn = start_pfn + nr_pages;
> -	struct page *page;
> +	struct page *page = pfn_to_page(start_pfn);
> +
> +	if (PageHuge(page))
> +		if (compound_order(compound_head(page)) >= nr_pages)
> +			return false;
>  
>  	for (i = start_pfn; i < end_pfn; i++) {
>  		if (!pfn_valid(i))
> @@ -1098,8 +1102,6 @@ static bool pfn_range_valid_gigantic(struct zone *z,
>  		if (page_count(page) > 0)
>  			return false;
>  
> -		if (PageHuge(page))
> -			return false;
>  	}
>  
>  	return true;
> 

So except in the case where there is a bigger huge page in the range this will
attempt migrating everything on the way. As mentioned before if it all this is
a good idea, it needs to differentiate between HugeTLB and THP and also take
into account costs of migrations and chance of subsequence allocation attempt
into account.
Pingfan Liu June 24, 2019, 5:55 a.m. UTC | #3
On Mon, Jun 24, 2019 at 1:03 PM Ira Weiny <ira.weiny@intel.com> wrote:
>
> On Mon, Jun 24, 2019 at 12:21:08PM +0800, Pingfan Liu wrote:
> > The current pfn_range_valid_gigantic() rejects the pud huge page allocation
> > if there is a pmd huge page inside the candidate range.
> >
> > But pud huge resource is more rare, which should align on 1GB on x86. It is
> > worth to allow migrating away pmd huge page to make room for a pud huge
> > page.
> >
> > The same logic is applied to pgd and pud huge pages.
>
> I'm sorry but I don't quite understand why we should do this.  Is this a bug or
> an optimization?  It sounds like an optimization.
Yes, an optimization. It can help us to success to allocate a 1GB
hugetlb if there is some 2MB hugetlb sit in the candidate range.
Allocation 1GB hugetlb requires more tough condition, not only a
continuous 1GB range, but also aligned on GB. While allocating a 2MB
range is easier.
>
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/hugetlb.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ac843d3..02d1978 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1081,7 +1081,11 @@ static bool pfn_range_valid_gigantic(struct zone *z,
> >                       unsigned long start_pfn, unsigned long nr_pages)
> >  {
> >       unsigned long i, end_pfn = start_pfn + nr_pages;
> > -     struct page *page;
> > +     struct page *page = pfn_to_page(start_pfn);
> > +
> > +     if (PageHuge(page))
> > +             if (compound_order(compound_head(page)) >= nr_pages)
>
> I don't think you want compound_order() here.
Yes, your are right.

Thanks,
  Pingfan
>
> Ira
>
> > +                     return false;
> >
> >       for (i = start_pfn; i < end_pfn; i++) {
> >               if (!pfn_valid(i))
> > @@ -1098,8 +1102,6 @@ static bool pfn_range_valid_gigantic(struct zone *z,
> >               if (page_count(page) > 0)
> >                       return false;
> >
> > -             if (PageHuge(page))
> > -                     return false;
> >       }
> >
> >       return true;
> > --
> > 2.7.5
> >
Pingfan Liu June 24, 2019, 6:10 a.m. UTC | #4
On Mon, Jun 24, 2019 at 1:16 PM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
>
>
> On 06/24/2019 09:51 AM, Pingfan Liu wrote:
> > The current pfn_range_valid_gigantic() rejects the pud huge page allocation
> > if there is a pmd huge page inside the candidate range.
> >
> > But pud huge resource is more rare, which should align on 1GB on x86. It is
> > worth to allow migrating away pmd huge page to make room for a pud huge
> > page.
> >
> > The same logic is applied to pgd and pud huge pages.
>
> The huge page in the range can either be a THP or HugeTLB and migrating them has
> different costs and chances of success. THP migration will involve splitting if
> THP migration is not enabled and all related TLB related costs. Are you sure
> that a PUD HugeTLB allocation really should go through these ? Is there any
PUD hugetlb has already driven out PMD thp in current. This patch just
want to make PUD hugetlb survives PMD hugetlb.

> guarantee that after migration of multiple PMD sized THP/HugeTLB pages on the
> given range, the allocation request for PUD will succeed ?
The migration is complicated, but as my understanding, if there is no
gup pin in the range and there is enough memory including swap, then
it will success.
>
> >
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Oscar Salvador <osalvador@suse.de>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  mm/hugetlb.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ac843d3..02d1978 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1081,7 +1081,11 @@ static bool pfn_range_valid_gigantic(struct zone *z,
> >                       unsigned long start_pfn, unsigned long nr_pages)
> >  {
> >       unsigned long i, end_pfn = start_pfn + nr_pages;
> > -     struct page *page;
> > +     struct page *page = pfn_to_page(start_pfn);
> > +
> > +     if (PageHuge(page))
> > +             if (compound_order(compound_head(page)) >= nr_pages)
> > +                     return false;
> >
> >       for (i = start_pfn; i < end_pfn; i++) {
> >               if (!pfn_valid(i))
> > @@ -1098,8 +1102,6 @@ static bool pfn_range_valid_gigantic(struct zone *z,
> >               if (page_count(page) > 0)
> >                       return false;
> >
> > -             if (PageHuge(page))
> > -                     return false;
> >       }
> >
> >       return true;
> >
>
> So except in the case where there is a bigger huge page in the range this will
> attempt migrating everything on the way. As mentioned before if it all this is
> a good idea, it needs to differentiate between HugeTLB and THP and also take
> into account costs of migrations and chance of subsequence allocation attempt
> into account.
Sorry, but I think this logic is only for hugetlb. The caller
alloc_gigantic_page() is only used inside mm/hugetlb.c, not by
huge_memory.c.

Thanks,
  Pingfan
Anshuman Khandual June 24, 2019, 8:26 a.m. UTC | #5
On 06/24/2019 11:40 AM, Pingfan Liu wrote:
> On Mon, Jun 24, 2019 at 1:16 PM Anshuman Khandual
> <anshuman.khandual@arm.com> wrote:
>>
>>
>>
>> On 06/24/2019 09:51 AM, Pingfan Liu wrote:
>>> The current pfn_range_valid_gigantic() rejects the pud huge page allocation
>>> if there is a pmd huge page inside the candidate range.
>>>
>>> But pud huge resource is more rare, which should align on 1GB on x86. It is
>>> worth to allow migrating away pmd huge page to make room for a pud huge
>>> page.
>>>
>>> The same logic is applied to pgd and pud huge pages.
>>
>> The huge page in the range can either be a THP or HugeTLB and migrating them has
>> different costs and chances of success. THP migration will involve splitting if
>> THP migration is not enabled and all related TLB related costs. Are you sure
>> that a PUD HugeTLB allocation really should go through these ? Is there any
> PUD hugetlb has already driven out PMD thp in current. This patch just
> want to make PUD hugetlb survives PMD hugetlb.

You are right. PageHuge() is true only for HugeTLB pages unlike PageTransHuge()
which is true for both HugeTLB and THP pages. So the current code does migrate
the THP out in order to allocate a gigantic HugeTLB. While here just wondering
should not we exclude THP as well unless it supports ARCH_HAS_THP_MIGRATION.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ac843d3..02d1978 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1081,7 +1081,11 @@  static bool pfn_range_valid_gigantic(struct zone *z,
 			unsigned long start_pfn, unsigned long nr_pages)
 {
 	unsigned long i, end_pfn = start_pfn + nr_pages;
-	struct page *page;
+	struct page *page = pfn_to_page(start_pfn);
+
+	if (PageHuge(page))
+		if (compound_order(compound_head(page)) >= nr_pages)
+			return false;
 
 	for (i = start_pfn; i < end_pfn; i++) {
 		if (!pfn_valid(i))
@@ -1098,8 +1102,6 @@  static bool pfn_range_valid_gigantic(struct zone *z,
 		if (page_count(page) > 0)
 			return false;
 
-		if (PageHuge(page))
-			return false;
 	}
 
 	return true;