diff mbox series

[1/4] mm/hugetlb: Enable PUD level huge page migration

Message ID 1538482531-26883-2-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64/mm: Enable HugeTLB migration | expand

Commit Message

Anshuman Khandual Oct. 2, 2018, 12:15 p.m. UTC
Architectures like arm64 have PUD level HugeTLB pages for certain configs
(1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
enabled for migration. It can be achieved through checking for PUD_SHIFT
order based HugeTLB pages during migration.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 include/linux/hugetlb.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Suzuki K Poulose Oct. 2, 2018, 12:38 p.m. UTC | #1
Hi Anshuman

On 02/10/18 13:15, Anshuman Khandual wrote:
> Architectures like arm64 have PUD level HugeTLB pages for certain configs
> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
> enabled for migration. It can be achieved through checking for PUD_SHIFT
> order based HugeTLB pages during migration.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>   include/linux/hugetlb.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6b68e34..9c1b77f 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -483,7 +483,8 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>   {
>   #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>   	if ((huge_page_shift(h) == PMD_SHIFT) ||
> -		(huge_page_shift(h) == PGDIR_SHIFT))
> +		(huge_page_shift(h) == PUD_SHIFT) ||


> +			(huge_page_shift(h) == PGDIR_SHIFT))

nit: Extra Tab ^^.
Also, if only arm64 supports PUD_SHIFT, should this be added only in the 
arm64 specific backend, which we introduce later ?

Suzuki
Michal Hocko Oct. 2, 2018, 12:39 p.m. UTC | #2
On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
> Architectures like arm64 have PUD level HugeTLB pages for certain configs
> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
> enabled for migration. It can be achieved through checking for PUD_SHIFT
> order based HugeTLB pages during migration.

Well a long term problem with hugepage_migration_supported is that it is
used in two different context 1) to bail out from the migration early
because the arch doesn't support migration at all and 2) to use movable
zone for hugetlb pages allocation. I am especially concerned about the
later because the mere support for migration is not really good enough.
Are you really able to find a different giga page during the runtime to
move an existing giga page out of the movable zone?

So I guess we want to split this into two functions
arch_hugepage_migration_supported and hugepage_movable. The later would
be a reasonably migrateable subset of the former. Without that this
patch migth introduce subtle regressions when somebody relies on movable
zone to be really movable.

> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  include/linux/hugetlb.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6b68e34..9c1b77f 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -483,7 +483,8 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>  {
>  #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>  	if ((huge_page_shift(h) == PMD_SHIFT) ||
> -		(huge_page_shift(h) == PGDIR_SHIFT))
> +		(huge_page_shift(h) == PUD_SHIFT) ||
> +			(huge_page_shift(h) == PGDIR_SHIFT))
>  		return true;
>  	else
>  		return false;
> -- 
> 2.7.4
Anshuman Khandual Oct. 2, 2018, 12:56 p.m. UTC | #3
On 10/02/2018 06:08 PM, Suzuki K Poulose wrote:
> Hi Anshuman
> 
> On 02/10/18 13:15, Anshuman Khandual wrote:
>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>> order based HugeTLB pages during migration.
>>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   include/linux/hugetlb.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 6b68e34..9c1b77f 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -483,7 +483,8 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>   {
>>   #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>>       if ((huge_page_shift(h) == PMD_SHIFT) ||
>> -        (huge_page_shift(h) == PGDIR_SHIFT))
>> +        (huge_page_shift(h) == PUD_SHIFT) ||
> 
> 
>> +            (huge_page_shift(h) == PGDIR_SHIFT))
> 
> nit: Extra Tab ^^.

The tab is in there when you apply this patch and all three checks are tab separated
in a newline.

> Also, if only arm64 supports PUD_SHIFT, should this be added only in the arm64 specific backend, which we introduce later ?

Even if with the platform can add this up in the back end, I would think having this
on for default fall back function makes it complete.
Anshuman Khandual Oct. 3, 2018, 2:16 a.m. UTC | #4
On 10/02/2018 06:09 PM, Michal Hocko wrote:
> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>> order based HugeTLB pages during migration.
> 
> Well a long term problem with hugepage_migration_supported is that it is
> used in two different context 1) to bail out from the migration early
> because the arch doesn't support migration at all and 2) to use movable
> zone for hugetlb pages allocation. I am especially concerned about the
> later because the mere support for migration is not really good enough.
> Are you really able to find a different giga page during the runtime to
> move an existing giga page out of the movable zone?

I pre-allocate them before trying to initiate the migration (soft offline
in my experiments). Hence it should come from the pre-allocated HugeTLB
pool instead from the buddy. I might be missing something here but do we
ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
came from the pool (unless its something related to ovecommit/surplus).
Could you please kindly explain regarding how migration target HugeTLB
pages are allocated on the fly from movable zone.

But even if there are some chances of run time allocation failure from
movable zone (as in point 2) that should not block the very initiation
of migration itself. IIUC thats not the semantics for either THP or
normal pages. Why should it be different here. If the allocation fails
we should report and abort as always. Its the caller of migration taking
the chances. why should we prevent that.

> 
> So I guess we want to split this into two functions
> arch_hugepage_migration_supported and hugepage_movable. The later would

So the set difference between arch_hugepage_migration_supported and 
hugepage_movable still remains un-migratable ? Then what is the purpose
for arch_hugepage_migration_supported page size set in the first place.
Does it mean we allow the migration at the beginning and the abort later
when the page size does not fall within the subset for hugepage_movable.
Could you please kindly explain this in more detail.

> be a reasonably migrateable subset of the former. Without that this
> patch migth introduce subtle regressions when somebody relies on movable
> zone to be really movable.
PUD based HugeTLB pages were never migratable, then how can there be any
regression here ? At present we even support PGD based HugeTLB pages for
migration. Wondering how PUD based ones are going to be any different.
Michal Hocko Oct. 3, 2018, 6:58 a.m. UTC | #5
On Wed 03-10-18 07:46:27, Anshuman Khandual wrote:
> 
> 
> On 10/02/2018 06:09 PM, Michal Hocko wrote:
> > On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
> >> Architectures like arm64 have PUD level HugeTLB pages for certain configs
> >> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
> >> enabled for migration. It can be achieved through checking for PUD_SHIFT
> >> order based HugeTLB pages during migration.
> > 
> > Well a long term problem with hugepage_migration_supported is that it is
> > used in two different context 1) to bail out from the migration early
> > because the arch doesn't support migration at all and 2) to use movable
> > zone for hugetlb pages allocation. I am especially concerned about the
> > later because the mere support for migration is not really good enough.
> > Are you really able to find a different giga page during the runtime to
> > move an existing giga page out of the movable zone?
> 
> I pre-allocate them before trying to initiate the migration (soft offline
> in my experiments). Hence it should come from the pre-allocated HugeTLB
> pool instead from the buddy. I might be missing something here but do we
> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
> came from the pool (unless its something related to ovecommit/surplus).
> Could you please kindly explain regarding how migration target HugeTLB
> pages are allocated on the fly from movable zone.

Hotplug comes to mind. You usually do not pre-allocate to cover full
node going offline. And people would like to do that. Another example is
CMA. You would really like to move pages out of the way.

> But even if there are some chances of run time allocation failure from
> movable zone (as in point 2) that should not block the very initiation
> of migration itself. IIUC thats not the semantics for either THP or
> normal pages. Why should it be different here. If the allocation fails
> we should report and abort as always. Its the caller of migration taking
> the chances. why should we prevent that.

Yes I agree, hence the distinction between the arch support for
migrateability and the criterion on the movable zone placement.
 
> > 
> > So I guess we want to split this into two functions
> > arch_hugepage_migration_supported and hugepage_movable. The later would
> 
> So the set difference between arch_hugepage_migration_supported and 
> hugepage_movable still remains un-migratable ? Then what is the purpose
> for arch_hugepage_migration_supported page size set in the first place.
> Does it mean we allow the migration at the beginning and the abort later
> when the page size does not fall within the subset for hugepage_movable.
> Could you please kindly explain this in more detail.

The purpose of arch_hugepage_migration_supported is to tell whether it
makes any sense to even try to migration. The allocation placement is
completely independent on this choice. The later just says whether it is
feasible to place a hugepage to the zone movable. Sure regular 2MB pages
do not guarantee movability as well because of the memory fragmentation.
But allocating a 2MB is a completely different storage from 1G or even
larger huge pages, isn't it?

> > be a reasonably migrateable subset of the former. Without that this
> > patch migth introduce subtle regressions when somebody relies on movable
> > zone to be really movable.
> PUD based HugeTLB pages were never migratable, then how can there be any
> regression here ?

That means that they haven't been allocated from the movable zone
before. Which is going to change by this patch.

> At present we even support PGD based HugeTLB pages for
> migration.

And that is already wrong but nobody probably cares because those are
rarely used.

> Wondering how PUD based ones are going to be any different.

It is not different, PGD is dubious already.
Anshuman Khandual Oct. 3, 2018, 9:58 a.m. UTC | #6
On 10/03/2018 12:28 PM, Michal Hocko wrote:
> On Wed 03-10-18 07:46:27, Anshuman Khandual wrote:
>>
>>
>> On 10/02/2018 06:09 PM, Michal Hocko wrote:
>>> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
>>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>>>> order based HugeTLB pages during migration.
>>>
>>> Well a long term problem with hugepage_migration_supported is that it is
>>> used in two different context 1) to bail out from the migration early
>>> because the arch doesn't support migration at all and 2) to use movable
>>> zone for hugetlb pages allocation. I am especially concerned about the
>>> later because the mere support for migration is not really good enough.
>>> Are you really able to find a different giga page during the runtime to
>>> move an existing giga page out of the movable zone?
>>
>> I pre-allocate them before trying to initiate the migration (soft offline
>> in my experiments). Hence it should come from the pre-allocated HugeTLB
>> pool instead from the buddy. I might be missing something here but do we
>> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
>> came from the pool (unless its something related to ovecommit/surplus).
>> Could you please kindly explain regarding how migration target HugeTLB
>> pages are allocated on the fly from movable zone.
> 
> Hotplug comes to mind. You usually do not pre-allocate to cover full
> node going offline. And people would like to do that. Another example is
> CMA. You would really like to move pages out of the way.

You are right.

Hotplug migration:

__offline_pages
   do_migrate_range
	migrate_pages(...new_node_page...)

new_node_page
   new_page_nodemask
	alloc_huge_page_nodemask
	   dequeue_huge_page_nodemask (Getting from pool)
	or
	   alloc_migrate_huge_page    (Getting from buddy - non-gigantic)
		alloc_fresh_huge_page
		    alloc_buddy_huge_page
			__alloc_pages_nodemask ----> goes into buddy

CMA allocation:

cma_alloc
   alloc_contig_range
	__alloc_contig_migrate_range
		migrate_pages(...alloc_migrate_target...)

alloc_migrate_target
   new_page_nodemask -> __alloc_pages_nodemask ---> goes into buddy

But this is not applicable for gigantic pages for which it backs off way
before going into buddy. With MAX_ORDER as 11 its anything beyond 64MB
for 64K pages, 16MB for 16K pages, 4MB for 4K pages etc. So all those
bigger huge pages like 512MB/1GB/16GB will not be part of the HugeTLB/CMA
initiated migrations. I will look into migration details during auto NUMA,
compaction, memory-failure etc to see if gigantic huge page is allocated
from the buddy with ___alloc_pages_nodemask or with alloc_contig_range().

> 
>> But even if there are some chances of run time allocation failure from
>> movable zone (as in point 2) that should not block the very initiation
>> of migration itself. IIUC thats not the semantics for either THP or
>> normal pages. Why should it be different here. If the allocation fails
>> we should report and abort as always. Its the caller of migration taking
>> the chances. why should we prevent that.
> 
> Yes I agree, hence the distinction between the arch support for
> migrateability and the criterion on the movable zone placement.
movable zone placement sounds very tricky here. How can the platform
(through the hook huge_movable) before hand say whether destination
page could be allocated from the ZONE_MOVABLE without looking into the
state of buddy at migration (any sort attempt to do this is going to
be expensive) or it merely indicates the desire to live with possible
consequence (unable to hot unplug/CMA going forward) for a migration
which might end up in an unmovable area.

>  
>>>
>>> So I guess we want to split this into two functions
>>> arch_hugepage_migration_supported and hugepage_movable. The later would
>>
>> So the set difference between arch_hugepage_migration_supported and 
>> hugepage_movable still remains un-migratable ? Then what is the purpose
>> for arch_hugepage_migration_supported page size set in the first place.
>> Does it mean we allow the migration at the beginning and the abort later
>> when the page size does not fall within the subset for hugepage_movable.
>> Could you please kindly explain this in more detail.
> 
> The purpose of arch_hugepage_migration_supported is to tell whether it
> makes any sense to even try to migration. The allocation placement is

Which kind of matches what we have right now and being continued with this
proposal in the series.

> completely independent on this choice. The later just says whether it is
> feasible to place a hugepage to the zone movable. Sure regular 2MB pages

What do you exactly mean by feasible ? Wont it depend on the state of the
buddy allocator (ZONE_MOVABLE in particular) and it's ability to accommodate
a given huge page. How can the platform decide on it ? Or as I mentioned
before it's platform's willingness to live with unmovable huge pages (of
certain sizes) as a consequence of migration.

> do not guarantee movability as well because of the memory fragmentation.
> But allocating a 2MB is a completely different storage from 1G or even
> larger huge pages, isn't it?

Right I understand that. Hotplug/CMA capability goes down more with bigger
huge pages being unmovable on the system.

> 
>>> be a reasonably migrateable subset of the former. Without that this
>>> patch migth introduce subtle regressions when somebody relies on movable
>>> zone to be really movable.
>> PUD based HugeTLB pages were never migratable, then how can there be any
>> regression here ?
> 
> That means that they haven't been allocated from the movable zone
> before. Which is going to change by this patch.

The source PUD huge page might have been allocated from movable zone.
The denial for migration is explicit and because we dont check for
PUD_SHIFT in there and nothing to do with the zone type where the
source page belongs. But are you referring to regression caused by
something like this.

Before the patch:

- PUD huge page allocated on ZONE_MOVABLE
- Huge page is movable (Hotplug/CMA works)

After the patch:

- PUD huge page allocated on ZONE_MOVABLE
- Migration is successful without checking for destination page's zone
- PUD huge page (new) is not on ZONE_MOVABLE
- Huge page is unmovable (Hotplug/CMA does not work anymore) -> Regression!

> 
>> At present we even support PGD based HugeTLB pages for
>> migration.
> 
> And that is already wrong but nobody probably cares because those are
> rarely used.
> 
>> Wondering how PUD based ones are going to be any different.
> 
> It is not different, PGD is dubious already.
>
Got it.
Suzuki K Poulose Oct. 3, 2018, 10:22 a.m. UTC | #7
On 02/10/18 13:56, Anshuman Khandual wrote:
> 
> 
> On 10/02/2018 06:08 PM, Suzuki K Poulose wrote:
>> Hi Anshuman
>>
>> On 02/10/18 13:15, Anshuman Khandual wrote:
>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>>> order based HugeTLB pages during migration.
>>>
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>> ---
>>>    include/linux/hugetlb.h | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>> index 6b68e34..9c1b77f 100644
>>> --- a/include/linux/hugetlb.h
>>> +++ b/include/linux/hugetlb.h
>>> @@ -483,7 +483,8 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>>    {
>>>    #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>>>        if ((huge_page_shift(h) == PMD_SHIFT) ||
>>> -        (huge_page_shift(h) == PGDIR_SHIFT))
>>> +        (huge_page_shift(h) == PUD_SHIFT) ||
>>
>>
>>> +            (huge_page_shift(h) == PGDIR_SHIFT))
>>
>> nit: Extra Tab ^^.
> 
> The tab is in there when you apply this patch and all three checks are tab separated
> in a newline.

Well, with the patch applied, at least I can see 2 tabs for the
PUD_SHIFT check and 3 tabs for PGDIR_SHIFT check. Which seems
inconsistent. Is it just me (my mail client) ?

Suzuki
Michal Hocko Oct. 3, 2018, 10:59 a.m. UTC | #8
On Wed 03-10-18 15:28:23, Anshuman Khandual wrote:
> 
> 
> On 10/03/2018 12:28 PM, Michal Hocko wrote:
> > On Wed 03-10-18 07:46:27, Anshuman Khandual wrote:
> >>
> >>
> >> On 10/02/2018 06:09 PM, Michal Hocko wrote:
> >>> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
> >>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
> >>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
> >>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
> >>>> order based HugeTLB pages during migration.
> >>>
> >>> Well a long term problem with hugepage_migration_supported is that it is
> >>> used in two different context 1) to bail out from the migration early
> >>> because the arch doesn't support migration at all and 2) to use movable
> >>> zone for hugetlb pages allocation. I am especially concerned about the
> >>> later because the mere support for migration is not really good enough.
> >>> Are you really able to find a different giga page during the runtime to
> >>> move an existing giga page out of the movable zone?
> >>
> >> I pre-allocate them before trying to initiate the migration (soft offline
> >> in my experiments). Hence it should come from the pre-allocated HugeTLB
> >> pool instead from the buddy. I might be missing something here but do we
> >> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
> >> came from the pool (unless its something related to ovecommit/surplus).
> >> Could you please kindly explain regarding how migration target HugeTLB
> >> pages are allocated on the fly from movable zone.
> > 
> > Hotplug comes to mind. You usually do not pre-allocate to cover full
> > node going offline. And people would like to do that. Another example is
> > CMA. You would really like to move pages out of the way.
> 
> You are right.
> 
> Hotplug migration:
> 
> __offline_pages
>    do_migrate_range
> 	migrate_pages(...new_node_page...)
> 
> new_node_page
>    new_page_nodemask
> 	alloc_huge_page_nodemask
> 	   dequeue_huge_page_nodemask (Getting from pool)
> 	or
> 	   alloc_migrate_huge_page    (Getting from buddy - non-gigantic)
> 		alloc_fresh_huge_page
> 		    alloc_buddy_huge_page
> 			__alloc_pages_nodemask ----> goes into buddy
> 
> CMA allocation:
> 
> cma_alloc
>    alloc_contig_range
> 	__alloc_contig_migrate_range
> 		migrate_pages(...alloc_migrate_target...)
> 
> alloc_migrate_target
>    new_page_nodemask -> __alloc_pages_nodemask ---> goes into buddy
> 
> But this is not applicable for gigantic pages for which it backs off way
> before going into buddy.

This is an implementation detail - mostly a missing or an incomplete
hugetlb overcommit implementation IIRC. The primary point remains the
same. Being able to migrate in principle and feasible enough to migrate
to be placed in zone movable are two distinct things.
[...]
> >> But even if there are some chances of run time allocation failure from
> >> movable zone (as in point 2) that should not block the very initiation
> >> of migration itself. IIUC thats not the semantics for either THP or
> >> normal pages. Why should it be different here. If the allocation fails
> >> we should report and abort as always. Its the caller of migration taking
> >> the chances. why should we prevent that.
> > 
> > Yes I agree, hence the distinction between the arch support for
> > migrateability and the criterion on the movable zone placement.
> movable zone placement sounds very tricky here. How can the platform
> (through the hook huge_movable) before hand say whether destination
> page could be allocated from the ZONE_MOVABLE without looking into the
> state of buddy at migration (any sort attempt to do this is going to
> be expensive) or it merely indicates the desire to live with possible
> consequence (unable to hot unplug/CMA going forward) for a migration
> which might end up in an unmovable area.

I do not follow. The whole point of zone_movable is to provide a
physical memory range which is more or less movable. That means that
pages allocated from this zone can be migrated away should there be a
reason for that.

> >>> So I guess we want to split this into two functions
> >>> arch_hugepage_migration_supported and hugepage_movable. The later would
> >>
> >> So the set difference between arch_hugepage_migration_supported and 
> >> hugepage_movable still remains un-migratable ? Then what is the purpose
> >> for arch_hugepage_migration_supported page size set in the first place.
> >> Does it mean we allow the migration at the beginning and the abort later
> >> when the page size does not fall within the subset for hugepage_movable.
> >> Could you please kindly explain this in more detail.
> > 
> > The purpose of arch_hugepage_migration_supported is to tell whether it
> > makes any sense to even try to migration. The allocation placement is
> 
> Which kind of matches what we have right now and being continued with this
> proposal in the series.

Except you only go half way there. Because you still consider "able to
migrate" and "feasible to migrate" as the same thing.

> 
> > completely independent on this choice. The later just says whether it is
> > feasible to place a hugepage to the zone movable. Sure regular 2MB pages
> 
> What do you exactly mean by feasible ? Wont it depend on the state of the
> buddy allocator (ZONE_MOVABLE in particular) and it's ability to accommodate
> a given huge page. How can the platform decide on it ?

It is not the platform that decides. That is the whole point of the
distinction. It is us to say what is feasible and what we want to
support. Do we want to support giga pages in zone_movable? Under which
conditions? See my point?

> Or as I mentioned
> before it's platform's willingness to live with unmovable huge pages (of
> certain sizes) as a consequence of migration.

No, the platform has no saying in that. The platform only says that it
supports migrating those pages in principle.
Anshuman Khandual Oct. 3, 2018, 11:10 a.m. UTC | #9
On 10/03/2018 03:52 PM, Suzuki K Poulose wrote:
> 
> 
> On 02/10/18 13:56, Anshuman Khandual wrote:
>>
>>
>> On 10/02/2018 06:08 PM, Suzuki K Poulose wrote:
>>> Hi Anshuman
>>>
>>> On 02/10/18 13:15, Anshuman Khandual wrote:
>>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>>>> order based HugeTLB pages during migration.
>>>>
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> ---
>>>>    include/linux/hugetlb.h | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>>> index 6b68e34..9c1b77f 100644
>>>> --- a/include/linux/hugetlb.h
>>>> +++ b/include/linux/hugetlb.h
>>>> @@ -483,7 +483,8 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>>>    {
>>>>    #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>>>>        if ((huge_page_shift(h) == PMD_SHIFT) ||
>>>> -        (huge_page_shift(h) == PGDIR_SHIFT))
>>>> +        (huge_page_shift(h) == PUD_SHIFT) ||
>>>
>>>
>>>> +            (huge_page_shift(h) == PGDIR_SHIFT))
>>>
>>> nit: Extra Tab ^^.
>>
>> The tab is in there when you apply this patch and all three checks are tab separated
>> in a newline.
> 
> Well, with the patch applied, at least I can see 2 tabs for the
> PUD_SHIFT check and 3 tabs for PGDIR_SHIFT check. Which seems
> inconsistent. Is it just me (my mail client) ?

I am sorry, you are right. Did not understand your point earlier. Yeah there is
increasing number of tabs for each new line with a conditional check. Is there
a problem with this style of indentation ? Though I will be happy to change.
Suzuki K Poulose Oct. 3, 2018, 11:17 a.m. UTC | #10
On 03/10/18 12:10, Anshuman Khandual wrote:
> 
> 
> On 10/03/2018 03:52 PM, Suzuki K Poulose wrote:
>>
>>
>> On 02/10/18 13:56, Anshuman Khandual wrote:
>>>
>>>
>>> On 10/02/2018 06:08 PM, Suzuki K Poulose wrote:
>>>> Hi Anshuman
>>>>
>>>> On 02/10/18 13:15, Anshuman Khandual wrote:
>>>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>>>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>>>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>>>>> order based HugeTLB pages during migration.
>>>>>
>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>>>>> ---
>>>>>     include/linux/hugetlb.h | 3 ++-
>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>>>>> index 6b68e34..9c1b77f 100644
>>>>> --- a/include/linux/hugetlb.h
>>>>> +++ b/include/linux/hugetlb.h
>>>>> @@ -483,7 +483,8 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>>>>     {
>>>>>     #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>>>>>         if ((huge_page_shift(h) == PMD_SHIFT) ||
>>>>> -        (huge_page_shift(h) == PGDIR_SHIFT))
>>>>> +        (huge_page_shift(h) == PUD_SHIFT) ||
>>>>
>>>>
>>>>> +            (huge_page_shift(h) == PGDIR_SHIFT))
>>>>
>>>> nit: Extra Tab ^^.
>>>
>>> The tab is in there when you apply this patch and all three checks are tab separated
>>> in a newline.
>>
>> Well, with the patch applied, at least I can see 2 tabs for the
>> PUD_SHIFT check and 3 tabs for PGDIR_SHIFT check. Which seems
>> inconsistent. Is it just me (my mail client) ?
> 
> I am sorry, you are right. Did not understand your point earlier. Yeah there is
> increasing number of tabs for each new line with a conditional check. Is there
> a problem with this style of indentation ? Though I will be happy to change.

I have been under the idea that all the checks at the same level could
have the same indentation. (i.e, 2 tabs in this case for each). Looks
like there is no rule about it. How about replacing it with a
switch..case  ?

Cheers
Suzuki
Michal Hocko Oct. 3, 2018, 11:27 a.m. UTC | #11
On Wed 03-10-18 12:17:52, Suzuki K Poulose wrote:
[...]
> I have been under the idea that all the checks at the same level could
> have the same indentation. (i.e, 2 tabs in this case for each). Looks
> like there is no rule about it. How about replacing it with a
> switch..case  ?

I would simply follow the existing indentation style in that function.
Is this really worth discussig, anyway? Does the proposed change makes
the code harder to read?
Anshuman Khandual Oct. 3, 2018, 11:37 a.m. UTC | #12
On 10/03/2018 04:29 PM, Michal Hocko wrote:
> On Wed 03-10-18 15:28:23, Anshuman Khandual wrote:
>>
>>
>> On 10/03/2018 12:28 PM, Michal Hocko wrote:
>>> On Wed 03-10-18 07:46:27, Anshuman Khandual wrote:
>>>>
>>>>
>>>> On 10/02/2018 06:09 PM, Michal Hocko wrote:
>>>>> On Tue 02-10-18 17:45:28, Anshuman Khandual wrote:
>>>>>> Architectures like arm64 have PUD level HugeTLB pages for certain configs
>>>>>> (1GB huge page is PUD based on ARM64_4K_PAGES base page size) that can be
>>>>>> enabled for migration. It can be achieved through checking for PUD_SHIFT
>>>>>> order based HugeTLB pages during migration.
>>>>>
>>>>> Well a long term problem with hugepage_migration_supported is that it is
>>>>> used in two different context 1) to bail out from the migration early
>>>>> because the arch doesn't support migration at all and 2) to use movable
>>>>> zone for hugetlb pages allocation. I am especially concerned about the
>>>>> later because the mere support for migration is not really good enough.
>>>>> Are you really able to find a different giga page during the runtime to
>>>>> move an existing giga page out of the movable zone?
>>>>
>>>> I pre-allocate them before trying to initiate the migration (soft offline
>>>> in my experiments). Hence it should come from the pre-allocated HugeTLB
>>>> pool instead from the buddy. I might be missing something here but do we
>>>> ever allocate HugeTLB on the go when trying to migrate ? IIUC it always
>>>> came from the pool (unless its something related to ovecommit/surplus).
>>>> Could you please kindly explain regarding how migration target HugeTLB
>>>> pages are allocated on the fly from movable zone.
>>>
>>> Hotplug comes to mind. You usually do not pre-allocate to cover full
>>> node going offline. And people would like to do that. Another example is
>>> CMA. You would really like to move pages out of the way.
>>
>> You are right.
>>
>> Hotplug migration:
>>
>> __offline_pages
>>    do_migrate_range
>> 	migrate_pages(...new_node_page...)
>>
>> new_node_page
>>    new_page_nodemask
>> 	alloc_huge_page_nodemask
>> 	   dequeue_huge_page_nodemask (Getting from pool)
>> 	or
>> 	   alloc_migrate_huge_page    (Getting from buddy - non-gigantic)
>> 		alloc_fresh_huge_page
>> 		    alloc_buddy_huge_page
>> 			__alloc_pages_nodemask ----> goes into buddy
>>
>> CMA allocation:
>>
>> cma_alloc
>>    alloc_contig_range
>> 	__alloc_contig_migrate_range
>> 		migrate_pages(...alloc_migrate_target...)
>>
>> alloc_migrate_target
>>    new_page_nodemask -> __alloc_pages_nodemask ---> goes into buddy
>>
>> But this is not applicable for gigantic pages for which it backs off way
>> before going into buddy.
> 
> This is an implementation detail - mostly a missing or an incomplete
> hugetlb overcommit implementation IIRC. The primary point remains the
> same. Being able to migrate in principle and feasible enough to migrate
> to be placed in zone movable are two distinct things.

I agree. They are two distinct things.

> [...]
>>>> But even if there are some chances of run time allocation failure from
>>>> movable zone (as in point 2) that should not block the very initiation
>>>> of migration itself. IIUC thats not the semantics for either THP or
>>>> normal pages. Why should it be different here. If the allocation fails
>>>> we should report and abort as always. Its the caller of migration taking
>>>> the chances. why should we prevent that.
>>>
>>> Yes I agree, hence the distinction between the arch support for
>>> migrateability and the criterion on the movable zone placement.
>> movable zone placement sounds very tricky here. How can the platform
>> (through the hook huge_movable) before hand say whether destination
>> page could be allocated from the ZONE_MOVABLE without looking into the
>> state of buddy at migration (any sort attempt to do this is going to
>> be expensive) or it merely indicates the desire to live with possible
>> consequence (unable to hot unplug/CMA going forward) for a migration
>> which might end up in an unmovable area.
> 
> I do not follow. The whole point of zone_movable is to provide a
> physical memory range which is more or less movable. That means that
> pages allocated from this zone can be migrated away should there be a
> reason for that.

I understand this.

> 
>>>>> So I guess we want to split this into two functions
>>>>> arch_hugepage_migration_supported and hugepage_movable. The later would
>>>>
>>>> So the set difference between arch_hugepage_migration_supported and 
>>>> hugepage_movable still remains un-migratable ? Then what is the purpose
>>>> for arch_hugepage_migration_supported page size set in the first place.
>>>> Does it mean we allow the migration at the beginning and the abort later
>>>> when the page size does not fall within the subset for hugepage_movable.
>>>> Could you please kindly explain this in more detail.
>>>
>>> The purpose of arch_hugepage_migration_supported is to tell whether it
>>> makes any sense to even try to migration. The allocation placement is
>>
>> Which kind of matches what we have right now and being continued with this
>> proposal in the series.
> 
> Except you only go half way there. Because you still consider "able to
> migrate" and "feasible to migrate" as the same thing.

Okay.

> 
>>
>>> completely independent on this choice. The later just says whether it is
>>> feasible to place a hugepage to the zone movable. Sure regular 2MB pages
>>
>> What do you exactly mean by feasible ? Wont it depend on the state of the
>> buddy allocator (ZONE_MOVABLE in particular) and it's ability to accommodate
>> a given huge page. How can the platform decide on it ?
> 
> It is not the platform that decides. That is the whole point of the
> distinction. It is us to say what is feasible and what we want to
> support. Do we want to support giga pages in zone_movable? Under which
> conditions? See my point?

So huge_movable() is going to be a generic MM function deciding on the
feasibility for allocating a huge page of 'size' from movable zone during
migration. If the feasibility turns out to be negative, then migration
process is aborted there.

huge_movable() will do something like these:

- Return positive right away on smaller size huge pages
- Measure movable allocation feasibility for bigger huge pages
	- Look out for free_pages in the huge page order in movable areas
	- if (order > (MAX_ORDER - 1))
		- Scan the PFN ranges in movable zone for possible allocation
	- etc
	- etc

Did I get this right ?

> 
>> Or as I mentioned
>> before it's platform's willingness to live with unmovable huge pages (of
>> certain sizes) as a consequence of migration.
> 
> No, the platform has no saying in that. The platform only says that it
> supports migrating those pages in principle.
I understand this now.
Michal Hocko Oct. 3, 2018, 11:48 a.m. UTC | #13
On Wed 03-10-18 17:07:13, Anshuman Khandual wrote:
> 
> 
> On 10/03/2018 04:29 PM, Michal Hocko wrote:
[...]
> > It is not the platform that decides. That is the whole point of the
> > distinction. It is us to say what is feasible and what we want to
> > support. Do we want to support giga pages in zone_movable? Under which
> > conditions? See my point?
> 
> So huge_movable() is going to be a generic MM function deciding on the
> feasibility for allocating a huge page of 'size' from movable zone during
> migration.

Yeah, this might be a more complex logic than just the size check. If
there is a sufficient pre-allocated pool to migrate the page off it
might be pre-reserved for future migration etc... Nothing to be done
right now of course.

> If the feasibility turns out to be negative, then migration
> process is aborted there.

You are still confusing allocation and migration here I am afraid. The
whole "feasible to migrate" is for the _allocation_ time when we decide
whether the new page should be placed in zone_movable or not.

> huge_movable() will do something like these:
> 
> - Return positive right away on smaller size huge pages
> - Measure movable allocation feasibility for bigger huge pages
> 	- Look out for free_pages in the huge page order in movable areas
> 	- if (order > (MAX_ORDER - 1))
> 		- Scan the PFN ranges in movable zone for possible allocation
> 	- etc
> 	- etc
> 
> Did I get this right ?

Well, not really. I was thinking of something like this for the
beginning
	if (!arch_hugepage_migration_supporte())
		return false;
	if (hstate_is_gigantic(h))
		return false;
	return true;

further changes might be done on top of this.
Anshuman Khandual Oct. 3, 2018, 1:06 p.m. UTC | #14
On 10/03/2018 05:18 PM, Michal Hocko wrote:
> On Wed 03-10-18 17:07:13, Anshuman Khandual wrote:
>>
>>
>> On 10/03/2018 04:29 PM, Michal Hocko wrote:
> [...]
>>> It is not the platform that decides. That is the whole point of the
>>> distinction. It is us to say what is feasible and what we want to
>>> support. Do we want to support giga pages in zone_movable? Under which
>>> conditions? See my point?
>>
>> So huge_movable() is going to be a generic MM function deciding on the
>> feasibility for allocating a huge page of 'size' from movable zone during
>> migration.
> 
> Yeah, this might be a more complex logic than just the size check. If
> there is a sufficient pre-allocated pool to migrate the page off it
> might be pre-reserved for future migration etc... Nothing to be done
> right now of course.

If the huge page has a pre-allocated pool, then it gets consumed first
through the current allocator logic (new_page_nodemask). Hence testing
for feasibility by looking into pool and (buddy / zone) together is not
going to change the policy unless there is also a new allocator which
goes and consumes (from reserved pool or buddy/zone) huge pages as
envisioned by the feasibility checker. But I understand your point.
That path can be explored as well.

> 
>> If the feasibility turns out to be negative, then migration
>> process is aborted there.
> 
> You are still confusing allocation and migration here I am afraid. The
> whole "feasible to migrate" is for the _allocation_ time when we decide
> whether the new page should be placed in zone_movable or not.

migrate_pages() -> platform specific arch_hugetlb_migration(in principle) ->
generic huge_movable() feasibility check while trying to allocate the
destination page -> move source to destination -> complete !

So we have two checks here

1) platform specific arch_hugetlb_migration -> In principle go ahead

2) huge_movable() during allocation

	- If huge page does not have to be placed on movable zone

		- Allocate any where successfully and done !
 
	- If huge page *should* be placed on a movable zone

		- Try allocating on movable zone

			- Successfull and done !

		- If the new page could not be allocated on movable zone
		
			- Abort the migration completely

					OR

			- Warn and fall back to non-movable


There is an important point to note here.

- Whether a huge size should be on movable zone can be determined
  looking into size and other parameters during feasibility test

- But whether a huge size can be allocated in actual on movable zone
  might not be determined without really allocating it which will
  further delay the decision to successfully complete the migration,
  warning about it or aborting it at this allocation phase itself

> 
>> huge_movable() will do something like these:
>>
>> - Return positive right away on smaller size huge pages
>> - Measure movable allocation feasibility for bigger huge pages
>> 	- Look out for free_pages in the huge page order in movable areas
>> 	- if (order > (MAX_ORDER - 1))
>> 		- Scan the PFN ranges in movable zone for possible allocation
>> 	- etc
>> 	- etc
>>
>> Did I get this right ?
> 
> Well, not really. I was thinking of something like this for the
> beginning
> 	if (!arch_hugepage_migration_supporte())
> 		return false;

First check: Platform check in principle as you mentioned before

> 	if (hstate_is_gigantic(h))
> 		return false;

Second check: Simplistic generic allocation feasibility check looking just at size

> 	return true;
> 
> further changes might be done on top of this.
Right.
Michal Hocko Oct. 3, 2018, 1:36 p.m. UTC | #15
On Wed 03-10-18 18:36:39, Anshuman Khandual wrote:
[...]
> So we have two checks here
> 
> 1) platform specific arch_hugetlb_migration -> In principle go ahead
> 
> 2) huge_movable() during allocation
> 
> 	- If huge page does not have to be placed on movable zone
> 
> 		- Allocate any where successfully and done !
>  
> 	- If huge page *should* be placed on a movable zone
> 
> 		- Try allocating on movable zone
> 
> 			- Successfull and done !
> 
> 		- If the new page could not be allocated on movable zone
> 		
> 			- Abort the migration completely
> 
> 					OR
> 
> 			- Warn and fall back to non-movable

I guess you are still making it more complicated than necessary. The
later is really only about __GFP_MOVABLE at this stage. I would just
make it simple for now. We do not have to implement any dynamic
heuristic right now. All that I am asking for is to split the migrate
possible part from movable part.

I should have been more clear about that I guess from my very first
reply. I do like how you moved the current coarse grained
hugepage_migration_supported to be more arch specific but I merely
wanted to point out that we need to do some other changes before we can
go that route and that thing is to distinguish movable from migration
supported.

See my point?
Anshuman Khandual Oct. 5, 2018, 7:34 a.m. UTC | #16
On 10/03/2018 07:06 PM, Michal Hocko wrote:
> On Wed 03-10-18 18:36:39, Anshuman Khandual wrote:
> [...]
>> So we have two checks here
>>
>> 1) platform specific arch_hugetlb_migration -> In principle go ahead
>>
>> 2) huge_movable() during allocation
>>
>> 	- If huge page does not have to be placed on movable zone
>>
>> 		- Allocate any where successfully and done !
>>  
>> 	- If huge page *should* be placed on a movable zone
>>
>> 		- Try allocating on movable zone
>>
>> 			- Successfull and done !
>>
>> 		- If the new page could not be allocated on movable zone
>> 		
>> 			- Abort the migration completely
>>
>> 					OR
>>
>> 			- Warn and fall back to non-movable
> 
> I guess you are still making it more complicated than necessary. The
> later is really only about __GFP_MOVABLE at this stage. I would just
> make it simple for now. We do not have to implement any dynamic
> heuristic right now. All that I am asking for is to split the migrate
> possible part from movable part.
> 
> I should have been more clear about that I guess from my very first
> reply. I do like how you moved the current coarse grained
> hugepage_migration_supported to be more arch specific but I merely
> wanted to point out that we need to do some other changes before we can
> go that route and that thing is to distinguish movable from migration
> supported.
> 
> See my point?

Does the following sound close enough to what you are looking for ?

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 9df1d59..070c419 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -504,6 +504,13 @@ static inline bool hugepage_migration_supported(struct hstate *h)
        return arch_hugetlb_migration_supported(h);
 }
 
+static inline bool hugepage_movable_required(struct hstate *h)
+{
+       if (hstate_is_gigantic(h))
+               return true;
+       return false;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
                                           struct mm_struct *mm, pte_t *pte)
 {
@@ -600,6 +607,11 @@ static inline bool hugepage_migration_supported(struct hstate *h)
        return false;
 }
 
+static inline bool hugepage_movable_required(struct hstate *h)
+{
+       return false;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
                                           struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c21775..8b0afdc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1635,6 +1635,9 @@ struct page *alloc_huge_page_node(struct hstate *h, int nid)
        if (nid != NUMA_NO_NODE)
                gfp_mask |= __GFP_THISNODE;
 
+       if (hugepage_movable_required(h))
+               gfp_mask |= __GFP_MOVABLE;
+
        spin_lock(&hugetlb_lock);
        if (h->free_huge_pages - h->resv_huge_pages > 0)
                page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL);
@@ -1652,6 +1655,9 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
 {
        gfp_t gfp_mask = htlb_alloc_mask(h);
 
+       if (hugepage_movable_required(h))
+               gfp_mask |= __GFP_MOVABLE;
+
        spin_lock(&hugetlb_lock);
        if (h->free_huge_pages - h->resv_huge_pages > 0) {
                struct page *page;
Michal Hocko Oct. 9, 2018, 2:14 p.m. UTC | #17
On Fri 05-10-18 13:04:43, Anshuman Khandual wrote:
> Does the following sound close enough to what you are looking for ?

I do not think so

> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9df1d59..070c419 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -504,6 +504,13 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>         return arch_hugetlb_migration_supported(h);
>  }
>  
> +static inline bool hugepage_movable_required(struct hstate *h)
> +{
> +       if (hstate_is_gigantic(h))
> +               return true;
> +       return false;
> +}
> +

Apart from naming (hugepage_movable_supported?) the above doesn't do the
most essential thing to query whether the hugepage migration is
supported at all. Apart from that i would expect the logic to be revers.
We do not really support giga pages migration enough to support them in
movable zone.
> @@ -1652,6 +1655,9 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
>  {
>         gfp_t gfp_mask = htlb_alloc_mask(h);
>  
> +       if (hugepage_movable_required(h))
> +               gfp_mask |= __GFP_MOVABLE;
> +

And besides that this really want to live in htlb_alloc_mask because
this is really an allocation policy. It would be unmap_and_move_huge_page
to call hugepage_migration_supported. The later is the one to allow for
an arch specific override.

Makes sense?
Anshuman Khandual Oct. 10, 2018, 3:09 a.m. UTC | #18
On 10/09/2018 07:44 PM, Michal Hocko wrote:
> On Fri 05-10-18 13:04:43, Anshuman Khandual wrote:
>> Does the following sound close enough to what you are looking for ?
> 
> I do not think so

Okay.

> 
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 9df1d59..070c419 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -504,6 +504,13 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>         return arch_hugetlb_migration_supported(h);
>>  }
>>  
>> +static inline bool hugepage_movable_required(struct hstate *h)
>> +{
>> +       if (hstate_is_gigantic(h))
>> +               return true;
>> +       return false;
>> +}
>> +
> 
> Apart from naming (hugepage_movable_supported?) the above doesn't do the
> most essential thing to query whether the hugepage migration is
> supported at all. Apart from that i would expect the logic to be revers.

My assumption was hugepage_migration_supported() should only be called from
unmap_and_move_huge_page() but we can add that here as well to limit the
set further.

> We do not really support giga pages migration enough to support them in
> movable zone.

Reversing the logic here would change gfp_t for a large number of huge pages.
Current implementation is very liberal in assigning __GFP_MOVABLE for multiple
huge page sizes (all most all of them when migration is enabled). But I guess
it should be okay because after all we are tying to control which all sizes
should be movable and which all should not be.

static inline bool hugepage_migration_supported(struct hstate *h)
{
#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
        if ((huge_page_shift(h) == PMD_SHIFT) ||
                (huge_page_shift(h) == PGDIR_SHIFT))
                return true;
        else
                return false;
#else
        return false;
#endif
}

static inline gfp_t htlb_alloc_mask(struct hstate *h)
{
        if (hugepage_migration_supported(h))
                return GFP_HIGHUSER_MOVABLE;
        else
                return GFP_HIGHUSER;
}

>> @@ -1652,6 +1655,9 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
>>  {
>>         gfp_t gfp_mask = htlb_alloc_mask(h);
>>  
>> +       if (hugepage_movable_required(h))
>> +               gfp_mask |= __GFP_MOVABLE;
>> +
> 
> And besides that this really want to live in htlb_alloc_mask because
> this is really an allocation policy. It would be unmap_and_move_huge_page
> to call hugepage_migration_supported. The later is the one to allow for
> an arch specific override.
> 
> Makes sense?
> 

hugepage_migration_supported() will be checked inside hugepage_movable_supported().
A changed version ....

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 9df1d59..4bcbf1e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -504,6 +504,16 @@ static inline bool hugepage_migration_supported(struct hstate *h)
        return arch_hugetlb_migration_supported(h);
 }
 
+static inline bool hugepage_movable_supported(struct hstate *h)
+{
+       if (!hugepage_migration_supported(h)) --> calls arch override restricting the set
+               return false;
+
+       if (hstate_is_gigantic(h)	--------> restricts the set further
+               return false;
+       return true;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
                                           struct mm_struct *mm, pte_t *pte)
 {
@@ -600,6 +610,11 @@ static inline bool hugepage_migration_supported(struct hstate *h)
        return false;
 }
 
+static inline bool hugepage_movable_supported(struct hstate *h)
+{
+       return false;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
                                           struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3c21775..a5a111d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -919,7 +919,7 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-       if (hugepage_migration_supported(h))
+       if (hugepage_movable_supported(h))
                return GFP_HIGHUSER_MOVABLE;
        else
                return GFP_HIGHUSER;


The above patch is in addition to the following later patch in the series.

    mm/hugetlb: Enable arch specific huge page size support for migration
    
    Architectures like arm64 have HugeTLB page sizes which are different than
    generic sizes at PMD, PUD, PGD level and implemented via contiguous bits.
    At present these special size HugeTLB pages cannot be identified through
    macros like (PMD|PUD|PGDIR)_SHIFT and hence chosen not be migrated.
    
    Enabling migration support for these special HugeTLB page sizes along with
    the generic ones (PMD|PUD|PGD) would require identifying all of them on a
    given platform. A platform specific hook can precisely enumerate all huge
    page sizes supported for migration. Instead of comparing against standard
    huge page orders let hugetlb_migration_support() function call a platform
    hook arch_hugetlb_migration_support(). Default definition for the platform
    hook maintains existing semantics which checks standard huge page order.
    But an architecture can choose to override the default and provide support
    for a comprehensive set of huge page sizes.
    
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 9c1b77f..9df1d59 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -479,18 +479,29 @@ static inline pgoff_t basepage_index(struct page *page)
 extern int dissolve_free_huge_page(struct page *page);
 extern int dissolve_free_huge_pages(unsigned long start_pfn,
                                    unsigned long end_pfn);
-static inline bool hugepage_migration_supported(struct hstate *h)
-{
+
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+#ifndef arch_hugetlb_migration_supported
+static inline bool arch_hugetlb_migration_supported(struct hstate *h)
+{
        if ((huge_page_shift(h) == PMD_SHIFT) ||
                (huge_page_shift(h) == PUD_SHIFT) ||
                        (huge_page_shift(h) == PGDIR_SHIFT))
                return true;
        else
                return false;
+}
+#endif
 #else
+static inline bool arch_hugetlb_migration_supported(struct hstate *h)
+{
        return false;
+}
 #endif
+
+static inline bool hugepage_migration_supported(struct hstate *h)
+{
+       return arch_hugetlb_migration_supported(h);
 }
Michal Hocko Oct. 10, 2018, 9:39 a.m. UTC | #19
On Wed 10-10-18 08:39:22, Anshuman Khandual wrote:
[...]
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9df1d59..4bcbf1e 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -504,6 +504,16 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>         return arch_hugetlb_migration_supported(h);
>  }
>  
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> +       if (!hugepage_migration_supported(h)) --> calls arch override restricting the set
> +               return false;
> +
> +       if (hstate_is_gigantic(h)	--------> restricts the set further
> +               return false;
> +       return true;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>                                            struct mm_struct *mm, pte_t *pte)
>  {
> @@ -600,6 +610,11 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>         return false;
>  }
>  
> +static inline bool hugepage_movable_supported(struct hstate *h)
> +{
> +       return false;
> +}
> +
>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>                                            struct mm_struct *mm, pte_t *pte)
>  {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3c21775..a5a111d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -919,7 +919,7 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
>  /* Movability of hugepages depends on migration support. */
>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>  {
> -       if (hugepage_migration_supported(h))
> +       if (hugepage_movable_supported(h))
>                 return GFP_HIGHUSER_MOVABLE;
>         else
>                 return GFP_HIGHUSER;

Exactly what I've had in mind. It would be great to have a comment in
hugepage_movable_supported to explain why we are not supporting giga
pages even though they are migrateable and why we need that distinction.

> The above patch is in addition to the following later patch in the series.
[...]
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 9c1b77f..9df1d59 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -479,18 +479,29 @@ static inline pgoff_t basepage_index(struct page *page)
>  extern int dissolve_free_huge_page(struct page *page);
>  extern int dissolve_free_huge_pages(unsigned long start_pfn,
>                                     unsigned long end_pfn);
> -static inline bool hugepage_migration_supported(struct hstate *h)
> -{
> +
>  #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
> +#ifndef arch_hugetlb_migration_supported
> +static inline bool arch_hugetlb_migration_supported(struct hstate *h)
> +{
>         if ((huge_page_shift(h) == PMD_SHIFT) ||
>                 (huge_page_shift(h) == PUD_SHIFT) ||
>                         (huge_page_shift(h) == PGDIR_SHIFT))
>                 return true;
>         else
>                 return false;
> +}
> +#endif
>  #else
> +static inline bool arch_hugetlb_migration_supported(struct hstate *h)
> +{
>         return false;
> +}
>  #endif
> +
> +static inline bool hugepage_migration_supported(struct hstate *h)
> +{
> +       return arch_hugetlb_migration_supported(h);
>  }

Yes making hugepage_migration_supported to have an arch override is
definitely the right thing to do. Whether the above approach rather than
a weak symbol is better is a matter of taste and I do not feel strongly
about that.
Anshuman Khandual Oct. 11, 2018, 3:16 a.m. UTC | #20
On 10/10/2018 03:09 PM, Michal Hocko wrote:
> On Wed 10-10-18 08:39:22, Anshuman Khandual wrote:
> [...]
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 9df1d59..4bcbf1e 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -504,6 +504,16 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>         return arch_hugetlb_migration_supported(h);
>>  }
>>  
>> +static inline bool hugepage_movable_supported(struct hstate *h)
>> +{
>> +       if (!hugepage_migration_supported(h)) --> calls arch override restricting the set
>> +               return false;
>> +
>> +       if (hstate_is_gigantic(h)	--------> restricts the set further
>> +               return false;
>> +       return true;
>> +}
>> +
>>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>>                                            struct mm_struct *mm, pte_t *pte)
>>  {
>> @@ -600,6 +610,11 @@ static inline bool hugepage_migration_supported(struct hstate *h)
>>         return false;
>>  }
>>  
>> +static inline bool hugepage_movable_supported(struct hstate *h)
>> +{
>> +       return false;
>> +}
>> +
>>  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>>                                            struct mm_struct *mm, pte_t *pte)
>>  {
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 3c21775..a5a111d 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -919,7 +919,7 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
>>  /* Movability of hugepages depends on migration support. */
>>  static inline gfp_t htlb_alloc_mask(struct hstate *h)
>>  {
>> -       if (hugepage_migration_supported(h))
>> +       if (hugepage_movable_supported(h))
>>                 return GFP_HIGHUSER_MOVABLE;
>>         else
>>                 return GFP_HIGHUSER;
> 
> Exactly what I've had in mind. It would be great to have a comment in
> hugepage_movable_supported to explain why we are not supporting giga
> pages even though they are migrateable and why we need that distinction.
sure, will do.

> 
>> The above patch is in addition to the following later patch in the series.
> [...]
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 9c1b77f..9df1d59 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -479,18 +479,29 @@ static inline pgoff_t basepage_index(struct page *page)
>>  extern int dissolve_free_huge_page(struct page *page);
>>  extern int dissolve_free_huge_pages(unsigned long start_pfn,
>>                                     unsigned long end_pfn);
>> -static inline bool hugepage_migration_supported(struct hstate *h)
>> -{
>> +
>>  #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>> +#ifndef arch_hugetlb_migration_supported
>> +static inline bool arch_hugetlb_migration_supported(struct hstate *h)
>> +{
>>         if ((huge_page_shift(h) == PMD_SHIFT) ||
>>                 (huge_page_shift(h) == PUD_SHIFT) ||
>>                         (huge_page_shift(h) == PGDIR_SHIFT))
>>                 return true;
>>         else
>>                 return false;
>> +}
>> +#endif
>>  #else
>> +static inline bool arch_hugetlb_migration_supported(struct hstate *h)
>> +{
>>         return false;
>> +}
>>  #endif
>> +
>> +static inline bool hugepage_migration_supported(struct hstate *h)
>> +{
>> +       return arch_hugetlb_migration_supported(h);
>>  }
> 
> Yes making hugepage_migration_supported to have an arch override is
> definitely the right thing to do. Whether the above approach rather than
> a weak symbol is better is a matter of taste and I do not feel strongly
> about that.
Okay then, will carry this forward and re-spin the patch series. Thank you
for your detailed review till now.
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6b68e34..9c1b77f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -483,7 +483,8 @@  static inline bool hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
 	if ((huge_page_shift(h) == PMD_SHIFT) ||
-		(huge_page_shift(h) == PGDIR_SHIFT))
+		(huge_page_shift(h) == PUD_SHIFT) ||
+			(huge_page_shift(h) == PGDIR_SHIFT))
 		return true;
 	else
 		return false;