diff mbox series

[v3,1/6] mm: migrate: do not migrate HugeTLB page whose refcount is one

Message ID 20210110124017.86750-2-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Fix some bugs about HugeTLB code | expand

Commit Message

Muchun Song Jan. 10, 2021, 12:40 p.m. UTC
If the refcount is one when it is migrated, it means that the page
was freed from under us. So we are done and do not need to migrate.

This optimization is consistent with the regular pages, just like
unmap_and_move() does.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Yang Shi <shy828301@gmail.com>
---
 mm/migrate.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Michal Hocko Jan. 12, 2021, 9:42 a.m. UTC | #1
On Sun 10-01-21 20:40:12, Muchun Song wrote:
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.

I would consider the following easier to understand. Feel free to reuse.
"
All pages isolated for the migration have an elevated reference count
and therefore seeing a reference count equal to 1 means that the last
user of the page has dropped the reference and the page has became
unused and there doesn't make much sense to migrate it anymore. This has
been done for regular pages and this patch does the same for hugetlb
pages. Although the likelyhood of the race is rather small for hugetlb
pages it makes sense the two code paths in sync.
"

> 
> This optimization is consistent with the regular pages, just like
> unmap_and_move() does.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Acked-by: Yang Shi <shy828301@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/migrate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4385f2fb5d18..a6631c4eb6a6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  		return -ENOSYS;
>  	}
>  
> +	if (page_count(hpage) == 1) {
> +		/* page was freed from under us. So we are done. */
> +		putback_active_hugepage(hpage);
> +		return MIGRATEPAGE_SUCCESS;
> +	}
> +
>  	new_hpage = get_new_page(hpage, private);
>  	if (!new_hpage)
>  		return -ENOMEM;
> -- 
> 2.11.0
Muchun Song Jan. 12, 2021, 9:43 a.m. UTC | #2
On Tue, Jan 12, 2021 at 5:42 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Sun 10-01-21 20:40:12, Muchun Song wrote:
> > If the refcount is one when it is migrated, it means that the page
> > was freed from under us. So we are done and do not need to migrate.
>
> I would consider the following easier to understand. Feel free to reuse.
> "
> All pages isolated for the migration have an elevated reference count
> and therefore seeing a reference count equal to 1 means that the last
> user of the page has dropped the reference and the page has became
> unused and there doesn't make much sense to migrate it anymore. This has
> been done for regular pages and this patch does the same for hugetlb
> pages. Although the likelyhood of the race is rather small for hugetlb
> pages it makes sense the two code paths in sync.
> "

Thanks.

>
> >
> > This optimization is consistent with the regular pages, just like
> > unmap_and_move() does.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Acked-by: Yang Shi <shy828301@gmail.com>
>
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks.

>
> > ---
> >  mm/migrate.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 4385f2fb5d18..a6631c4eb6a6 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >               return -ENOSYS;
> >       }
> >
> > +     if (page_count(hpage) == 1) {
> > +             /* page was freed from under us. So we are done. */
> > +             putback_active_hugepage(hpage);
> > +             return MIGRATEPAGE_SUCCESS;
> > +     }
> > +
> >       new_hpage = get_new_page(hpage, private);
> >       if (!new_hpage)
> >               return -ENOMEM;
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs
David Hildenbrand Jan. 12, 2021, 11 a.m. UTC | #3
On 10.01.21 13:40, Muchun Song wrote:
> If the refcount is one when it is migrated, it means that the page
> was freed from under us. So we are done and do not need to migrate.
> 
> This optimization is consistent with the regular pages, just like
> unmap_and_move() does.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Acked-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/migrate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4385f2fb5d18..a6631c4eb6a6 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  		return -ENOSYS;
>  	}
>  
> +	if (page_count(hpage) == 1) {
> +		/* page was freed from under us. So we are done. */
> +		putback_active_hugepage(hpage);
> +		return MIGRATEPAGE_SUCCESS;
> +	}
> +
>  	new_hpage = get_new_page(hpage, private);
>  	if (!new_hpage)
>  		return -ENOMEM;
> 

Question: What if called via alloc_contig_range() where we even want to
"migrate" free pages, meaning, relocate it?
David Hildenbrand Jan. 12, 2021, 11:11 a.m. UTC | #4
On 12.01.21 12:00, David Hildenbrand wrote:
> On 10.01.21 13:40, Muchun Song wrote:
>> If the refcount is one when it is migrated, it means that the page
>> was freed from under us. So we are done and do not need to migrate.
>>
>> This optimization is consistent with the regular pages, just like
>> unmap_and_move() does.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Acked-by: Yang Shi <shy828301@gmail.com>
>> ---
>>  mm/migrate.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 4385f2fb5d18..a6631c4eb6a6 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>  		return -ENOSYS;
>>  	}
>>  
>> +	if (page_count(hpage) == 1) {
>> +		/* page was freed from under us. So we are done. */
>> +		putback_active_hugepage(hpage);
>> +		return MIGRATEPAGE_SUCCESS;
>> +	}
>> +
>>  	new_hpage = get_new_page(hpage, private);
>>  	if (!new_hpage)
>>  		return -ENOMEM;
>>
> 
> Question: What if called via alloc_contig_range() where we even want to
> "migrate" free pages, meaning, relocate it?
> 

To be more precise:

a) We don't have dissolve_free_huge_pages() calls on the
alloc_contig_range() path. So we *need* migration IIUC.

b) dissolve_free_huge_pages() will fail if going below the reservation.
In that case we really want to migrate free pages. This even applies to
memory offlining.

Either I am missing something important or this patch is more dangerous
than it looks like.
Michal Hocko Jan. 12, 2021, 11:27 a.m. UTC | #5
On Tue 12-01-21 12:11:21, David Hildenbrand wrote:
> On 12.01.21 12:00, David Hildenbrand wrote:
> > On 10.01.21 13:40, Muchun Song wrote:
> >> If the refcount is one when it is migrated, it means that the page
> >> was freed from under us. So we are done and do not need to migrate.
> >>
> >> This optimization is consistent with the regular pages, just like
> >> unmap_and_move() does.
> >>
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >> Acked-by: Yang Shi <shy828301@gmail.com>
> >> ---
> >>  mm/migrate.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index 4385f2fb5d18..a6631c4eb6a6 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >>  		return -ENOSYS;
> >>  	}
> >>  
> >> +	if (page_count(hpage) == 1) {
> >> +		/* page was freed from under us. So we are done. */
> >> +		putback_active_hugepage(hpage);
> >> +		return MIGRATEPAGE_SUCCESS;
> >> +	}
> >> +
> >>  	new_hpage = get_new_page(hpage, private);
> >>  	if (!new_hpage)
> >>  		return -ENOMEM;
> >>
> > 
> > Question: What if called via alloc_contig_range() where we even want to
> > "migrate" free pages, meaning, relocate it?
> > 
> 
> To be more precise:
> 
> a) We don't have dissolve_free_huge_pages() calls on the
> alloc_contig_range() path. So we *need* migration IIUC.
> 
> b) dissolve_free_huge_pages() will fail if going below the reservation.
> In that case we really want to migrate free pages. This even applies to
> memory offlining.
> 
> Either I am missing something important or this patch is more dangerous
> than it looks like.

This is an interesting point. But do we try to migrate hugetlb pages in
alloc_contig_range? isolate_migratepages_block !PageLRU need to be
marked as PageMovable AFAICS. This would be quite easy to implement but
a more fundamental question is whether we really want to mess with
existing pools for alloc_contig_range.

Anyway you are quite right that this change has more side effects than
it is easy to see while it doesn't really bring any major advantage
other than the code consistency.
David Hildenbrand Jan. 12, 2021, 11:34 a.m. UTC | #6
On 12.01.21 12:27, Michal Hocko wrote:
> On Tue 12-01-21 12:11:21, David Hildenbrand wrote:
>> On 12.01.21 12:00, David Hildenbrand wrote:
>>> On 10.01.21 13:40, Muchun Song wrote:
>>>> If the refcount is one when it is migrated, it means that the page
>>>> was freed from under us. So we are done and do not need to migrate.
>>>>
>>>> This optimization is consistent with the regular pages, just like
>>>> unmap_and_move() does.
>>>>
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Acked-by: Yang Shi <shy828301@gmail.com>
>>>> ---
>>>>  mm/migrate.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 4385f2fb5d18..a6631c4eb6a6 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>>>  		return -ENOSYS;
>>>>  	}
>>>>  
>>>> +	if (page_count(hpage) == 1) {
>>>> +		/* page was freed from under us. So we are done. */
>>>> +		putback_active_hugepage(hpage);
>>>> +		return MIGRATEPAGE_SUCCESS;
>>>> +	}
>>>> +
>>>>  	new_hpage = get_new_page(hpage, private);
>>>>  	if (!new_hpage)
>>>>  		return -ENOMEM;
>>>>
>>>
>>> Question: What if called via alloc_contig_range() where we even want to
>>> "migrate" free pages, meaning, relocate it?
>>>
>>
>> To be more precise:
>>
>> a) We don't have dissolve_free_huge_pages() calls on the
>> alloc_contig_range() path. So we *need* migration IIUC.
>>
>> b) dissolve_free_huge_pages() will fail if going below the reservation.
>> In that case we really want to migrate free pages. This even applies to
>> memory offlining.
>>
>> Either I am missing something important or this patch is more dangerous
>> than it looks like.
> 
> This is an interesting point. But do we try to migrate hugetlb pages in
> alloc_contig_range? isolate_migratepages_block !PageLRU need to be

I didn't test it so far (especially in the context of virtio-mem or
CMA), but have a TODO item on my long list of things to look at in the
future.

> marked as PageMovable AFAICS. This would be quite easy to implement but
> a more fundamental question is whether we really want to mess with
> existing pools for alloc_contig_range.

Can these pages fall onto ZONE_MOVABLE or even MIGRATE_CMA? If yes, we
really want to. And I think both is the case for "ordinary" huge pages
allocated via the buddy.

> 
> Anyway you are quite right that this change has more side effects than
> it is easy to see while it doesn't really bring any major advantage
> other than the consistency.

Free hugetlbfs pages are special. E.g., they cannot simply be skipped
when offlining. So I don't think consistency actually really applies.
Michal Hocko Jan. 12, 2021, 12:16 p.m. UTC | #7
On Tue 12-01-21 12:34:01, David Hildenbrand wrote:
> On 12.01.21 12:27, Michal Hocko wrote:
> > On Tue 12-01-21 12:11:21, David Hildenbrand wrote:
> >> On 12.01.21 12:00, David Hildenbrand wrote:
> >>> On 10.01.21 13:40, Muchun Song wrote:
> >>>> If the refcount is one when it is migrated, it means that the page
> >>>> was freed from under us. So we are done and do not need to migrate.
> >>>>
> >>>> This optimization is consistent with the regular pages, just like
> >>>> unmap_and_move() does.
> >>>>
> >>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >>>> Acked-by: Yang Shi <shy828301@gmail.com>
> >>>> ---
> >>>>  mm/migrate.c | 6 ++++++
> >>>>  1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>> index 4385f2fb5d18..a6631c4eb6a6 100644
> >>>> --- a/mm/migrate.c
> >>>> +++ b/mm/migrate.c
> >>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >>>>  		return -ENOSYS;
> >>>>  	}
> >>>>  
> >>>> +	if (page_count(hpage) == 1) {
> >>>> +		/* page was freed from under us. So we are done. */
> >>>> +		putback_active_hugepage(hpage);
> >>>> +		return MIGRATEPAGE_SUCCESS;
> >>>> +	}
> >>>> +
> >>>>  	new_hpage = get_new_page(hpage, private);
> >>>>  	if (!new_hpage)
> >>>>  		return -ENOMEM;
> >>>>
> >>>
> >>> Question: What if called via alloc_contig_range() where we even want to
> >>> "migrate" free pages, meaning, relocate it?
> >>>
> >>
> >> To be more precise:
> >>
> >> a) We don't have dissolve_free_huge_pages() calls on the
> >> alloc_contig_range() path. So we *need* migration IIUC.
> >>
> >> b) dissolve_free_huge_pages() will fail if going below the reservation.
> >> In that case we really want to migrate free pages. This even applies to
> >> memory offlining.
> >>
> >> Either I am missing something important or this patch is more dangerous
> >> than it looks like.
> > 
> > This is an interesting point. But do we try to migrate hugetlb pages in
> > alloc_contig_range? isolate_migratepages_block !PageLRU need to be
> 
> I didn't test it so far (especially in the context of virtio-mem or
> CMA), but have a TODO item on my long list of things to look at in the
> future.

I have looked around and it seems this would more work than just to
allow the migration in a-c-r. migrate_huge_page_move_mapping resp.
hugetlbfs_migrate_page would need to special case pool pages. Likely a
non trivial work and potentially another land mine territory.

> 
> > marked as PageMovable AFAICS. This would be quite easy to implement but
> > a more fundamental question is whether we really want to mess with
> > existing pools for alloc_contig_range.
> 
> Can these pages fall onto ZONE_MOVABLE or even MIGRATE_CMA? If yes, we
> really want to. And I think both is the case for "ordinary" huge pages
> allocated via the buddy.

Yes movable hugetlb pages can sit in movable zone and CMA as well (see
htlb_modify_alloc_mask).
 
> > Anyway you are quite right that this change has more side effects than
> > it is easy to see while it doesn't really bring any major advantage
> > other than the consistency.
> 
> Free hugetlbfs pages are special. E.g., they cannot simply be skipped
> when offlining. So I don't think consistency actually really applies.

Well, currently pool pages are not migrateable but you are right that
this is likely something that we will need to look into in the future
and this optimization would stand in the way.
Muchun Song Jan. 12, 2021, 1:40 p.m. UTC | #8
On Tue, Jan 12, 2021 at 7:11 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.01.21 12:00, David Hildenbrand wrote:
> > On 10.01.21 13:40, Muchun Song wrote:
> >> If the refcount is one when it is migrated, it means that the page
> >> was freed from under us. So we are done and do not need to migrate.
> >>
> >> This optimization is consistent with the regular pages, just like
> >> unmap_and_move() does.
> >>
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >> Acked-by: Yang Shi <shy828301@gmail.com>
> >> ---
> >>  mm/migrate.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index 4385f2fb5d18..a6631c4eb6a6 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >>              return -ENOSYS;
> >>      }
> >>
> >> +    if (page_count(hpage) == 1) {
> >> +            /* page was freed from under us. So we are done. */
> >> +            putback_active_hugepage(hpage);
> >> +            return MIGRATEPAGE_SUCCESS;
> >> +    }
> >> +
> >>      new_hpage = get_new_page(hpage, private);
> >>      if (!new_hpage)
> >>              return -ENOMEM;
> >>
> >
> > Question: What if called via alloc_contig_range() where we even want to
> > "migrate" free pages, meaning, relocate it?
> >
>
> To be more precise:
>
> a) We don't have dissolve_free_huge_pages() calls on the
> alloc_contig_range() path. So we *need* migration IIUC.

Without this patch, if you want to migrate a HUgeTLB page,
the page is freed to the hugepage pool. With this patch,
the page is also freed to the hugepage pool.
I didn't see any different. I am missing something?


>
> b) dissolve_free_huge_pages() will fail if going below the reservation.
> In that case we really want to migrate free pages. This even applies to
> memory offlining.
>
> Either I am missing something important or this patch is more dangerous
> than it looks like.
>
> --
> Thanks,
>
> David / dhildenb
>
David Hildenbrand Jan. 12, 2021, 1:51 p.m. UTC | #9
On 12.01.21 14:40, Muchun Song wrote:
> On Tue, Jan 12, 2021 at 7:11 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 12.01.21 12:00, David Hildenbrand wrote:
>>> On 10.01.21 13:40, Muchun Song wrote:
>>>> If the refcount is one when it is migrated, it means that the page
>>>> was freed from under us. So we are done and do not need to migrate.
>>>>
>>>> This optimization is consistent with the regular pages, just like
>>>> unmap_and_move() does.
>>>>
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>> Acked-by: Yang Shi <shy828301@gmail.com>
>>>> ---
>>>>  mm/migrate.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 4385f2fb5d18..a6631c4eb6a6 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>>>              return -ENOSYS;
>>>>      }
>>>>
>>>> +    if (page_count(hpage) == 1) {
>>>> +            /* page was freed from under us. So we are done. */
>>>> +            putback_active_hugepage(hpage);
>>>> +            return MIGRATEPAGE_SUCCESS;
>>>> +    }
>>>> +
>>>>      new_hpage = get_new_page(hpage, private);
>>>>      if (!new_hpage)
>>>>              return -ENOMEM;
>>>>
>>>
>>> Question: What if called via alloc_contig_range() where we even want to
>>> "migrate" free pages, meaning, relocate it?
>>>
>>
>> To be more precise:
>>
>> a) We don't have dissolve_free_huge_pages() calls on the
>> alloc_contig_range() path. So we *need* migration IIUC.
> 
> Without this patch, if you want to migrate a HUgeTLB page,
> the page is freed to the hugepage pool. With this patch,
> the page is also freed to the hugepage pool.
> I didn't see any different. I am missing something?

I am definitely not an expert on hugetlb pools, that's why I am asking.

Isn't it, that with your code, no new page is allocated - so
dissolve_free_huge_pages() might just refuse to dissolve due to
reservations, bailing out, no?

(as discussed, looks like alloc_contig_range() needs to be fixed to
handle this correctly)
Muchun Song Jan. 12, 2021, 2:17 p.m. UTC | #10
On Tue, Jan 12, 2021 at 9:51 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.01.21 14:40, Muchun Song wrote:
> > On Tue, Jan 12, 2021 at 7:11 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 12.01.21 12:00, David Hildenbrand wrote:
> >>> On 10.01.21 13:40, Muchun Song wrote:
> >>>> If the refcount is one when it is migrated, it means that the page
> >>>> was freed from under us. So we are done and do not need to migrate.
> >>>>
> >>>> This optimization is consistent with the regular pages, just like
> >>>> unmap_and_move() does.
> >>>>
> >>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >>>> Acked-by: Yang Shi <shy828301@gmail.com>
> >>>> ---
> >>>>  mm/migrate.c | 6 ++++++
> >>>>  1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>> index 4385f2fb5d18..a6631c4eb6a6 100644
> >>>> --- a/mm/migrate.c
> >>>> +++ b/mm/migrate.c
> >>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >>>>              return -ENOSYS;
> >>>>      }
> >>>>
> >>>> +    if (page_count(hpage) == 1) {
> >>>> +            /* page was freed from under us. So we are done. */
> >>>> +            putback_active_hugepage(hpage);
> >>>> +            return MIGRATEPAGE_SUCCESS;
> >>>> +    }
> >>>> +
> >>>>      new_hpage = get_new_page(hpage, private);
> >>>>      if (!new_hpage)
> >>>>              return -ENOMEM;
> >>>>
> >>>
> >>> Question: What if called via alloc_contig_range() where we even want to
> >>> "migrate" free pages, meaning, relocate it?
> >>>
> >>
> >> To be more precise:
> >>
> >> a) We don't have dissolve_free_huge_pages() calls on the
> >> alloc_contig_range() path. So we *need* migration IIUC.
> >
> > Without this patch, if you want to migrate a HUgeTLB page,
> > the page is freed to the hugepage pool. With this patch,
> > the page is also freed to the hugepage pool.
> > I didn't see any different. I am missing something?
>
> I am definitely not an expert on hugetlb pools, that's why I am asking.
>
> Isn't it, that with your code, no new page is allocated - so
> dissolve_free_huge_pages() might just refuse to dissolve due to
> reservations, bailing out, no?

Without this patch, the new page can be allocated from the
hugepage pool. The dissolve_free_huge_pages() also
can refuse to dissolve due to reservations. Right?

>
> (as discussed, looks like alloc_contig_range() needs to be fixed to
> handle this correctly)
>
> --
> Thanks,
>
> David / dhildenb
>
Michal Hocko Jan. 12, 2021, 2:23 p.m. UTC | #11
On Tue 12-01-21 13:16:45, Michal Hocko wrote:
[...]
> Well, currently pool pages are not migrateable but you are right that
> this is likely something that we will need to look into in the future
> and this optimization would stand in the way.

After some more thinking I believe I was wrong in my last statement.
This optimization shouldn't have any effect on pages on the pool as
those stay at reference count 0 and they cannot be isolated either
(clear_page_huge_active before it is enqueued).

That being said, the migration code would still have to learn about
about this pages but that is out of scope of this discussion.

Sorry about the confusion from my side.
David Hildenbrand Jan. 12, 2021, 2:28 p.m. UTC | #12
On 12.01.21 15:17, Muchun Song wrote:
> On Tue, Jan 12, 2021 at 9:51 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 12.01.21 14:40, Muchun Song wrote:
>>> On Tue, Jan 12, 2021 at 7:11 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 12.01.21 12:00, David Hildenbrand wrote:
>>>>> On 10.01.21 13:40, Muchun Song wrote:
>>>>>> If the refcount is one when it is migrated, it means that the page
>>>>>> was freed from under us. So we are done and do not need to migrate.
>>>>>>
>>>>>> This optimization is consistent with the regular pages, just like
>>>>>> unmap_and_move() does.
>>>>>>
>>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>>>> Acked-by: Yang Shi <shy828301@gmail.com>
>>>>>> ---
>>>>>>  mm/migrate.c | 6 ++++++
>>>>>>  1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>> index 4385f2fb5d18..a6631c4eb6a6 100644
>>>>>> --- a/mm/migrate.c
>>>>>> +++ b/mm/migrate.c
>>>>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>>>>>              return -ENOSYS;
>>>>>>      }
>>>>>>
>>>>>> +    if (page_count(hpage) == 1) {
>>>>>> +            /* page was freed from under us. So we are done. */
>>>>>> +            putback_active_hugepage(hpage);
>>>>>> +            return MIGRATEPAGE_SUCCESS;
>>>>>> +    }
>>>>>> +
>>>>>>      new_hpage = get_new_page(hpage, private);
>>>>>>      if (!new_hpage)
>>>>>>              return -ENOMEM;
>>>>>>
>>>>>
>>>>> Question: What if called via alloc_contig_range() where we even want to
>>>>> "migrate" free pages, meaning, relocate it?
>>>>>
>>>>
>>>> To be more precise:
>>>>
>>>> a) We don't have dissolve_free_huge_pages() calls on the
>>>> alloc_contig_range() path. So we *need* migration IIUC.
>>>
>>> Without this patch, if you want to migrate a HUgeTLB page,
>>> the page is freed to the hugepage pool. With this patch,
>>> the page is also freed to the hugepage pool.
>>> I didn't see any different. I am missing something?
>>
>> I am definitely not an expert on hugetlb pools, that's why I am asking.
>>
>> Isn't it, that with your code, no new page is allocated - so
>> dissolve_free_huge_pages() might just refuse to dissolve due to
>> reservations, bailing out, no?
> 
> Without this patch, the new page can be allocated from the
> hugepage pool. The dissolve_free_huge_pages() also
> can refuse to dissolve due to reservations. Right?

Oh, you mean the migration target might be coming from the pool? I guess
yes, looking at alloc_migration_target()->alloc_huge_page_nodemask().

In that case, yes, I think we run into a similar issue already.

Instead of trying to allocate new huge pages in
dissolve_free_huge_pages() to "relocate free pages", we bail out.

This all feels kind of wrong. After we migrated a huge page we should
free it back to the buddy, so most of our machinery just keeps working
without caring about free huge pages.


I can see how your patch will not change the current (IMHO broken) behavior.
David Hildenbrand Jan. 12, 2021, 2:41 p.m. UTC | #13
On 12.01.21 15:23, Michal Hocko wrote:
> On Tue 12-01-21 13:16:45, Michal Hocko wrote:
> [...]
>> Well, currently pool pages are not migrateable but you are right that
>> this is likely something that we will need to look into in the future
>> and this optimization would stand in the way.
> 
> After some more thinking I believe I was wrong in my last statement.
> This optimization shouldn't have any effect on pages on the pool as
> those stay at reference count 0 and they cannot be isolated either
> (clear_page_huge_active before it is enqueued).
> 
> That being said, the migration code would still have to learn about
> about this pages but that is out of scope of this discussion.
> 
> Sorry about the confusion from my side.
> 

At this point I am fairly confused what's working at what's not :D

I think this will require more thought, on how to teach
alloc_contig_range() (and eventually in some cases offline_pages()?) to
do the right thing.
Michal Hocko Jan. 12, 2021, 2:53 p.m. UTC | #14
On Tue 12-01-21 15:41:02, David Hildenbrand wrote:
> On 12.01.21 15:23, Michal Hocko wrote:
> > On Tue 12-01-21 13:16:45, Michal Hocko wrote:
> > [...]
> >> Well, currently pool pages are not migrateable but you are right that
> >> this is likely something that we will need to look into in the future
> >> and this optimization would stand in the way.
> > 
> > After some more thinking I believe I was wrong in my last statement.
> > This optimization shouldn't have any effect on pages on the pool as
> > those stay at reference count 0 and they cannot be isolated either
> > (clear_page_huge_active before it is enqueued).
> > 
> > That being said, the migration code would still have to learn about
> > about this pages but that is out of scope of this discussion.
> > 
> > Sorry about the confusion from my side.
> > 
> 
> At this point I am fairly confused what's working at what's not :D

heh, tell me something about that. Hugetlb is a maze full of land mines.

> I think this will require more thought, on how to teach
> alloc_contig_range() (and eventually in some cases offline_pages()?) to
> do the right thing.

Well, offlining sort of works because it retries both migrates and
dissolves. It can fail with the later due to reservations but that can
be expected. We can still try harder to rellocate/rebalance per numa
pools to keep the reservation but I strongly suspect nobody has noticed
this to be a problem so there we are.
Muchun Song Jan. 12, 2021, 2:59 p.m. UTC | #15
On Tue, Jan 12, 2021 at 10:28 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.01.21 15:17, Muchun Song wrote:
> > On Tue, Jan 12, 2021 at 9:51 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 12.01.21 14:40, Muchun Song wrote:
> >>> On Tue, Jan 12, 2021 at 7:11 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 12.01.21 12:00, David Hildenbrand wrote:
> >>>>> On 10.01.21 13:40, Muchun Song wrote:
> >>>>>> If the refcount is one when it is migrated, it means that the page
> >>>>>> was freed from under us. So we are done and do not need to migrate.
> >>>>>>
> >>>>>> This optimization is consistent with the regular pages, just like
> >>>>>> unmap_and_move() does.
> >>>>>>
> >>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> >>>>>> Acked-by: Yang Shi <shy828301@gmail.com>
> >>>>>> ---
> >>>>>>  mm/migrate.c | 6 ++++++
> >>>>>>  1 file changed, 6 insertions(+)
> >>>>>>
> >>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>>>> index 4385f2fb5d18..a6631c4eb6a6 100644
> >>>>>> --- a/mm/migrate.c
> >>>>>> +++ b/mm/migrate.c
> >>>>>> @@ -1279,6 +1279,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
> >>>>>>              return -ENOSYS;
> >>>>>>      }
> >>>>>>
> >>>>>> +    if (page_count(hpage) == 1) {
> >>>>>> +            /* page was freed from under us. So we are done. */
> >>>>>> +            putback_active_hugepage(hpage);
> >>>>>> +            return MIGRATEPAGE_SUCCESS;
> >>>>>> +    }
> >>>>>> +
> >>>>>>      new_hpage = get_new_page(hpage, private);
> >>>>>>      if (!new_hpage)
> >>>>>>              return -ENOMEM;
> >>>>>>
> >>>>>
> >>>>> Question: What if called via alloc_contig_range() where we even want to
> >>>>> "migrate" free pages, meaning, relocate it?
> >>>>>
> >>>>
> >>>> To be more precise:
> >>>>
> >>>> a) We don't have dissolve_free_huge_pages() calls on the
> >>>> alloc_contig_range() path. So we *need* migration IIUC.
> >>>
> >>> Without this patch, if you want to migrate a HUgeTLB page,
> >>> the page is freed to the hugepage pool. With this patch,
> >>> the page is also freed to the hugepage pool.
> >>> I didn't see any different. I am missing something?
> >>
> >> I am definitely not an expert on hugetlb pools, that's why I am asking.
> >>
> >> Isn't it, that with your code, no new page is allocated - so
> >> dissolve_free_huge_pages() might just refuse to dissolve due to
> >> reservations, bailing out, no?
> >
> > Without this patch, the new page can be allocated from the
> > hugepage pool. The dissolve_free_huge_pages() also
> > can refuse to dissolve due to reservations. Right?
>
> Oh, you mean the migration target might be coming from the pool? I guess
> yes, looking at alloc_migration_target()->alloc_huge_page_nodemask().

Yeah, you are right. If we want to free a HugeTLB page to
the buddy allocator, we should dissolve_free_huge_page()
to do that. Migrating cannot guarantee this at least now.

>
> In that case, yes, I think we run into a similar issue already.
>
> Instead of trying to allocate new huge pages in
> dissolve_free_huge_pages() to "relocate free pages", we bail out.
>
> This all feels kind of wrong. After we migrated a huge page we should
> free it back to the buddy, so most of our machinery just keeps working
> without caring about free huge pages.
>
>
> I can see how your patch will not change the current (IMHO broken) behavior.
>
> --
> Thanks,
>
> David / dhildenb
>
Mike Kravetz Jan. 12, 2021, 8:12 p.m. UTC | #16
On 1/12/21 6:53 AM, Michal Hocko wrote:
> On Tue 12-01-21 15:41:02, David Hildenbrand wrote:
>> On 12.01.21 15:23, Michal Hocko wrote:
>>> On Tue 12-01-21 13:16:45, Michal Hocko wrote:
>>> [...]
>>>> Well, currently pool pages are not migrateable but you are right that
>>>> this is likely something that we will need to look into in the future
>>>> and this optimization would stand in the way.
>>>
>>> After some more thinking I believe I was wrong in my last statement.
>>> This optimization shouldn't have any effect on pages on the pool as
>>> those stay at reference count 0 and they cannot be isolated either
>>> (clear_page_huge_active before it is enqueued).
>>>
>>> That being said, the migration code would still have to learn about
>>> about this pages but that is out of scope of this discussion.
>>>
>>> Sorry about the confusion from my side.
>>>
>>
>> At this point I am fairly confused what's working at what's not :D
> 
> heh, tell me something about that. Hugetlb is a maze full of land mines.
> 
>> I think this will require more thought, on how to teach
>> alloc_contig_range() (and eventually in some cases offline_pages()?) to
>> do the right thing.
> 
> Well, offlining sort of works because it retries both migrates and
> dissolves. It can fail with the later due to reservations but that can
> be expected. We can still try harder to rellocate/rebalance per numa
> pools to keep the reservation but I strongly suspect nobody has noticed
> this to be a problem so there we are.

Due to my time zone, I get to read all the previous comments before
commenting myself. :)

To be clear, this patch is handling a very specific case where a hugetlb
page was isolated for migration and after being isolated the last reference
to the page was dropped.  Normally, dropping the last reference would send
the page to free_huge_page processing.  free_huge_page processing would
either dissolve the page to the buddy allocator or more likely place the
page on the free list of the pool.  However, since isolation also holds
a reference to the page, processing is continued down the migration path.

Today there is no code to migrate free huge pages in the pool.  Only
active in use pages are migrated.  Without this patch, 'free pages' in
the very specific state above would be migrated.  But to be clear, that
was never the intention of any hugetlb migration code.  In that respect,
I believe this patch helps the current code work as designed.

David brings up the valid point that alloc_contig_range needs to deal
with free hugetlb pool pages.  However, that is code which needs to be
written as it does not exist today.  I suspect nobody thought about free
hugetlb pages when alloc_contig_range was written.  This patch should
in no way hinder development of that code.  Free huge pages have a ref
count of 0, and this code is checking for ref count of 1.

That is a long way of saying that I still agree with this patch.  The
only modification/suggestion would be enhancing the commit message as
suggested by Michal.
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 4385f2fb5d18..a6631c4eb6a6 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1279,6 +1279,12 @@  static int unmap_and_move_huge_page(new_page_t get_new_page,
 		return -ENOSYS;
 	}
 
+	if (page_count(hpage) == 1) {
+		/* page was freed from under us. So we are done. */
+		putback_active_hugepage(hpage);
+		return MIGRATEPAGE_SUCCESS;
+	}
+
 	new_hpage = get_new_page(hpage, private);
 	if (!new_hpage)
 		return -ENOMEM;