diff mbox series

[v4,4/8] hugetlb: create remove_hugetlb_page() to separate functionality

Message ID 20210405230043.182734-5-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series make hugetlb put_page safe for all calling contexts | expand

Commit Message

Mike Kravetz April 5, 2021, 11 p.m. UTC
The new remove_hugetlb_page() routine is designed to remove a hugetlb
page from hugetlbfs processing.  It will remove the page from the active
or free list, update global counters and set the compound page
destructor to NULL so that PageHuge() will return false for the 'page'.
After this call, the 'page' can be treated as a normal compound page or
a collection of base size pages.

update_and_free_page no longer decrements h->nr_huge_pages{_node} as
this is performed in remove_hugetlb_page.  The only functionality
performed by update_and_free_page is to free the base pages to the lower
level allocators.

update_and_free_page is typically called after remove_hugetlb_page.

remove_hugetlb_page is to be called with the hugetlb_lock held.

Creating this routine and separating functionality is in preparation for
restructuring code to reduce lock hold times.  This commit should not
introduce any changes to functionality.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 88 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 51 insertions(+), 37 deletions(-)

Comments

Michal Hocko April 6, 2021, 9:56 a.m. UTC | #1
On Mon 05-04-21 16:00:39, Mike Kravetz wrote:
> The new remove_hugetlb_page() routine is designed to remove a hugetlb
> page from hugetlbfs processing.  It will remove the page from the active
> or free list, update global counters and set the compound page
> destructor to NULL so that PageHuge() will return false for the 'page'.
> After this call, the 'page' can be treated as a normal compound page or
> a collection of base size pages.
> 
> update_and_free_page no longer decrements h->nr_huge_pages{_node} as
> this is performed in remove_hugetlb_page.  The only functionality
> performed by update_and_free_page is to free the base pages to the lower
> level allocators.
> 
> update_and_free_page is typically called after remove_hugetlb_page.
> 
> remove_hugetlb_page is to be called with the hugetlb_lock held.
> 
> Creating this routine and separating functionality is in preparation for
> restructuring code to reduce lock hold times.  This commit should not
> introduce any changes to functionality.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Btw. I would prefer to reverse the ordering of this and Oscar's
patchset. This one is a bug fix which might be interesting for stable
backports while Oscar's work can be looked as a general functionality
improvement.

> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>  		/*
>  		 * Freed from under us. Drop new_page too.
>  		 */
> +		remove_hugetlb_page(h, new_page, false);
>  		update_and_free_page(h, new_page);
>  		goto unlock;
>  	} else if (page_count(old_page)) {
> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>  		 * Someone has grabbed the page, try to isolate it here.
>  		 * Fail with -EBUSY if not possible.
>  		 */
> +		remove_hugetlb_page(h, new_page, false);
>  		update_and_free_page(h, new_page);
>  		spin_unlock(&hugetlb_lock);
>  		if (!isolate_huge_page(old_page, list))

the page is not enqued anywhere here so remove_hugetlb_page would blow
when linked list debugging is enabled.
Oscar Salvador April 6, 2021, 12:50 p.m. UTC | #2
On Tue, Apr 06, 2021 at 11:56:11AM +0200, Michal Hocko wrote:
> Btw. I would prefer to reverse the ordering of this and Oscar's
> patchset. This one is a bug fix which might be interesting for stable
> backports while Oscar's work can be looked as a general functionality
> improvement.

If it makes things easier, I do not mind this work gets first and then I
work on top of that.
Given said that, I will try to have a look at this series.
Oscar Salvador April 6, 2021, 1:41 p.m. UTC | #3
On Mon, Apr 05, 2021 at 04:00:39PM -0700, Mike Kravetz wrote:
> +static void remove_hugetlb_page(struct hstate *h, struct page *page,
> +							bool adjust_surplus)
> +{
> +	int nid = page_to_nid(page);
> +
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> +		return;
> +
> +	list_del(&page->lru);
> +
> +	if (HPageFreed(page)) {
> +		h->free_huge_pages--;
> +		h->free_huge_pages_node[nid]--;
> +		ClearHPageFreed(page);
> +	}
> +	if (adjust_surplus) {
> +		h->surplus_huge_pages--;
> +		h->surplus_huge_pages_node[nid]--;
> +	}
> +
> +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
> +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);

These checks feel a bit odd here.
I would move them above, before we start messing with the counters?

> +
> +	ClearHPageTemporary(page);

Why clearing it unconditionally? I guess we do not really care, but
someone might wonder why when reading the core.
So I would either do as we used to do and only clear it in case of
HPageTemporary(), or drop a comment.

> +	set_page_refcounted(page);
> +	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
> +
> +	h->nr_huge_pages--;
> +	h->nr_huge_pages_node[nid]--;
> +}

As Michal pointed out, remove_hugetlb_page() from
alloc_and_dissolve_huge_page() might be problematic in some cases because
the new_page has not been enqueued yet.

Now, I guess this might be easily fixed with checking list_empty()
before going ahead with list_del() call, or with another bool parameter
'delete', with a fat comment explaining why we can get to call remove_huge_page()
on a page that is not in any list.

Another solution that comes to my mind is to split remove_huge_page() functionality in
1) delete from list + unaccount free and surplus pages and 2) reset page's state + unaccount
nr_huge_pages

But that might be just biased as would fit for my usecase.
So maybe a list_empty() or the bool parameter might just do.
Oscar Salvador April 6, 2021, 1:44 p.m. UTC | #4
On Mon, Apr 05, 2021 at 04:00:39PM -0700, Mike Kravetz wrote:
> The new remove_hugetlb_page() routine is designed to remove a hugetlb
> page from hugetlbfs processing.  It will remove the page from the active
> or free list, update global counters and set the compound page
> destructor to NULL so that PageHuge() will return false for the 'page'.
> After this call, the 'page' can be treated as a normal compound page or
> a collection of base size pages.
> 
> update_and_free_page no longer decrements h->nr_huge_pages{_node} as
> this is performed in remove_hugetlb_page.  The only functionality
> performed by update_and_free_page is to free the base pages to the lower
> level allocators.
> 
> update_and_free_page is typically called after remove_hugetlb_page.
> 
> remove_hugetlb_page is to be called with the hugetlb_lock held.
> 
> Creating this routine and separating functionality is in preparation for
> restructuring code to reduce lock hold times.  This commit should not
> introduce any changes to functionality.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Btw, it seems you were just doing fine before realizing that my series
went in.
So, as this seems a rather urgent matter to move forward (for obvious
reasons and also because it holds hotplug-vmemmap stuff), I wonder if
it would make your life easier to just ask Andrew to remove my series
for the time being and give it yours priority.

I can later work on top of that.

> ---
>  mm/hugetlb.c | 88 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 51 insertions(+), 37 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8497a3598c86..df2a3d1f632b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1055,18 +1055,13 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
>  	return false;
>  }
>  
> -static void __enqueue_huge_page(struct list_head *list, struct page *page)
> -{
> -	list_move(&page->lru, list);
> -	SetHPageFreed(page);
> -}
> -
>  static void enqueue_huge_page(struct hstate *h, struct page *page)
>  {
>  	int nid = page_to_nid(page);
> -	__enqueue_huge_page(&h->hugepage_freelists[nid], page);
> +	list_move(&page->lru, &h->hugepage_freelists[nid]);
>  	h->free_huge_pages++;
>  	h->free_huge_pages_node[nid]++;
> +	SetHPageFreed(page);
>  }
>  
>  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> @@ -1331,6 +1326,43 @@ static inline void destroy_compound_gigantic_page(struct page *page,
>  						unsigned int order) { }
>  #endif
>  
> +/*
> + * Remove hugetlb page from lists, and update dtor so that page appears
> + * as just a compound page.  A reference is held on the page.
> + *
> + * Must be called with hugetlb lock held.
> + */
> +static void remove_hugetlb_page(struct hstate *h, struct page *page,
> +							bool adjust_surplus)
> +{
> +	int nid = page_to_nid(page);
> +
> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> +		return;
> +
> +	list_del(&page->lru);
> +
> +	if (HPageFreed(page)) {
> +		h->free_huge_pages--;
> +		h->free_huge_pages_node[nid]--;
> +		ClearHPageFreed(page);
> +	}
> +	if (adjust_surplus) {
> +		h->surplus_huge_pages--;
> +		h->surplus_huge_pages_node[nid]--;
> +	}
> +
> +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
> +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);
> +
> +	ClearHPageTemporary(page);
> +	set_page_refcounted(page);
> +	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
> +
> +	h->nr_huge_pages--;
> +	h->nr_huge_pages_node[nid]--;
> +}
> +
>  static void update_and_free_page(struct hstate *h, struct page *page)
>  {
>  	int i;
> @@ -1339,8 +1371,6 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>  		return;
>  
> -	h->nr_huge_pages--;
> -	h->nr_huge_pages_node[page_to_nid(page)]--;
>  	for (i = 0; i < pages_per_huge_page(h);
>  	     i++, subpage = mem_map_next(subpage, page, i)) {
>  		subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
> @@ -1348,10 +1378,6 @@ static void update_and_free_page(struct hstate *h, struct page *page)
>  				1 << PG_active | 1 << PG_private |
>  				1 << PG_writeback);
>  	}
> -	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
> -	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);
> -	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
> -	set_page_refcounted(page);
>  	if (hstate_is_gigantic(h)) {
>  		destroy_compound_gigantic_page(page, huge_page_order(h));
>  		free_gigantic_page(page, huge_page_order(h));
> @@ -1419,15 +1445,12 @@ static void __free_huge_page(struct page *page)
>  		h->resv_huge_pages++;
>  
>  	if (HPageTemporary(page)) {
> -		list_del(&page->lru);
> -		ClearHPageTemporary(page);
> +		remove_hugetlb_page(h, page, false);
>  		update_and_free_page(h, page);
>  	} else if (h->surplus_huge_pages_node[nid]) {
>  		/* remove the page from active list */
> -		list_del(&page->lru);
> +		remove_hugetlb_page(h, page, true);
>  		update_and_free_page(h, page);
> -		h->surplus_huge_pages--;
> -		h->surplus_huge_pages_node[nid]--;
>  	} else {
>  		arch_clear_hugepage_flags(page);
>  		enqueue_huge_page(h, page);
> @@ -1712,13 +1735,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
>  			struct page *page =
>  				list_entry(h->hugepage_freelists[node].next,
>  					  struct page, lru);
> -			list_del(&page->lru);
> -			h->free_huge_pages--;
> -			h->free_huge_pages_node[node]--;
> -			if (acct_surplus) {
> -				h->surplus_huge_pages--;
> -				h->surplus_huge_pages_node[node]--;
> -			}
> +			remove_hugetlb_page(h, page, acct_surplus);
>  			update_and_free_page(h, page);
>  			ret = 1;
>  			break;
> @@ -1756,7 +1773,6 @@ int dissolve_free_huge_page(struct page *page)
>  	if (!page_count(page)) {
>  		struct page *head = compound_head(page);
>  		struct hstate *h = page_hstate(head);
> -		int nid = page_to_nid(head);
>  		if (h->free_huge_pages - h->resv_huge_pages == 0)
>  			goto out;
>  
> @@ -1787,9 +1803,7 @@ int dissolve_free_huge_page(struct page *page)
>  			SetPageHWPoison(page);
>  			ClearPageHWPoison(head);
>  		}
> -		list_del(&head->lru);
> -		h->free_huge_pages--;
> -		h->free_huge_pages_node[nid]--;
> +		remove_hugetlb_page(h, page, false);
>  		h->max_huge_pages--;
>  		update_and_free_page(h, head);
>  		rc = 0;
> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>  		/*
>  		 * Freed from under us. Drop new_page too.
>  		 */
> +		remove_hugetlb_page(h, new_page, false);
>  		update_and_free_page(h, new_page);
>  		goto unlock;
>  	} else if (page_count(old_page)) {
> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>  		 * Someone has grabbed the page, try to isolate it here.
>  		 * Fail with -EBUSY if not possible.
>  		 */
> +		remove_hugetlb_page(h, new_page, false);
>  		update_and_free_page(h, new_page);
>  		spin_unlock(&hugetlb_lock);
>  		if (!isolate_huge_page(old_page, list))
> @@ -2323,13 +2339,13 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>  		/*
>  		 * Ok, old_page is still a genuine free hugepage. Replace it
>  		 * with the new one.
> +		 * Note: h->free_huge_pages{_node} counters are decremented
> +		 * in remove_hugetlb_page for old_page and incremented in
> +		 * enqueue_huge_page for new page.  Net result is no change.
>  		 */
> -		list_del(&old_page->lru);
> +		remove_hugetlb_page(h, old_page, false);
>  		update_and_free_page(h, old_page);
> -		/*
> -		 * h->free_huge_pages{_node} counters do not need to be updated.
> -		 */
> -		__enqueue_huge_page(&h->hugepage_freelists[nid], new_page);
> +		enqueue_huge_page(h, new_page);
>  	}
>  unlock:
>  	spin_unlock(&hugetlb_lock);
> @@ -2667,10 +2683,8 @@ static void try_to_free_low(struct hstate *h, unsigned long count,
>  				return;
>  			if (PageHighMem(page))
>  				continue;
> -			list_del(&page->lru);
> +			remove_hugetlb_page(h, page, false);
>  			update_and_free_page(h, page);
> -			h->free_huge_pages--;
> -			h->free_huge_pages_node[page_to_nid(page)]--;
>  		}
>  	}
>  }
> -- 
> 2.30.2
>
Mike Kravetz April 6, 2021, 4:49 p.m. UTC | #5
On 4/6/21 2:56 AM, Michal Hocko wrote:
> On Mon 05-04-21 16:00:39, Mike Kravetz wrote:
>> The new remove_hugetlb_page() routine is designed to remove a hugetlb
>> page from hugetlbfs processing.  It will remove the page from the active
>> or free list, update global counters and set the compound page
>> destructor to NULL so that PageHuge() will return false for the 'page'.
>> After this call, the 'page' can be treated as a normal compound page or
>> a collection of base size pages.
>>
>> update_and_free_page no longer decrements h->nr_huge_pages{_node} as
>> this is performed in remove_hugetlb_page.  The only functionality
>> performed by update_and_free_page is to free the base pages to the lower
>> level allocators.
>>
>> update_and_free_page is typically called after remove_hugetlb_page.
>>
>> remove_hugetlb_page is to be called with the hugetlb_lock held.
>>
>> Creating this routine and separating functionality is in preparation for
>> restructuring code to reduce lock hold times.  This commit should not
>> introduce any changes to functionality.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Btw. I would prefer to reverse the ordering of this and Oscar's
> patchset. This one is a bug fix which might be interesting for stable
> backports while Oscar's work can be looked as a general functionality
> improvement.

Ok, that makes sense.

Andrew, can we make this happen?  It would require removing Oscar's
series until it can be modified to work on top of this.
There is only one small issue with this series as it originally went
into mmotm.  There is a missing conversion of spin_lock to spin_lock_irq
in patch 7.  In addition, there are some suggested changes from Oscar to
this patch.  I do not think they are necessary, but I could make those
as well.  Let me know what I can do to help make this happen.

>> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>>  		/*
>>  		 * Freed from under us. Drop new_page too.
>>  		 */
>> +		remove_hugetlb_page(h, new_page, false);
>>  		update_and_free_page(h, new_page);
>>  		goto unlock;
>>  	} else if (page_count(old_page)) {
>> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>>  		 * Someone has grabbed the page, try to isolate it here.
>>  		 * Fail with -EBUSY if not possible.
>>  		 */
>> +		remove_hugetlb_page(h, new_page, false);
>>  		update_and_free_page(h, new_page);
>>  		spin_unlock(&hugetlb_lock);
>>  		if (!isolate_huge_page(old_page, list))
> 
> the page is not enqued anywhere here so remove_hugetlb_page would blow
> when linked list debugging is enabled.

I also thought this would be an issue.  However, INIT_LIST_HEAD would
have been called for the page so,

static inline void INIT_LIST_HEAD(struct list_head *list)
{
        WRITE_ONCE(list->next, list);
        list->prev = list;
}

The debug checks of concern in __list_del_entry_valid are:

            CHECK_DATA_CORRUPTION(prev->next != entry,
                        "list_del corruption. prev->next should be %px, but was
%px\n",
                        entry, prev->next) ||
            CHECK_DATA_CORRUPTION(next->prev != entry,
                        "list_del corruption. next->prev should be %px, but was
%px\n",
                        entry, next->prev))

Since, all pointers point back to the list(head) the check passes.  My
track record with the list routines is not so good, so I actually
forced list_del after INIT_LIST_HEAD with list debugging enabled and did
not enounter any issues.

Going forward, I agree it would be better to perhaps add a list_empty
check so that things do not blow up if the debugging code is changed.

At one time I also thought of splitting the functionality in
alloc_fresh_huge_page and prep_new_huge_page so that it would only
allocate/prep the page but not increment nr_huge_pages.  A new routine
would be used to increment the counter when it was actually put into use.
I thought this could be used when doing bulk adjustments in set_max_huge_pages
but the benefit would be minimal.  This seems like something that would
be useful in Oscar's alloc_and_dissolve_huge_page routine.
Oscar Salvador April 6, 2021, 5:57 p.m. UTC | #6
On 2021-04-06 18:49, Mike Kravetz wrote:
> 
> Andrew, can we make this happen?  It would require removing Oscar's
> series until it can be modified to work on top of this.
> There is only one small issue with this series as it originally went
> into mmotm.  There is a missing conversion of spin_lock to 
> spin_lock_irq
> in patch 7.  In addition, there are some suggested changes from Oscar 
> to
> this patch.  I do not think they are necessary, but I could make those
> as well.  Let me know what I can do to help make this happen.

I agree that it might not be necesary to make such changes, but I still 
would like to see an explanation on the points I raised(excluding the 
list_del() as you already proved that is not necesary), just to be sure 
I am not missing anything.
Mike Kravetz April 6, 2021, 8:52 p.m. UTC | #7
On 4/6/21 6:41 AM, Oscar Salvador wrote:
> On Mon, Apr 05, 2021 at 04:00:39PM -0700, Mike Kravetz wrote:
>> +static void remove_hugetlb_page(struct hstate *h, struct page *page,
>> +							bool adjust_surplus)
>> +{
>> +	int nid = page_to_nid(page);
>> +
>> +	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
>> +		return;
>> +
>> +	list_del(&page->lru);
>> +
>> +	if (HPageFreed(page)) {
>> +		h->free_huge_pages--;
>> +		h->free_huge_pages_node[nid]--;
>> +		ClearHPageFreed(page);
>> +	}
>> +	if (adjust_surplus) {
>> +		h->surplus_huge_pages--;
>> +		h->surplus_huge_pages_node[nid]--;
>> +	}
>> +
>> +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
>> +	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);
> 
> These checks feel a bit odd here.
> I would move them above, before we start messing with the counters?
> 

This routine is comprised of code that was previously in update_and_free_page
and __free_huge_page.   In those routines, the VM_BUG_ON_PAGE came after the
counter adjustments.  That is the only reason they are positioned as they are.

I agree that it makes more sense to add them to the beginning of the routine.

>> +
>> +	ClearHPageTemporary(page);
> 
> Why clearing it unconditionally? I guess we do not really care, but
> someone might wonder why when reading the core.
> So I would either do as we used to do and only clear it in case of
> HPageTemporary(), or drop a comment.
> 

Technically, the HPage* flags are meaningless after calling this
routine.  So, there really is no need to modify them at all.  The
flag clearing code is left over from the routines in which they
previously existed.

Any clearing of HPage* flags in this routine is unnecessary and should
be removed to avoid any questions.
Michal Hocko April 7, 2021, 8:21 a.m. UTC | #8
On Tue 06-04-21 09:49:13, Mike Kravetz wrote:
> On 4/6/21 2:56 AM, Michal Hocko wrote:
> > On Mon 05-04-21 16:00:39, Mike Kravetz wrote:
[...]
> >> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
> >>  		/*
> >>  		 * Freed from under us. Drop new_page too.
> >>  		 */
> >> +		remove_hugetlb_page(h, new_page, false);
> >>  		update_and_free_page(h, new_page);
> >>  		goto unlock;
> >>  	} else if (page_count(old_page)) {
> >> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
> >>  		 * Someone has grabbed the page, try to isolate it here.
> >>  		 * Fail with -EBUSY if not possible.
> >>  		 */
> >> +		remove_hugetlb_page(h, new_page, false);
> >>  		update_and_free_page(h, new_page);
> >>  		spin_unlock(&hugetlb_lock);
> >>  		if (!isolate_huge_page(old_page, list))
> > 
> > the page is not enqued anywhere here so remove_hugetlb_page would blow
> > when linked list debugging is enabled.
> 
> I also thought this would be an issue.  However, INIT_LIST_HEAD would
> have been called for the page so,

OK, this is true for a freshly allocated hugetlb page (prep_new_huge_page.
It's a very sublte dependency though. In case somebody ever wants to
fortify linked lists and decides to check list_del on an empty list then
this would wait silently to blow up.

> Going forward, I agree it would be better to perhaps add a list_empty
> check so that things do not blow up if the debugging code is changed.

Yes this is less tricky then a bool flag or making more stages of the
tear down. 2 stages are more than enough IMHO.

> At one time I also thought of splitting the functionality in
> alloc_fresh_huge_page and prep_new_huge_page so that it would only
> allocate/prep the page but not increment nr_huge_pages.

We already have that distinction. alloc_buddy_huge_page is there to
allocate a fresh huge page without any hstate  accunting. Considering
that giga pages are not supported for the migration anyway, maybe this
would make Oscar's work slightly less tricky?
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8497a3598c86..df2a3d1f632b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1055,18 +1055,13 @@  static bool vma_has_reserves(struct vm_area_struct *vma, long chg)
 	return false;
 }
 
-static void __enqueue_huge_page(struct list_head *list, struct page *page)
-{
-	list_move(&page->lru, list);
-	SetHPageFreed(page);
-}
-
 static void enqueue_huge_page(struct hstate *h, struct page *page)
 {
 	int nid = page_to_nid(page);
-	__enqueue_huge_page(&h->hugepage_freelists[nid], page);
+	list_move(&page->lru, &h->hugepage_freelists[nid]);
 	h->free_huge_pages++;
 	h->free_huge_pages_node[nid]++;
+	SetHPageFreed(page);
 }
 
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1331,6 +1326,43 @@  static inline void destroy_compound_gigantic_page(struct page *page,
 						unsigned int order) { }
 #endif
 
+/*
+ * Remove hugetlb page from lists, and update dtor so that page appears
+ * as just a compound page.  A reference is held on the page.
+ *
+ * Must be called with hugetlb lock held.
+ */
+static void remove_hugetlb_page(struct hstate *h, struct page *page,
+							bool adjust_surplus)
+{
+	int nid = page_to_nid(page);
+
+	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
+		return;
+
+	list_del(&page->lru);
+
+	if (HPageFreed(page)) {
+		h->free_huge_pages--;
+		h->free_huge_pages_node[nid]--;
+		ClearHPageFreed(page);
+	}
+	if (adjust_surplus) {
+		h->surplus_huge_pages--;
+		h->surplus_huge_pages_node[nid]--;
+	}
+
+	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
+	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);
+
+	ClearHPageTemporary(page);
+	set_page_refcounted(page);
+	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
+
+	h->nr_huge_pages--;
+	h->nr_huge_pages_node[nid]--;
+}
+
 static void update_and_free_page(struct hstate *h, struct page *page)
 {
 	int i;
@@ -1339,8 +1371,6 @@  static void update_and_free_page(struct hstate *h, struct page *page)
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
 
-	h->nr_huge_pages--;
-	h->nr_huge_pages_node[page_to_nid(page)]--;
 	for (i = 0; i < pages_per_huge_page(h);
 	     i++, subpage = mem_map_next(subpage, page, i)) {
 		subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
@@ -1348,10 +1378,6 @@  static void update_and_free_page(struct hstate *h, struct page *page)
 				1 << PG_active | 1 << PG_private |
 				1 << PG_writeback);
 	}
-	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page);
-	VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page);
-	set_compound_page_dtor(page, NULL_COMPOUND_DTOR);
-	set_page_refcounted(page);
 	if (hstate_is_gigantic(h)) {
 		destroy_compound_gigantic_page(page, huge_page_order(h));
 		free_gigantic_page(page, huge_page_order(h));
@@ -1419,15 +1445,12 @@  static void __free_huge_page(struct page *page)
 		h->resv_huge_pages++;
 
 	if (HPageTemporary(page)) {
-		list_del(&page->lru);
-		ClearHPageTemporary(page);
+		remove_hugetlb_page(h, page, false);
 		update_and_free_page(h, page);
 	} else if (h->surplus_huge_pages_node[nid]) {
 		/* remove the page from active list */
-		list_del(&page->lru);
+		remove_hugetlb_page(h, page, true);
 		update_and_free_page(h, page);
-		h->surplus_huge_pages--;
-		h->surplus_huge_pages_node[nid]--;
 	} else {
 		arch_clear_hugepage_flags(page);
 		enqueue_huge_page(h, page);
@@ -1712,13 +1735,7 @@  static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 			struct page *page =
 				list_entry(h->hugepage_freelists[node].next,
 					  struct page, lru);
-			list_del(&page->lru);
-			h->free_huge_pages--;
-			h->free_huge_pages_node[node]--;
-			if (acct_surplus) {
-				h->surplus_huge_pages--;
-				h->surplus_huge_pages_node[node]--;
-			}
+			remove_hugetlb_page(h, page, acct_surplus);
 			update_and_free_page(h, page);
 			ret = 1;
 			break;
@@ -1756,7 +1773,6 @@  int dissolve_free_huge_page(struct page *page)
 	if (!page_count(page)) {
 		struct page *head = compound_head(page);
 		struct hstate *h = page_hstate(head);
-		int nid = page_to_nid(head);
 		if (h->free_huge_pages - h->resv_huge_pages == 0)
 			goto out;
 
@@ -1787,9 +1803,7 @@  int dissolve_free_huge_page(struct page *page)
 			SetPageHWPoison(page);
 			ClearPageHWPoison(head);
 		}
-		list_del(&head->lru);
-		h->free_huge_pages--;
-		h->free_huge_pages_node[nid]--;
+		remove_hugetlb_page(h, page, false);
 		h->max_huge_pages--;
 		update_and_free_page(h, head);
 		rc = 0;
@@ -2298,6 +2312,7 @@  static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 		/*
 		 * Freed from under us. Drop new_page too.
 		 */
+		remove_hugetlb_page(h, new_page, false);
 		update_and_free_page(h, new_page);
 		goto unlock;
 	} else if (page_count(old_page)) {
@@ -2305,6 +2320,7 @@  static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 		 * Someone has grabbed the page, try to isolate it here.
 		 * Fail with -EBUSY if not possible.
 		 */
+		remove_hugetlb_page(h, new_page, false);
 		update_and_free_page(h, new_page);
 		spin_unlock(&hugetlb_lock);
 		if (!isolate_huge_page(old_page, list))
@@ -2323,13 +2339,13 @@  static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
 		/*
 		 * Ok, old_page is still a genuine free hugepage. Replace it
 		 * with the new one.
+		 * Note: h->free_huge_pages{_node} counters are decremented
+		 * in remove_hugetlb_page for old_page and incremented in
+		 * enqueue_huge_page for new page.  Net result is no change.
 		 */
-		list_del(&old_page->lru);
+		remove_hugetlb_page(h, old_page, false);
 		update_and_free_page(h, old_page);
-		/*
-		 * h->free_huge_pages{_node} counters do not need to be updated.
-		 */
-		__enqueue_huge_page(&h->hugepage_freelists[nid], new_page);
+		enqueue_huge_page(h, new_page);
 	}
 unlock:
 	spin_unlock(&hugetlb_lock);
@@ -2667,10 +2683,8 @@  static void try_to_free_low(struct hstate *h, unsigned long count,
 				return;
 			if (PageHighMem(page))
 				continue;
-			list_del(&page->lru);
+			remove_hugetlb_page(h, page, false);
 			update_and_free_page(h, page);
-			h->free_huge_pages--;
-			h->free_huge_pages_node[page_to_nid(page)]--;
 		}
 	}
 }