Message ID | 20210405230043.182734-5-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | make hugetlb put_page safe for all calling contexts | expand |
On Mon 05-04-21 16:00:39, Mike Kravetz wrote: > The new remove_hugetlb_page() routine is designed to remove a hugetlb > page from hugetlbfs processing. It will remove the page from the active > or free list, update global counters and set the compound page > destructor to NULL so that PageHuge() will return false for the 'page'. > After this call, the 'page' can be treated as a normal compound page or > a collection of base size pages. > > update_and_free_page no longer decrements h->nr_huge_pages{_node} as > this is performed in remove_hugetlb_page. The only functionality > performed by update_and_free_page is to free the base pages to the lower > level allocators. > > update_and_free_page is typically called after remove_hugetlb_page. > > remove_hugetlb_page is to be called with the hugetlb_lock held. > > Creating this routine and separating functionality is in preparation for > restructuring code to reduce lock hold times. This commit should not > introduce any changes to functionality. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Btw. I would prefer to reverse the ordering of this and Oscar's patchset. This one is a bug fix which might be interesting for stable backports while Oscar's work can be looked as a general functionality improvement. > @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > /* > * Freed from under us. Drop new_page too. > */ > + remove_hugetlb_page(h, new_page, false); > update_and_free_page(h, new_page); > goto unlock; > } else if (page_count(old_page)) { > @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > * Someone has grabbed the page, try to isolate it here. > * Fail with -EBUSY if not possible. > */ > + remove_hugetlb_page(h, new_page, false); > update_and_free_page(h, new_page); > spin_unlock(&hugetlb_lock); > if (!isolate_huge_page(old_page, list)) the page is not enqued anywhere here so remove_hugetlb_page would blow when linked list debugging is enabled.
On Tue, Apr 06, 2021 at 11:56:11AM +0200, Michal Hocko wrote: > Btw. I would prefer to reverse the ordering of this and Oscar's > patchset. This one is a bug fix which might be interesting for stable > backports while Oscar's work can be looked as a general functionality > improvement. If it makes things easier, I do not mind this work gets first and then I work on top of that. Given said that, I will try to have a look at this series.
On Mon, Apr 05, 2021 at 04:00:39PM -0700, Mike Kravetz wrote: > +static void remove_hugetlb_page(struct hstate *h, struct page *page, > + bool adjust_surplus) > +{ > + int nid = page_to_nid(page); > + > + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) > + return; > + > + list_del(&page->lru); > + > + if (HPageFreed(page)) { > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + ClearHPageFreed(page); > + } > + if (adjust_surplus) { > + h->surplus_huge_pages--; > + h->surplus_huge_pages_node[nid]--; > + } > + > + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); > + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); These checks feel a bit odd here. I would move them above, before we start messing with the counters? > + > + ClearHPageTemporary(page); Why clearing it unconditionally? I guess we do not really care, but someone might wonder why when reading the core. So I would either do as we used to do and only clear it in case of HPageTemporary(), or drop a comment. > + set_page_refcounted(page); > + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > + > + h->nr_huge_pages--; > + h->nr_huge_pages_node[nid]--; > +} As Michal pointed out, remove_hugetlb_page() from alloc_and_dissolve_huge_page() might be problematic in some cases because the new_page has not been enqueued yet. Now, I guess this might be easily fixed with checking list_empty() before going ahead with list_del() call, or with another bool parameter 'delete', with a fat comment explaining why we can get to call remove_huge_page() on a page that is not in any list. Another solution that comes to my mind is to split remove_huge_page() functionality in 1) delete from list + unaccount free and surplus pages and 2) reset page's state + unaccount nr_huge_pages But that might be just biased as would fit for my usecase. So maybe a list_empty() or the bool parameter might just do.
On Mon, Apr 05, 2021 at 04:00:39PM -0700, Mike Kravetz wrote: > The new remove_hugetlb_page() routine is designed to remove a hugetlb > page from hugetlbfs processing. It will remove the page from the active > or free list, update global counters and set the compound page > destructor to NULL so that PageHuge() will return false for the 'page'. > After this call, the 'page' can be treated as a normal compound page or > a collection of base size pages. > > update_and_free_page no longer decrements h->nr_huge_pages{_node} as > this is performed in remove_hugetlb_page. The only functionality > performed by update_and_free_page is to free the base pages to the lower > level allocators. > > update_and_free_page is typically called after remove_hugetlb_page. > > remove_hugetlb_page is to be called with the hugetlb_lock held. > > Creating this routine and separating functionality is in preparation for > restructuring code to reduce lock hold times. This commit should not > introduce any changes to functionality. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Btw, it seems you were just doing fine before realizing that my series went in. So, as this seems a rather urgent matter to move forward (for obvious reasons and also because it holds hotplug-vmemmap stuff), I wonder if it would make your life easier to just ask Andrew to remove my series for the time being and give it yours priority. I can later work on top of that. > --- > mm/hugetlb.c | 88 ++++++++++++++++++++++++++++++---------------------- > 1 file changed, 51 insertions(+), 37 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8497a3598c86..df2a3d1f632b 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1055,18 +1055,13 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) > return false; > } > > -static void __enqueue_huge_page(struct list_head *list, struct page *page) > -{ > - list_move(&page->lru, list); > - SetHPageFreed(page); > -} > - > static void enqueue_huge_page(struct hstate *h, struct page *page) > { > int nid = page_to_nid(page); > - __enqueue_huge_page(&h->hugepage_freelists[nid], page); > + list_move(&page->lru, &h->hugepage_freelists[nid]); > h->free_huge_pages++; > h->free_huge_pages_node[nid]++; > + SetHPageFreed(page); > } > > static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) > @@ -1331,6 +1326,43 @@ static inline void destroy_compound_gigantic_page(struct page *page, > unsigned int order) { } > #endif > > +/* > + * Remove hugetlb page from lists, and update dtor so that page appears > + * as just a compound page. A reference is held on the page. > + * > + * Must be called with hugetlb lock held. > + */ > +static void remove_hugetlb_page(struct hstate *h, struct page *page, > + bool adjust_surplus) > +{ > + int nid = page_to_nid(page); > + > + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) > + return; > + > + list_del(&page->lru); > + > + if (HPageFreed(page)) { > + h->free_huge_pages--; > + h->free_huge_pages_node[nid]--; > + ClearHPageFreed(page); > + } > + if (adjust_surplus) { > + h->surplus_huge_pages--; > + h->surplus_huge_pages_node[nid]--; > + } > + > + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); > + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); > + > + ClearHPageTemporary(page); > + set_page_refcounted(page); > + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > + > + h->nr_huge_pages--; > + h->nr_huge_pages_node[nid]--; > +} > + > static void update_and_free_page(struct hstate *h, struct page *page) > { > int i; > @@ -1339,8 +1371,6 @@ static void update_and_free_page(struct hstate *h, struct page *page) > if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) > return; > > - h->nr_huge_pages--; > - h->nr_huge_pages_node[page_to_nid(page)]--; > for (i = 0; i < pages_per_huge_page(h); > i++, subpage = mem_map_next(subpage, page, i)) { > subpage->flags &= ~(1 << PG_locked | 1 << PG_error | > @@ -1348,10 +1378,6 @@ static void update_and_free_page(struct hstate *h, struct page *page) > 1 << PG_active | 1 << PG_private | > 1 << PG_writeback); > } > - VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); > - VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); > - set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > - set_page_refcounted(page); > if (hstate_is_gigantic(h)) { > destroy_compound_gigantic_page(page, huge_page_order(h)); > free_gigantic_page(page, huge_page_order(h)); > @@ -1419,15 +1445,12 @@ static void __free_huge_page(struct page *page) > h->resv_huge_pages++; > > if (HPageTemporary(page)) { > - list_del(&page->lru); > - ClearHPageTemporary(page); > + remove_hugetlb_page(h, page, false); > update_and_free_page(h, page); > } else if (h->surplus_huge_pages_node[nid]) { > /* remove the page from active list */ > - list_del(&page->lru); > + remove_hugetlb_page(h, page, true); > update_and_free_page(h, page); > - h->surplus_huge_pages--; > - h->surplus_huge_pages_node[nid]--; > } else { > arch_clear_hugepage_flags(page); > enqueue_huge_page(h, page); > @@ -1712,13 +1735,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > struct page *page = > list_entry(h->hugepage_freelists[node].next, > struct page, lru); > - list_del(&page->lru); > - h->free_huge_pages--; > - h->free_huge_pages_node[node]--; > - if (acct_surplus) { > - h->surplus_huge_pages--; > - h->surplus_huge_pages_node[node]--; > - } > + remove_hugetlb_page(h, page, acct_surplus); > update_and_free_page(h, page); > ret = 1; > break; > @@ -1756,7 +1773,6 @@ int dissolve_free_huge_page(struct page *page) > if (!page_count(page)) { > struct page *head = compound_head(page); > struct hstate *h = page_hstate(head); > - int nid = page_to_nid(head); > if (h->free_huge_pages - h->resv_huge_pages == 0) > goto out; > > @@ -1787,9 +1803,7 @@ int dissolve_free_huge_page(struct page *page) > SetPageHWPoison(page); > ClearPageHWPoison(head); > } > - list_del(&head->lru); > - h->free_huge_pages--; > - h->free_huge_pages_node[nid]--; > + remove_hugetlb_page(h, page, false); > h->max_huge_pages--; > update_and_free_page(h, head); > rc = 0; > @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > /* > * Freed from under us. Drop new_page too. > */ > + remove_hugetlb_page(h, new_page, false); > update_and_free_page(h, new_page); > goto unlock; > } else if (page_count(old_page)) { > @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > * Someone has grabbed the page, try to isolate it here. > * Fail with -EBUSY if not possible. > */ > + remove_hugetlb_page(h, new_page, false); > update_and_free_page(h, new_page); > spin_unlock(&hugetlb_lock); > if (!isolate_huge_page(old_page, list)) > @@ -2323,13 +2339,13 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > /* > * Ok, old_page is still a genuine free hugepage. Replace it > * with the new one. > + * Note: h->free_huge_pages{_node} counters are decremented > + * in remove_hugetlb_page for old_page and incremented in > + * enqueue_huge_page for new page. Net result is no change. > */ > - list_del(&old_page->lru); > + remove_hugetlb_page(h, old_page, false); > update_and_free_page(h, old_page); > - /* > - * h->free_huge_pages{_node} counters do not need to be updated. > - */ > - __enqueue_huge_page(&h->hugepage_freelists[nid], new_page); > + enqueue_huge_page(h, new_page); > } > unlock: > spin_unlock(&hugetlb_lock); > @@ -2667,10 +2683,8 @@ static void try_to_free_low(struct hstate *h, unsigned long count, > return; > if (PageHighMem(page)) > continue; > - list_del(&page->lru); > + remove_hugetlb_page(h, page, false); > update_and_free_page(h, page); > - h->free_huge_pages--; > - h->free_huge_pages_node[page_to_nid(page)]--; > } > } > } > -- > 2.30.2 >
On 4/6/21 2:56 AM, Michal Hocko wrote: > On Mon 05-04-21 16:00:39, Mike Kravetz wrote: >> The new remove_hugetlb_page() routine is designed to remove a hugetlb >> page from hugetlbfs processing. It will remove the page from the active >> or free list, update global counters and set the compound page >> destructor to NULL so that PageHuge() will return false for the 'page'. >> After this call, the 'page' can be treated as a normal compound page or >> a collection of base size pages. >> >> update_and_free_page no longer decrements h->nr_huge_pages{_node} as >> this is performed in remove_hugetlb_page. The only functionality >> performed by update_and_free_page is to free the base pages to the lower >> level allocators. >> >> update_and_free_page is typically called after remove_hugetlb_page. >> >> remove_hugetlb_page is to be called with the hugetlb_lock held. >> >> Creating this routine and separating functionality is in preparation for >> restructuring code to reduce lock hold times. This commit should not >> introduce any changes to functionality. >> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > Btw. I would prefer to reverse the ordering of this and Oscar's > patchset. This one is a bug fix which might be interesting for stable > backports while Oscar's work can be looked as a general functionality > improvement. Ok, that makes sense. Andrew, can we make this happen? It would require removing Oscar's series until it can be modified to work on top of this. There is only one small issue with this series as it originally went into mmotm. There is a missing conversion of spin_lock to spin_lock_irq in patch 7. In addition, there are some suggested changes from Oscar to this patch. I do not think they are necessary, but I could make those as well. Let me know what I can do to help make this happen. >> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, >> /* >> * Freed from under us. Drop new_page too. >> */ >> + remove_hugetlb_page(h, new_page, false); >> update_and_free_page(h, new_page); >> goto unlock; >> } else if (page_count(old_page)) { >> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, >> * Someone has grabbed the page, try to isolate it here. >> * Fail with -EBUSY if not possible. >> */ >> + remove_hugetlb_page(h, new_page, false); >> update_and_free_page(h, new_page); >> spin_unlock(&hugetlb_lock); >> if (!isolate_huge_page(old_page, list)) > > the page is not enqued anywhere here so remove_hugetlb_page would blow > when linked list debugging is enabled. I also thought this would be an issue. However, INIT_LIST_HEAD would have been called for the page so, static inline void INIT_LIST_HEAD(struct list_head *list) { WRITE_ONCE(list->next, list); list->prev = list; } The debug checks of concern in __list_del_entry_valid are: CHECK_DATA_CORRUPTION(prev->next != entry, "list_del corruption. prev->next should be %px, but was %px\n", entry, prev->next) || CHECK_DATA_CORRUPTION(next->prev != entry, "list_del corruption. next->prev should be %px, but was %px\n", entry, next->prev)) Since, all pointers point back to the list(head) the check passes. My track record with the list routines is not so good, so I actually forced list_del after INIT_LIST_HEAD with list debugging enabled and did not enounter any issues. Going forward, I agree it would be better to perhaps add a list_empty check so that things do not blow up if the debugging code is changed. At one time I also thought of splitting the functionality in alloc_fresh_huge_page and prep_new_huge_page so that it would only allocate/prep the page but not increment nr_huge_pages. A new routine would be used to increment the counter when it was actually put into use. I thought this could be used when doing bulk adjustments in set_max_huge_pages but the benefit would be minimal. This seems like something that would be useful in Oscar's alloc_and_dissolve_huge_page routine.
On 2021-04-06 18:49, Mike Kravetz wrote: > > Andrew, can we make this happen? It would require removing Oscar's > series until it can be modified to work on top of this. > There is only one small issue with this series as it originally went > into mmotm. There is a missing conversion of spin_lock to > spin_lock_irq > in patch 7. In addition, there are some suggested changes from Oscar > to > this patch. I do not think they are necessary, but I could make those > as well. Let me know what I can do to help make this happen. I agree that it might not be necesary to make such changes, but I still would like to see an explanation on the points I raised(excluding the list_del() as you already proved that is not necesary), just to be sure I am not missing anything.
On 4/6/21 6:41 AM, Oscar Salvador wrote: > On Mon, Apr 05, 2021 at 04:00:39PM -0700, Mike Kravetz wrote: >> +static void remove_hugetlb_page(struct hstate *h, struct page *page, >> + bool adjust_surplus) >> +{ >> + int nid = page_to_nid(page); >> + >> + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) >> + return; >> + >> + list_del(&page->lru); >> + >> + if (HPageFreed(page)) { >> + h->free_huge_pages--; >> + h->free_huge_pages_node[nid]--; >> + ClearHPageFreed(page); >> + } >> + if (adjust_surplus) { >> + h->surplus_huge_pages--; >> + h->surplus_huge_pages_node[nid]--; >> + } >> + >> + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); >> + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); > > These checks feel a bit odd here. > I would move them above, before we start messing with the counters? > This routine is comprised of code that was previously in update_and_free_page and __free_huge_page. In those routines, the VM_BUG_ON_PAGE came after the counter adjustments. That is the only reason they are positioned as they are. I agree that it makes more sense to add them to the beginning of the routine. >> + >> + ClearHPageTemporary(page); > > Why clearing it unconditionally? I guess we do not really care, but > someone might wonder why when reading the core. > So I would either do as we used to do and only clear it in case of > HPageTemporary(), or drop a comment. > Technically, the HPage* flags are meaningless after calling this routine. So, there really is no need to modify them at all. The flag clearing code is left over from the routines in which they previously existed. Any clearing of HPage* flags in this routine is unnecessary and should be removed to avoid any questions.
On Tue 06-04-21 09:49:13, Mike Kravetz wrote: > On 4/6/21 2:56 AM, Michal Hocko wrote: > > On Mon 05-04-21 16:00:39, Mike Kravetz wrote: [...] > >> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > >> /* > >> * Freed from under us. Drop new_page too. > >> */ > >> + remove_hugetlb_page(h, new_page, false); > >> update_and_free_page(h, new_page); > >> goto unlock; > >> } else if (page_count(old_page)) { > >> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > >> * Someone has grabbed the page, try to isolate it here. > >> * Fail with -EBUSY if not possible. > >> */ > >> + remove_hugetlb_page(h, new_page, false); > >> update_and_free_page(h, new_page); > >> spin_unlock(&hugetlb_lock); > >> if (!isolate_huge_page(old_page, list)) > > > > the page is not enqued anywhere here so remove_hugetlb_page would blow > > when linked list debugging is enabled. > > I also thought this would be an issue. However, INIT_LIST_HEAD would > have been called for the page so, OK, this is true for a freshly allocated hugetlb page (prep_new_huge_page. It's a very sublte dependency though. In case somebody ever wants to fortify linked lists and decides to check list_del on an empty list then this would wait silently to blow up. > Going forward, I agree it would be better to perhaps add a list_empty > check so that things do not blow up if the debugging code is changed. Yes this is less tricky then a bool flag or making more stages of the tear down. 2 stages are more than enough IMHO. > At one time I also thought of splitting the functionality in > alloc_fresh_huge_page and prep_new_huge_page so that it would only > allocate/prep the page but not increment nr_huge_pages. We already have that distinction. alloc_buddy_huge_page is there to allocate a fresh huge page without any hstate accunting. Considering that giga pages are not supported for the migration anyway, maybe this would make Oscar's work slightly less tricky?
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8497a3598c86..df2a3d1f632b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1055,18 +1055,13 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } -static void __enqueue_huge_page(struct list_head *list, struct page *page) -{ - list_move(&page->lru, list); - SetHPageFreed(page); -} - static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); - __enqueue_huge_page(&h->hugepage_freelists[nid], page); + list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + SetHPageFreed(page); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -1331,6 +1326,43 @@ static inline void destroy_compound_gigantic_page(struct page *page, unsigned int order) { } #endif +/* + * Remove hugetlb page from lists, and update dtor so that page appears + * as just a compound page. A reference is held on the page. + * + * Must be called with hugetlb lock held. + */ +static void remove_hugetlb_page(struct hstate *h, struct page *page, + bool adjust_surplus) +{ + int nid = page_to_nid(page); + + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) + return; + + list_del(&page->lru); + + if (HPageFreed(page)) { + h->free_huge_pages--; + h->free_huge_pages_node[nid]--; + ClearHPageFreed(page); + } + if (adjust_surplus) { + h->surplus_huge_pages--; + h->surplus_huge_pages_node[nid]--; + } + + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); + + ClearHPageTemporary(page); + set_page_refcounted(page); + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); + + h->nr_huge_pages--; + h->nr_huge_pages_node[nid]--; +} + static void update_and_free_page(struct hstate *h, struct page *page) { int i; @@ -1339,8 +1371,6 @@ static void update_and_free_page(struct hstate *h, struct page *page) if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; - h->nr_huge_pages--; - h->nr_huge_pages_node[page_to_nid(page)]--; for (i = 0; i < pages_per_huge_page(h); i++, subpage = mem_map_next(subpage, page, i)) { subpage->flags &= ~(1 << PG_locked | 1 << PG_error | @@ -1348,10 +1378,6 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_active | 1 << PG_private | 1 << PG_writeback); } - VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); - VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); - set_compound_page_dtor(page, NULL_COMPOUND_DTOR); - set_page_refcounted(page); if (hstate_is_gigantic(h)) { destroy_compound_gigantic_page(page, huge_page_order(h)); free_gigantic_page(page, huge_page_order(h)); @@ -1419,15 +1445,12 @@ static void __free_huge_page(struct page *page) h->resv_huge_pages++; if (HPageTemporary(page)) { - list_del(&page->lru); - ClearHPageTemporary(page); + remove_hugetlb_page(h, page, false); update_and_free_page(h, page); } else if (h->surplus_huge_pages_node[nid]) { /* remove the page from active list */ - list_del(&page->lru); + remove_hugetlb_page(h, page, true); update_and_free_page(h, page); - h->surplus_huge_pages--; - h->surplus_huge_pages_node[nid]--; } else { arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); @@ -1712,13 +1735,7 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, struct page *page = list_entry(h->hugepage_freelists[node].next, struct page, lru); - list_del(&page->lru); - h->free_huge_pages--; - h->free_huge_pages_node[node]--; - if (acct_surplus) { - h->surplus_huge_pages--; - h->surplus_huge_pages_node[node]--; - } + remove_hugetlb_page(h, page, acct_surplus); update_and_free_page(h, page); ret = 1; break; @@ -1756,7 +1773,6 @@ int dissolve_free_huge_page(struct page *page) if (!page_count(page)) { struct page *head = compound_head(page); struct hstate *h = page_hstate(head); - int nid = page_to_nid(head); if (h->free_huge_pages - h->resv_huge_pages == 0) goto out; @@ -1787,9 +1803,7 @@ int dissolve_free_huge_page(struct page *page) SetPageHWPoison(page); ClearPageHWPoison(head); } - list_del(&head->lru); - h->free_huge_pages--; - h->free_huge_pages_node[nid]--; + remove_hugetlb_page(h, page, false); h->max_huge_pages--; update_and_free_page(h, head); rc = 0; @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, /* * Freed from under us. Drop new_page too. */ + remove_hugetlb_page(h, new_page, false); update_and_free_page(h, new_page); goto unlock; } else if (page_count(old_page)) { @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, * Someone has grabbed the page, try to isolate it here. * Fail with -EBUSY if not possible. */ + remove_hugetlb_page(h, new_page, false); update_and_free_page(h, new_page); spin_unlock(&hugetlb_lock); if (!isolate_huge_page(old_page, list)) @@ -2323,13 +2339,13 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, /* * Ok, old_page is still a genuine free hugepage. Replace it * with the new one. + * Note: h->free_huge_pages{_node} counters are decremented + * in remove_hugetlb_page for old_page and incremented in + * enqueue_huge_page for new page. Net result is no change. */ - list_del(&old_page->lru); + remove_hugetlb_page(h, old_page, false); update_and_free_page(h, old_page); - /* - * h->free_huge_pages{_node} counters do not need to be updated. - */ - __enqueue_huge_page(&h->hugepage_freelists[nid], new_page); + enqueue_huge_page(h, new_page); } unlock: spin_unlock(&hugetlb_lock); @@ -2667,10 +2683,8 @@ static void try_to_free_low(struct hstate *h, unsigned long count, return; if (PageHighMem(page)) continue; - list_del(&page->lru); + remove_hugetlb_page(h, page, false); update_and_free_page(h, page); - h->free_huge_pages--; - h->free_huge_pages_node[page_to_nid(page)]--; } } }
The new remove_hugetlb_page() routine is designed to remove a hugetlb page from hugetlbfs processing. It will remove the page from the active or free list, update global counters and set the compound page destructor to NULL so that PageHuge() will return false for the 'page'. After this call, the 'page' can be treated as a normal compound page or a collection of base size pages. update_and_free_page no longer decrements h->nr_huge_pages{_node} as this is performed in remove_hugetlb_page. The only functionality performed by update_and_free_page is to free the base pages to the lower level allocators. update_and_free_page is typically called after remove_hugetlb_page. remove_hugetlb_page is to be called with the hugetlb_lock held. Creating this routine and separating functionality is in preparation for restructuring code to reduce lock hold times. This commit should not introduce any changes to functionality. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 88 ++++++++++++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 37 deletions(-)