Message ID | 20240321142448.1645400-2-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Various significant MM patches | expand |
On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote: > For compound pages which are at least order-2 (and hence have a > deferred_list), initialise it and then we can check at free that the > page is not part of a deferred list. We recently found this useful to > rule out a source of corruption. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/huge_memory.c | 2 -- > mm/hugetlb.c | 3 ++- > mm/internal.h | 2 ++ > mm/memcontrol.c | 2 ++ > mm/page_alloc.c | 9 +++++---- > 5 files changed, 11 insertions(+), 7 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 9859aa4f7553..04fb994a7b0b 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio) > { > if (!folio || !folio_test_large(folio)) > return; > - if (folio_order(folio) > 1) > - INIT_LIST_HEAD(&folio->_deferred_list); > folio_set_large_rmappable(folio); > } > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 23ef240ba48a..7e9a766059aa 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, > destroy_compound_gigantic_folio(folio, huge_page_order(h)); > free_gigantic_folio(folio, huge_page_order(h)); > } else { > - __free_pages(&folio->page, huge_page_order(h)); > + INIT_LIST_HEAD(&folio->_deferred_list); Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ? > + folio_put(folio); Can all __free_pages be replaced with folio_put in mm/hugetlb.c? Thanks.
On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote: > For compound pages which are at least order-2 (and hence have a > deferred_list), initialise it and then we can check at free that the > page is not part of a deferred list. We recently found this useful to > rule out a source of corruption. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz>
On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote: > For compound pages which are at least order-2 (and hence have a > deferred_list), initialise it and then we can check at free that the > page is not part of a deferred list. We recently found this useful to > rule out a source of corruption. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- Reviewed-by: David Hildenbrand <david@redhat.com>
On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote: > > +++ b/mm/hugetlb.c > > @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, > > destroy_compound_gigantic_folio(folio, huge_page_order(h)); > > free_gigantic_folio(folio, huge_page_order(h)); > > } else { > > - __free_pages(&folio->page, huge_page_order(h)); > > + INIT_LIST_HEAD(&folio->_deferred_list); > > Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ? Maybe? Something like /* We reused this space for our own purposes */ > > + folio_put(folio); > > Can all __free_pages be replaced with folio_put in mm/hugetlb.c? There's only one left, and indeed it can! I'll drop this into my tree and send it as a proper patch later. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 333f6278ef63..43cc7e6bc374 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, nodemask_t *node_alloc_noretry) { int order = huge_page_order(h); - struct page *page; + struct folio *folio; bool alloc_try_hard = true; bool retry = true; /* - * By default we always try hard to allocate the page with - * __GFP_RETRY_MAYFAIL flag. However, if we are allocating pages in + * By default we always try hard to allocate the folio with + * __GFP_RETRY_MAYFAIL flag. However, if we are allocating folios in * a loop (to adjust global huge page counts) and previous allocation * failed, do not continue to try hard on the same node. Use the * node_alloc_noretry bitmap to manage this state information. @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, if (nid == NUMA_NO_NODE) nid = numa_mem_id(); retry: - page = __alloc_pages(gfp_mask, order, nid, nmask); + folio = __folio_alloc(gfp_mask, order, nid, nmask); - /* Freeze head page */ - if (page && !page_ref_freeze(page, 1)) { - __free_pages(page, order); + if (folio && !folio_ref_freeze(folio, 1)) { + folio_put(folio); if (retry) { /* retry once */ retry = false; goto retry; } /* WOW! twice in a row. */ - pr_warn("HugeTLB head page unexpected inflated ref count\n"); - page = NULL; + pr_warn("HugeTLB unexpected inflated folio ref count\n"); + folio = NULL; } /* - * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this - * indicates an overall state change. Clear bit so that we resume - * normal 'try hard' allocations. + * If we did not specify __GFP_RETRY_MAYFAIL, but still got a + * folio this indicates an overall state change. Clear bit so + * that we resume normal 'try hard' allocations. */ - if (node_alloc_noretry && page && !alloc_try_hard) + if (node_alloc_noretry && folio && !alloc_try_hard) node_clear(nid, *node_alloc_noretry); /* - * If we tried hard to get a page but failed, set bit so that + * If we tried hard to get a folio but failed, set bit so that * subsequent attempts will not try as hard until there is an * overall state change. */ - if (node_alloc_noretry && !page && alloc_try_hard) + if (node_alloc_noretry && !folio && alloc_try_hard) node_set(nid, *node_alloc_noretry); - if (!page) { + if (!folio) { __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); return NULL; } __count_vm_event(HTLB_BUDDY_PGALLOC); - return page_folio(page); + return folio; } static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
On 2024/3/22 21:00, Matthew Wilcox wrote: > On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote: >>> +++ b/mm/hugetlb.c >>> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, >>> destroy_compound_gigantic_folio(folio, huge_page_order(h)); >>> free_gigantic_folio(folio, huge_page_order(h)); >>> } else { >>> - __free_pages(&folio->page, huge_page_order(h)); >>> + INIT_LIST_HEAD(&folio->_deferred_list); >> >> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ? Sorry for late, I was on off-the-job training last week. It's really tired. :( > > Maybe? Something like > /* We reused this space for our own purposes */ This one looks good to me. > >>> + folio_put(folio); >> >> Can all __free_pages be replaced with folio_put in mm/hugetlb.c? > > There's only one left, and indeed it can! > > I'll drop this into my tree and send it as a proper patch later. > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 333f6278ef63..43cc7e6bc374 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, > nodemask_t *node_alloc_noretry) > { > int order = huge_page_order(h); > - struct page *page; > + struct folio *folio; > bool alloc_try_hard = true; > bool retry = true; > > /* > - * By default we always try hard to allocate the page with > - * __GFP_RETRY_MAYFAIL flag. However, if we are allocating pages in > + * By default we always try hard to allocate the folio with > + * __GFP_RETRY_MAYFAIL flag. However, if we are allocating folios in > * a loop (to adjust global huge page counts) and previous allocation > * failed, do not continue to try hard on the same node. Use the > * node_alloc_noretry bitmap to manage this state information. > @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h, > if (nid == NUMA_NO_NODE) > nid = numa_mem_id(); > retry: > - page = __alloc_pages(gfp_mask, order, nid, nmask); > + folio = __folio_alloc(gfp_mask, order, nid, nmask); > > - /* Freeze head page */ > - if (page && !page_ref_freeze(page, 1)) { > - __free_pages(page, order); > + if (folio && !folio_ref_freeze(folio, 1)) { > + folio_put(folio); > if (retry) { /* retry once */ > retry = false; > goto retry; > } > /* WOW! twice in a row. */ > - pr_warn("HugeTLB head page unexpected inflated ref count\n"); > - page = NULL; > + pr_warn("HugeTLB unexpected inflated folio ref count\n"); > + folio = NULL; > } > > /* > - * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this > - * indicates an overall state change. Clear bit so that we resume > - * normal 'try hard' allocations. > + * If we did not specify __GFP_RETRY_MAYFAIL, but still got a > + * folio this indicates an overall state change. Clear bit so > + * that we resume normal 'try hard' allocations. > */ > - if (node_alloc_noretry && page && !alloc_try_hard) > + if (node_alloc_noretry && folio && !alloc_try_hard) > node_clear(nid, *node_alloc_noretry); > > /* > - * If we tried hard to get a page but failed, set bit so that > + * If we tried hard to get a folio but failed, set bit so that > * subsequent attempts will not try as hard until there is an > * overall state change. > */ > - if (node_alloc_noretry && !page && alloc_try_hard) > + if (node_alloc_noretry && !folio && alloc_try_hard) > node_set(nid, *node_alloc_noretry); > > - if (!page) { > + if (!folio) { > __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); > return NULL; > } > > __count_vm_event(HTLB_BUDDY_PGALLOC); > - return page_folio(page); > + return folio; > } > > static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h, > . This also looks good to me. Thanks for your work.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9859aa4f7553..04fb994a7b0b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio) { if (!folio || !folio_test_large(folio)) return; - if (folio_order(folio) > 1) - INIT_LIST_HEAD(&folio->_deferred_list); folio_set_large_rmappable(folio); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 23ef240ba48a..7e9a766059aa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, destroy_compound_gigantic_folio(folio, huge_page_order(h)); free_gigantic_folio(folio, huge_page_order(h)); } else { - __free_pages(&folio->page, huge_page_order(h)); + INIT_LIST_HEAD(&folio->_deferred_list); + folio_put(folio); } } diff --git a/mm/internal.h b/mm/internal.h index 7e486f2c502c..10895ec52546 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -525,6 +525,8 @@ static inline void prep_compound_head(struct page *page, unsigned int order) atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); + if (order > 1) + INIT_LIST_HEAD(&folio->_deferred_list); } static inline void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fabce2b50c69..a2a74d4ca0b1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7448,6 +7448,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); + VM_BUG_ON_FOLIO(folio_order(folio) > 1 && + !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 14d39f34d336..4301146a5bf4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1006,10 +1006,11 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) } break; case 2: - /* - * the second tail page: ->mapping is - * deferred_list.next -- ignore value. - */ + /* the second tail page: deferred_list overlaps ->mapping */ + if (unlikely(!list_empty(&folio->_deferred_list))) { + bad_page(page, "on deferred list"); + goto out; + } break; default: if (page->mapping != TAIL_MAPPING) {
For compound pages which are at least order-2 (and hence have a deferred_list), initialise it and then we can check at free that the page is not part of a deferred list. We recently found this useful to rule out a source of corruption. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/huge_memory.c | 2 -- mm/hugetlb.c | 3 ++- mm/internal.h | 2 ++ mm/memcontrol.c | 2 ++ mm/page_alloc.c | 9 +++++---- 5 files changed, 11 insertions(+), 7 deletions(-)