Message ID | 20240314012506.1600378-2-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Reliable testing for hugetlb | expand |
On 14.03.24 02:25, Matthew Wilcox (Oracle) wrote: > The current folio_test_hugetlb() can be fooled by a concurrent folio split > into returning true for a folio which has never belonged to hugetlbfs. > This can't happen if the caller holds a refcount on it, but we have a > few places (memory-failure, compaction, procfs) which do not and should > not take a speculative reference. > > Since hugetlb pages do not use individual page mapcounts (they are always > fully mapped and use the entire_mapcount field to record the number of > mappings), the PageType field is available for use with only a minor > modification to page_mapcount() to ignore the value in this field for > hugetlb pages. Other parts of the kernel which examine page->_mapcount > directly already know about page_has_type() (eg procfs, dump_page()). > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- Should we include this hunk: diff --git a/mm/debug.c b/mm/debug.c index c1c1a6a484e4c..f521e17150a10 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -63,7 +63,8 @@ static void __dump_folio(struct folio *folio, struct page *page, * encode own info, and we must avoid calling page_folio() again. */ if (!folio_test_slab(folio)) { - mapcount = atomic_read(&page->_mapcount) + 1; + if (!folio_test_hugetlb(folio)) + mapcount = atomic_read(&page->_mapcount) + 1; if (folio_test_large(folio)) mapcount += folio_entire_mapcount(folio); } Apart from that, LGTM Reviewed-by: David Hildenbrand <david@redhat.com>
On Thu, Mar 14, 2024 at 11:26:17AM +0100, David Hildenbrand wrote: > Should we include this hunk: > > diff --git a/mm/debug.c b/mm/debug.c > index c1c1a6a484e4c..f521e17150a10 100644 > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -63,7 +63,8 @@ static void __dump_folio(struct folio *folio, struct page *page, > * encode own info, and we must avoid calling page_folio() again. > */ > if (!folio_test_slab(folio)) { > - mapcount = atomic_read(&page->_mapcount) + 1; > + if (!folio_test_hugetlb(folio)) > + mapcount = atomic_read(&page->_mapcount) + 1; > if (folio_test_large(folio)) > mapcount += folio_entire_mapcount(folio); > } > > Apart from that, LGTM I actually want to do a bit more here ... +++ b/mm/debug.c @@ -58,15 +58,10 @@ static void __dump_folio(struct folio *folio, struct page *page, int mapcount = 0; char *type = ""; - /* - * page->_mapcount space in struct page is used by slab pages to - * encode own info, and we must avoid calling page_folio() again. - */ - if (!folio_test_slab(folio)) { + if (!page_has_type(page)) { mapcount = atomic_read(&page->_mapcount) + 1; - if (folio_test_large(folio)) - mapcount += folio_entire_mapcount(folio); - } + if (folio_test_large(folio)) + mapcount += folio_entire_mapcount(folio); pr_warn("page: refcount:%d mapcount:%d mapping:%p index:%#lx pfn:%#lx\n", folio_ref_count(folio), mapcount, mapping, @@ -99,7 +94,8 @@ static void __dump_folio(struct folio *folio, struct page *page, */ pr_warn("%sflags: %pGp%s\n", type, &folio->flags, is_migrate_cma_folio(folio, pfn) ? " CMA" : ""); - pr_warn("page_type: %pGt\n", &folio->page.page_type); + if (page_has_type(page)) + pr_warn("type: %pGt\n", &folio->page.page_type); print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, sizeof(unsigned long), page, and I think that deserves its own patch. Also, I need to do: +++ b/include/trace/events/mmflags.h @@ -135,6 +135,7 @@ IF_HAVE_PG_ARCH_X(arch_3) #define DEF_PAGETYPE_NAME(_name) { PG_##_name, __stringify(_name) } #define __def_pagetype_names \ + DEF_PAGETYPE_NAME(hugetlb) \ DEF_PAGETYPE_NAME(offline), \ DEF_PAGETYPE_NAME(guard), \ DEF_PAGETYPE_NAME(table), \ (and I think having this in the include/trace directory is a mistake; putting it in page-flags.h will lead to it not being forgotten)
On 2024/3/14 22:33, Matthew Wilcox wrote: > On Thu, Mar 14, 2024 at 11:26:17AM +0100, David Hildenbrand wrote: >> Should we include this hunk: >> >> diff --git a/mm/debug.c b/mm/debug.c >> index c1c1a6a484e4c..f521e17150a10 100644 >> --- a/mm/debug.c >> +++ b/mm/debug.c >> @@ -63,7 +63,8 @@ static void __dump_folio(struct folio *folio, struct page *page, >> * encode own info, and we must avoid calling page_folio() again. >> */ >> if (!folio_test_slab(folio)) { >> - mapcount = atomic_read(&page->_mapcount) + 1; >> + if (!folio_test_hugetlb(folio)) >> + mapcount = atomic_read(&page->_mapcount) + 1; >> if (folio_test_large(folio)) >> mapcount += folio_entire_mapcount(folio); >> } >> >> Apart from that, LGTM > > I actually want to do a bit more here ... > > +++ b/mm/debug.c > @@ -58,15 +58,10 @@ static void __dump_folio(struct folio *folio, struct page *page, > int mapcount = 0; > char *type = ""; > > - /* > - * page->_mapcount space in struct page is used by slab pages to > - * encode own info, and we must avoid calling page_folio() again. > - */ > - if (!folio_test_slab(folio)) { > + if (!page_has_type(page)) { > mapcount = atomic_read(&page->_mapcount) + 1; > - if (folio_test_large(folio)) > - mapcount += folio_entire_mapcount(folio); > - } > + if (folio_test_large(folio)) > + mapcount += folio_entire_mapcount(folio); Why folio_test_large is not within the "if (!page_has_type(page))" block? I think a slab page could also be large folio. Or am I miss something? Thanks.
On Fri, Mar 15, 2024 at 02:19:10PM +0800, Miaohe Lin wrote: > On 2024/3/14 22:33, Matthew Wilcox wrote: > > I actually want to do a bit more here ... > > > > +++ b/mm/debug.c > > @@ -58,15 +58,10 @@ static void __dump_folio(struct folio *folio, struct page *page, > > int mapcount = 0; > > char *type = ""; > > > > - /* > > - * page->_mapcount space in struct page is used by slab pages to > > - * encode own info, and we must avoid calling page_folio() again. > > - */ > > - if (!folio_test_slab(folio)) { > > + if (!page_has_type(page)) { > > mapcount = atomic_read(&page->_mapcount) + 1; > > - if (folio_test_large(folio)) > > - mapcount += folio_entire_mapcount(folio); > > - } > > + if (folio_test_large(folio)) > > + mapcount += folio_entire_mapcount(folio); > > Why folio_test_large is not within the "if (!page_has_type(page))" block? I think a slab page could also be large folio. > Or am I miss something? Slab pages don't use the first tail page so folio_entire_mapcount will be 0 for them. And we don't want the folio_test_large() to be gated by page_has_type() because hugetlb pages (after this patch) will be in the page_has_type() category. So for hugetlb pages, we want to not read page->mapcount (leaving mapcount at zero), but we do want to read the entire_mapcount.
On 2024/3/15 20:32, Matthew Wilcox wrote: > On Fri, Mar 15, 2024 at 02:19:10PM +0800, Miaohe Lin wrote: >> On 2024/3/14 22:33, Matthew Wilcox wrote: >>> I actually want to do a bit more here ... >>> >>> +++ b/mm/debug.c >>> @@ -58,15 +58,10 @@ static void __dump_folio(struct folio *folio, struct page *page, >>> int mapcount = 0; >>> char *type = ""; >>> >>> - /* >>> - * page->_mapcount space in struct page is used by slab pages to >>> - * encode own info, and we must avoid calling page_folio() again. >>> - */ >>> - if (!folio_test_slab(folio)) { >>> + if (!page_has_type(page)) { >>> mapcount = atomic_read(&page->_mapcount) + 1; >>> - if (folio_test_large(folio)) >>> - mapcount += folio_entire_mapcount(folio); >>> - } >>> + if (folio_test_large(folio)) >>> + mapcount += folio_entire_mapcount(folio); >> >> Why folio_test_large is not within the "if (!page_has_type(page))" block? I think a slab page could also be large folio. >> Or am I miss something? > > Slab pages don't use the first tail page so folio_entire_mapcount will > be 0 for them. And we don't want the folio_test_large() to be gated by > page_has_type() because hugetlb pages (after this patch) will be in the > page_has_type() category. So for hugetlb pages, we want to not read > page->mapcount (leaving mapcount at zero), but we do want to read the > entire_mapcount. I see. Many thanks for your explanation. This might worth a comment to avoid future confusion. Thanks.
diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..32281097f07e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1212,6 +1212,9 @@ static inline int page_mapcount(struct page *page) { int mapcount = atomic_read(&page->_mapcount) + 1; + /* Head page repurposes mapcount; tail page leaves at -1 */ + if (PageHeadHuge(page)) + mapcount = 0; if (unlikely(PageCompound(page))) mapcount += folio_entire_mapcount(page_folio(page)); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 735cddc13d20..6ebb573c7195 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -190,7 +190,6 @@ enum pageflags { /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned = PG_error, - PG_hugetlb = PG_active, PG_large_rmappable = PG_workingset, /* anon or file-backed */ }; @@ -829,29 +828,6 @@ TESTPAGEFLAG_FALSE(LargeRmappable, large_rmappable) #define PG_head_mask ((1UL << PG_head)) -#ifdef CONFIG_HUGETLB_PAGE -int PageHuge(struct page *page); -SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND) -CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND) - -/** - * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs - * @folio: The folio to test. - * - * Context: Any context. Caller should have a reference on the folio to - * prevent it from being turned into a tail page. - * Return: True for hugetlbfs folios, false for anon folios or folios - * belonging to other filesystems. - */ -static inline bool folio_test_hugetlb(struct folio *folio) -{ - return folio_test_large(folio) && - test_bit(PG_hugetlb, folio_flags(folio, 1)); -} -#else -TESTPAGEFLAG_FALSE(Huge, hugetlb) -#endif - #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* * PageHuge() only returns true for hugetlbfs pages, but not for @@ -907,18 +883,6 @@ PAGEFLAG_FALSE(HasHWPoisoned, has_hwpoisoned) TESTSCFLAG_FALSE(HasHWPoisoned, has_hwpoisoned) #endif -/* - * Check if a page is currently marked HWPoisoned. Note that this check is - * best effort only and inherently racy: there is no way to synchronize with - * failing hardware. - */ -static inline bool is_page_hwpoison(struct page *page) -{ - if (PageHWPoison(page)) - return true; - return PageHuge(page) && PageHWPoison(compound_head(page)); -} - /* * For pages that are never mapped to userspace (and aren't PageSlab), * page_type may be used. Because it is initialised to -1, we invert the @@ -935,6 +899,7 @@ static inline bool is_page_hwpoison(struct page *page) #define PG_offline 0x00000100 #define PG_table 0x00000200 #define PG_guard 0x00000400 +#define PG_hugetlb 0x00000800 #define PageType(page, flag) \ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) @@ -1026,6 +991,37 @@ PAGE_TYPE_OPS(Table, table, pgtable) */ PAGE_TYPE_OPS(Guard, guard, guard) +#ifdef CONFIG_HUGETLB_PAGE +PAGE_TYPE_OPS(HeadHuge, hugetlb, hugetlb) +#else +TESTPAGEFLAG_FALSE(HeadHuge, hugetlb) +#endif + +/** + * PageHuge - Determine if the page belongs to hugetlbfs + * @page: The page to test. + * + * Context: Any context. + * Return: True for hugetlbfs pages, false for anon pages or pages + * belonging to other filesystems. + */ +static inline bool PageHuge(struct page *page) +{ + return folio_test_hugetlb(page_folio(page)); +} + +/* + * Check if a page is currently marked HWPoisoned. Note that this check is + * best effort only and inherently racy: there is no way to synchronize with + * failing hardware. + */ +static inline bool is_page_hwpoison(struct page *page) +{ + if (PageHWPoison(page)) + return true; + return PageHuge(page) && PageHWPoison(compound_head(page)); +} + extern bool is_free_buddy_page(struct page *page); PAGEFLAG(Isolated, isolated, PF_ANY); @@ -1092,7 +1088,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page) */ #define PAGE_FLAGS_SECOND \ (0xffUL /* order */ | 1UL << PG_has_hwpoisoned | \ - 1UL << PG_hugetlb | 1UL << PG_large_rmappable) + 1UL << PG_large_rmappable) #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ed1581b670d4..23b62df21971 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1623,7 +1623,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h, { lockdep_assert_held(&hugetlb_lock); - folio_clear_hugetlb(folio); + __folio_clear_hugetlb(folio); } /* @@ -1710,7 +1710,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio, h->surplus_huge_pages_node[nid]++; } - folio_set_hugetlb(folio); + __folio_set_hugetlb(folio); folio_change_private(folio, NULL); /* * We have to set hugetlb_vmemmap_optimized again as above @@ -2048,7 +2048,7 @@ static void __prep_account_new_huge_page(struct hstate *h, int nid) static void init_new_hugetlb_folio(struct hstate *h, struct folio *folio) { - folio_set_hugetlb(folio); + __folio_set_hugetlb(folio); INIT_LIST_HEAD(&folio->lru); hugetlb_set_folio_subpool(folio, NULL); set_hugetlb_cgroup(folio, NULL); @@ -2158,22 +2158,6 @@ static bool prep_compound_gigantic_folio_for_demote(struct folio *folio, return __prep_compound_gigantic_folio(folio, order, true); } -/* - * PageHuge() only returns true for hugetlbfs pages, but not for normal or - * transparent huge pages. See the PageTransHuge() documentation for more - * details. - */ -int PageHuge(struct page *page) -{ - struct folio *folio; - - if (!PageCompound(page)) - return 0; - folio = page_folio(page); - return folio_test_hugetlb(folio); -} -EXPORT_SYMBOL_GPL(PageHuge); - /* * Find and lock address space (mapping) in write mode. *
The current folio_test_hugetlb() can be fooled by a concurrent folio split into returning true for a folio which has never belonged to hugetlbfs. This can't happen if the caller holds a refcount on it, but we have a few places (memory-failure, compaction, procfs) which do not and should not take a speculative reference. Since hugetlb pages do not use individual page mapcounts (they are always fully mapped and use the entire_mapcount field to record the number of mappings), the PageType field is available for use with only a minor modification to page_mapcount() to ignore the value in this field for hugetlb pages. Other parts of the kernel which examine page->_mapcount directly already know about page_has_type() (eg procfs, dump_page()). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- include/linux/mm.h | 3 ++ include/linux/page-flags.h | 70 ++++++++++++++++++-------------------- mm/hugetlb.c | 22 ++---------- 3 files changed, 39 insertions(+), 56 deletions(-)