Message ID | 20200629151918.15537-2-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improvements for dump_page() | expand |
On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote: > Tail page flags contain very little useful information. Print the head > page's flags instead (even though PageHead is a little misleading). You are right about the tail page. And the raw output provides the tail page flags, in case someone *really* needs to dig into tail page problems, so that's all good. However, I just gave this a spin, and seeing the "|head" in the list for my "tail page: dump_page test" is also slightly misleading for me, too. Sure, one can eventually work through it and conclude "my dump_page output is misleading", but I think a small tweak would avoid that: > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > mm/debug.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/debug.c b/mm/debug.c > index 4f376514744d..44d843b3b07c 100644 > --- a/mm/debug.c > +++ b/mm/debug.c > @@ -163,7 +163,7 @@ void __dump_page(struct page *page, const char *reason) > out_mapping: > BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > > - pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags, > + pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, > page_cma ? " CMA" : ""); Perhaps just adding "head " to the above will clear it all up: diff --git a/mm/debug.c b/mm/debug.c index fcf3a16902b2..b7ac8af71940 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -165,7 +165,7 @@ void __dump_page(struct page *page, const char *reason) out_mapping: BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, + pr_warn("head %sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, page_cma ? " CMA" : ""); hex_only: Sample output: [ 62.064271] page:000000001c8f0c3e refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 head:00000000ba9ccef2 order:9 compound_mapcount:1 compound_pincount:512 [ 62.078432] head anon flags: 0x17ffe000001000e(referenced|uptodate|dirty|head) [ 62.088454] raw: 017ffe0000000000 ffffea00209e8001 ffffea00209e8448 dead000000000400 [ 62.095356] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 62.102289] head: 017ffe000001000e ffffffff83649ca0 ffffea00209e0008 ffff88889663c001 [ 62.109286] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 [ 62.116301] page dumped because: gup_benchmark: tail page: dump_page test thanks,
On Mon, Jun 29, 2020 at 03:38:13PM -0700, John Hubbard wrote: > On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote: > > Tail page flags contain very little useful information. Print the head > > page's flags instead (even though PageHead is a little misleading). > > You are right about the tail page. And the raw output provides the tail > page flags, in case someone *really* needs to dig into tail page problems, > so that's all good. > > However, I just gave this a spin, and seeing the "|head" in the list for > my "tail page: dump_page test" is also slightly misleading for me, too. We could also do ... @@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason) struct address_space *mapping; bool page_poisoned = PagePoisoned(page); bool compound = PageCompound(page); + unsigned long flags = page->flags; + /* * Accessing the pageblock without the zone lock. It could change to * "isolate" again in the meantime, but since we are just dumping the @@ -165,7 +162,9 @@ void __dump_page(struct page *page, const char *reason) out_mapping: BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, + if (head != page) + flags = head->flags & ~PG_head; + pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags, page_cma ? " CMA" : ""); hex_only:
On 2020-06-29 15:51, Matthew Wilcox wrote: > On Mon, Jun 29, 2020 at 03:38:13PM -0700, John Hubbard wrote: >> On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote: >>> Tail page flags contain very little useful information. Print the head >>> page's flags instead (even though PageHead is a little misleading). >> >> You are right about the tail page. And the raw output provides the tail >> page flags, in case someone *really* needs to dig into tail page problems, >> so that's all good. >> >> However, I just gave this a spin, and seeing the "|head" in the list for >> my "tail page: dump_page test" is also slightly misleading for me, too. > > We could also do ... > > @@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason) > struct address_space *mapping; > bool page_poisoned = PagePoisoned(page); > bool compound = PageCompound(page); > + unsigned long flags = page->flags; > + > /* > * Accessing the pageblock without the zone lock. It could change to > * "isolate" again in the meantime, but since we are just dumping the > @@ -165,7 +162,9 @@ void __dump_page(struct page *page, const char *reason) > out_mapping: > BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > > - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, > + if (head != page) > + flags = head->flags & ~PG_head; > + pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags, > page_cma ? " CMA" : ""); > OK, I'll give this a spin, along with your line break approach... thanks,
On 2020-06-29 15:51, Matthew Wilcox wrote: > On Mon, Jun 29, 2020 at 03:38:13PM -0700, John Hubbard wrote: >> On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote: >>> Tail page flags contain very little useful information. Print the head >>> page's flags instead (even though PageHead is a little misleading). >> >> You are right about the tail page. And the raw output provides the tail >> page flags, in case someone *really* needs to dig into tail page problems, >> so that's all good. >> >> However, I just gave this a spin, and seeing the "|head" in the list for >> my "tail page: dump_page test" is also slightly misleading for me, too. > > We could also do ... > > @@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason) > struct address_space *mapping; > bool page_poisoned = PagePoisoned(page); > bool compound = PageCompound(page); > + unsigned long flags = page->flags; > + > /* > * Accessing the pageblock without the zone lock. It could change to > * "isolate" again in the meantime, but since we are just dumping the > @@ -165,7 +162,9 @@ void __dump_page(struct page *page, const char *reason) > out_mapping: > BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > > - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, > + if (head != page) > + flags = head->flags & ~PG_head; The above should be: flags = head->flags & ~(1UL << PG_head); > + pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags, > page_cma ? " CMA" : ""); > > hex_only: > ...so with that fix, along with your line break approach in the other thread, a tail page dump of a FOLL_PIN page looks like this: [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 [ 38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty) [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test So, good. However, I feel that the "head " prefix approach is slightly preferable, because it's doing less processing (the more code one adds to little-exercised debug paths, the more likely the debugging has bugs) and is instead just printing out what it sees directly. And it seems a little odd to remove the PG_head bit from the output. The "head " prefix approach looks like this: [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 [ 38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head) [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test thanks,
On 6/30/20 1:35 AM, John Hubbard wrote: >> + pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags, >> page_cma ? " CMA" : ""); >> >> hex_only: >> > > ...so with that fix, along with your line break approach in the other thread, > a tail page dump of a FOLL_PIN page looks like this: > > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > [ 38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty) > [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 > [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 > [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 > [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test > > So, good. However, I feel that the "head " prefix approach is slightly > preferable, because it's doing less processing (the more code one > adds to little-exercised debug paths, the more likely the debugging has > bugs) and is instead just printing out what it sees directly. And it seems a little > odd to remove the PG_head bit from the output. > > The "head " prefix approach looks like this: I would also prefer this approach.It would be also nice to know the tail index. As long as page pointer wasn't hashed, it was possible to figure this out, but now it's not. Maybe print pfn of both page and head? > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > [ 38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head) > [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 > [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 > [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 > [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test > > > > > thanks, >
On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote: > I would also prefer this approach.It would be also nice to know the tail index. > As long as page pointer wasn't hashed, it was possible to figure this out, but > now it's not. Maybe print pfn of both page and head? > On 6/30/20 1:35 AM, John Hubbard wrote: > > ...so with that fix, along with your line break approach in the other thread, > > a tail page dump of a FOLL_PIN page looks like this: > > > > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 index is the last thing printed on this line. If it's something like index:0x12345678, you can reduce it mod 1<<order, as printed on the next line. > > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > > [ 38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty) > > [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 > > [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > > [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 > > [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 > > [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test > > > > So, good. However, I feel that the "head " prefix approach is slightly > > preferable, because it's doing less processing (the more code one > > adds to little-exercised debug paths, the more likely the debugging has > > bugs) and is instead just printing out what it sees directly. And it seems a little > > odd to remove the PG_head bit from the output. > > > > The "head " prefix approach looks like this: > > I would also prefer this approach.It would be also nice to know the tail index. > As long as page pointer wasn't hashed, it was possible to figure this out, but > now it's not. Maybe print pfn of both page and head? > > > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > > [ 38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head) How about ... [ 38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound) That is, change pageflag_names[] to print 'head' as 'compound' and move the 'anon' or 'ksm' to look like a pageflag. Also CMA. Like this: +++ b/include/trace/events/mmflags.h @@ -96,7 +96,7 @@ {1UL << PG_private, "private" }, \ {1UL << PG_private_2, "private_2" }, \ {1UL << PG_writeback, "writeback" }, \ - {1UL << PG_head, "head" }, \ + {1UL << PG_head, "compound" }, \ {1UL << PG_mappedtodisk, "mappedtodisk" }, \ {1UL << PG_reclaim, "reclaim" }, \ {1UL << PG_swapbacked, "swapbacked" }, \ +++ b/mm/debug.c @@ -54,7 +54,7 @@ void __dump_page(struct page *page, const char *reason) * state for debugging, it should be fine to accept a bit of * inaccuracy here due to racing. */ - bool page_cma = is_migrate_cma_page(page); + char *cma = is_migrate_cma_page(page) ? "|cma" : ""; int mapcount; char *type = ""; @@ -71,6 +71,10 @@ void __dump_page(struct page *page, const char *reason) if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) { /* Corrupt page, cannot call page_mapping */ mapping = page->mapping; + if ((unsigned long)mapping & PAGE_MAPPING_ANON) + mapping = NULL; + mapping = (struct address_space *) + ((unsigned long)mapping & ~PAGE_MAPPING_FLAGS); head = page; compound = false; } else { [...] if (PageKsm(page)) - type = "ksm "; + type = "ksm|"; else if (PageAnon(page)) - type = "anon "; - else if (mapping) { + type = "anon|"; + + BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); + pr_warn("flags: %#lx(%s%pGp%s)\n", page->flags, type, &page->flags, + cma); + + if (mapping) { const struct inode *host; const struct address_space_operations *a_ops; @@ -163,11 +167,6 @@ void __dump_page(struct page *page, const char *reason) } } out_mapping: - BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); - - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, - page_cma ? " CMA" : ""); - hex_only: print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, sizeof(unsigned long), page, Can also delete the 'out_mapping' label this way. (I really like it that we're debating this ... it feels like we've been in a bit of a push-pull with the debug patches over the years)
On 2020-06-30 04:59, Matthew Wilcox wrote: ... > How about ... > [ 38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound) > > That is, change pageflag_names[] to print 'head' as 'compound' and move the > 'anon' or 'ksm' to look like a pageflag. Also CMA. Like this: > > +++ b/include/trace/events/mmflags.h > @@ -96,7 +96,7 @@ > {1UL << PG_private, "private" }, \ > {1UL << PG_private_2, "private_2" }, \ > {1UL << PG_writeback, "writeback" }, \ > - {1UL << PG_head, "head" }, \ > + {1UL << PG_head, "compound" }, \ > {1UL << PG_mappedtodisk, "mappedtodisk" }, \ > {1UL << PG_reclaim, "reclaim" }, \ > {1UL << PG_swapbacked, "swapbacked" }, \ > +++ b/mm/debug.c > @@ -54,7 +54,7 @@ void __dump_page(struct page *page, const char *reason) > * state for debugging, it should be fine to accept a bit of > * inaccuracy here due to racing. > */ > - bool page_cma = is_migrate_cma_page(page); > + char *cma = is_migrate_cma_page(page) ? "|cma" : ""; > int mapcount; > char *type = ""; > > @@ -71,6 +71,10 @@ void __dump_page(struct page *page, const char *reason) > if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) { > /* Corrupt page, cannot call page_mapping */ > mapping = page->mapping; > + if ((unsigned long)mapping & PAGE_MAPPING_ANON) > + mapping = NULL; > + mapping = (struct address_space *) > + ((unsigned long)mapping & ~PAGE_MAPPING_FLAGS); > head = page; > compound = false; > } else { > [...] > if (PageKsm(page)) > - type = "ksm "; > + type = "ksm|"; > else if (PageAnon(page)) > - type = "anon "; > - else if (mapping) { > + type = "anon|"; > + > + BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > + pr_warn("flags: %#lx(%s%pGp%s)\n", page->flags, type, &page->flags, > + cma); > + > + if (mapping) { > const struct inode *host; > const struct address_space_operations *a_ops; > @@ -163,11 +167,6 @@ void __dump_page(struct page *page, const char *reason) > } > } > out_mapping: > - BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > - > - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, > - page_cma ? " CMA" : ""); > - > hex_only: > print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, > sizeof(unsigned long), page, > > Can also delete the 'out_mapping' label this way. OK, so after applying that on top of your original series, and your line-break approach, here's what the output looks like for regular and THP, head and tail pages (all involving FOLL_PIN): THP FOLL_PIN page: head page: [ 42.360473] page:0000000025f35fdb refcount:513 mapcount:1 mapping:0000000000000000 index:0x0 [ 42.368012] head:0000000025f35fdb order:9 compound_mapcount:1 compound_pincount:512 [ 42.374761] flags: 0x17ffe000001000e(anon|referenced|uptodate|dirty|compound) [ 42.380994] raw: 017ffe000001000e ffffffff83649ca0 ffffea0020c80008 ffff888898091901 [ 42.387822] raw: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 [ 42.394680] page dumped because: gup_benchmark: head page: dump_page test THP FOLL_PIN page: tail page: [ 42.408222] page:00000000803d233b refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 [ 42.415850] head:0000000025f35fdb order:9 compound_mapcount:1 compound_pincount:512 [ 42.422607] flags: 0x17ffe0000000000(anon|) [ 42.425772] raw: 017ffe0000000000 ffffea0020c98001 ffffea0020c98448 dead000000000400 [ 42.432636] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 42.439490] head: 017ffe000001000e ffffffff83649ca0 ffffea0020c80008 ffff888898091901 [ 42.446431] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 [ 42.453355] page dumped because: gup_benchmark: tail page: dump_page test Non-THP FOLL_PIN page: head page: [ 41.513677] page:00000000190e28ba refcount:1025 mapcount:1 mapping:0000000000000000 index:0x0 [ 41.521331] flags: 0x17ffe0000080034(anon|uptodate|lru|active|swapbacked) [ 41.527189] raw: 017ffe0000080034 ffffea002228db08 ffffea0022215108 ffff888898090191 [ 41.534020] raw: 0000000000000000 0000000000000000 0000040100000000 0000000000000000 [ 41.540863] page dumped because: gup_benchmark: head page: dump_page test Non-THP FOLL_PIN page: tail page: [ 41.554472] page:00000000696a8210 refcount:1025 mapcount:1 mapping:0000000000000000 index:0x11 [ 41.562230] flags: 0x17ffe0000080034(anon|uptodate|lru|active|swapbacked) [ 41.568073] raw: 017ffe0000080034 ffffea0022195688 ffffea0021a4e608 ffff888898090191 [ 41.574940] raw: 0000000000000011 0000000000000000 0000040100000000 0000000000000000 [ 41.581768] page dumped because: gup_benchmark: tail page: dump_page test > > (I really like it that we're debating this ... it feels like we've been > in a bit of a push-pull with the debug patches over the years) > Yes, it's good that we're pausing a moment to polish this up. Lots of goodness here and I like the above output. thanks,
On 6/30/20 1:59 PM, Matthew Wilcox wrote: > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote: >> I would also prefer this approach.It would be also nice to know the tail index. >> As long as page pointer wasn't hashed, it was possible to figure this out, but >> now it's not. Maybe print pfn of both page and head? > >> On 6/30/20 1:35 AM, John Hubbard wrote: >> > ...so with that fix, along with your line break approach in the other thread, >> > a tail page dump of a FOLL_PIN page looks like this: >> > >> > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > > index is the last thing printed on this line. If it's something like > index:0x12345678, you can reduce it mod 1<<order, as printed on the next > line. Hmm, I guess that works as long as head pages really have index aligned to 1 << order ... they should, right? But does it fail for HugeTLB? CC Kirill (git blamed) and Mike. page_to_pgoff() has for PageHeadHuge this: return page->index << compound_order(page) but page_to_index() does simply pgoff = compound_head(page)->index; pgoff += page - compound_head(page); Shouldn't it also do a <<compound_order(head) for HugeTLB? >> > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > >> > [ 38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty) >> > [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 >> > [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 >> > [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 >> > [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 >> > [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test >> > >> > So, good. However, I feel that the "head " prefix approach is slightly >> > preferable, because it's doing less processing (the more code one >> > adds to little-exercised debug paths, the more likely the debugging has >> > bugs) and is instead just printing out what it sees directly. And it seems a little >> > odd to remove the PG_head bit from the output. >> > >> > The "head " prefix approach looks like this: >> >> I would also prefer this approach.It would be also nice to know the tail index. >> As long as page pointer wasn't hashed, it was possible to figure this out, but >> now it's not. Maybe print pfn of both page and head? >> >> > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 >> > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 >> > [ 38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head) > > How about ... > [ 38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound) > > That is, change pageflag_names[] to print 'head' as 'compound' and move the > 'anon' or 'ksm' to look like a pageflag. Also CMA. Like this: > > +++ b/include/trace/events/mmflags.h > @@ -96,7 +96,7 @@ > {1UL << PG_private, "private" }, \ > {1UL << PG_private_2, "private_2" }, \ > {1UL << PG_writeback, "writeback" }, \ > - {1UL << PG_head, "head" }, \ > + {1UL << PG_head, "compound" }, \ Dunno about this, "compound" used to always mean "head or tail" AFAIK. > {1UL << PG_mappedtodisk, "mappedtodisk" }, \ > {1UL << PG_reclaim, "reclaim" }, \ > {1UL << PG_swapbacked, "swapbacked" }, \ > +++ b/mm/debug.c > @@ -54,7 +54,7 @@ void __dump_page(struct page *page, const char *reason) > * state for debugging, it should be fine to accept a bit of > * inaccuracy here due to racing. > */ > - bool page_cma = is_migrate_cma_page(page); > + char *cma = is_migrate_cma_page(page) ? "|cma" : ""; > int mapcount; > char *type = ""; > > @@ -71,6 +71,10 @@ void __dump_page(struct page *page, const char *reason) > if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) { > /* Corrupt page, cannot call page_mapping */ > mapping = page->mapping; > + if ((unsigned long)mapping & PAGE_MAPPING_ANON) > + mapping = NULL; > + mapping = (struct address_space *) > + ((unsigned long)mapping & ~PAGE_MAPPING_FLAGS); > head = page; > compound = false; > } else { > [...] > if (PageKsm(page)) > - type = "ksm "; > + type = "ksm|"; > else if (PageAnon(page)) > - type = "anon "; > - else if (mapping) { > + type = "anon|"; > + > + BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > + pr_warn("flags: %#lx(%s%pGp%s)\n", page->flags, type, &page->flags, > + cma); > + > + if (mapping) { > const struct inode *host; > const struct address_space_operations *a_ops; > @@ -163,11 +167,6 @@ void __dump_page(struct page *page, const char *reason) > } > } > out_mapping: > - BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); > - > - pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, > - page_cma ? " CMA" : ""); > - > hex_only: > print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32, > sizeof(unsigned long), page, > > Can also delete the 'out_mapping' label this way. > > (I really like it that we're debating this ... it feels like we've been > in a bit of a push-pull with the debug patches over the years) >
On Thu, Jul 02, 2020 at 04:59:14PM +0200, Vlastimil Babka wrote: > On 6/30/20 1:59 PM, Matthew Wilcox wrote: > > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote: > >> I would also prefer this approach.It would be also nice to know the tail index. > >> As long as page pointer wasn't hashed, it was possible to figure this out, but > >> now it's not. Maybe print pfn of both page and head? > > > >> On 6/30/20 1:35 AM, John Hubbard wrote: > >> > ...so with that fix, along with your line break approach in the other thread, > >> > a tail page dump of a FOLL_PIN page looks like this: > >> > > >> > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > > > > index is the last thing printed on this line. If it's something like > > index:0x12345678, you can reduce it mod 1<<order, as printed on the next > > line. > > Hmm, I guess that works as long as head pages really have index aligned to 1 << > order ... they should, right? > > But does it fail for HugeTLB? CC Kirill (git blamed) and Mike. > page_to_pgoff() has for PageHeadHuge this: > return page->index << compound_order(page) > > but page_to_index() does simply > pgoff = compound_head(page)->index; > pgoff += page - compound_head(page); > > Shouldn't it also do a <<compound_order(head) for HugeTLB? PageHeadHuge() is only true for head pages, so we are fine here. > > >> > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > > > >> > [ 38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty) > >> > [ 38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400 > >> > [ 38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 > >> > [ 38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641 > >> > [ 38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000 > >> > [ 38.083102] page dumped because: gup_benchmark: tail page: dump_page test > >> > > >> > So, good. However, I feel that the "head " prefix approach is slightly > >> > preferable, because it's doing less processing (the more code one > >> > adds to little-exercised debug paths, the more likely the debugging has > >> > bugs) and is instead just printing out what it sees directly. And it seems a little > >> > odd to remove the PG_head bit from the output. > >> > > >> > The "head " prefix approach looks like this: > >> > >> I would also prefer this approach.It would be also nice to know the tail index. > >> As long as page pointer wasn't hashed, it was possible to figure this out, but > >> now it's not. Maybe print pfn of both page and head? > >> > >> > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > >> > [ 38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512 > >> > [ 38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head) > > > > How about ... > > [ 38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound) > > > > That is, change pageflag_names[] to print 'head' as 'compound' and move the > > 'anon' or 'ksm' to look like a pageflag. Also CMA. Like this: > > > > +++ b/include/trace/events/mmflags.h > > @@ -96,7 +96,7 @@ > > {1UL << PG_private, "private" }, \ > > {1UL << PG_private_2, "private_2" }, \ > > {1UL << PG_writeback, "writeback" }, \ > > - {1UL << PG_head, "head" }, \ > > + {1UL << PG_head, "compound" }, \ > > Dunno about this, "compound" used to always mean "head or tail" AFAIK. Yeah. I'm not convinced by the change either. show_page_flags() also uses this difinitions and I think it can get confusing in the trace.
On 7/2/20 5:53 PM, Kirill A. Shutemov wrote: > On Thu, Jul 02, 2020 at 04:59:14PM +0200, Vlastimil Babka wrote: >> On 6/30/20 1:59 PM, Matthew Wilcox wrote: >> > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote: >> >> I would also prefer this approach.It would be also nice to know the tail index. >> >> As long as page pointer wasn't hashed, it was possible to figure this out, but >> >> now it's not. Maybe print pfn of both page and head? >> > >> >> On 6/30/20 1:35 AM, John Hubbard wrote: >> >> > ...so with that fix, along with your line break approach in the other thread, >> >> > a tail page dump of a FOLL_PIN page looks like this: >> >> > >> >> > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 >> > >> > index is the last thing printed on this line. If it's something like >> > index:0x12345678, you can reduce it mod 1<<order, as printed on the next >> > line. >> >> Hmm, I guess that works as long as head pages really have index aligned to 1 << >> order ... they should, right? >> >> But does it fail for HugeTLB? CC Kirill (git blamed) and Mike. >> page_to_pgoff() has for PageHeadHuge this: >> return page->index << compound_order(page) >> >> but page_to_index() does simply >> pgoff = compound_head(page)->index; >> pgoff += page - compound_head(page); >> >> Shouldn't it also do a <<compound_order(head) for HugeTLB? > > PageHeadHuge() is only true for head pages, so we are fine here. Really? We are calculatting index (pgoff) of tail page, which should be index of head page plus n for n'th tail page; the unit is base pages. But ff HugeTLB head pages use the unit of huge page in page->index, and page_to_pgoff() translates it to unit of base pages, then we should do the same when calculating the index of tail page, no? Otherwise we are adding up units of huge pages (from head->index) with units of base page (n'th tail) and get garbage as a result, AFAICS?
On Thu, Jul 02, 2020 at 06:19:08PM +0200, Vlastimil Babka wrote: > On 7/2/20 5:53 PM, Kirill A. Shutemov wrote: > > On Thu, Jul 02, 2020 at 04:59:14PM +0200, Vlastimil Babka wrote: > >> On 6/30/20 1:59 PM, Matthew Wilcox wrote: > >> > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote: > >> >> I would also prefer this approach.It would be also nice to know the tail index. > >> >> As long as page pointer wasn't hashed, it was possible to figure this out, but > >> >> now it's not. Maybe print pfn of both page and head? > >> > > >> >> On 6/30/20 1:35 AM, John Hubbard wrote: > >> >> > ...so with that fix, along with your line break approach in the other thread, > >> >> > a tail page dump of a FOLL_PIN page looks like this: > >> >> > > >> >> > [ 38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 > >> > > >> > index is the last thing printed on this line. If it's something like > >> > index:0x12345678, you can reduce it mod 1<<order, as printed on the next > >> > line. > >> > >> Hmm, I guess that works as long as head pages really have index aligned to 1 << > >> order ... they should, right? > >> > >> But does it fail for HugeTLB? CC Kirill (git blamed) and Mike. > >> page_to_pgoff() has for PageHeadHuge this: > >> return page->index << compound_order(page) > >> > >> but page_to_index() does simply > >> pgoff = compound_head(page)->index; > >> pgoff += page - compound_head(page); > >> > >> Shouldn't it also do a <<compound_order(head) for HugeTLB? > > > > PageHeadHuge() is only true for head pages, so we are fine here. > > Really? We are calculatting index (pgoff) of tail page, which should be index of > head page plus n for n'th tail page; the unit is base pages. > But ff HugeTLB head pages use the unit of huge page in page->index, and > page_to_pgoff() translates it to unit of base pages, then we should do the same > when calculating the index of tail page, no? Otherwise we are adding up units of > huge pages (from head->index) with units of base page (n'th tail) and get > garbage as a result, AFAICS? You are right. I guess we can get away with this because nobody calls page_to_pgoff() on tail pages of hugetlb page. Except when something goes wrong and dump_page() has to deal with it. I'm not sure if it's worth fixing and whether the fix should be inside page_to_pgoff(). The best fix would be to remove hugetlb pages special-casing: it has to have ->index in base pagesize, not huge page.
On Thu, Jul 02, 2020 at 11:39:07PM +0300, Kirill A. Shutemov wrote: > On Thu, Jul 02, 2020 at 06:19:08PM +0200, Vlastimil Babka wrote: > > Really? We are calculatting index (pgoff) of tail page, which should be index of > > head page plus n for n'th tail page; the unit is base pages. > > But ff HugeTLB head pages use the unit of huge page in page->index, and > > page_to_pgoff() translates it to unit of base pages, then we should do the same > > when calculating the index of tail page, no? Otherwise we are adding up units of > > huge pages (from head->index) with units of base page (n'th tail) and get > > garbage as a result, AFAICS? > > You are right. I guess we can get away with this because nobody calls > page_to_pgoff() on tail pages of hugetlb page. Except when something goes > wrong and dump_page() has to deal with it. > > I'm not sure if it's worth fixing and whether the fix should be inside > page_to_pgoff(). > > The best fix would be to remove hugetlb pages special-casing: it has to > have ->index in base pagesize, not huge page. https://lore.kernel.org/linux-mm/20200629152033.16175-1-willy@infradead.org/ would be a good place to start. We'll use 8 entries for an order-9 page instead of the 1 entry that we currently do, but that's a lot better than using 512 entries.
diff --git a/mm/debug.c b/mm/debug.c index 4f376514744d..44d843b3b07c 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -163,7 +163,7 @@ void __dump_page(struct page *page, const char *reason) out_mapping: BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1); - pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags, + pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags, page_cma ? " CMA" : ""); hex_only:
Tail page flags contain very little useful information. Print the head page's flags instead (even though PageHead is a little misleading). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)