diff mbox series

[1/3] mm: Print head flags in dump_page

Message ID 20200629151918.15537-2-willy@infradead.org (mailing list archive)
State New, archived
Headers show
Series Improvements for dump_page() | expand

Commit Message

Matthew Wilcox June 29, 2020, 3:19 p.m. UTC
Tail page flags contain very little useful information.  Print the head
page's flags instead (even though PageHead is a little misleading).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

John Hubbard June 29, 2020, 10:38 p.m. UTC | #1
On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote:
> Tail page flags contain very little useful information.  Print the head
> page's flags instead (even though PageHead is a little misleading).

You are right about the tail page. And the raw output provides the tail
page flags, in case someone *really* needs to dig into tail page problems,
so that's all good.

However, I just gave this a spin, and seeing the "|head" in the list for
my "tail page: dump_page test" is also slightly misleading for me, too.

Sure, one can eventually work through it and conclude "my dump_page output
is misleading", but I think a small tweak would avoid that:

> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/debug.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/debug.c b/mm/debug.c
> index 4f376514744d..44d843b3b07c 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -163,7 +163,7 @@ void __dump_page(struct page *page, const char *reason)
>   out_mapping:
>   	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>   
> -	pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags,
> +	pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
>   		page_cma ? " CMA" : "");


Perhaps just adding "head " to the above will clear it all up:

diff --git a/mm/debug.c b/mm/debug.c
index fcf3a16902b2..b7ac8af71940 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -165,7 +165,7 @@ void __dump_page(struct page *page, const char *reason)
  out_mapping:
  	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);

-	pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
+	pr_warn("head %sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
  		page_cma ? " CMA" : "");

  hex_only:


Sample output:

[   62.064271] page:000000001c8f0c3e refcount:513 mapcount:1 mapping:0000000000000000 index:0x11 
head:00000000ba9ccef2 order:9 compound_mapcount:1 compound_pincount:512
[   62.078432] head anon flags: 0x17ffe000001000e(referenced|uptodate|dirty|head)
[   62.088454] raw: 017ffe0000000000 ffffea00209e8001 ffffea00209e8448 dead000000000400
[   62.095356] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   62.102289] head: 017ffe000001000e ffffffff83649ca0 ffffea00209e0008 ffff88889663c001
[   62.109286] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
[   62.116301] page dumped because: gup_benchmark: tail page: dump_page test



thanks,
Matthew Wilcox June 29, 2020, 10:51 p.m. UTC | #2
On Mon, Jun 29, 2020 at 03:38:13PM -0700, John Hubbard wrote:
> On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote:
> > Tail page flags contain very little useful information.  Print the head
> > page's flags instead (even though PageHead is a little misleading).
> 
> You are right about the tail page. And the raw output provides the tail
> page flags, in case someone *really* needs to dig into tail page problems,
> so that's all good.
> 
> However, I just gave this a spin, and seeing the "|head" in the list for
> my "tail page: dump_page test" is also slightly misleading for me, too.

We could also do ...

@@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason)
        struct address_space *mapping;
        bool page_poisoned = PagePoisoned(page);
        bool compound = PageCompound(page);
+       unsigned long flags = page->flags;
+
        /*
         * Accessing the pageblock without the zone lock. It could change to
         * "isolate" again in the meantime, but since we are just dumping the
@@ -165,7 +162,9 @@ void __dump_page(struct page *page, const char *reason)
 out_mapping:
        BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
 
-       pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
+       if (head != page)
+               flags = head->flags & ~PG_head;
+       pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags,
                page_cma ? " CMA" : "");
 
 hex_only:
John Hubbard June 29, 2020, 10:54 p.m. UTC | #3
On 2020-06-29 15:51, Matthew Wilcox wrote:
> On Mon, Jun 29, 2020 at 03:38:13PM -0700, John Hubbard wrote:
>> On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote:
>>> Tail page flags contain very little useful information.  Print the head
>>> page's flags instead (even though PageHead is a little misleading).
>>
>> You are right about the tail page. And the raw output provides the tail
>> page flags, in case someone *really* needs to dig into tail page problems,
>> so that's all good.
>>
>> However, I just gave this a spin, and seeing the "|head" in the list for
>> my "tail page: dump_page test" is also slightly misleading for me, too.
> 
> We could also do ...
> 
> @@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason)
>          struct address_space *mapping;
>          bool page_poisoned = PagePoisoned(page);
>          bool compound = PageCompound(page);
> +       unsigned long flags = page->flags;
> +
>          /*
>           * Accessing the pageblock without the zone lock. It could change to
>           * "isolate" again in the meantime, but since we are just dumping the
> @@ -165,7 +162,9 @@ void __dump_page(struct page *page, const char *reason)
>   out_mapping:
>          BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>   
> -       pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
> +       if (head != page)
> +               flags = head->flags & ~PG_head;
> +       pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags,
>                  page_cma ? " CMA" : "");
>   

OK, I'll give this a spin, along with your line break approach...

thanks,
John Hubbard June 29, 2020, 11:35 p.m. UTC | #4
On 2020-06-29 15:51, Matthew Wilcox wrote:
> On Mon, Jun 29, 2020 at 03:38:13PM -0700, John Hubbard wrote:
>> On 2020-06-29 08:19, Matthew Wilcox (Oracle) wrote:
>>> Tail page flags contain very little useful information.  Print the head
>>> page's flags instead (even though PageHead is a little misleading).
>>
>> You are right about the tail page. And the raw output provides the tail
>> page flags, in case someone *really* needs to dig into tail page problems,
>> so that's all good.
>>
>> However, I just gave this a spin, and seeing the "|head" in the list for
>> my "tail page: dump_page test" is also slightly misleading for me, too.
> 
> We could also do ...
> 
> @@ -48,6 +48,8 @@ void __dump_page(struct page *page, const char *reason)
>          struct address_space *mapping;
>          bool page_poisoned = PagePoisoned(page);
>          bool compound = PageCompound(page);
> +       unsigned long flags = page->flags;
> +
>          /*
>           * Accessing the pageblock without the zone lock. It could change to
>           * "isolate" again in the meantime, but since we are just dumping the
> @@ -165,7 +162,9 @@ void __dump_page(struct page *page, const char *reason)
>   out_mapping:
>          BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>   
> -       pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
> +       if (head != page)
> +               flags = head->flags & ~PG_head;

The above should be:

                    flags = head->flags & ~(1UL << PG_head);


> +       pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags,
>                  page_cma ? " CMA" : "");
>   
>   hex_only:
> 

...so with that fix, along with your line break approach in the other thread,
a tail page dump of a FOLL_PIN page looks like this:

[   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
[   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
[   38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty)
[   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
[   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
[   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
[   38.083102] page dumped because: gup_benchmark: tail page: dump_page test

So, good. However, I feel that the "head " prefix approach is slightly
preferable, because it's doing less processing (the more code one
adds to little-exercised debug paths, the more likely the debugging has
bugs) and is instead just printing out what it sees directly. And it seems a little
odd to remove the PG_head bit from the output.

The "head " prefix approach looks like this:


[   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
[   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
[   38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head)
[   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
[   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
[   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
[   38.083102] page dumped because: gup_benchmark: tail page: dump_page test




thanks,
Vlastimil Babka June 30, 2020, 8:02 a.m. UTC | #5
On 6/30/20 1:35 AM, John Hubbard wrote:
>> +       pr_warn("%sflags: %#lx(%pGp)%s\n", type, flags, &flags,
>>                  page_cma ? " CMA" : "");
>>   
>>   hex_only:
>> 
> 
> ...so with that fix, along with your line break approach in the other thread,
> a tail page dump of a FOLL_PIN page looks like this:
> 
> [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
> [   38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty)
> [   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
> [   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
> [   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
> [   38.083102] page dumped because: gup_benchmark: tail page: dump_page test
> 
> So, good. However, I feel that the "head " prefix approach is slightly
> preferable, because it's doing less processing (the more code one
> adds to little-exercised debug paths, the more likely the debugging has
> bugs) and is instead just printing out what it sees directly. And it seems a little
> odd to remove the PG_head bit from the output.
> 
> The "head " prefix approach looks like this:

I would also prefer this approach.It would be also nice to know the tail index.
As long as page pointer wasn't hashed, it was possible to figure this out, but
now it's not. Maybe print pfn of both page and head?

> [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
> [   38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head)
> [   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
> [   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
> [   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
> [   38.083102] page dumped because: gup_benchmark: tail page: dump_page test
> 
> 
> 
> 
> thanks,
>
Matthew Wilcox June 30, 2020, 11:59 a.m. UTC | #6
On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote:
> I would also prefer this approach.It would be also nice to know the tail index.
> As long as page pointer wasn't hashed, it was possible to figure this out, but
> now it's not. Maybe print pfn of both page and head?

> On 6/30/20 1:35 AM, John Hubbard wrote:
> > ...so with that fix, along with your line break approach in the other thread,
> > a tail page dump of a FOLL_PIN page looks like this:
> > 
> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11

index is the last thing printed on this line.  If it's something like
index:0x12345678, you can reduce it mod 1<<order, as printed on the next
line.

> > [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512

> > [   38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty)
> > [   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
> > [   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> > [   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
> > [   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
> > [   38.083102] page dumped because: gup_benchmark: tail page: dump_page test
> > 
> > So, good. However, I feel that the "head " prefix approach is slightly
> > preferable, because it's doing less processing (the more code one
> > adds to little-exercised debug paths, the more likely the debugging has
> > bugs) and is instead just printing out what it sees directly. And it seems a little
> > odd to remove the PG_head bit from the output.
> > 
> > The "head " prefix approach looks like this:
> 
> I would also prefer this approach.It would be also nice to know the tail index.
> As long as page pointer wasn't hashed, it was possible to figure this out, but
> now it's not. Maybe print pfn of both page and head?
> 
> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> > [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
> > [   38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head)

How about ...
[   38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound)

That is, change pageflag_names[] to print 'head' as 'compound' and move the
'anon' or 'ksm' to look like a pageflag.  Also CMA.  Like this:

+++ b/include/trace/events/mmflags.h
@@ -96,7 +96,7 @@
        {1UL << PG_private,             "private"       },              \
        {1UL << PG_private_2,           "private_2"     },              \
        {1UL << PG_writeback,           "writeback"     },              \
-       {1UL << PG_head,                "head"          },              \
+       {1UL << PG_head,                "compound"      },              \
        {1UL << PG_mappedtodisk,        "mappedtodisk"  },              \
        {1UL << PG_reclaim,             "reclaim"       },              \
        {1UL << PG_swapbacked,          "swapbacked"    },              \
+++ b/mm/debug.c
@@ -54,7 +54,7 @@ void __dump_page(struct page *page, const char *reason)
         * state for debugging, it should be fine to accept a bit of
         * inaccuracy here due to racing.
         */
-       bool page_cma = is_migrate_cma_page(page);
+       char *cma = is_migrate_cma_page(page) ? "|cma" : "";
        int mapcount;
        char *type = "";
 
@@ -71,6 +71,10 @@ void __dump_page(struct page *page, const char *reason)
        if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
                /* Corrupt page, cannot call page_mapping */
                mapping = page->mapping;
+               if ((unsigned long)mapping & PAGE_MAPPING_ANON)
+                       mapping = NULL;
+               mapping = (struct address_space *)
+                               ((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
                head = page;
                compound = false;
        } else {
[...]
        if (PageKsm(page))
-               type = "ksm ";
+               type = "ksm|";
        else if (PageAnon(page))
-               type = "anon ";
-       else if (mapping) {
+               type = "anon|";
+
+       BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
+       pr_warn("flags: %#lx(%s%pGp%s)\n", page->flags, type, &page->flags,
+                       cma);
+
+       if (mapping) {
                const struct inode *host;
                const struct address_space_operations *a_ops;
@@ -163,11 +167,6 @@ void __dump_page(struct page *page, const char *reason)
                }
        }
 out_mapping:
-       BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
-
-       pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
-               page_cma ? " CMA" : "");
-
 hex_only:
        print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
                        sizeof(unsigned long), page,

Can also delete the 'out_mapping' label this way.

(I really like it that we're debating this ... it feels like we've been
in a bit of a push-pull with the debug patches over the years)
John Hubbard July 1, 2020, 2:12 a.m. UTC | #7
On 2020-06-30 04:59, Matthew Wilcox wrote:
...
> How about ...
> [   38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound)
> 
> That is, change pageflag_names[] to print 'head' as 'compound' and move the
> 'anon' or 'ksm' to look like a pageflag.  Also CMA.  Like this:
> 
> +++ b/include/trace/events/mmflags.h
> @@ -96,7 +96,7 @@
>          {1UL << PG_private,             "private"       },              \
>          {1UL << PG_private_2,           "private_2"     },              \
>          {1UL << PG_writeback,           "writeback"     },              \
> -       {1UL << PG_head,                "head"          },              \
> +       {1UL << PG_head,                "compound"      },              \
>          {1UL << PG_mappedtodisk,        "mappedtodisk"  },              \
>          {1UL << PG_reclaim,             "reclaim"       },              \
>          {1UL << PG_swapbacked,          "swapbacked"    },              \
> +++ b/mm/debug.c
> @@ -54,7 +54,7 @@ void __dump_page(struct page *page, const char *reason)
>           * state for debugging, it should be fine to accept a bit of
>           * inaccuracy here due to racing.
>           */
> -       bool page_cma = is_migrate_cma_page(page);
> +       char *cma = is_migrate_cma_page(page) ? "|cma" : "";
>          int mapcount;
>          char *type = "";
>   
> @@ -71,6 +71,10 @@ void __dump_page(struct page *page, const char *reason)
>          if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
>                  /* Corrupt page, cannot call page_mapping */
>                  mapping = page->mapping;
> +               if ((unsigned long)mapping & PAGE_MAPPING_ANON)
> +                       mapping = NULL;
> +               mapping = (struct address_space *)
> +                               ((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>                  head = page;
>                  compound = false;
>          } else {
> [...]
>          if (PageKsm(page))
> -               type = "ksm ";
> +               type = "ksm|";
>          else if (PageAnon(page))
> -               type = "anon ";
> -       else if (mapping) {
> +               type = "anon|";
> +
> +       BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
> +       pr_warn("flags: %#lx(%s%pGp%s)\n", page->flags, type, &page->flags,
> +                       cma);
> +
> +       if (mapping) {
>                  const struct inode *host;
>                  const struct address_space_operations *a_ops;
> @@ -163,11 +167,6 @@ void __dump_page(struct page *page, const char *reason)
>                  }
>          }
>   out_mapping:
> -       BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
> -
> -       pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
> -               page_cma ? " CMA" : "");
> -
>   hex_only:
>          print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>                          sizeof(unsigned long), page,
> 
> Can also delete the 'out_mapping' label this way.

OK, so after applying that on top of your original series, and your
line-break approach, here's what the output looks like for regular
and THP, head and tail pages (all involving FOLL_PIN):

THP FOLL_PIN page: head page:

[   42.360473] page:0000000025f35fdb refcount:513 mapcount:1 mapping:0000000000000000 index:0x0
[   42.368012] head:0000000025f35fdb order:9 compound_mapcount:1 compound_pincount:512
[   42.374761] flags: 0x17ffe000001000e(anon|referenced|uptodate|dirty|compound)
[   42.380994] raw: 017ffe000001000e ffffffff83649ca0 ffffea0020c80008 ffff888898091901
[   42.387822] raw: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
[   42.394680] page dumped because: gup_benchmark: head page: dump_page test

THP FOLL_PIN page: tail page:

[   42.408222] page:00000000803d233b refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
[   42.415850] head:0000000025f35fdb order:9 compound_mapcount:1 compound_pincount:512
[   42.422607] flags: 0x17ffe0000000000(anon|)
[   42.425772] raw: 017ffe0000000000 ffffea0020c98001 ffffea0020c98448 dead000000000400
[   42.432636] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   42.439490] head: 017ffe000001000e ffffffff83649ca0 ffffea0020c80008 ffff888898091901
[   42.446431] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
[   42.453355] page dumped because: gup_benchmark: tail page: dump_page test

Non-THP FOLL_PIN page: head page:

[   41.513677] page:00000000190e28ba refcount:1025 mapcount:1 mapping:0000000000000000 index:0x0
[   41.521331] flags: 0x17ffe0000080034(anon|uptodate|lru|active|swapbacked)
[   41.527189] raw: 017ffe0000080034 ffffea002228db08 ffffea0022215108 ffff888898090191
[   41.534020] raw: 0000000000000000 0000000000000000 0000040100000000 0000000000000000
[   41.540863] page dumped because: gup_benchmark: head page: dump_page test

Non-THP FOLL_PIN page: tail page:

[   41.554472] page:00000000696a8210 refcount:1025 mapcount:1 mapping:0000000000000000 index:0x11
[   41.562230] flags: 0x17ffe0000080034(anon|uptodate|lru|active|swapbacked)
[   41.568073] raw: 017ffe0000080034 ffffea0022195688 ffffea0021a4e608 ffff888898090191
[   41.574940] raw: 0000000000000011 0000000000000000 0000040100000000 0000000000000000
[   41.581768] page dumped because: gup_benchmark: tail page: dump_page test

> 
> (I really like it that we're debating this ... it feels like we've been
> in a bit of a push-pull with the debug patches over the years)
> 

Yes, it's good that we're pausing a moment to polish this up. Lots of goodness
here and I like the above output.

thanks,
Vlastimil Babka July 2, 2020, 2:59 p.m. UTC | #8
On 6/30/20 1:59 PM, Matthew Wilcox wrote:
> On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote:
>> I would also prefer this approach.It would be also nice to know the tail index.
>> As long as page pointer wasn't hashed, it was possible to figure this out, but
>> now it's not. Maybe print pfn of both page and head?
> 
>> On 6/30/20 1:35 AM, John Hubbard wrote:
>> > ...so with that fix, along with your line break approach in the other thread,
>> > a tail page dump of a FOLL_PIN page looks like this:
>> > 
>> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> 
> index is the last thing printed on this line.  If it's something like
> index:0x12345678, you can reduce it mod 1<<order, as printed on the next
> line.

Hmm, I guess that works as long as head pages really have index aligned to 1 <<
order ... they should, right?

But does it fail for HugeTLB? CC Kirill (git blamed) and Mike.
page_to_pgoff() has for PageHeadHuge this:
	return page->index << compound_order(page)

but page_to_index() does simply
        pgoff = compound_head(page)->index;
        pgoff += page - compound_head(page);

Shouldn't it also do a <<compound_order(head) for HugeTLB?

>> > [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
> 
>> > [   38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty)
>> > [   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
>> > [   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
>> > [   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
>> > [   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
>> > [   38.083102] page dumped because: gup_benchmark: tail page: dump_page test
>> > 
>> > So, good. However, I feel that the "head " prefix approach is slightly
>> > preferable, because it's doing less processing (the more code one
>> > adds to little-exercised debug paths, the more likely the debugging has
>> > bugs) and is instead just printing out what it sees directly. And it seems a little
>> > odd to remove the PG_head bit from the output.
>> > 
>> > The "head " prefix approach looks like this:
>> 
>> I would also prefer this approach.It would be also nice to know the tail index.
>> As long as page pointer wasn't hashed, it was possible to figure this out, but
>> now it's not. Maybe print pfn of both page and head?
>> 
>> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
>> > [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
>> > [   38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head)
> 
> How about ...
> [   38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound)
> 
> That is, change pageflag_names[] to print 'head' as 'compound' and move the
> 'anon' or 'ksm' to look like a pageflag.  Also CMA.  Like this:
> 
> +++ b/include/trace/events/mmflags.h
> @@ -96,7 +96,7 @@
>         {1UL << PG_private,             "private"       },              \
>         {1UL << PG_private_2,           "private_2"     },              \
>         {1UL << PG_writeback,           "writeback"     },              \
> -       {1UL << PG_head,                "head"          },              \
> +       {1UL << PG_head,                "compound"      },              \

Dunno about this, "compound" used to always mean "head or tail" AFAIK.

>         {1UL << PG_mappedtodisk,        "mappedtodisk"  },              \
>         {1UL << PG_reclaim,             "reclaim"       },              \
>         {1UL << PG_swapbacked,          "swapbacked"    },              \
> +++ b/mm/debug.c
> @@ -54,7 +54,7 @@ void __dump_page(struct page *page, const char *reason)
>          * state for debugging, it should be fine to accept a bit of
>          * inaccuracy here due to racing.
>          */
> -       bool page_cma = is_migrate_cma_page(page);
> +       char *cma = is_migrate_cma_page(page) ? "|cma" : "";
>         int mapcount;
>         char *type = "";
>  
> @@ -71,6 +71,10 @@ void __dump_page(struct page *page, const char *reason)
>         if (page < head || (page >= head + MAX_ORDER_NR_PAGES)) {
>                 /* Corrupt page, cannot call page_mapping */
>                 mapping = page->mapping;
> +               if ((unsigned long)mapping & PAGE_MAPPING_ANON)
> +                       mapping = NULL;
> +               mapping = (struct address_space *)
> +                               ((unsigned long)mapping & ~PAGE_MAPPING_FLAGS);
>                 head = page;
>                 compound = false;
>         } else {
> [...]
>         if (PageKsm(page))
> -               type = "ksm ";
> +               type = "ksm|";
>         else if (PageAnon(page))
> -               type = "anon ";
> -       else if (mapping) {
> +               type = "anon|";
> +
> +       BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
> +       pr_warn("flags: %#lx(%s%pGp%s)\n", page->flags, type, &page->flags,
> +                       cma);
> +
> +       if (mapping) {
>                 const struct inode *host;
>                 const struct address_space_operations *a_ops;
> @@ -163,11 +167,6 @@ void __dump_page(struct page *page, const char *reason)
>                 }
>         }
>  out_mapping:
> -       BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
> -
> -       pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
> -               page_cma ? " CMA" : "");
> -
>  hex_only:
>         print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>                         sizeof(unsigned long), page,
> 
> Can also delete the 'out_mapping' label this way.
> 
> (I really like it that we're debating this ... it feels like we've been
> in a bit of a push-pull with the debug patches over the years)
>
Kirill A . Shutemov July 2, 2020, 3:53 p.m. UTC | #9
On Thu, Jul 02, 2020 at 04:59:14PM +0200, Vlastimil Babka wrote:
> On 6/30/20 1:59 PM, Matthew Wilcox wrote:
> > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote:
> >> I would also prefer this approach.It would be also nice to know the tail index.
> >> As long as page pointer wasn't hashed, it was possible to figure this out, but
> >> now it's not. Maybe print pfn of both page and head?
> > 
> >> On 6/30/20 1:35 AM, John Hubbard wrote:
> >> > ...so with that fix, along with your line break approach in the other thread,
> >> > a tail page dump of a FOLL_PIN page looks like this:
> >> > 
> >> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> > 
> > index is the last thing printed on this line.  If it's something like
> > index:0x12345678, you can reduce it mod 1<<order, as printed on the next
> > line.
> 
> Hmm, I guess that works as long as head pages really have index aligned to 1 <<
> order ... they should, right?
> 
> But does it fail for HugeTLB? CC Kirill (git blamed) and Mike.
> page_to_pgoff() has for PageHeadHuge this:
> 	return page->index << compound_order(page)
> 
> but page_to_index() does simply
>         pgoff = compound_head(page)->index;
>         pgoff += page - compound_head(page);
> 
> Shouldn't it also do a <<compound_order(head) for HugeTLB?

PageHeadHuge() is only true for head pages, so we are fine here.

> 
> >> > [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
> > 
> >> > [   38.049155] anon flags: 0x17ffe000000000e(referenced|uptodate|dirty)
> >> > [   38.055465] raw: 017ffe0000000000 ffffea0020dd0001 ffffea0020dd0448 dead000000000400
> >> > [   38.062319] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> >> > [   38.069183] head: 017ffe000001000e ffffffff83649ca0 ffffea0020dd8008 ffff88888e0b6641
> >> > [   38.076141] head: 0000000000000000 0000000000000000 00000201ffffffff 0000000000000000
> >> > [   38.083102] page dumped because: gup_benchmark: tail page: dump_page test
> >> > 
> >> > So, good. However, I feel that the "head " prefix approach is slightly
> >> > preferable, because it's doing less processing (the more code one
> >> > adds to little-exercised debug paths, the more likely the debugging has
> >> > bugs) and is instead just printing out what it sees directly. And it seems a little
> >> > odd to remove the PG_head bit from the output.
> >> > 
> >> > The "head " prefix approach looks like this:
> >> 
> >> I would also prefer this approach.It would be also nice to know the tail index.
> >> As long as page pointer wasn't hashed, it was possible to figure this out, but
> >> now it's not. Maybe print pfn of both page and head?
> >> 
> >> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> >> > [   38.035633] head:00000000675be53c order:9 compound_mapcount:1 compound_pincount:512
> >> > [   38.049155] head anon flags: 0x17ffe000000000e(referenced|uptodate|dirty|head)
> > 
> > How about ...
> > [   38.049155] flags: 0x17ffe000000000e(anon|referenced|uptodate|dirty|compound)
> > 
> > That is, change pageflag_names[] to print 'head' as 'compound' and move the
> > 'anon' or 'ksm' to look like a pageflag.  Also CMA.  Like this:
> > 
> > +++ b/include/trace/events/mmflags.h
> > @@ -96,7 +96,7 @@
> >         {1UL << PG_private,             "private"       },              \
> >         {1UL << PG_private_2,           "private_2"     },              \
> >         {1UL << PG_writeback,           "writeback"     },              \
> > -       {1UL << PG_head,                "head"          },              \
> > +       {1UL << PG_head,                "compound"      },              \
> 
> Dunno about this, "compound" used to always mean "head or tail" AFAIK.

Yeah. I'm not convinced by the change either. show_page_flags() also uses
this difinitions and I think it can get confusing in the trace.
Vlastimil Babka July 2, 2020, 4:19 p.m. UTC | #10
On 7/2/20 5:53 PM, Kirill A. Shutemov wrote:
> On Thu, Jul 02, 2020 at 04:59:14PM +0200, Vlastimil Babka wrote:
>> On 6/30/20 1:59 PM, Matthew Wilcox wrote:
>> > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote:
>> >> I would also prefer this approach.It would be also nice to know the tail index.
>> >> As long as page pointer wasn't hashed, it was possible to figure this out, but
>> >> now it's not. Maybe print pfn of both page and head?
>> > 
>> >> On 6/30/20 1:35 AM, John Hubbard wrote:
>> >> > ...so with that fix, along with your line break approach in the other thread,
>> >> > a tail page dump of a FOLL_PIN page looks like this:
>> >> > 
>> >> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
>> > 
>> > index is the last thing printed on this line.  If it's something like
>> > index:0x12345678, you can reduce it mod 1<<order, as printed on the next
>> > line.
>> 
>> Hmm, I guess that works as long as head pages really have index aligned to 1 <<
>> order ... they should, right?
>> 
>> But does it fail for HugeTLB? CC Kirill (git blamed) and Mike.
>> page_to_pgoff() has for PageHeadHuge this:
>> 	return page->index << compound_order(page)
>> 
>> but page_to_index() does simply
>>         pgoff = compound_head(page)->index;
>>         pgoff += page - compound_head(page);
>> 
>> Shouldn't it also do a <<compound_order(head) for HugeTLB?
> 
> PageHeadHuge() is only true for head pages, so we are fine here.

Really? We are calculatting index (pgoff) of tail page, which should be index of
head page plus n for n'th tail page; the unit is base pages.
But ff HugeTLB head pages use the unit of huge page in page->index, and
page_to_pgoff() translates it to unit of base pages, then we should do the same
when calculating the index of tail page, no? Otherwise we are adding up units of
huge pages (from head->index) with units of base page (n'th tail) and get
garbage as a result, AFAICS?
Kirill A . Shutemov July 2, 2020, 8:39 p.m. UTC | #11
On Thu, Jul 02, 2020 at 06:19:08PM +0200, Vlastimil Babka wrote:
> On 7/2/20 5:53 PM, Kirill A. Shutemov wrote:
> > On Thu, Jul 02, 2020 at 04:59:14PM +0200, Vlastimil Babka wrote:
> >> On 6/30/20 1:59 PM, Matthew Wilcox wrote:
> >> > On Tue, Jun 30, 2020 at 10:02:50AM +0200, Vlastimil Babka wrote:
> >> >> I would also prefer this approach.It would be also nice to know the tail index.
> >> >> As long as page pointer wasn't hashed, it was possible to figure this out, but
> >> >> now it's not. Maybe print pfn of both page and head?
> >> > 
> >> >> On 6/30/20 1:35 AM, John Hubbard wrote:
> >> >> > ...so with that fix, along with your line break approach in the other thread,
> >> >> > a tail page dump of a FOLL_PIN page looks like this:
> >> >> > 
> >> >> > [   38.027987] page:00000000abaef9ae refcount:513 mapcount:1 mapping:0000000000000000 index:0x11
> >> > 
> >> > index is the last thing printed on this line.  If it's something like
> >> > index:0x12345678, you can reduce it mod 1<<order, as printed on the next
> >> > line.
> >> 
> >> Hmm, I guess that works as long as head pages really have index aligned to 1 <<
> >> order ... they should, right?
> >> 
> >> But does it fail for HugeTLB? CC Kirill (git blamed) and Mike.
> >> page_to_pgoff() has for PageHeadHuge this:
> >> 	return page->index << compound_order(page)
> >> 
> >> but page_to_index() does simply
> >>         pgoff = compound_head(page)->index;
> >>         pgoff += page - compound_head(page);
> >> 
> >> Shouldn't it also do a <<compound_order(head) for HugeTLB?
> > 
> > PageHeadHuge() is only true for head pages, so we are fine here.
> 
> Really? We are calculatting index (pgoff) of tail page, which should be index of
> head page plus n for n'th tail page; the unit is base pages.
> But ff HugeTLB head pages use the unit of huge page in page->index, and
> page_to_pgoff() translates it to unit of base pages, then we should do the same
> when calculating the index of tail page, no? Otherwise we are adding up units of
> huge pages (from head->index) with units of base page (n'th tail) and get
> garbage as a result, AFAICS?

You are right. I guess we can get away with this because nobody calls
page_to_pgoff() on tail pages of hugetlb page. Except when something goes
wrong and dump_page() has to deal with it.

I'm not sure if it's worth fixing and whether the fix should be inside
page_to_pgoff().

The best fix would be to remove hugetlb pages special-casing: it has to
have ->index in base pagesize, not huge page.
Matthew Wilcox July 2, 2020, 9:01 p.m. UTC | #12
On Thu, Jul 02, 2020 at 11:39:07PM +0300, Kirill A. Shutemov wrote:
> On Thu, Jul 02, 2020 at 06:19:08PM +0200, Vlastimil Babka wrote:
> > Really? We are calculatting index (pgoff) of tail page, which should be index of
> > head page plus n for n'th tail page; the unit is base pages.
> > But ff HugeTLB head pages use the unit of huge page in page->index, and
> > page_to_pgoff() translates it to unit of base pages, then we should do the same
> > when calculating the index of tail page, no? Otherwise we are adding up units of
> > huge pages (from head->index) with units of base page (n'th tail) and get
> > garbage as a result, AFAICS?
> 
> You are right. I guess we can get away with this because nobody calls
> page_to_pgoff() on tail pages of hugetlb page. Except when something goes
> wrong and dump_page() has to deal with it.
> 
> I'm not sure if it's worth fixing and whether the fix should be inside
> page_to_pgoff().
> 
> The best fix would be to remove hugetlb pages special-casing: it has to
> have ->index in base pagesize, not huge page.

https://lore.kernel.org/linux-mm/20200629152033.16175-1-willy@infradead.org/
would be a good place to start.  We'll use 8 entries for an order-9 page
instead of the 1 entry that we currently do, but that's a lot better
than using 512 entries.
diff mbox series

Patch

diff --git a/mm/debug.c b/mm/debug.c
index 4f376514744d..44d843b3b07c 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -163,7 +163,7 @@  void __dump_page(struct page *page, const char *reason)
 out_mapping:
 	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
 
-	pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags,
+	pr_warn("%sflags: %#lx(%pGp)%s\n", type, head->flags, &head->flags,
 		page_cma ? " CMA" : "");
 
 hex_only: