diff mbox series

mm/page_alloc: Add a reason for reserved pages in has_unmovable_pages()

Message ID 1570090257-25001-1-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series mm/page_alloc: Add a reason for reserved pages in has_unmovable_pages() | expand

Commit Message

Anshuman Khandual Oct. 3, 2019, 8:10 a.m. UTC
Having unmovable pages on a given pageblock should be reported correctly
when required with REPORT_FAILURE flag. But there can be a scenario where a
reserved page in the page block will get reported as a generic "unmovable"
reason code. Instead this should be changed to a more appropriate reason
code like "Reserved page".

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 mm/page_alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Qian Cai Oct. 3, 2019, 9:05 a.m. UTC | #1
> On Oct 3, 2019, at 4:10 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> Having unmovable pages on a given pageblock should be reported correctly
> when required with REPORT_FAILURE flag. But there can be a scenario where a
> reserved page in the page block will get reported as a generic "unmovable"
> reason code. Instead this should be changed to a more appropriate reason
> code like "Reserved page".

Isn’t this redundant as it dumps the flags in dump_page() anyway?

> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> mm/page_alloc.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 15c2050c629b..fbf93ea119d2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> 
>        page = pfn_to_page(check);
> 
> -        if (PageReserved(page))
> +        if (PageReserved(page)) {
> +            reason = "Reserved page";
>            goto unmovable;
> +        }
> 
>        /*
>         * If the zone is movable and we have ruled out all reserved
> -- 
> 2.20.1
>
Anshuman Khandual Oct. 3, 2019, 9:32 a.m. UTC | #2
On 10/03/2019 02:35 PM, Qian Cai wrote:
> 
> 
>> On Oct 3, 2019, at 4:10 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>
>> Having unmovable pages on a given pageblock should be reported correctly
>> when required with REPORT_FAILURE flag. But there can be a scenario where a
>> reserved page in the page block will get reported as a generic "unmovable"
>> reason code. Instead this should be changed to a more appropriate reason
>> code like "Reserved page".
> 
> Isn’t this redundant as it dumps the flags in dump_page() anyway?

Even though page flags does contain reserved bit information, the problem
is that we are explicitly printing the reason for this page dump. In this
case it is caused by the fact that it is a reserved page.

page dumped because: <reason>

The proposed change makes it explicit that the dump is caused because a
non movable page with reserved bit set. It also helps in differentiating 
between reserved bit condition and the last one "if (found > count)".

> 
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> mm/page_alloc.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 15c2050c629b..fbf93ea119d2 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>>
>>        page = pfn_to_page(check);
>>
>> -        if (PageReserved(page))
>> +        if (PageReserved(page)) {
>> +            reason = "Reserved page";
>>            goto unmovable;
>> +        }
>>
>>        /*
>>         * If the zone is movable and we have ruled out all reserved
>> -- 
>> 2.20.1
>>
>
Anshuman Khandual Oct. 3, 2019, 9:53 a.m. UTC | #3
On 10/03/2019 03:02 PM, Anshuman Khandual wrote:
> 
> On 10/03/2019 02:35 PM, Qian Cai wrote:
>>
>>> On Oct 3, 2019, at 4:10 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>>
>>> Having unmovable pages on a given pageblock should be reported correctly
>>> when required with REPORT_FAILURE flag. But there can be a scenario where a
>>> reserved page in the page block will get reported as a generic "unmovable"
>>> reason code. Instead this should be changed to a more appropriate reason
>>> code like "Reserved page".
>> Isn’t this redundant as it dumps the flags in dump_page() anyway?
> Even though page flags does contain reserved bit information, the problem
> is that we are explicitly printing the reason for this page dump. In this
> case it is caused by the fact that it is a reserved page.
> 
> page dumped because: <reason>
> 
> The proposed change makes it explicit that the dump is caused because a
> non movable page with reserved bit set. It also helps in differentiating 
> between reserved bit condition and the last one "if (found > count)"

Instead, will it better to rename the reason codes as

1. "Unmovable (CMA)"
2. "Unmovable (Reserved)"
3. "Unmovable (Private/non-LRU)"
Qian Cai Oct. 3, 2019, 11:19 a.m. UTC | #4
> On Oct 3, 2019, at 5:32 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> Even though page flags does contain reserved bit information, the problem
> is that we are explicitly printing the reason for this page dump. In this
> case it is caused by the fact that it is a reserved page.
> 
> page dumped because: <reason>
> 
> The proposed change makes it explicit that the dump is caused because a
> non movable page with reserved bit set. It also helps in differentiating 
> between reserved bit condition and the last one "if (found > count)".

Then, someone later would like to add a reason for every different page flags just because they *think* they are special.
Anshuman Khandual Oct. 3, 2019, 11:32 a.m. UTC | #5
On 10/03/2019 04:49 PM, Qian Cai wrote:
> 
> 
>> On Oct 3, 2019, at 5:32 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>
>> Even though page flags does contain reserved bit information, the problem
>> is that we are explicitly printing the reason for this page dump. In this
>> case it is caused by the fact that it is a reserved page.
>>
>> page dumped because: <reason>
>>
>> The proposed change makes it explicit that the dump is caused because a
>> non movable page with reserved bit set. It also helps in differentiating 
>> between reserved bit condition and the last one "if (found > count)".
> 
> Then, someone later would like to add a reason for every different page flags just because they *think* they are special.
> 

Ohh, never meant that the 'Reserved' bit is anything special here but it
is a reason to make a page unmovable unlike many other flags.
Qian Cai Oct. 3, 2019, 11:50 a.m. UTC | #6
> On Oct 3, 2019, at 7:31 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> Ohh, never meant that the 'Reserved' bit is anything special here but it
> is a reason to make a page unmovable unlike many other flags.

But dump_page() is used everywhere, and it is better to reserve “reason” to indicate something more important rather than duplicating the page flags.

Especially, it is trivial enough right now for developers look in the page flags dumping from has_unmovable_pages(), and figure out the exact branching in the code.
Anshuman Khandual Oct. 3, 2019, 12:02 p.m. UTC | #7
On 10/03/2019 05:20 PM, Qian Cai wrote:
> 
> 
>> On Oct 3, 2019, at 7:31 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>
>> Ohh, never meant that the 'Reserved' bit is anything special here but it
>> is a reason to make a page unmovable unlike many other flags.
> 
> But dump_page() is used everywhere, and it is better to reserve “reason” to indicate something more important rather than duplicating the page flags.
> 
> Especially, it is trivial enough right now for developers look in the page flags dumping from has_unmovable_pages(), and figure out the exact branching in the code.
> 

Will something like this be better ? hugepage_migration_supported() has got
uncertainty depending on platform and huge page size.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15c2050c629b..8dbc86696515 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8175,7 +8175,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
        unsigned long found;
        unsigned long iter = 0;
        unsigned long pfn = page_to_pfn(page);
-       const char *reason = "unmovable page";
+       const char *reason;

        /*
         * TODO we could make this much more efficient by not checking every
@@ -8194,7 +8194,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
                if (is_migrate_cma(migratetype))
                        return false;

-               reason = "CMA page";
+               reason = "Unmovable CMA page";
                goto unmovable;
        }

@@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,

                page = pfn_to_page(check);

-               if (PageReserved(page))
+               if (PageReserved(page)) {
+                       reason = "Unmovable reserved page";
                        goto unmovable;
+               }

                /*
                 * If the zone is movable and we have ruled out all reserved
@@ -8226,8 +8228,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
                        struct page *head = compound_head(page);
                        unsigned int skip_pages;

-                       if (!hugepage_migration_supported(page_hstate(head)))
+                       if (!hugepage_migration_supported(page_hstate(head))) {
+                               reason = "Unmovable HugeTLB page";
                                goto unmovable;
+                       }

                        skip_pages = compound_nr(head) - (page - head);
                        iter += skip_pages - 1;
@@ -8271,8 +8275,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
                 * is set to both of a memory hole page and a _used_ kernel
                 * page at boot.
                 */
-               if (found > count)
+               if (found > count) {
+                       reason = "Unmovable non-LRU page";
                        goto unmovable;
+               }
        }
        return false;
 unmovable:
Qian Cai Oct. 3, 2019, 12:14 p.m. UTC | #8
> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> 
> Will something like this be better ?

Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path.

> hugepage_migration_supported() has got
> uncertainty depending on platform and huge page size.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 15c2050c629b..8dbc86696515 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8175,7 +8175,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>        unsigned long found;
>        unsigned long iter = 0;
>        unsigned long pfn = page_to_pfn(page);
> -       const char *reason = "unmovable page";
> +       const char *reason;
> 
>        /*
>         * TODO we could make this much more efficient by not checking every
> @@ -8194,7 +8194,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>                if (is_migrate_cma(migratetype))
>                        return false;
> 
> -               reason = "CMA page";
> +               reason = "Unmovable CMA page";
>                goto unmovable;
>        }
> 
> @@ -8206,8 +8206,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> 
>                page = pfn_to_page(check);
> 
> -               if (PageReserved(page))
> +               if (PageReserved(page)) {
> +                       reason = "Unmovable reserved page";
>                        goto unmovable;
> +               }
> 
>                /*
>                 * If the zone is movable and we have ruled out all reserved
> @@ -8226,8 +8228,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>                        struct page *head = compound_head(page);
>                        unsigned int skip_pages;
> 
> -                       if (!hugepage_migration_supported(page_hstate(head)))
> +                       if (!hugepage_migration_supported(page_hstate(head))) {
> +                               reason = "Unmovable HugeTLB page";
>                                goto unmovable;
> +                       }
> 
>                        skip_pages = compound_nr(head) - (page - head);
>                        iter += skip_pages - 1;
> @@ -8271,8 +8275,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>                 * is set to both of a memory hole page and a _used_ kernel
>                 * page at boot.
>                 */
> -               if (found > count)
> +               if (found > count) {
> +                       reason = "Unmovable non-LRU page";
>                        goto unmovable;
> +               }
>        }
>        return false;
> unmovable:
David Hildenbrand Oct. 4, 2019, 8:25 a.m. UTC | #9
On 03.10.19 14:14, Qian Cai wrote:
> 
> 
>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>
>> Will something like this be better ?
> 
> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path.
> 

I agree, I use the dump_page() output frequently to identify PG_reserved
pages. No need to duplicate that.
Michal Hocko Oct. 4, 2019, 10:58 a.m. UTC | #10
On Thu 03-10-19 13:40:57, Anshuman Khandual wrote:
> Having unmovable pages on a given pageblock should be reported correctly
> when required with REPORT_FAILURE flag. But there can be a scenario where a
> reserved page in the page block will get reported as a generic "unmovable"
> reason code. Instead this should be changed to a more appropriate reason
> code like "Reserved page".

Others have already pointed out this is just redundant but I will have a
more generic comment on the changelog. There is essentially no
information why the current state is bad/unhelpful and why the chnage is
needed. All you claim is that something is a certain way and then assert
that it should be done differently. That is not how changelogs should
look like.
Anshuman Khandual Oct. 4, 2019, 11:44 a.m. UTC | #11
On 10/04/2019 04:28 PM, Michal Hocko wrote:
> On Thu 03-10-19 13:40:57, Anshuman Khandual wrote:
>> Having unmovable pages on a given pageblock should be reported correctly
>> when required with REPORT_FAILURE flag. But there can be a scenario where a
>> reserved page in the page block will get reported as a generic "unmovable"
>> reason code. Instead this should be changed to a more appropriate reason
>> code like "Reserved page".
> 
> Others have already pointed out this is just redundant but I will have a

Sure.

> more generic comment on the changelog. There is essentially no
> information why the current state is bad/unhelpful and why the chnage is

The current state is not necessarily bad or unhelpful. I just though that it
could be improved upon. Some how calling out explicitly only the CMA page
failure case just felt adhoc, where as there are other reasons like HugeTLB
immovability which might depend on other factors apart from just page flags
(though I did not propose that originally).

> needed. All you claim is that something is a certain way and then assert
> that it should be done differently. That is not how changelogs should
> look like.
> 

Okay, probably I should have explained more on why "unmovable" is less than
adequate to capture the exact reason for specific failure cases and how
"Reserved Page" instead would been better. But got the point, will improve.
Qian Cai Oct. 4, 2019, 12:56 p.m. UTC | #12
On Fri, 2019-10-04 at 17:14 +0530, Anshuman Khandual wrote:
> 
> On 10/04/2019 04:28 PM, Michal Hocko wrote:
> > On Thu 03-10-19 13:40:57, Anshuman Khandual wrote:
> > > Having unmovable pages on a given pageblock should be reported correctly
> > > when required with REPORT_FAILURE flag. But there can be a scenario where a
> > > reserved page in the page block will get reported as a generic "unmovable"
> > > reason code. Instead this should be changed to a more appropriate reason
> > > code like "Reserved page".
> > 
> > Others have already pointed out this is just redundant but I will have a
> 
> Sure.
> 
> > more generic comment on the changelog. There is essentially no
> > information why the current state is bad/unhelpful and why the chnage is
> 
> The current state is not necessarily bad or unhelpful. I just though that it
> could be improved upon. Some how calling out explicitly only the CMA page
> failure case just felt adhoc, where as there are other reasons like HugeTLB
> immovability which might depend on other factors apart from just page flags
> (though I did not propose that originally).
> 
> > needed. All you claim is that something is a certain way and then assert
> > that it should be done differently. That is not how changelogs should
> > look like.
> > 
> 
> Okay, probably I should have explained more on why "unmovable" is less than
> adequate to capture the exact reason for specific failure cases and how
> "Reserved Page" instead would been better. But got the point, will improve.
> 

It might be a good time to rethink if it is really a good idea to dump_page()
at all inside has_unmovable_pages(). As it is right now, it is a a potential
deadlock between console vs memory offline. More details are in this thread,

https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/

01: [  672.875392] WARNING: possible circular locking dependency detected       
01: [  672.875394] 5.4.0-rc1-next-20191004+ #64 Not tainted                     
01: [  672.875396] ------------------------------------------------------       
01: [  672.875398] test.sh/1724 is trying to acquire lock:                      
01: [  672.875400] 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
01: 328/0xa30                                                                   
01: [  672.875406]                                                              
01: [  672.875408] but task is already holding lock:                            
01: [  672.875409] 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
01: late_page_range+0x216/0x538                                                 
01: [  672.875415]                                                              
01: [  672.875417] which lock already depends on the new lock.                  
01: [  672.875418]                                                              
01: [  672.875419]                                                              
01: [  672.875421] the existing dependency chain (in reverse order) is:         
01: [  672.875423]                                                              
01: [  672.875424] -> #2 (&(&zone->lock)->rlock){-.-.}:                         
01: [  672.875430]        lock_acquire+0x21a/0x468                              
01: [  672.875431]        _raw_spin_lock+0x54/0x68                              
01: [  672.875433]        get_page_from_freelist+0x8b6/0x2d28                   
01: [  672.875435]        __alloc_pages_nodemask+0x246/0x658                    
01: [  672.875437]        __get_free_pages+0x34/0x78                            
01: [  672.875438]        sclp_init+0x106/0x690                                 
01: [  672.875440]        sclp_register+0x2e/0x248                              
01: [  672.875442]        sclp_rw_init+0x4a/0x70                                
01: [  672.875443]        sclp_console_init+0x4a/0x1b8                          
01: [  672.875445]        console_init+0x2c8/0x410                              
01: [  672.875447]        start_kernel+0x530/0x6a0                              
01: [  672.875448]        startup_continue+0x70/0xd0                            
01: [  672.875449]                                                              
01: [  672.875450] -> #1 (sclp_lock){-.-.}:                                     
01: [  672.875458]        lock_acquire+0x21a/0x468                              
01: [  672.875460]        _raw_spin_lock_irqsave+0xcc/0xe8                      
01: [  672.875462]        sclp_add_request+0x34/0x308                           
01: [  672.875464]        sclp_conbuf_emit+0x100/0x138                          
01: [  672.875465]        sclp_console_write+0x96/0x3b8                         
01: [  672.875467]        console_unlock+0x6dc/0xa30                            
01: [  672.875469]        vprintk_emit+0x184/0x3c8                              
01: [  672.875470]        vprintk_default+0x44/0x50                             
01: [  672.875472]        printk+0xa8/0xc0                                      
01: [  672.875473]        iommu_debugfs_setup+0xf2/0x108                        
01: [  672.875475]        iommu_init+0x6c/0x78                                  
01: [  672.875477]        do_one_initcall+0x162/0x680                           
01: [  672.875478]        kernel_init_freeable+0x4e8/0x5a8                      
01: [  672.875480]        kernel_init+0x2a/0x188                                
01: [  672.875484]        ret_from_fork+0x30/0x34                               
01: [  672.875486]        kernel_thread_starter+0x0/0xc                         
01: [  672.875487]                                                              
01: [  672.875488] -> #0 (console_owner){-...}:                                 
01: [  672.875495]        check_noncircular+0x338/0x3e0                         
01: [  672.875496]        __lock_acquire+0x1e66/0x2d88                          
01: [  672.875498]        lock_acquire+0x21a/0x468                              
01: [  672.875499]        console_unlock+0x3a6/0xa30                            
01: [  672.875501]        vprintk_emit+0x184/0x3c8                              
01: [  672.875503]        vprintk_default+0x44/0x50                             
01: [  672.875504]        printk+0xa8/0xc0                                      
01: [  672.875506]        __dump_page+0x1dc/0x710                               
01: [  672.875507]        dump_page+0x2e/0x58                                   
01: [  672.875509]        has_unmovable_pages+0x2e8/0x470                       
01: [  672.875511]        start_isolate_page_range+0x404/0x538                  
01: [  672.875513]        __offline_pages+0x22c/0x1338                          
01: [  672.875514]        memory_subsys_offline+0xa6/0xe8                       
01: [  672.875516]        device_offline+0xe6/0x118                             
01: [  672.875517]        state_store+0xf0/0x110                                
01: [  672.875519]        kernfs_fop_write+0x1bc/0x270                          
01: [  672.875521]        vfs_write+0xce/0x220                                  
01: [  672.875522]        ksys_write+0xea/0x190                                 
01: [  672.875524]        system_call+0xd8/0x2b4                                
01: [  672.875525]                                                              
01: [  672.875527] other info that might help us debug this:                    
01: [  672.875528]                                                              
01: [  672.875529] Chain exists of:                                             
01: [  672.875530]   console_owner --> sclp_lock --> &(&zone->lock)->rlock      
01: [  672.875538]                                                              
01: [  672.875540]  Possible unsafe locking scenario:                           
01: [  672.875541]                                                              
01: [  672.875543]        CPU0                    CPU1                          
01: [  672.875544]        ----                    ----                          
01: [  672.875545]   lock(&(&zone->lock)->rlock);                               
01: [  672.875549]                                lock(sclp_lock);              
01: [  672.875553]                                lock(&(&zone->lock)->rlock);  
01: [  672.875557]   lock(console_owner);                                       
01: [  672.875560]                                                              
01: [  672.875562]  *** DEADLOCK ***                                            
01: [  672.875563]                                                              
01: [  672.875564] 9 locks held by test.sh/1724:                                
01: [  672.875565]  #0: 000000000e925408 (sb_writers#4){.+.+}, at: vfs_write+0x2
01: 06/0x220                                                                    
01: [  672.875574]  #1: 0000000050aa4280 (&of->mutex){+.+.}, at: kernfs_fop_writ
01: e+0x154/0x270                                                               
01: [  672.875581]  #2: 0000000062e5c628 (kn->count#198){.+.+}, at: kernfs_fop_w
01: rite+0x16a/0x270                                                            
01: [  672.875590]  #3: 00000000523236a0 (device_hotplug_lock){+.+.}, at: lock_d
01: evice_hotplug_sysfs+0x30/0x80                                               
01: [  672.875598]  #4: 0000000062e70990 (&dev->mutex){....}, at: device_offline
01: +0x78/0x118                                                                 
01: [  672.875605]  #5: 0000000051fd36b0 (cpu_hotplug_lock.rw_sem){++++}, at: __
01: offline_pages+0xec/0x1338                                                   
01: [  672.875613]  #6: 00000000521ca470 (mem_hotplug_lock.rw_sem){++++}, at: pe
01: rcpu_down_write+0x38/0x210                                                  
01: [  672.875620]  #7: 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: star
01: t_isolate_page_range+0x216/0x538                                            
01: [  672.875628]  #8: 000000005205a100 (console_lock){+.+.}, at: vprintk_emit+
01: 0x178/0x3c8                                                                 
01: [  672.875635]                                                              
01: [  672.875636] stack backtrace:                                             
01: [  672.875639] CPU: 1 PID: 1724 Comm: test.sh Not tainted 5.4.0-rc1-next-201
01: 91004+ #64                                                                  
01: [  672.875640] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)                 
01: [  672.875642] Call Trace:                                                  
01: [  672.875644] ([<00000000512ae218>] show_stack+0x110/0x1b0)                
01: [  672.875645]  [<0000000051b6d506>] dump_stack+0x126/0x178                 
01: [  672.875648]  [<00000000513a4b08>] check_noncircular+0x338/0x3e0          
01: [  672.875650]  [<00000000513aaaf6>] __lock_acquire+0x1e66/0x2d88           
01: [  672.875652]  [<00000000513a7e12>] lock_acquire+0x21a/0x468               
01: [  672.875654]  [<00000000513bb2fe>] console_unlock+0x3a6/0xa30             
01: [  672.875655]  [<00000000513bde2c>] vprintk_emit+0x184/0x3c8               
01: [  672.875657]  [<00000000513be0b4>] vprintk_default+0x44/0x50              
01: [  672.875659]  [<00000000513beb60>] printk+0xa8/0xc0                       
01: [  672.875661]  [<000000005158c364>] __dump_page+0x1dc/0x710                
01: [  672.875663]  [<000000005158c8c6>] dump_page+0x2e/0x58                    
01: [  672.875665]  [<00000000515d87c8>] has_unmovable_pages+0x2e8/0x470        
01: [  672.875667]  [<000000005167072c>] start_isolate_page_range+0x404/0x538   
01: [  672.875669]  [<0000000051b96de4>] __offline_pages+0x22c/0x1338           
01: [  672.875671]  [<0000000051908586>] memory_subsys_offline+0xa6/0xe8        
01: [  672.875673]  [<00000000518e561e>] device_offline+0xe6/0x118              
01: [  672.875675]  [<0000000051908170>] state_store+0xf0/0x110                 
01: [  672.875677]  [<0000000051796384>] kernfs_fop_write+0x1bc/0x270           
01: [  672.875679]  [<000000005168972e>] vfs_write+0xce/0x220                   
01: [  672.875681]  [<0000000051689b9a>] ksys_write+0xea/0x190                  
01: [  672.875685]  [<0000000051ba9990>] system_call+0xd8/0x2b4                 
01: [  672.875687] INFO: lockdep is turned off.
Michal Hocko Oct. 4, 2019, 1:07 p.m. UTC | #13
On Fri 04-10-19 08:56:16, Qian Cai wrote:
[...]
> It might be a good time to rethink if it is really a good idea to dump_page()
> at all inside has_unmovable_pages(). As it is right now, it is a a potential
> deadlock between console vs memory offline. More details are in this thread,
> 
> https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/

Huh. That would imply we cannot do any printk from that path, no?
Qian Cai Oct. 4, 2019, 1:30 p.m. UTC | #14
On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote:
> On Fri 04-10-19 08:56:16, Qian Cai wrote:
> [...]
> > It might be a good time to rethink if it is really a good idea to dump_page()
> > at all inside has_unmovable_pages(). As it is right now, it is a a potential
> > deadlock between console vs memory offline. More details are in this thread,
> > 
> > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/
> 
> Huh. That would imply we cannot do any printk from that path, no?

Yes, or use something like printk_deferred() or it needs to rework of the
current console locking which I have no clue yet.
Michal Hocko Oct. 4, 2019, 1:38 p.m. UTC | #15
On Fri 04-10-19 09:30:39, Qian Cai wrote:
> On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote:
> > On Fri 04-10-19 08:56:16, Qian Cai wrote:
> > [...]
> > > It might be a good time to rethink if it is really a good idea to dump_page()
> > > at all inside has_unmovable_pages(). As it is right now, it is a a potential
> > > deadlock between console vs memory offline. More details are in this thread,
> > > 
> > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/
> > 
> > Huh. That would imply we cannot do any printk from that path, no?
> 
> Yes, or use something like printk_deferred()

This is just insane. The hotplug code is in no way special wrt printk.
It is never called from the printk code AFAIK and thus there is no real
reason why this particular code should be any special. Not to mention
it calls printk indirectly from a code that is shared with other code
paths.

> or it needs to rework of the current console locking which I have no
> clue yet.

Yes, if the lockdep is really referring to a real deadlock which I
haven't really explored.
Qian Cai Oct. 4, 2019, 1:56 p.m. UTC | #16
On Fri, 2019-10-04 at 15:38 +0200, Michal Hocko wrote:
> On Fri 04-10-19 09:30:39, Qian Cai wrote:
> > On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote:
> > > On Fri 04-10-19 08:56:16, Qian Cai wrote:
> > > [...]
> > > > It might be a good time to rethink if it is really a good idea to dump_page()
> > > > at all inside has_unmovable_pages(). As it is right now, it is a a potential
> > > > deadlock between console vs memory offline. More details are in this thread,
> > > > 
> > > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/
> > > 
> > > Huh. That would imply we cannot do any printk from that path, no?
> > 
> > Yes, or use something like printk_deferred()
> 
> This is just insane. The hotplug code is in no way special wrt printk.
> It is never called from the printk code AFAIK and thus there is no real
> reason why this particular code should be any special. Not to mention
> it calls printk indirectly from a code that is shared with other code
> paths.

Basically, printk() while holding the zone_lock will be problematic as console
is doing the opposite as it always needs to allocate some memory. Then, it will
always find some way to form this chain,

console_lock -> * -> zone_lock.

> 
> > or it needs to rework of the current console locking which I have no
> > clue yet.
> 
> Yes, if the lockdep is really referring to a real deadlock which I
> haven't really explored.
>
Michal Hocko Oct. 4, 2019, 2:41 p.m. UTC | #17
On Fri 04-10-19 09:56:00, Qian Cai wrote:
> On Fri, 2019-10-04 at 15:38 +0200, Michal Hocko wrote:
> > On Fri 04-10-19 09:30:39, Qian Cai wrote:
> > > On Fri, 2019-10-04 at 15:07 +0200, Michal Hocko wrote:
> > > > On Fri 04-10-19 08:56:16, Qian Cai wrote:
> > > > [...]
> > > > > It might be a good time to rethink if it is really a good idea to dump_page()
> > > > > at all inside has_unmovable_pages(). As it is right now, it is a a potential
> > > > > deadlock between console vs memory offline. More details are in this thread,
> > > > > 
> > > > > https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/
> > > > 
> > > > Huh. That would imply we cannot do any printk from that path, no?
> > > 
> > > Yes, or use something like printk_deferred()
> > 
> > This is just insane. The hotplug code is in no way special wrt printk.
> > It is never called from the printk code AFAIK and thus there is no real
> > reason why this particular code should be any special. Not to mention
> > it calls printk indirectly from a code that is shared with other code
> > paths.
> 
> Basically, printk() while holding the zone_lock will be problematic as console
> is doing the opposite as it always needs to allocate some memory. Then, it will
> always find some way to form this chain,
> 
> console_lock -> * -> zone_lock.

So this is not as much a hotplug specific problem but zone->lock ->
printk -> alloc chain that is a problem, right? Who is doing an
allocation from this atomic context? I do not see any atomic allocation
in kernel/printk/printk.c.
Andrew Morton Oct. 5, 2019, 9:22 p.m. UTC | #18
On Fri, 4 Oct 2019 16:41:50 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> > > This is just insane. The hotplug code is in no way special wrt printk.
> > > It is never called from the printk code AFAIK and thus there is no real
> > > reason why this particular code should be any special. Not to mention
> > > it calls printk indirectly from a code that is shared with other code
> > > paths.
> > 
> > Basically, printk() while holding the zone_lock will be problematic as console
> > is doing the opposite as it always needs to allocate some memory. Then, it will
> > always find some way to form this chain,
> > 
> > console_lock -> * -> zone_lock.
> 
> So this is not as much a hotplug specific problem but zone->lock ->
> printk -> alloc chain that is a problem, right? Who is doing an
> allocation from this atomic context? I do not see any atomic allocation
> in kernel/printk/printk.c.

Apparently some console drivers can do memory allocation on the printk()
path.

This behavior is daft, IMO.  Have we identified which ones and looked
into fixing them?
Qian Cai Oct. 5, 2019, 10:38 p.m. UTC | #19
> On Oct 5, 2019, at 5:22 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> Apparently some console drivers can do memory allocation on the printk()
> path.
> 
> This behavior is daft, IMO.  Have we identified which ones and looked
> into fixing them?

Not necessary that simple. It is more of 2+ CPUs required to trigger the deadlock. Please see my v2 patch I sent which has all the information. Especially, the thread link included there which contains a few lockdep splat traces and the s390 one in the patch description.
Anshuman Khandual Jan. 14, 2020, 8:19 a.m. UTC | #20
On 10/04/2019 01:55 PM, David Hildenbrand wrote:
> On 03.10.19 14:14, Qian Cai wrote:
>>
>>
>>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>>
>>> Will something like this be better ?
>>
>> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path.
>>
> 
> I agree, I use the dump_page() output frequently to identify PG_reserved
> pages. No need to duplicate that.

Here in this path there is a reserved page which is preventing
offlining a memory section but unfortunately dump_page() does
not print page->flags for a reserved page pinned there possibly
through memblock_reserve() during boot.

__offline_pages()
	start_isolate_page_range()
		set_migratetype_isolate()
			has_unmovable_pages()
				dump_page() 

[   64.920970] ------------[ cut here ]------------
[   64.921718] WARNING: CPU: 16 PID: 1116 at mm/page_alloc.c:8298 has_unmovable_pages+0x274/0x2a8
[   64.923110] Modules linked in:
[   64.923634] CPU: 16 PID: 1116 Comm: bash Not tainted 5.5.0-rc6-00006-gca544f2a11ae-dirty #281
[   64.925102] Hardware name: linux,dummy-virt (DT)
[   64.925905] pstate: 60400085 (nZCv daIf +PAN -UAO)
[   64.926742] pc : has_unmovable_pages+0x274/0x2a8
[   64.927554] lr : has_unmovable_pages+0x298/0x2a8
[   64.928359] sp : ffff800014fd3a00
[   64.928944] x29: ffff800014fd3a00 x28: fffffe0017640000 
[   64.929875] x27: 0000000000000000 x26: ffff0005fcfcda00 
[   64.930810] x25: 0000000000640000 x24: 0000000000000003 
[   64.931736] x23: 0000000019840000 x22: 0000000000001380 
[   64.932667] x21: ffff800011259000 x20: ffff0005fcfcda00 
[   64.933588] x19: 0000000000661000 x18: 0000000000000010 
[   64.934514] x17: 0000000000000000 x16: 0000000000000000 
[   64.935454] x15: ffffffffffffffff x14: ffff8000118498c8 
[   64.936377] x13: ffff800094fd3797 x12: ffff800014fd379f 
[   64.937304] x11: ffff800011861000 x10: ffff800014fd3720 
[   64.938226] x9 : 00000000ffffffd0 x8 : ffff8000106a60d0 
[   64.939156] x7 : 0000000000000000 x6 : ffff0005fc6261b0 
[   64.940078] x5 : ffff0005fc6261b0 x4 : 0000000000000000 
[   64.941003] x3 : ffff0005fc62cf80 x2 : ffffffffffffec80 
[   64.941927] x1 : ffff800011141b58 x0 : ffff0005fcfcda00 
[   64.942857] Call trace:
[   64.943298]  has_unmovable_pages+0x274/0x2a8
[   64.944056]  start_isolate_page_range+0x258/0x360
[   64.944879]  __offline_pages+0xf4/0x9e8
[   64.945554]  offline_pages+0x10/0x18
[   64.946189]  memory_block_action+0x40/0x1a0
[   64.946929]  memory_subsys_offline+0x4c/0x78
[   64.947679]  device_offline+0x98/0xc8
[   64.948328]  unprobe_store+0xa8/0x158
[   64.948976]  dev_attr_store+0x14/0x28
[   64.949628]  sysfs_kf_write+0x40/0x50
[   64.950273]  kernfs_fop_write+0x108/0x218
[   64.950983]  __vfs_write+0x18/0x40
[   64.951592]  vfs_write+0xb0/0x1d0
[   64.952175]  ksys_write+0x64/0xe8
[   64.952761]  __arm64_sys_write+0x18/0x20
[   64.953451]  el0_svc_common.constprop.2+0x88/0x150
[   64.954293]  el0_svc_handler+0x20/0x80
[   64.954963]  el0_sync_handler+0x118/0x188
[   64.955669]  el0_sync+0x140/0x180
[   64.956256] ---[ end trace b162b4d1cbea304d ]---
[   64.957063] page:fffffe0017640000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
[   64.958489] raw: 1ffff80000001000 fffffe0017640008 fffffe0017640008 0000000000000000
[   64.959839] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   64.961174] page dumped because: unmovable page

The reason is dump_page() does not print page->flags universally
and only does so for KSM, Anon and File pages while excluding
reserved pages at boot. Wondering should not we make printing
page->flags universal ?

- Anshuman
David Hildenbrand Jan. 14, 2020, 8:30 a.m. UTC | #21
On 14.01.20 09:19, Anshuman Khandual wrote:
> 
> 
> On 10/04/2019 01:55 PM, David Hildenbrand wrote:
>> On 03.10.19 14:14, Qian Cai wrote:
>>>
>>>
>>>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
>>>>
>>>> Will something like this be better ?
>>>
>>> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path.
>>>
>>
>> I agree, I use the dump_page() output frequently to identify PG_reserved
>> pages. No need to duplicate that.
> 
> Here in this path there is a reserved page which is preventing
> offlining a memory section but unfortunately dump_page() does
> not print page->flags for a reserved page pinned there possibly
> through memblock_reserve() during boot.
> 
> __offline_pages()
> 	start_isolate_page_range()
> 		set_migratetype_isolate()
> 			has_unmovable_pages()
> 				dump_page() 
> 
> [   64.920970] ------------[ cut here ]------------
> [   64.921718] WARNING: CPU: 16 PID: 1116 at mm/page_alloc.c:8298 has_unmovable_pages+0x274/0x2a8
> [   64.923110] Modules linked in:
> [   64.923634] CPU: 16 PID: 1116 Comm: bash Not tainted 5.5.0-rc6-00006-gca544f2a11ae-dirty #281
> [   64.925102] Hardware name: linux,dummy-virt (DT)
> [   64.925905] pstate: 60400085 (nZCv daIf +PAN -UAO)
> [   64.926742] pc : has_unmovable_pages+0x274/0x2a8
> [   64.927554] lr : has_unmovable_pages+0x298/0x2a8
> [   64.928359] sp : ffff800014fd3a00
> [   64.928944] x29: ffff800014fd3a00 x28: fffffe0017640000 
> [   64.929875] x27: 0000000000000000 x26: ffff0005fcfcda00 
> [   64.930810] x25: 0000000000640000 x24: 0000000000000003 
> [   64.931736] x23: 0000000019840000 x22: 0000000000001380 
> [   64.932667] x21: ffff800011259000 x20: ffff0005fcfcda00 
> [   64.933588] x19: 0000000000661000 x18: 0000000000000010 
> [   64.934514] x17: 0000000000000000 x16: 0000000000000000 
> [   64.935454] x15: ffffffffffffffff x14: ffff8000118498c8 
> [   64.936377] x13: ffff800094fd3797 x12: ffff800014fd379f 
> [   64.937304] x11: ffff800011861000 x10: ffff800014fd3720 
> [   64.938226] x9 : 00000000ffffffd0 x8 : ffff8000106a60d0 
> [   64.939156] x7 : 0000000000000000 x6 : ffff0005fc6261b0 
> [   64.940078] x5 : ffff0005fc6261b0 x4 : 0000000000000000 
> [   64.941003] x3 : ffff0005fc62cf80 x2 : ffffffffffffec80 
> [   64.941927] x1 : ffff800011141b58 x0 : ffff0005fcfcda00 
> [   64.942857] Call trace:
> [   64.943298]  has_unmovable_pages+0x274/0x2a8
> [   64.944056]  start_isolate_page_range+0x258/0x360
> [   64.944879]  __offline_pages+0xf4/0x9e8
> [   64.945554]  offline_pages+0x10/0x18
> [   64.946189]  memory_block_action+0x40/0x1a0
> [   64.946929]  memory_subsys_offline+0x4c/0x78
> [   64.947679]  device_offline+0x98/0xc8
> [   64.948328]  unprobe_store+0xa8/0x158
> [   64.948976]  dev_attr_store+0x14/0x28
> [   64.949628]  sysfs_kf_write+0x40/0x50
> [   64.950273]  kernfs_fop_write+0x108/0x218
> [   64.950983]  __vfs_write+0x18/0x40
> [   64.951592]  vfs_write+0xb0/0x1d0
> [   64.952175]  ksys_write+0x64/0xe8
> [   64.952761]  __arm64_sys_write+0x18/0x20
> [   64.953451]  el0_svc_common.constprop.2+0x88/0x150
> [   64.954293]  el0_svc_handler+0x20/0x80
> [   64.954963]  el0_sync_handler+0x118/0x188
> [   64.955669]  el0_sync+0x140/0x180
> [   64.956256] ---[ end trace b162b4d1cbea304d ]---
> [   64.957063] page:fffffe0017640000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
> [   64.958489] raw: 1ffff80000001000 fffffe0017640008 fffffe0017640008 0000000000000000
> [   64.959839] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [   64.961174] page dumped because: unmovable page
> 
> The reason is dump_page() does not print page->flags universally
> and only does so for KSM, Anon and File pages while excluding
> reserved pages at boot. Wondering should not we make printing
> page->flags universal ?

The thing is that "PageReserved" on a random page tells us that the
values in the memmap cannot be trusted (in some scenarios).

However, we also expose flags for reserved pages via stable_page_flags()
- /proc/kpageflags. As this is just a debugging mechanism, I think it
makes sense to also print them.
Michal Hocko Jan. 14, 2020, 9:10 a.m. UTC | #22
[Cc Ralph]

On Tue 14-01-20 13:49:14, Anshuman Khandual wrote:
> 
> 
> On 10/04/2019 01:55 PM, David Hildenbrand wrote:
> > On 03.10.19 14:14, Qian Cai wrote:
> >>
> >>
> >>> On Oct 3, 2019, at 8:01 AM, Anshuman Khandual <Anshuman.Khandual@arm.com> wrote:
> >>>
> >>> Will something like this be better ?
> >>
> >> Not really. dump_page() will dump PageCompound information anyway, so it is trivial to figure out if went in that path.
> >>
> > 
> > I agree, I use the dump_page() output frequently to identify PG_reserved
> > pages. No need to duplicate that.
> 
> Here in this path there is a reserved page which is preventing
> offlining a memory section but unfortunately dump_page() does
> not print page->flags for a reserved page pinned there possibly
> through memblock_reserve() during boot.
> 
> __offline_pages()
> 	start_isolate_page_range()
> 		set_migratetype_isolate()
> 			has_unmovable_pages()
> 				dump_page() 
> 
> [   64.920970] ------------[ cut here ]------------
> [   64.921718] WARNING: CPU: 16 PID: 1116 at mm/page_alloc.c:8298 has_unmovable_pages+0x274/0x2a8
> [   64.923110] Modules linked in:
> [   64.923634] CPU: 16 PID: 1116 Comm: bash Not tainted 5.5.0-rc6-00006-gca544f2a11ae-dirty #281
> [   64.925102] Hardware name: linux,dummy-virt (DT)
> [   64.925905] pstate: 60400085 (nZCv daIf +PAN -UAO)
> [   64.926742] pc : has_unmovable_pages+0x274/0x2a8
> [   64.927554] lr : has_unmovable_pages+0x298/0x2a8
> [   64.928359] sp : ffff800014fd3a00
> [   64.928944] x29: ffff800014fd3a00 x28: fffffe0017640000 
> [   64.929875] x27: 0000000000000000 x26: ffff0005fcfcda00 
> [   64.930810] x25: 0000000000640000 x24: 0000000000000003 
> [   64.931736] x23: 0000000019840000 x22: 0000000000001380 
> [   64.932667] x21: ffff800011259000 x20: ffff0005fcfcda00 
> [   64.933588] x19: 0000000000661000 x18: 0000000000000010 
> [   64.934514] x17: 0000000000000000 x16: 0000000000000000 
> [   64.935454] x15: ffffffffffffffff x14: ffff8000118498c8 
> [   64.936377] x13: ffff800094fd3797 x12: ffff800014fd379f 
> [   64.937304] x11: ffff800011861000 x10: ffff800014fd3720 
> [   64.938226] x9 : 00000000ffffffd0 x8 : ffff8000106a60d0 
> [   64.939156] x7 : 0000000000000000 x6 : ffff0005fc6261b0 
> [   64.940078] x5 : ffff0005fc6261b0 x4 : 0000000000000000 
> [   64.941003] x3 : ffff0005fc62cf80 x2 : ffffffffffffec80 
> [   64.941927] x1 : ffff800011141b58 x0 : ffff0005fcfcda00 
> [   64.942857] Call trace:
> [   64.943298]  has_unmovable_pages+0x274/0x2a8
> [   64.944056]  start_isolate_page_range+0x258/0x360
> [   64.944879]  __offline_pages+0xf4/0x9e8
> [   64.945554]  offline_pages+0x10/0x18
> [   64.946189]  memory_block_action+0x40/0x1a0
> [   64.946929]  memory_subsys_offline+0x4c/0x78
> [   64.947679]  device_offline+0x98/0xc8
> [   64.948328]  unprobe_store+0xa8/0x158
> [   64.948976]  dev_attr_store+0x14/0x28
> [   64.949628]  sysfs_kf_write+0x40/0x50
> [   64.950273]  kernfs_fop_write+0x108/0x218
> [   64.950983]  __vfs_write+0x18/0x40
> [   64.951592]  vfs_write+0xb0/0x1d0
> [   64.952175]  ksys_write+0x64/0xe8
> [   64.952761]  __arm64_sys_write+0x18/0x20
> [   64.953451]  el0_svc_common.constprop.2+0x88/0x150
> [   64.954293]  el0_svc_handler+0x20/0x80
> [   64.954963]  el0_sync_handler+0x118/0x188
> [   64.955669]  el0_sync+0x140/0x180
> [   64.956256] ---[ end trace b162b4d1cbea304d ]---
> [   64.957063] page:fffffe0017640000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
> [   64.958489] raw: 1ffff80000001000 fffffe0017640008 fffffe0017640008 0000000000000000
> [   64.959839] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [   64.961174] page dumped because: unmovable page
> 
> The reason is dump_page() does not print page->flags universally
> and only does so for KSM, Anon and File pages while excluding
> reserved pages at boot. Wondering should not we make printing
> page->flags universal ?

We used to do that and this caught me as a surprise when looking again.
This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an
extra line") which is a cleanup patch and I suspect this result was not
anticipated.

The following will do the trick but I cannot really say I like the code
duplication. pr_cont in this case sounds like a much cleaner solution to
me.

diff --git a/mm/debug.c b/mm/debug.c
index 0461df1207cb..4cbf30d03c88 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -89,6 +89,8 @@ void __dump_page(struct page *page, const char *reason)
 		} else
 			pr_warn("%ps\n", mapping->a_ops);
 		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
+	} else {
+		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
 	}
 	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
Vlastimil Babka Jan. 14, 2020, 10:23 a.m. UTC | #23
On 1/14/20 10:10 AM, Michal Hocko wrote:
> [Cc Ralph]
>> The reason is dump_page() does not print page->flags universally
>> and only does so for KSM, Anon and File pages while excluding
>> reserved pages at boot. Wondering should not we make printing
>> page->flags universal ?
> 
> We used to do that and this caught me as a surprise when looking again.
> This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an
> extra line") which is a cleanup patch and I suspect this result was not
> anticipated.
> 
> The following will do the trick but I cannot really say I like the code
> duplication. pr_cont in this case sounds like a much cleaner solution to
> me.

How about this then?

diff --git mm/debug.c mm/debug.c
index 0461df1207cb..6a52316af839 100644
--- mm/debug.c
+++ mm/debug.c
@@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason)
 	struct address_space *mapping;
 	bool page_poisoned = PagePoisoned(page);
 	int mapcount;
+	char *type = "";
 
 	/*
 	 * If struct page is poisoned don't access Page*() functions as that
@@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason)
 			page, page_ref_count(page), mapcount,
 			page->mapping, page_to_pgoff(page));
 	if (PageKsm(page))
-		pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags);
+		type = "ksm ";
 	else if (PageAnon(page))
-		pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags);
+		type = "anon ";
 	else if (mapping) {
 		if (mapping->host && mapping->host->i_dentry.first) {
 			struct dentry *dentry;
@@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason)
 			pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry);
 		} else
 			pr_warn("%ps\n", mapping->a_ops);
-		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
 	}
 	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
 
+	pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
+
 hex_only:
 	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
 			sizeof(unsigned long), page,
Anshuman Khandual Jan. 14, 2020, 11:03 a.m. UTC | #24
On 01/14/2020 03:53 PM, Vlastimil Babka wrote:
> On 1/14/20 10:10 AM, Michal Hocko wrote:
>> [Cc Ralph]
>>> The reason is dump_page() does not print page->flags universally
>>> and only does so for KSM, Anon and File pages while excluding
>>> reserved pages at boot. Wondering should not we make printing
>>> page->flags universal ?
>>
>> We used to do that and this caught me as a surprise when looking again.
>> This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an
>> extra line") which is a cleanup patch and I suspect this result was not
>> anticipated.
>>
>> The following will do the trick but I cannot really say I like the code
>> duplication. pr_cont in this case sounds like a much cleaner solution to
>> me.
> 
> How about this then?

This looks better than what we have right now though my initial thought
was similar to what Michal had suggested earlier.

> 
> diff --git mm/debug.c mm/debug.c
> index 0461df1207cb..6a52316af839 100644
> --- mm/debug.c
> +++ mm/debug.c
> @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason)
>  	struct address_space *mapping;
>  	bool page_poisoned = PagePoisoned(page);
>  	int mapcount;
> +	char *type = "";
>  
>  	/*
>  	 * If struct page is poisoned don't access Page*() functions as that
> @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason)
>  			page, page_ref_count(page), mapcount,
>  			page->mapping, page_to_pgoff(page));
>  	if (PageKsm(page))
> -		pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags);
> +		type = "ksm ";
>  	else if (PageAnon(page))
> -		pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags);
> +		type = "anon ";
>  	else if (mapping) {
>  		if (mapping->host && mapping->host->i_dentry.first) {
>  			struct dentry *dentry;
> @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason)
>  			pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry);
>  		} else
>  			pr_warn("%ps\n", mapping->a_ops);
> -		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
>  	}
>  	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>  
> +	pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
> +
>  hex_only:
>  	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>  			sizeof(unsigned long), page,
> 
>
Michal Hocko Jan. 14, 2020, 11:32 a.m. UTC | #25
On Tue 14-01-20 11:23:52, Vlastimil Babka wrote:
> On 1/14/20 10:10 AM, Michal Hocko wrote:
> > [Cc Ralph]
> >> The reason is dump_page() does not print page->flags universally
> >> and only does so for KSM, Anon and File pages while excluding
> >> reserved pages at boot. Wondering should not we make printing
> >> page->flags universal ?
> > 
> > We used to do that and this caught me as a surprise when looking again.
> > This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an
> > extra line") which is a cleanup patch and I suspect this result was not
> > anticipated.
> > 
> > The following will do the trick but I cannot really say I like the code
> > duplication. pr_cont in this case sounds like a much cleaner solution to
> > me.
> 
> How about this then?

Yes makes sense as well.

> diff --git mm/debug.c mm/debug.c
> index 0461df1207cb..6a52316af839 100644
> --- mm/debug.c
> +++ mm/debug.c
> @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason)
>  	struct address_space *mapping;
>  	bool page_poisoned = PagePoisoned(page);
>  	int mapcount;
> +	char *type = "";
>  
>  	/*
>  	 * If struct page is poisoned don't access Page*() functions as that
> @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason)
>  			page, page_ref_count(page), mapcount,
>  			page->mapping, page_to_pgoff(page));
>  	if (PageKsm(page))
> -		pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags);
> +		type = "ksm ";
>  	else if (PageAnon(page))
> -		pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags);
> +		type = "anon ";
>  	else if (mapping) {
>  		if (mapping->host && mapping->host->i_dentry.first) {
>  			struct dentry *dentry;
> @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason)
>  			pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry);
>  		} else
>  			pr_warn("%ps\n", mapping->a_ops);
> -		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
>  	}
>  	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>  
> +	pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
> +
>  hex_only:
>  	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>  			sizeof(unsigned long), page,
>
Ralph Campbell Jan. 14, 2020, 6:22 p.m. UTC | #26
On 1/14/20 3:32 AM, Michal Hocko wrote:
> On Tue 14-01-20 11:23:52, Vlastimil Babka wrote:
>> On 1/14/20 10:10 AM, Michal Hocko wrote:
>>> [Cc Ralph]
>>>> The reason is dump_page() does not print page->flags universally
>>>> and only does so for KSM, Anon and File pages while excluding
>>>> reserved pages at boot. Wondering should not we make printing
>>>> page->flags universal ?
>>>
>>> We used to do that and this caught me as a surprise when looking again.
>>> This is a result of 76a1850e4572 ("mm/debug.c: __dump_page() prints an
>>> extra line") which is a cleanup patch and I suspect this result was not
>>> anticipated.
>>>
>>> The following will do the trick but I cannot really say I like the code
>>> duplication. pr_cont in this case sounds like a much cleaner solution to
>>> me.
>>
>> How about this then?
> 
> Yes makes sense as well.
> 
>> diff --git mm/debug.c mm/debug.c
>> index 0461df1207cb..6a52316af839 100644
>> --- mm/debug.c
>> +++ mm/debug.c
>> @@ -47,6 +47,7 @@ void __dump_page(struct page *page, const char *reason)
>>   	struct address_space *mapping;
>>   	bool page_poisoned = PagePoisoned(page);
>>   	int mapcount;
>> +	char *type = "";
>>   
>>   	/*
>>   	 * If struct page is poisoned don't access Page*() functions as that
>> @@ -78,9 +79,9 @@ void __dump_page(struct page *page, const char *reason)
>>   			page, page_ref_count(page), mapcount,
>>   			page->mapping, page_to_pgoff(page));
>>   	if (PageKsm(page))
>> -		pr_warn("ksm flags: %#lx(%pGp)\n", page->flags, &page->flags);
>> +		type = "ksm ";
>>   	else if (PageAnon(page))
>> -		pr_warn("anon flags: %#lx(%pGp)\n", page->flags, &page->flags);
>> +		type = "anon ";
>>   	else if (mapping) {
>>   		if (mapping->host && mapping->host->i_dentry.first) {
>>   			struct dentry *dentry;
>> @@ -88,10 +89,11 @@ void __dump_page(struct page *page, const char *reason)
>>   			pr_warn("%ps name:\"%pd\"\n", mapping->a_ops, dentry);
>>   		} else
>>   			pr_warn("%ps\n", mapping->a_ops);
>> -		pr_warn("flags: %#lx(%pGp)\n", page->flags, &page->flags);
>>   	}
>>   	BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
>>   
>> +	pr_warn("%sflags: %#lx(%pGp)\n", type, page->flags, &page->flags);
>> +
>>   hex_only:
>>   	print_hex_dump(KERN_WARNING, "raw: ", DUMP_PREFIX_NONE, 32,
>>   			sizeof(unsigned long), page,
>>
> 
Looks good to me. Thanks for the clean up.
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15c2050c629b..fbf93ea119d2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8206,8 +8206,10 @@  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 
 		page = pfn_to_page(check);
 
-		if (PageReserved(page))
+		if (PageReserved(page)) {
+			reason = "Reserved page";
 			goto unmovable;
+		}
 
 		/*
 		 * If the zone is movable and we have ruled out all reserved