diff mbox series

[Resend,1/6] mm/memcg: warning on !memcg after readahead page charged

Message ID 1597144232-11370-1-git-send-email-alex.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series [Resend,1/6] mm/memcg: warning on !memcg after readahead page charged | expand

Commit Message

Alex Shi Aug. 11, 2020, 11:10 a.m. UTC
Since readahead page is charged on memcg too, in theory we don't have to
check this exception now. Before safely remove them all, add a warning
for the unexpected !memcg.

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/mmdebug.h | 13 +++++++++++++
 mm/memcontrol.c         | 15 ++++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

Comments

Qian Cai Aug. 20, 2020, 2:58 p.m. UTC | #1
On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> Since readahead page is charged on memcg too, in theory we don't have to
> check this exception now. Before safely remove them all, add a warning
> for the unexpected !memcg.
> 
> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> Acked-by: Michal Hocko <mhocko@suse.com>

This will trigger,

[ 1863.916499] LTP: starting move_pages12
[ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
[ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
[ 1863.946568] anon flags: 0x7fff800001000d(locked|uptodate|dirty|head)
[ 1863.946584] raw: 007fff800001000d c000000016ebfcd8 c000000016ebfcd8 c000001feaf46d59
[ 1863.946609] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 1863.946632] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
[ 1863.946669] ------------[ cut here ]------------
[ 1863.946694] WARNING: CPU: 16 PID: 35307 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5f8/0x610
[ 1863.946708] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci libphy mdio firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
[ 1863.946801] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
[ 1863.946834] NIP:  c0000000003fcb48 LR: c0000000003fcb38 CTR: 0000000000000000
[ 1863.946856] REGS: c000000016ebf6f0 TRAP: 0700   Not tainted  (5.9.0-rc1-next-20200820)
[ 1863.946879] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28242882  XER: 00000000
[ 1863.946915] CFAR: c00000000032c644 IRQMASK: 0 
               GPR00: c0000000003fcb38 c000000016ebf980 c000000005923200 0000000000000031 
               GPR04: 0000000000000000 0000000000000000 0000000000000027 c000001ffd727190 
               GPR08: 0000000000000023 0000000000000001 c0000000058f3200 0000000000000001 
               GPR12: 0000000000002000 c000001ffffe3800 c000000000b26a68 0000000000000000 
               GPR16: c000000016ebfc20 c000000016ebfcd8 0000000000000020 0000000000000001 
               GPR20: c00c00080724f000 c0000000003c8770 0000000000000000 c000000016ebfcd0 
               GPR24: 0000000000000000 fffffffffffffff5 0000000000000002 0000000000000000 
               GPR28: 0000000000000000 0000000000000001 0000000000000000 c00c000007f4f000 
[ 1863.947142] NIP [c0000000003fcb48] mem_cgroup_migrate+0x5f8/0x610
[ 1863.947164] LR [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610
[ 1863.947185] Call Trace:
[ 1863.947203] [c000000016ebf980] [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610 (unreliable)
[ 1863.947241] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
[ 1863.947274] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
[ 1863.947307] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
[ 1863.947341] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
[ 1863.947365] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
[ 1863.947399] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
[ 1863.947434] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
[ 1863.947459] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 1863.947481] Instruction dump:
[ 1863.947502] 7fc3f378 4bfee82d 7c0802a6 3c82fb20 7fe3fb78 38844fc8 f8010050 4bf2fad5 
[ 1863.947527] 60000000 39200001 3d42fffd 992a82fb <0fe00000> e8010050 eb810020 7c0803a6 
[ 1863.947563] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
[ 1863.947594] Call Trace:
[ 1863.947615] [c000000016ebf4d0] [c0000000006f6008] dump_stack+0xfc/0x174 (unreliable)
[ 1863.947642] [c000000016ebf520] [c0000000000c9004] __warn+0xc4/0x14c
[ 1863.947665] [c000000016ebf5b0] [c0000000006f4b68] report_bug+0x108/0x1f0
[ 1863.947689] [c000000016ebf650] [c0000000000234f4] program_check_exception+0x104/0x2e0
[ 1863.947724] [c000000016ebf680] [c000000000009664] program_check_common_virt+0x2c4/0x310
[ 1863.947751] --- interrupt: 700 at mem_cgroup_migrate+0x5f8/0x610
                   LR = mem_cgroup_migrate+0x5e8/0x610
[ 1863.947786] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
[ 1863.947810] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
[ 1863.947843] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
[ 1863.947867] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
[ 1863.947901] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
[ 1863.947936] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
[ 1863.947969] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
[ 1863.948002] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 1863.948034] irq event stamp: 410
[ 1863.948054] hardirqs last  enabled at (409): [<c000000000184564>] console_unlock+0x6b4/0x990
[ 1863.948092] hardirqs last disabled at (410): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
[ 1863.948126] softirqs last  enabled at (0): [<c0000000000c59a8>] copy_process+0x788/0x1950
[ 1863.948229] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 1863.948316] ---[ end trace 74f8f4df751b0259 ]---

> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  include/linux/mmdebug.h | 13 +++++++++++++
>  mm/memcontrol.c         | 15 ++++++++-------
>  2 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 2ad72d2c8cc5..4ed52879ce55 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -37,6 +37,18 @@
>  			BUG();						\
>  		}							\
>  	} while (0)
> +#define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
> +	static bool __section(.data.once) __warned;			\
> +	int __ret_warn_once = !!(cond);					\
> +									\
> +	if (unlikely(__ret_warn_once && !__warned)) {			\
> +		dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
> +		__warned = true;					\
> +		WARN_ON(1);						\
> +	}								\
> +	unlikely(__ret_warn_once);					\
> +})
> +
>  #define VM_WARN_ON(cond) (void)WARN_ON(cond)
>  #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
>  #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
> @@ -48,6 +60,7 @@
>  #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
>  #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
>  #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
> +#define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
>  #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
>  #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
>  #endif
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 130093bdf74b..299382fc55a9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1322,10 +1322,8 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
>  	}
>  
>  	memcg = page->mem_cgroup;
> -	/*
> -	 * Swapcache readahead pages are added to the LRU - and
> -	 * possibly migrated - before they are charged.
> -	 */
> +	/* Readahead page is charged too, to see if other page uncharged */
> +	VM_WARN_ON_ONCE_PAGE(!memcg, page);
>  	if (!memcg)
>  		memcg = root_mem_cgroup;
>  
> @@ -6906,8 +6904,9 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
>  	if (newpage->mem_cgroup)
>  		return;
>  
> -	/* Swapcache readahead pages can get replaced before being charged */
>  	memcg = oldpage->mem_cgroup;
> +	/* Readahead page is charged too, to see if other page uncharged */
> +	VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
>  	if (!memcg)
>  		return;
>  
> @@ -7104,7 +7103,8 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
>  
>  	memcg = page->mem_cgroup;
>  
> -	/* Readahead page, never charged */
> +	/* Readahead page is charged too, to see if other page uncharged */
> +	VM_WARN_ON_ONCE_PAGE(!memcg, page);
>  	if (!memcg)
>  		return;
>  
> @@ -7168,7 +7168,8 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
>  
>  	memcg = page->mem_cgroup;
>  
> -	/* Readahead page, never charged */
> +	/* Readahead page is charged too, to see if other page uncharged */
> +	VM_WARN_ON_ONCE_PAGE(!memcg, page);
>  	if (!memcg)
>  		return 0;
>  
> -- 
> 1.8.3.1
> 
>
Michal Hocko Aug. 21, 2020, 8:01 a.m. UTC | #2
On Thu 20-08-20 10:58:51, Qian Cai wrote:
> On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > Since readahead page is charged on memcg too, in theory we don't have to
> > check this exception now. Before safely remove them all, add a warning
> > for the unexpected !memcg.
> > 
> > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> This will trigger,

Thanks for the report!
 
> [ 1863.916499] LTP: starting move_pages12
> [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0

Hmm, this is really unexpected. How did we get order-5 page here? Is
this some special mappaing that sys_move_pages should just ignore?

> [ 1863.946568] anon flags: 0x7fff800001000d(locked|uptodate|dirty|head)
> [ 1863.946584] raw: 007fff800001000d c000000016ebfcd8 c000000016ebfcd8 c000001feaf46d59
> [ 1863.946609] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 1863.946632] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
> [ 1863.946669] ------------[ cut here ]------------
> [ 1863.946694] WARNING: CPU: 16 PID: 35307 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5f8/0x610
> [ 1863.946708] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci libphy mdio firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
> [ 1863.946801] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.946834] NIP:  c0000000003fcb48 LR: c0000000003fcb38 CTR: 0000000000000000
> [ 1863.946856] REGS: c000000016ebf6f0 TRAP: 0700   Not tainted  (5.9.0-rc1-next-20200820)
> [ 1863.946879] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28242882  XER: 00000000
> [ 1863.946915] CFAR: c00000000032c644 IRQMASK: 0 
>                GPR00: c0000000003fcb38 c000000016ebf980 c000000005923200 0000000000000031 
>                GPR04: 0000000000000000 0000000000000000 0000000000000027 c000001ffd727190 
>                GPR08: 0000000000000023 0000000000000001 c0000000058f3200 0000000000000001 
>                GPR12: 0000000000002000 c000001ffffe3800 c000000000b26a68 0000000000000000 
>                GPR16: c000000016ebfc20 c000000016ebfcd8 0000000000000020 0000000000000001 
>                GPR20: c00c00080724f000 c0000000003c8770 0000000000000000 c000000016ebfcd0 
>                GPR24: 0000000000000000 fffffffffffffff5 0000000000000002 0000000000000000 
>                GPR28: 0000000000000000 0000000000000001 0000000000000000 c00c000007f4f000 
> [ 1863.947142] NIP [c0000000003fcb48] mem_cgroup_migrate+0x5f8/0x610
> [ 1863.947164] LR [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947185] Call Trace:
> [ 1863.947203] [c000000016ebf980] [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610 (unreliable)
> [ 1863.947241] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947274] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947307] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947341] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947365] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947399] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947434] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.947459] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.947481] Instruction dump:
> [ 1863.947502] 7fc3f378 4bfee82d 7c0802a6 3c82fb20 7fe3fb78 38844fc8 f8010050 4bf2fad5 
> [ 1863.947527] 60000000 39200001 3d42fffd 992a82fb <0fe00000> e8010050 eb810020 7c0803a6 
> [ 1863.947563] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.947594] Call Trace:
> [ 1863.947615] [c000000016ebf4d0] [c0000000006f6008] dump_stack+0xfc/0x174 (unreliable)
> [ 1863.947642] [c000000016ebf520] [c0000000000c9004] __warn+0xc4/0x14c
> [ 1863.947665] [c000000016ebf5b0] [c0000000006f4b68] report_bug+0x108/0x1f0
> [ 1863.947689] [c000000016ebf650] [c0000000000234f4] program_check_exception+0x104/0x2e0
> [ 1863.947724] [c000000016ebf680] [c000000000009664] program_check_common_virt+0x2c4/0x310
> [ 1863.947751] --- interrupt: 700 at mem_cgroup_migrate+0x5f8/0x610
>                    LR = mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947786] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947810] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947843] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947867] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947901] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947936] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947969] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.948002] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.948034] irq event stamp: 410
> [ 1863.948054] hardirqs last  enabled at (409): [<c000000000184564>] console_unlock+0x6b4/0x990
> [ 1863.948092] hardirqs last disabled at (410): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
> [ 1863.948126] softirqs last  enabled at (0): [<c0000000000c59a8>] copy_process+0x788/0x1950
> [ 1863.948229] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 1863.948316] ---[ end trace 74f8f4df751b0259 ]---
Qian Cai Aug. 21, 2020, 12:39 p.m. UTC | #3
On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > Since readahead page is charged on memcg too, in theory we don't have to
> > > check this exception now. Before safely remove them all, add a warning
> > > for the unexpected !memcg.
> > > 
> > > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> > This will trigger,
> 
> Thanks for the report!
>  
> > [ 1863.916499] LTP: starting move_pages12
> > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> 
> Hmm, this is really unexpected. How did we get order-5 page here? Is
> this some special mappaing that sys_move_pages should just ignore?

Well, I thought everybody should be able to figure out where to find the LTP
tests source code at this stage to see what it does. Anyway, the test simply
migrate hugepages while soft offlining, so order 5 is expected as that is 2M
hugepage on powerpc (also reproduced on x86 below). It might be easier to
reproduce using our linux-mm random bug collection on NUMA systems.

# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1

The main code is here:

https://gitlab.com/cailca/linux-mm/-/blob/master/random.c#L786

Reproduced on x86 as well:

[  314.171411][ T1762] Offlined Pages 524288
[  315.265413][ T1762] Soft offlining pfn 0x1e86e00 at process virtual address 0x7f221b800000
[  315.307179][ T1762] soft offline: 0x1e86e00: hugepage isolation failed: 0, page count 2, type 3bfffc00001000f (locked|referenced|uptodate|dirty|head)
[  315.372397][ T1762] Soft offlining pfn 0x1880000 at process virtual address 0x7f221ba00000
[  315.372788][ T1939] page:000000004b7fe362 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1e86e00
[  315.461283][ T1939] head:000000004b7fe362 order:9 compound_mapcount:0 compound_pincount:0
[  315.501050][ T1939] anon flags: 0x3bfffc00001000f(locked|referenced|uptodate|dirty|head)
[  315.539977][ T1939] raw: 03bfffc00001000f ffffc9000aaefe30 ffffc9000aaefe30 ffff888fefec5a49
[  315.580841][ T1939] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  315.621327][ T1939] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
[  315.677964][ T1939] WARNING: CPU: 30 PID: 1939 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x50e/0x850
[  315.722090][ T1939] Modules linked in: nls_ascii nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs
[  315.819298][ T1939] CPU: 30 PID: 1939 Comm: random Not tainted 5.9.0-rc1-next-20200821 #2
[  315.858186][ T1939] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[  315.894272][ T1939] RIP: 0010:mem_cgroup_migrate+0x50e/0x850
[  315.922436][ T1939] Code: 2d c0 5c 5d 06 40 80 fd 01 0f 87 3f 2d 00 00 83 e5 01 75 18 48 c7 c6 80 14 4e ad 48 89 df e8 99 4f eb ff c6 05 9b 5c 5d 06 01 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 60 04
[  316.018174][ T1939] RSP: 0018:ffffc9000aaefa98 EFLAGS: 00010296
[  316.047132][ T1939] RAX: 0000000000000000 RBX: ffffea007a1b8000 RCX: ffffffffac899a12
[  316.084567][ T1939] RDX: 1ffffd400f437007 RSI: 0000000000000000 RDI: ffffea007a1b8038
[  316.122817][ T1939] RBP: 0000000000000000 R08: ffffed120bf75e7a R09: ffffed120bf75e7a
[  316.160571][ T1939] R10: ffff88905fbaf3cf R11: ffffed120bf75e79 R12: 0000000000000000
[  316.198278][ T1939] R13: ffffea000e108038 R14: ffffea000e108008 R15: 0000000000000001
[  316.235465][ T1939] FS:  00007f221c461740(0000) GS:ffff88905fb80000(0000) knlGS:0000000000000000
[  316.277658][ T1939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  316.309711][ T1939] CR2: 000055acde5279d8 CR3: 0000000fec490003 CR4: 00000000001706e0
[  316.349521][ T1939] Call Trace:
[  316.364537][ T1939]  ? migrate_page_states+0xb4c/0x1970
[  316.389115][ T1939]  migrate_page+0xea/0x190
[  316.409463][ T1939]  move_to_new_page+0x338/0xca0
[  316.432024][ T1939]  ? remove_migration_ptes+0xd0/0xd0
[  316.456464][ T1939]  ? __page_mapcount+0x19a/0x250
[  316.479465][ T1939]  ? try_to_unmap+0x1bf/0x2d0
[  316.501801][ T1939]  ? rmap_walk_locked+0x140/0x140
[  316.524702][ T1939]  ? PageHuge+0xf/0xd0
[  316.543478][ T1939]  ? page_mapped+0x155/0x2e0
[  316.564341][ T1939]  ? hugetlb_page_mapping_lock_write+0x97/0x180
[  316.593160][ T1939]  migrate_pages+0x1496/0x2290
[  316.615520][ T1939]  ? remove_migration_pte+0xac0/0xac0
[  316.640773][ T1939]  move_pages_and_store_status.isra.47+0xd7/0x1a0
[  316.670470][ T1939]  ? migrate_pages+0x2290/0x2290
[  316.693733][ T1939]  __x64_sys_move_pages+0x8b7/0x1180
[  316.717974][ T1939]  ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
[  316.749559][ T1939]  ? syscall_enter_from_user_mode+0x1b/0x210
[  316.776749][ T1939]  ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
[  316.804892][ T1939]  ? syscall_enter_from_user_mode+0x20/0x210
[  316.833657][ T1939]  ? trace_hardirqs_on+0x20/0x1b5
[  316.858450][ T1939]  do_syscall_64+0x33/0x40
[  316.879205][ T1939]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  316.905709][ T1939] RIP: 0033:0x7f221bd5d6ed
[  316.925937][ T1939] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[  317.017819][ T1939] RSP: 002b:00007ffff6c4d448 EFLAGS: 00000212 ORIG_RAX: 0000000000000117
[  317.057248][ T1939] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f221bd5d6ed
[  317.094612][ T1939] RDX: 000000000151bc40 RSI: 0000000000000400 RDI: 00000000000006e2
[  317.131770][ T1939] RBP: 00007ffff6c4d4b0 R08: 000000000151ac30 R09: 0000000000000004
[  317.169009][ T1939] R10: 0000000001519c20 R11: 0000000000000212 R12: 0000000000401cb0
[  317.206463][ T1939] R13: 00007ffff6c58970 R14: 0000000000000000 R15: 0000000000000000
[  317.243922][ T1939] CPU: 30 PID: 1939 Comm: random Not tainted 5.9.0-rc1-next-20200821 #2
[  317.282806][ T1939] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[  317.318693][ T1939] Call Trace:
[  317.334040][ T1939]  dump_stack+0x9d/0xe0
[  317.354529][ T1939]  __warn.cold.13+0xe/0x57
[  317.376117][ T1939]  ? mem_cgroup_migrate+0x50e/0x850
[  317.400959][ T1939]  report_bug+0x1af/0x260
[  317.420960][ T1939]  handle_bug+0x44/0x80
[  317.439737][ T1939]  exc_invalid_op+0x13/0x40
[  317.460081][ T1939]  asm_exc_invalid_op+0x12/0x20
[  317.483336][ T1939] RIP: 0010:mem_cgroup_migrate+0x50e/0x850
[  317.510309][ T1939] Code: 2d c0 5c 5d 06 40 80 fd 01 0f 87 3f 2d 00 00 83 e5 01 75 18 48 c7 c6 80 14 4e ad 48 89 df e8 99 4f eb ff c6 05 9b 5c 5d 06 01 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 60 04
[  317.603479][ T1939] RSP: 0018:ffffc9000aaefa98 EFLAGS: 00010296
[  317.630605][ T1939] RAX: 0000000000000000 RBX: ffffea007a1b8000 RCX: ffffffffac899a12
[  317.666432][ T1939] RDX: 1ffffd400f437007 RSI: 0000000000000000 RDI: ffffea007a1b8038
[  317.703630][ T1939] RBP: 0000000000000000 R08: ffffed120bf75e7a R09: ffffed120bf75e7a
[  317.739491][ T1939] R10: ffff88905fbaf3cf R11: ffffed120bf75e79 R12: 0000000000000000
[  317.777498][ T1939] R13: ffffea000e108038 R14: ffffea000e108008 R15: 0000000000000001
[  317.815155][ T1939]  ? llist_add_batch+0x52/0x90
[  317.837438][ T1939]  ? mem_cgroup_migrate+0x507/0x850
[  317.862628][ T1939]  ? migrate_page_states+0xb4c/0x1970
[  317.888821][ T1939]  migrate_page+0xea/0x190
[  317.910213][ T1939]  move_to_new_page+0x338/0xca0
[  317.932614][ T1939]  ? remove_migration_ptes+0xd0/0xd0
[  317.956549][ T1939]  ? __page_mapcount+0x19a/0x250
[  317.979397][ T1939]  ? try_to_unmap+0x1bf/0x2d0
[  318.000960][ T1939]  ? rmap_walk_locked+0x140/0x140
[  318.023999][ T1939]  ? PageHuge+0xf/0xd0
[  318.042729][ T1939]  ? page_mapped+0x155/0x2e0
[  318.063725][ T1939]  ? hugetlb_page_mapping_lock_write+0x97/0x180
[  318.092655][ T1939]  migrate_pages+0x1496/0x2290
[  318.114177][ T1939]  ? remove_migration_pte+0xac0/0xac0
[  318.139797][ T1939]  move_pages_and_store_status.isra.47+0xd7/0x1a0
[  318.170609][ T1939]  ? migrate_pages+0x2290/0x2290
[  318.193481][ T1939]  __x64_sys_move_pages+0x8b7/0x1180
[  318.217652][ T1939]  ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
[  318.248748][ T1939]  ? syscall_enter_from_user_mode+0x1b/0x210
[  318.276536][ T1939]  ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
[  318.304077][ T1939]  ? syscall_enter_from_user_mode+0x20/0x210
[  318.331847][ T1939]  ? trace_hardirqs_on+0x20/0x1b5
[  318.355312][ T1939]  do_syscall_64+0x33/0x40
[  318.376825][ T1939]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  318.405938][ T1939] RIP: 0033:0x7f221bd5d6ed
[  318.426072][ T1939] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[  318.519665][ T1939] RSP: 002b:00007ffff6c4d448 EFLAGS: 00000212 ORIG_RAX: 0000000000000117
[  318.560049][ T1939] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f221bd5d6ed
[  318.597747][ T1939] RDX: 000000000151bc40 RSI: 0000000000000400 RDI: 00000000000006e2
[  318.635143][ T1939] RBP: 00007ffff6c4d4b0 R08: 000000000151ac30 R09: 0000000000000004
[  318.672919][ T1939] R10: 0000000001519c20 R11: 0000000000000212 R12: 0000000000401cb0
[  318.710619][ T1939] R13: 00007ffff6c58970 R14: 0000000000000000 R15: 0000000000000000
[  318.747920][ T1939] irq event stamp: 1465
[  318.766713][ T1939] hardirqs last  enabled at (1475): [<ffffffffabe671bf>] console_unlock+0x75f/0xaf0
[  318.810473][ T1939] hardirqs last disabled at (1484): [<ffffffffabe66cad>] console_unlock+0x24d/0xaf0
[  318.854253][ T1939] softirqs last  enabled at (1464): [<ffffffffad20070f>] __do_softirq+0x70f/0xa9f
[  318.900757][ T1939] softirqs last disabled at (1455): [<ffffffffad000ec2>] asm_call_on_stack+0x12/0x20
[  318.945713][ T1939] ---[ end trace 5a58095b9439b080 ]---
Michal Hocko Aug. 21, 2020, 1:48 p.m. UTC | #4
On Fri 21-08-20 08:39:37, Qian Cai wrote:
> On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> > On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > check this exception now. Before safely remove them all, add a warning
> > > > for the unexpected !memcg.
> > > > 
> > > > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > 
> > > This will trigger,
> > 
> > Thanks for the report!
> >  
> > > [ 1863.916499] LTP: starting move_pages12
> > > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> > 
> > Hmm, this is really unexpected. How did we get order-5 page here? Is
> > this some special mappaing that sys_move_pages should just ignore?
> 
> Well, I thought everybody should be able to figure out where to find the LTP
> tests source code at this stage to see what it does. Anyway, the test simply
> migrate hugepages while soft offlining, so order 5 is expected as that is 2M
> hugepage on powerpc (also reproduced on x86 below). It might be easier to
> reproduce using our linux-mm random bug collection on NUMA systems.

OK, I must have missed that this was on ppc. The order makes more sense
now. I will have a look at this next week.

Thanks!
Qian Cai Aug. 24, 2020, 2:52 p.m. UTC | #5
On Thu, Aug 20, 2020 at 10:58:50AM -0400, Qian Cai wrote:
> On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > Since readahead page is charged on memcg too, in theory we don't have to
> > check this exception now. Before safely remove them all, add a warning
> > for the unexpected !memcg.
> > 
> > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > Acked-by: Michal Hocko <mhocko@suse.com>
> 
> This will trigger,

Andrew, Stephen, can you drop this series for now? I did manage to trigger this
warning on all arches, powerpc, x86 and arm64 (below).

[ 7380.751929][T73938] WARNING: CPU: 160 PID: 73938 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5a4/0x6d8
[ 7380.761317][T73938] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop processor efivarfs ip_tables x_tables sd_mod ahci libahci mlx5_core firmware_class libata dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_del_mod]
[ 7380.783242][T73938] CPU: 160 PID: 73938 Comm: move_pages12 Tainted: G           O L    5.9.0-rc2-next-20200824 #1
[ 7380.793499][T73938] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.15 05/08/2020
[ 7380.803932][T73938] pstate: 00400009 (nzcv daif +PAN -UAO BTYPE=--)
[ 7380.810196][T73938] pc : mem_cgroup_migrate+0x5a4/0x6d8
[ 7380.815419][T73938] lr : mem_cgroup_migrate+0x59c/0x6d8
[ 7380.820641][T73938] sp : ffff008de9d0f6a0
[ 7380.824647][T73938] x29: ffff008de9d0f6a0 x28: 0000000000000002 
[ 7380.830661][T73938] x27: ffffffe022880018 x26: 1ffffffc04510003 
[ 7380.836674][T73938] x25: 0000000000000001 x24: 0000000000000001 
[ 7380.842687][T73938] x23: ffffffe003280038 x22: 1ffffffc00650007 
[ 7380.848701][T73938] x21: 0000000000000000 x20: ffffa0001703692c 
[ 7380.854714][T73938] x19: ffffffe022880000 x18: 0000000000000000 
[ 7380.860726][T73938] x17: 0000000000000000 x16: 0000000000000000 
[ 7380.866738][T73938] x15: 0000000000000000 x14: 0000000000000001 
[ 7380.872751][T73938] x13: ffff8011cf16f0ff x12: 1fffe011cf16f0fe 
[ 7380.878764][T73938] x11: 1fffe011cf16f0fe x10: ffff8011cf16f0fe 
[ 7380.884777][T73938] x9 : dfffa00000000000 x8 : ffff008e78b787f7 
[ 7380.890789][T73938] x7 : 0000000000000001 x6 : ffff8011cf16f0ff 
[ 7380.896802][T73938] x5 : ffff8011cf16f0ff x4 : ffff8011cf16f0ff 
[ 7380.902815][T73938] x3 : 1fffe011c1a4ae72 x2 : 1ffffffc04510007 
[ 7380.908828][T73938] x1 : 53d80e6b46c19e00 x0 : 0000000000000001 
[ 7380.914842][T73938] Call trace:
[ 7380.917982][T73938]  mem_cgroup_migrate+0x5a4/0x6d8
[ 7380.922862][T73938]  migrate_page_states+0x938/0x17c0
[ 7380.927911][T73938]  migrate_page_copy+0x6c0/0x1018
[ 7380.932787][T73938]  migrate_page+0xe0/0x1a0
[ 7380.937055][T73938]  move_to_new_page+0x2b0/0x9e8
[ 7380.941757][T73938]  migrate_pages+0x1650/0x23a0
[ 7380.946373][T73938]  move_pages_and_store_status.isra.40+0xe4/0x170
[ 7380.952638][T73938]  do_pages_move+0x484/0xb88
[ 7380.957079][T73938]  __arm64_sys_move_pages+0x3a8/0x7d0
[ 7380.962314][T73938]  do_el0_svc+0x124/0x228
[ 7380.966502][T73938]  el0_sync_handler+0x260/0x410
[ 7380.971204][T73938]  el0_sync+0x140/0x180
[ 7380.975213][T73938] CPU: 160 PID: 73938 Comm: move_pages12 Tainted: G           O L    5.9.0-rc2-next-20200824 #1
[ 7380.985469][T73938] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.15 05/08/2020
[ 7380.995898][T73938] Call trace:
[ 7380.999041][T73938]  dump_backtrace+0x0/0x398
[ 7381.003396][T73938]  show_stack+0x14/0x20
[ 7381.007412][T73938]  dump_stack+0x140/0x1c8
[ 7381.011604][T73938]  __warn+0x23c/0x2c8
[ 7381.015439][T73938]  report_bug+0x18c/0x2a8
[ 7381.019621][T73938]  bug_handler+0x34/0x88
[ 7381.023715][T73938]  brk_handler+0x138/0x240
[ 7381.027987][T73938]  do_debug_exception+0x138/0x544
[ 7381.032862][T73938]  el1_sync_handler+0x13c/0x1b8
[ 7381.037564][T73938]  el1_sync+0x7c/0x100
[ 7381.041484][T73938]  mem_cgroup_migrate+0x5a4/0x6d8
[ 7381.046359][T73938]  migrate_page_states+0x938/0x17c0
[ 7381.051408][T73938]  migrate_page_copy+0x6c0/0x1018
[ 7381.056283][T73938]  migrate_page+0xe0/0x1a0
[ 7381.060551][T73938]  move_to_new_page+0x2b0/0x9e8
[ 7381.065252][T73938]  migrate_pages+0x1650/0x23a0
[ 7381.069867][T73938]  move_pages_and_store_status.isra.40+0xe4/0x170
[ 7381.076131][T73938]  do_pages_move+0x484/0xb88
[ 7381.080573][T73938]  __arm64_sys_move_pages+0x3a8/0x7d0
[ 7381.085796][T73938]  do_el0_svc+0x124/0x228
[ 7381.089977][T73938]  el0_sync_handler+0x260/0x410
[ 7381.094678][T73938]  el0_sync+0x140/0x180
[ 7381.098684][T73938] irq event stamp: 470
[ 7381.102614][T73938] hardirqs last  enabled at (469): [<ffffa000103b5b5c>] console_unlock+0x7f4/0xf10
[ 7381.111745][T73938] hardirqs last disabled at (470): [<ffffa000101cd934>] do_debug_exception+0x304/0x544
[ 7381.121222][T73938] softirqs last  enabled at (408): [<ffffa000101a1b50>] efi_header_end+0xb50/0x14d4
[ 7381.130448][T73938] softirqs last disabled at (403): [<ffffa0001028df98>] irq_exit+0x440/0x510

== powerpc ==

> 
> [ 1863.916499] LTP: starting move_pages12
> [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> [ 1863.946568] anon flags: 0x7fff800001000d(locked|uptodate|dirty|head)
> [ 1863.946584] raw: 007fff800001000d c000000016ebfcd8 c000000016ebfcd8 c000001feaf46d59
> [ 1863.946609] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 1863.946632] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
> [ 1863.946669] ------------[ cut here ]------------
> [ 1863.946694] WARNING: CPU: 16 PID: 35307 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5f8/0x610
> [ 1863.946708] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci libphy mdio firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
> [ 1863.946801] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.946834] NIP:  c0000000003fcb48 LR: c0000000003fcb38 CTR: 0000000000000000
> [ 1863.946856] REGS: c000000016ebf6f0 TRAP: 0700   Not tainted  (5.9.0-rc1-next-20200820)
> [ 1863.946879] MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28242882  XER: 00000000
> [ 1863.946915] CFAR: c00000000032c644 IRQMASK: 0 
>                GPR00: c0000000003fcb38 c000000016ebf980 c000000005923200 0000000000000031 
>                GPR04: 0000000000000000 0000000000000000 0000000000000027 c000001ffd727190 
>                GPR08: 0000000000000023 0000000000000001 c0000000058f3200 0000000000000001 
>                GPR12: 0000000000002000 c000001ffffe3800 c000000000b26a68 0000000000000000 
>                GPR16: c000000016ebfc20 c000000016ebfcd8 0000000000000020 0000000000000001 
>                GPR20: c00c00080724f000 c0000000003c8770 0000000000000000 c000000016ebfcd0 
>                GPR24: 0000000000000000 fffffffffffffff5 0000000000000002 0000000000000000 
>                GPR28: 0000000000000000 0000000000000001 0000000000000000 c00c000007f4f000 
> [ 1863.947142] NIP [c0000000003fcb48] mem_cgroup_migrate+0x5f8/0x610
> [ 1863.947164] LR [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947185] Call Trace:
> [ 1863.947203] [c000000016ebf980] [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610 (unreliable)
> [ 1863.947241] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947274] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947307] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947341] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947365] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947399] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947434] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.947459] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.947481] Instruction dump:
> [ 1863.947502] 7fc3f378 4bfee82d 7c0802a6 3c82fb20 7fe3fb78 38844fc8 f8010050 4bf2fad5 
> [ 1863.947527] 60000000 39200001 3d42fffd 992a82fb <0fe00000> e8010050 eb810020 7c0803a6 
> [ 1863.947563] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.947594] Call Trace:
> [ 1863.947615] [c000000016ebf4d0] [c0000000006f6008] dump_stack+0xfc/0x174 (unreliable)
> [ 1863.947642] [c000000016ebf520] [c0000000000c9004] __warn+0xc4/0x14c
> [ 1863.947665] [c000000016ebf5b0] [c0000000006f4b68] report_bug+0x108/0x1f0
> [ 1863.947689] [c000000016ebf650] [c0000000000234f4] program_check_exception+0x104/0x2e0
> [ 1863.947724] [c000000016ebf680] [c000000000009664] program_check_common_virt+0x2c4/0x310
> [ 1863.947751] --- interrupt: 700 at mem_cgroup_migrate+0x5f8/0x610
>                    LR = mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947786] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947810] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947843] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947867] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947901] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947936] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947969] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.948002] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.948034] irq event stamp: 410
> [ 1863.948054] hardirqs last  enabled at (409): [<c000000000184564>] console_unlock+0x6b4/0x990
> [ 1863.948092] hardirqs last disabled at (410): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
> [ 1863.948126] softirqs last  enabled at (0): [<c0000000000c59a8>] copy_process+0x788/0x1950
> [ 1863.948229] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 1863.948316] ---[ end trace 74f8f4df751b0259 ]---
> 
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: cgroups@vger.kernel.org
> > Cc: linux-mm@kvack.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  include/linux/mmdebug.h | 13 +++++++++++++
> >  mm/memcontrol.c         | 15 ++++++++-------
> >  2 files changed, 21 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> > index 2ad72d2c8cc5..4ed52879ce55 100644
> > --- a/include/linux/mmdebug.h
> > +++ b/include/linux/mmdebug.h
> > @@ -37,6 +37,18 @@
> >  			BUG();						\
> >  		}							\
> >  	} while (0)
> > +#define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
> > +	static bool __section(.data.once) __warned;			\
> > +	int __ret_warn_once = !!(cond);					\
> > +									\
> > +	if (unlikely(__ret_warn_once && !__warned)) {			\
> > +		dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
> > +		__warned = true;					\
> > +		WARN_ON(1);						\
> > +	}								\
> > +	unlikely(__ret_warn_once);					\
> > +})
> > +
> >  #define VM_WARN_ON(cond) (void)WARN_ON(cond)
> >  #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
> >  #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
> > @@ -48,6 +60,7 @@
> >  #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
> >  #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
> >  #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
> > +#define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
> >  #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
> >  #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
> >  #endif
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 130093bdf74b..299382fc55a9 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1322,10 +1322,8 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
> >  	}
> >  
> >  	memcg = page->mem_cgroup;
> > -	/*
> > -	 * Swapcache readahead pages are added to the LRU - and
> > -	 * possibly migrated - before they are charged.
> > -	 */
> > +	/* Readahead page is charged too, to see if other page uncharged */
> > +	VM_WARN_ON_ONCE_PAGE(!memcg, page);
> >  	if (!memcg)
> >  		memcg = root_mem_cgroup;
> >  
> > @@ -6906,8 +6904,9 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
> >  	if (newpage->mem_cgroup)
> >  		return;
> >  
> > -	/* Swapcache readahead pages can get replaced before being charged */
> >  	memcg = oldpage->mem_cgroup;
> > +	/* Readahead page is charged too, to see if other page uncharged */
> > +	VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
> >  	if (!memcg)
> >  		return;
> >  
> > @@ -7104,7 +7103,8 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> >  
> >  	memcg = page->mem_cgroup;
> >  
> > -	/* Readahead page, never charged */
> > +	/* Readahead page is charged too, to see if other page uncharged */
> > +	VM_WARN_ON_ONCE_PAGE(!memcg, page);
> >  	if (!memcg)
> >  		return;
> >  
> > @@ -7168,7 +7168,8 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> >  
> >  	memcg = page->mem_cgroup;
> >  
> > -	/* Readahead page, never charged */
> > +	/* Readahead page is charged too, to see if other page uncharged */
> > +	VM_WARN_ON_ONCE_PAGE(!memcg, page);
> >  	if (!memcg)
> >  		return 0;
> >  
> > -- 
> > 1.8.3.1
> > 
> >
Michal Hocko Aug. 24, 2020, 3:10 p.m. UTC | #6
On Fri 21-08-20 15:48:44, Michal Hocko wrote:
> On Fri 21-08-20 08:39:37, Qian Cai wrote:
> > On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> > > On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > > check this exception now. Before safely remove them all, add a warning
> > > > > for the unexpected !memcg.
> > > > > 
> > > > > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > > > > Acked-by: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > This will trigger,
> > > 
> > > Thanks for the report!
> > >  
> > > > [ 1863.916499] LTP: starting move_pages12
> > > > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > > > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> > > 
> > > Hmm, this is really unexpected. How did we get order-5 page here? Is
> > > this some special mappaing that sys_move_pages should just ignore?
> > 
> > Well, I thought everybody should be able to figure out where to find the LTP
> > tests source code at this stage to see what it does. Anyway, the test simply
> > migrate hugepages while soft offlining, so order 5 is expected as that is 2M
> > hugepage on powerpc (also reproduced on x86 below). It might be easier to
> > reproduce using our linux-mm random bug collection on NUMA systems.
> 
> OK, I must have missed that this was on ppc. The order makes more sense
> now. I will have a look at this next week.

OK, so I've had a look and I know what's going on there. The
move_pages12 is migrating hugetlb pages. Those are not charged to any
memcg. We have completely missed this case. There are two ways going
around that. Drop the warning and update the comment so that we do not
forget about that or special case hugetlb pages.

I think the first option is better.
Michal Hocko Aug. 24, 2020, 3:10 p.m. UTC | #7
On Mon 24-08-20 10:52:02, Qian Cai wrote:
> On Thu, Aug 20, 2020 at 10:58:50AM -0400, Qian Cai wrote:
> > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > Since readahead page is charged on memcg too, in theory we don't have to
> > > check this exception now. Before safely remove them all, add a warning
> > > for the unexpected !memcg.
> > > 
> > > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> > This will trigger,
> 
> Andrew, Stephen, can you drop this series for now? I did manage to trigger this
> warning on all arches, powerpc, x86 and arm64 (below).

Yes, I do agree. See http://lkml.kernel.org/r/20200824151013.GB3415@dhcp22.suse.cz
Stephen Rothwell Aug. 24, 2020, 11 p.m. UTC | #8
Hi Michal,

On Mon, 24 Aug 2020 17:10:45 +0200 Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 24-08-20 10:52:02, Qian Cai wrote:
> > On Thu, Aug 20, 2020 at 10:58:50AM -0400, Qian Cai wrote:  
> > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:  
> > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > check this exception now. Before safely remove them all, add a warning
> > > > for the unexpected !memcg.
> > > > 
> > > > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > > > Acked-by: Michal Hocko <mhocko@suse.com>  
> > > 
> > > This will trigger,  
> > 
> > Andrew, Stephen, can you drop this series for now? I did manage to trigger this
> > warning on all arches, powerpc, x86 and arm64 (below).  
> 
> Yes, I do agree. See http://lkml.kernel.org/r/20200824151013.GB3415@dhcp22.suse.cz

OK, I have removed the following from linux-next for today:

  c443db77c9f3 ("mm/thp: narrow lru locking")
  18bafefba73d ("mm/thp: remove code path which never got into")
  5fb6c0683017 ("mm/thp: clean up lru_add_page_tail")
  9d1d568727a8 ("mm/thp: move lru_add_page_tail func to huge_memory.c")
  47eb331560ff ("mm/memcg: bail out early from swap accounting when memcg is disabled")
  4b0d99a64d78 ("mm/memcg: warning on !memcg after readahead page charged")
Alex Shi Aug. 24, 2020, 11:14 p.m. UTC | #9
在 2020/8/25 上午7:00, Stephen Rothwell 写道:
>>>> This will trigger,  
>>> Andrew, Stephen, can you drop this series for now? I did manage to trigger this
>>> warning on all arches, powerpc, x86 and arm64 (below).  
>> Yes, I do agree. See http://lkml.kernel.org/r/20200824151013.GB3415@dhcp22.suse.cz
> OK, I have removed the following from linux-next for today:
> 
>   c443db77c9f3 ("mm/thp: narrow lru locking")
>   18bafefba73d ("mm/thp: remove code path which never got into")
>   5fb6c0683017 ("mm/thp: clean up lru_add_page_tail")
>   9d1d568727a8 ("mm/thp: move lru_add_page_tail func to huge_memory.c")
>   47eb331560ff ("mm/memcg: bail out early from swap accounting when memcg is disabled")
>   4b0d99a64d78 ("mm/memcg: warning on !memcg after readahead page charged")

The first patch 4b0d99a64d78 ("mm/memcg: warning on !memcg after readahead page charged")
reveals the hugetlb out of lru on some unexpected path. At least comments are helpful.

All other are good and functional.

Thanks
Alex
Alex Shi Aug. 25, 2020, 1:25 a.m. UTC | #10
reproduce using our linux-mm random bug collection on NUMA systems.
>>
>> OK, I must have missed that this was on ppc. The order makes more sense
>> now. I will have a look at this next week.
> 
> OK, so I've had a look and I know what's going on there. The
> move_pages12 is migrating hugetlb pages. Those are not charged to any
> memcg. We have completely missed this case. There are two ways going
> around that. Drop the warning and update the comment so that we do not
> forget about that or special case hugetlb pages.
> 
> I think the first option is better.
> 


Hi Michal,

Compare to ignore the warning which is designed to give, seems addressing
the hugetlb out of charge issue is a better solution, otherwise the memcg
memory usage is out of control on hugetlb, is that right?

Thanks
Alex
Hugh Dickins Aug. 25, 2020, 2:04 a.m. UTC | #11
On Tue, 25 Aug 2020, Alex Shi wrote:
> reproduce using our linux-mm random bug collection on NUMA systems.
> >>
> >> OK, I must have missed that this was on ppc. The order makes more sense
> >> now. I will have a look at this next week.
> > 
> > OK, so I've had a look and I know what's going on there. The
> > move_pages12 is migrating hugetlb pages. Those are not charged to any
> > memcg. We have completely missed this case. There are two ways going
> > around that. Drop the warning and update the comment so that we do not
> > forget about that or special case hugetlb pages.
> > 
> > I think the first option is better.
> > 
> 
> 
> Hi Michal,
> 
> Compare to ignore the warning which is designed to give, seems addressing
> the hugetlb out of charge issue is a better solution, otherwise the memcg
> memory usage is out of control on hugetlb, is that right?

Please don't suppose that this is peculiar to hugetlb: I'm not
testing hugetlb at all (sorry), but I see the VM_WARN_ON_ONCE from
mem_cgroup_page_lruvec(), and from mem_cgroup_migrate(), and from
mem_cgroup_swapout().

In all cases seen on a PageAnon page (well, in one case PageKsm).
And not related to THP either: seen also on machine incapable of THP.

Maybe there's an independent change in 5.9-rc that's defeating
expectations here, or maybe they were never valid.  Worth
investigating, even though the patch is currently removed,
to find out why expectations were wrong.

You'll ask me for more info, stacktraces etc, and I'll say sorry,
no time today.  Please try the swapping tests I sent before.

And may I say, the comment
/* Readahead page is charged too, to see if other page uncharged */
is nonsensical to me, and much better deleted: maybe it would make
some sense if the reader could see the comment it replaces - as
they can in the patch - but not in the resulting source file.

Hugh
Michal Hocko Aug. 25, 2020, 7:25 a.m. UTC | #12
On Tue 25-08-20 09:25:01, Alex Shi wrote:
> reproduce using our linux-mm random bug collection on NUMA systems.
> >>
> >> OK, I must have missed that this was on ppc. The order makes more sense
> >> now. I will have a look at this next week.
> > 
> > OK, so I've had a look and I know what's going on there. The
> > move_pages12 is migrating hugetlb pages. Those are not charged to any
> > memcg. We have completely missed this case. There are two ways going
> > around that. Drop the warning and update the comment so that we do not
> > forget about that or special case hugetlb pages.
> > 
> > I think the first option is better.
> > 
> 
> 
> Hi Michal,
> 
> Compare to ignore the warning which is designed to give, seems addressing
> the hugetlb out of charge issue is a better solution, otherwise the memcg
> memory usage is out of control on hugetlb, is that right?

Hugetlb memory is out of memcg scope deliberately. This is not a
reclaimable memory and something that can easily get out of control. The
memory is preallocated and overcommit is strictly controlled as well. We
have a dedicated hugetlb cgroup controller to offer a better control of
the preallocated pool distribution.

Anyway this just shows that there are more subtle cases where a page
with no memcg can hit some common paths so the patch is clearly not
ready.

I should have realized that when giving my ack but same as you I got
misled by the existing comment.
Hugh Dickins Aug. 27, 2020, 6 a.m. UTC | #13
On Mon, 24 Aug 2020, Hugh Dickins wrote:
> On Tue, 25 Aug 2020, Alex Shi wrote:
> > reproduce using our linux-mm random bug collection on NUMA systems.
> > >>
> > >> OK, I must have missed that this was on ppc. The order makes more sense
> > >> now. I will have a look at this next week.
> > > 
> > > OK, so I've had a look and I know what's going on there. The
> > > move_pages12 is migrating hugetlb pages. Those are not charged to any
> > > memcg. We have completely missed this case. There are two ways going
> > > around that. Drop the warning and update the comment so that we do not
> > > forget about that or special case hugetlb pages.
> > > 
> > > I think the first option is better.
> > > 
> > 
> > 
> > Hi Michal,
> > 
> > Compare to ignore the warning which is designed to give, seems addressing
> > the hugetlb out of charge issue is a better solution, otherwise the memcg
> > memory usage is out of control on hugetlb, is that right?

I agree: it seems that hugetlb is not participating in memcg and lrus,
so it should not even be calling mem_cgroup_migrate().  That happens
because hugetlb finds the rest of migrate_page_states() useful,
but maybe there just needs to be an "if (!PageHuge(page))" or
"if (!PageHuge(newpage))" before its call to mem_cgroup_migrate() -
but I have not yet checked whether either of those actually works.

The same could be done inside mem_cgroup_migrate() instead,
but it just seems wrong for hugetlb to be getting that far,
if it has no other reason to enter mm/memcontrol.c.

> 
> Please don't suppose that this is peculiar to hugetlb: I'm not
> testing hugetlb at all (sorry), but I see the VM_WARN_ON_ONCE from
> mem_cgroup_page_lruvec(), and from mem_cgroup_migrate(), and from
> mem_cgroup_swapout().
> 
> In all cases seen on a PageAnon page (well, in one case PageKsm).
> And not related to THP either: seen also on machine incapable of THP.
> 
> Maybe there's an independent change in 5.9-rc that's defeating
> expectations here, or maybe they were never valid.  Worth
> investigating, even though the patch is currently removed,
> to find out why expectations were wrong.

It was very well worth investigating.  And at the time of writing
the above, I thought it was coming up very quickly on all machines,
but in fact it only came up quickly on the one exercising KSM;
on the other machines it took about an hour to appear, so no
wonder that you and others had not already seen it.

While I'd prefer to spring the answer on you all in the patch that
fixes it, there's something more there that I don't fully understand
yet, and want to sort out before posting; so I'd better not keep you
in suspense... we broke the memcg charging of ksm_might_need_to_copy()
pages a couple of releases ago, and not noticed until your warning.

What's surprising is that the same bug can affect PageAnon pages too,
even when there's been no KSM involved whatsoever.  I put in the KSM
fix, set all the machines running, expecting to get more info on the
PageAnon instances, but all of them turned out to be fixed.

> 
> You'll ask me for more info, stacktraces etc, and I'll say sorry,
> no time today.  Please try the swapping tests I sent before.
> 
> And may I say, the comment
> /* Readahead page is charged too, to see if other page uncharged */
> is nonsensical to me, and much better deleted: maybe it would make
> some sense if the reader could see the comment it replaces - as
> they can in the patch - but not in the resulting source file.

I stand by that remark; but otherwise, I think this was a helpful
commit that helped to identify a bug, just as it was intended to do.
(I say "helped to" because its warnings alerted, but did not point
to the culprit: I had to add another in lru_cache_add() to find it.)

Hugh
diff mbox series

Patch

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 2ad72d2c8cc5..4ed52879ce55 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -37,6 +37,18 @@ 
 			BUG();						\
 		}							\
 	} while (0)
+#define VM_WARN_ON_ONCE_PAGE(cond, page)	({			\
+	static bool __section(.data.once) __warned;			\
+	int __ret_warn_once = !!(cond);					\
+									\
+	if (unlikely(__ret_warn_once && !__warned)) {			\
+		dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
+		__warned = true;					\
+		WARN_ON(1);						\
+	}								\
+	unlikely(__ret_warn_once);					\
+})
+
 #define VM_WARN_ON(cond) (void)WARN_ON(cond)
 #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
 #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
@@ -48,6 +60,7 @@ 
 #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
 #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE_PAGE(cond, page)  BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 130093bdf74b..299382fc55a9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1322,10 +1322,8 @@  struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
 	}
 
 	memcg = page->mem_cgroup;
-	/*
-	 * Swapcache readahead pages are added to the LRU - and
-	 * possibly migrated - before they are charged.
-	 */
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, page);
 	if (!memcg)
 		memcg = root_mem_cgroup;
 
@@ -6906,8 +6904,9 @@  void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
 	if (newpage->mem_cgroup)
 		return;
 
-	/* Swapcache readahead pages can get replaced before being charged */
 	memcg = oldpage->mem_cgroup;
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
 	if (!memcg)
 		return;
 
@@ -7104,7 +7103,8 @@  void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 
 	memcg = page->mem_cgroup;
 
-	/* Readahead page, never charged */
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, page);
 	if (!memcg)
 		return;
 
@@ -7168,7 +7168,8 @@  int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
 
 	memcg = page->mem_cgroup;
 
-	/* Readahead page, never charged */
+	/* Readahead page is charged too, to see if other page uncharged */
+	VM_WARN_ON_ONCE_PAGE(!memcg, page);
 	if (!memcg)
 		return 0;