Message ID | CAJtqMcZp5AVva2yOM4gJET8Gd_j_BGJDLTkcqRdJynVCiRRFxQ@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hugetl.c: keep the page mapping info when free_huge_page() hit the VM_BUG_ON_PAGE | expand |
On Tue 13-11-18 20:38:16, Yongkai Wu wrote: > It is better to keep page mapping info when free_huge_page() hit the > VM_BUG_ON_PAGE, > so we can get more infomation from the coredump for further analysis. The patch seems to be whitespace damaged. Put that aside, have you actually seen a case where preservning the page state would help to nail down any bug. I am not objecting to the patch, it actually makes some sense to me, I am just curious about a background motivation. > Signed-off-by: Yongkai Wu <nic_w@163.com> > --- > mm/hugetlb.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c007fb5..ba693bb 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1248,10 +1248,11 @@ void free_huge_page(struct page *page) > (struct hugepage_subpool *)page_private(page); > bool restore_reserve; > > + VM_BUG_ON_PAGE(page_count(page), page); > + VM_BUG_ON_PAGE(page_mapcount(page), page); > + > set_page_private(page, 0); > page->mapping = NULL; > - VM_BUG_ON_PAGE(page_count(page), page); > - VM_BUG_ON_PAGE(page_mapcount(page), page); > restore_reserve = PagePrivate(page); > ClearPagePrivate(page); > > -- > 1.8.3.1
Dear Maintainer, Actually i met a VM_BUG_ON_PAGE issue in centos7.4 some days ago.When the issue first happen, i just can know that it happen in free_huge_page() when doing soft offline huge page. But because page->mapping is set to null,i can not get any further information how the issue happen. So i modified the code as the patch show,and apply the new code to our produce line and wait some time, then the issue come again.And this time i can know the whole file path which trigger the issue by using crash tool to get the inode、dentry and so on,that help me to find a way to reproduce the issue quite easily and finally found the root cause and solve it. I think if keep the page->mapping,we can even do the rmap to check more detail info too by using the crash tool to analyse the coredump. So i think preservning the page state would more or less help to debug. But if it is not so meaningful,just let it go. ^_^ Thank you for your time. Best Regards On Tue, Nov 13, 2018 at 9:04 PM Michal Hocko <mhocko@kernel.org> wrote: > On Tue 13-11-18 20:38:16, Yongkai Wu wrote: > > It is better to keep page mapping info when free_huge_page() hit the > > VM_BUG_ON_PAGE, > > so we can get more infomation from the coredump for further analysis. > > The patch seems to be whitespace damaged. Put that aside, have you > actually seen a case where preservning the page state would help to nail > down any bug. > > I am not objecting to the patch, it actually makes some sense to me, I > am just curious about a background motivation. > > > Signed-off-by: Yongkai Wu <nic_w@163.com> > > --- > > mm/hugetlb.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index c007fb5..ba693bb 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1248,10 +1248,11 @@ void free_huge_page(struct page *page) > > (struct hugepage_subpool *)page_private(page); > > bool restore_reserve; > > > > + VM_BUG_ON_PAGE(page_count(page), page); > > + VM_BUG_ON_PAGE(page_mapcount(page), page); > > + > > set_page_private(page, 0); > > page->mapping = NULL; > > - VM_BUG_ON_PAGE(page_count(page), page); > > - VM_BUG_ON_PAGE(page_mapcount(page), page); > > restore_reserve = PagePrivate(page); > > ClearPagePrivate(page); > > > > -- > > 1.8.3.1 > > -- > Michal Hocko > SUSE Labs > <div dir="ltr"><div dir="ltr"><div>Dear Maintainer,</div><div>Actually i met a VM_BUG_ON_PAGE issue in centos7.4 some days ago.When the issue first happen,</div><div>i just can know that it happen in free_huge_page() when doing soft offline huge page.</div><div>But because page->mapping is set to null,i can not get any further information how the issue happen.</div><div><br></div><div>So i modified the code as the patch show,and apply the new code to our produce line and wait some time,</div><div>then the issue come again.And this time i can know the whole file path which trigger the issue by using </div><div>crash tool to get the inode、dentry and so on,that help me to find a way to reproduce the issue quite easily</div><div>and finally found the root cause and solve it.</div><div><br></div><div>I think if keep the page->mapping,we can even do the rmap to check more detail info too by using the crash tool to analyse the coredump.</div><div>So i think preservning the page state would more or less help to debug.</div><div>But if it is not so meaningful,just let it go. ^_^</div><div><br></div><div>Thank you for your time.</div><div><br></div><div>Best Regards</div></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Nov 13, 2018 at 9:04 PM Michal Hocko <<a href="mailto:mhocko@kernel.org">mhocko@kernel.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue 13-11-18 20:38:16, Yongkai Wu wrote:<br> > It is better to keep page mapping info when free_huge_page() hit the<br> > VM_BUG_ON_PAGE,<br> > so we can get more infomation from the coredump for further analysis.<br> <br> The patch seems to be whitespace damaged. Put that aside, have you<br> actually seen a case where preservning the page state would help to nail<br> down any bug.<br> <br> I am not objecting to the patch, it actually makes some sense to me, I<br> am just curious about a background motivation.<br> <br> > Signed-off-by: Yongkai Wu <<a href="mailto:nic_w@163.com" target="_blank">nic_w@163.com</a>><br> > ---<br> > mm/hugetlb.c | 5 +++--<br> > 1 file changed, 3 insertions(+), 2 deletions(-)<br> > <br> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c<br> > index c007fb5..ba693bb 100644<br> > --- a/mm/hugetlb.c<br> > +++ b/mm/hugetlb.c<br> > @@ -1248,10 +1248,11 @@ void free_huge_page(struct page *page)<br> > (struct hugepage_subpool *)page_private(page);<br> > bool restore_reserve;<br> > <br> > + VM_BUG_ON_PAGE(page_count(page), page);<br> > + VM_BUG_ON_PAGE(page_mapcount(page), page);<br> > +<br> > set_page_private(page, 0);<br> > page->mapping = NULL;<br> > - VM_BUG_ON_PAGE(page_count(page), page);<br> > - VM_BUG_ON_PAGE(page_mapcount(page), page);<br> > restore_reserve = PagePrivate(page);<br> > ClearPagePrivate(page);<br> > <br> > -- <br> > 1.8.3.1<br> <br> -- <br> Michal Hocko<br> SUSE Labs<br> </blockquote></div>
[Please do not top-post] On Tue 13-11-18 23:12:24, Yongkai Wu wrote: > Dear Maintainer, > Actually i met a VM_BUG_ON_PAGE issue in centos7.4 some days ago.When the > issue first happen, > i just can know that it happen in free_huge_page() when doing soft offline > huge page. > But because page->mapping is set to null,i can not get any further > information how the issue happen. > > So i modified the code as the patch show,and apply the new code to our > produce line and wait some time, > then the issue come again.And this time i can know the whole file path > which trigger the issue by using > crash tool to get the inode、dentry and so on,that help me to find a way to > reproduce the issue quite easily > and finally found the root cause and solve it. OK, thanks for the clarification. Please repost without the patch being clobbered by your email client and feel free to add Acked-by: Michal Hocko <mhocko@suse.com> A note about your experience in the changelog would be useful IMHO.
On 11/13/18 7:12 AM, Yongkai Wu wrote: > Dear Maintainer, > Actually i met a VM_BUG_ON_PAGE issue in centos7.4 some days ago.When the issue first happen, > i just can know that it happen in free_huge_page() when doing soft offline huge page. > But because page->mapping is set to null,i can not get any further information how the issue happen. > > So i modified the code as the patch show,and apply the new code to our produce line and wait some time, > then the issue come again.And this time i can know the whole file path which trigger the issue by using > crash tool to get the inode、dentry and so on,that help me to find a way to reproduce the issue quite easily > and finally found the root cause and solve it. Thank you for the information and the patch. As previously stated by Michal, please add some additional information to the change log (commit message) and fix the formatting of the patch. Can you tell us more about the root cause of your issue? What was the issue? How could you reproduce it? How did you solve it?
Dear Mike, Glad to get your reply.I am sorry for late reply because i am on business trip today.I will tell more detail about the issue i met later. And because i have no experience how to send a kernel patch correctly,so it may takes some time for me to learn that before i resend the patch. Best regards On Wed, Nov 14, 2018 at 2:05 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > On 11/13/18 7:12 AM, Yongkai Wu wrote: > > Dear Maintainer, > > Actually i met a VM_BUG_ON_PAGE issue in centos7.4 some days ago.When > the issue first happen, > > i just can know that it happen in free_huge_page() when doing soft > offline huge page. > > But because page->mapping is set to null,i can not get any further > information how the issue happen. > > > > So i modified the code as the patch show,and apply the new code to our > produce line and wait some time, > > then the issue come again.And this time i can know the whole file path > which trigger the issue by using > > crash tool to get the inode、dentry and so on,that help me to find a way > to reproduce the issue quite easily > > and finally found the root cause and solve it. > > Thank you for the information and the patch. > > As previously stated by Michal, please add some additional information to > the > change log (commit message) and fix the formatting of the patch. > > Can you tell us more about the root cause of your issue? What was the > issue? > How could you reproduce it? How did you solve it? > -- > Mike Kravetz > <div dir="ltr"><div dir="ltr">Dear Mike,<div>Glad to get your reply.I am sorry for late reply because i am on business trip today.I will tell more detail about the issue i met later.</div><div>And because i have no experience how to send a kernel patch correctly,so it may takes some time for me to learn that before i resend the patch.</div><div><br></div><div>Best regards</div></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Nov 14, 2018 at 2:05 AM Mike Kravetz <<a href="mailto:mike.kravetz@oracle.com">mike.kravetz@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 11/13/18 7:12 AM, Yongkai Wu wrote:<br> > Dear Maintainer,<br> > Actually i met a VM_BUG_ON_PAGE issue in centos7.4 some days ago.When the issue first happen,<br> > i just can know that it happen in free_huge_page() when doing soft offline huge page.<br> > But because page->mapping is set to null,i can not get any further information how the issue happen.<br> > <br> > So i modified the code as the patch show,and apply the new code to our produce line and wait some time,<br> > then the issue come again.And this time i can know the whole file path which trigger the issue by using <br> > crash tool to get the inode、dentry and so on,that help me to find a way to reproduce the issue quite easily<br> > and finally found the root cause and solve it.<br> <br> Thank you for the information and the patch.<br> <br> As previously stated by Michal, please add some additional information to the<br> change log (commit message) and fix the formatting of the patch.<br> <br> Can you tell us more about the root cause of your issue? What was the issue?<br> How could you reproduce it? How did you solve it?<br> -- <br> Mike Kravetz<br> </blockquote></div>
Dear Maintainer, The following is the detail information of the issue i meet. It may be too boring to read,but i try to explain it more detail. We use centos 7.4(kernel version:3.10.0-693.el7.x86_64). Several days ago,i met a kernel panic issue,the kernel log showed: [759990.616719] EDAC MC2: 51 CE memory read error on CPU_SrcID#1_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x4dd4f69 offset:0xcc0 grain:32 syndrome:0x0 - OVERFLOW err_code:0101:0091 socket:1 imc:0 rank:0 bg:1 ba:1 row:1371a col:1a8) [759990.627721] soft offline: 0x4dd4f69: migration failed 1, type 6fffff00008000 [759990.627743] ------------[ cut here ]------------ [759990.627763] kernel BUG at /data/rpmbuild/BUILD/kernel-3.10.0/kernel-3.10.0/mm/hugetlb.c:1250! [759990.628390] CPU: 27 PID: 457768 Comm: mcelog [759990.628413] Hardware name: Lenovo HR650X /HR650X , BIOS HR6N0333 06/23/2018 [759990.628433] task: ffff882f7f4f4f10 ti: ffff883326f10000 task.ti: ffff883326f10000 [759990.628452] RIP: 0010:[<ffffffff811cc094>] [<ffffffff811cc094>] free_huge_page+0x1e4/0x200 [759990.628479] RSP: 0018:ffff883326f13d80 EFLAGS: 00010213 [759990.628493] RAX: 0000000000000001 RBX: ffffea0137000000 RCX: 0000000000000012 [759990.628511] RDX: 0000000040000000 RSI: ffffffff81f55500 RDI: ffffea0137000000 [759990.628529] RBP: ffff883326f13da8 R08: ffffffff81f4e0e8 R09: 0000000000000000 ...... [759990.628741] Call Trace: [759990.628752] [<ffffffff8169fe6a>] __put_compound_page+0x1f/0x22 [759990.628768] [<ffffffff8169fea2>] put_compound_page+0x35/0x174 [759990.628786] [<ffffffff811905a5>] put_page+0x45/0x50 [759990.629591] [<ffffffff811d09a0>] putback_active_hugepage+0xd0/0xf0 [759990.630365] [<ffffffff811fa0eb>] soft_offline_page+0x4db/0x580 [759990.631134] [<ffffffff81453595>] store_soft_offline_page+0xa5/0xe0 [759990.631900] [<ffffffff8143a298>] dev_attr_store+0x18/0x30 [759990.632660] [<ffffffff81280486>] sysfs_write_file+0xc6/0x140 [759990.633409] [<ffffffff8120101d>] vfs_write+0xbd/0x1e0 [759990.634148] [<ffffffff81201e2f>] SyS_write+0x7f/0xe0 [759990.634870] [<ffffffff816b4849>] system_call_fastpath+0x16/0x1b I check the coredump,by disassembly free_huge_page() : 0xffffffff811cbf78 <free_huge_page+0xc8>: cmp $0xffffffff,%eax 0xffffffff811cbf7b <free_huge_page+0xcb>: jne 0xffffffff811cc094 <free_huge_page+0x1e4> ...... 0xffffffff811cc094 <free_huge_page+0x1e4>: ud2 and check the sourcecode,i can only know that the panic reason is: page->_count=0 but page->_mapcount=1,so hit the BUG_ON(page_mapcount(page)); But can not get any further clue how the issue happen. So i modify the code as the patch show,and apply the new code to our produce line and wait some days,then the issue come again on another server. And this time,by analyse the coredump using crash tool,i can know the whole file path which trigger the issue. For example: crash> page.mapping ffffea02f9000000 mapping = 0xffff88b098ae8160 crash> address_space.host 0xffff88b098ae8160 host = 0xffff88b098ae8010 crash> inode.i_dentry 0xffff88b098ae8010 i_dentry = { first = 0xffff88b0bbeb58b0 } crash> dentry.d_name.name -l dentry.d_alias 0xffff88b0bbeb58b0 d_name.name = 0xffff88b0bbeb5838 "file_a" So i can know the issue happen when doing soft offline to the page of the file "file_a". And i can also know the whole file path by list the dentry.d_parent and check the dentry name. Check with other team,i know that their user component will use file_a all the time, so the page->_mapcount not equal to -1 seems normal,and page->_count=0 is abnormal at that time. I guess if i triggeer a soft offline to the physical addr of the page using by file_a,maybe the issue can reproduce. So i write a user application to mmap to file_a and get the physical addr of the page,the key step just as the following: fd = open(FILE_A_PATH,O_RDWR,0666); buf = mmap(NULL, pagesize, PROT_READ, MAP_SHARED, fd, 0); phys_addr = vtop((unsigned long long)buf); In the function vtop(),i use "/proc/pid/pagemap" to get the physical addr of the page. Suppose that the physical addr is 0xbe40000000,then i can trigger a soft offline to the addr: echo 0xbe40000000 > /sys/devices/system/memory/soft_offline_page And after i trigger two or more times,the issue reproduce. Then i use systemtap to probe page->_count and page->_mapcount in the fucntions soft_offline_page(),putback_active_hugepage() and migrate_pages() part of my systemtap script: function get_page_mapcount:long (page:long) %{ struct page *page; page = (struct page *)STAP_ARG_page; if(page == NULL) STAP_RETURN(NULL); else STAP_RETURN(page_mapcount(page)-1); %} probe kernel.function("migrate_pages") { page2 = get_page_from_migrate_pages($from); printf("Now exec migrate_pages -- page=%p,pfn=%ld,phy_addr=0x%lx,page_flags=0x%lx\n",page2,get_page_pfn(page2),get_page_phy_addr(page2),get_page_flags(page2)); printf("page->mapping=%p,page->_count=%d,page->_mapcount=%d\n",get_page_mapping(page2),get_page_count(page2),get_page_mapcount(page2)); print_backtrace(); } Then i trigger soft offline to reproduce the issue,and finally find the root cause: In centos 7.4,when run into soft_offline_huge_page(),get_any_page() only increase the page->_count by 1, (isolate_huge_page() will also inc page->_count by 1 but then put_page() will release the ref) because we use 1 GB size of hugepage,so hugepage_migration_supported() will always return false, so soft_offline_huge_page() --> migrate_pages() --> unmap_and_move_huge_page() will call putback_active_hugepage() to decrease page->_count by 1 just as the code show: static int unmap_and_move_huge_page(new_page_t get_new_page, { ...... if (!hugepage_migration_supported(page_hstate(hpage))) { putback_active_hugepage(hpage); // ==> will decrease page->_count by 1 return -ENOSYS; } ...... Then when return to soft_offline_huge_page,page->_count will be decrease by 1 again by putback_active_hugepage(): static int soft_offline_huge_page(struct page *page, int flags) { ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); putback_active_hugepage(hpage); // ==> here will decrease page->_count by 1 again ...... } else { ...... } } So we can know when call soft_offline_page() to the 1 GB size of hugepage,page->_count will be abnormally decrease by 1! 【 I remove one putback_active_hugepage() in soft_offline_huge_page() to fix this issue. 】 And i check the latest kernel code on git hub(4.19),it seems already fix this issue by the following code: static int soft_offline_huge_page(struct page *page, int flags) { ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n", pfn, ret, page->flags, &page->flags); if (!list_empty(&pagelist)) // ==> seems this code can fix the issue i meet putback_movable_pages(&pagelist); if (ret > 0) ret = -EIO; } else { } } But i can not find a similar bug fix report or commit log. <div dir="ltr">Dear Maintainer,<div>The following is the detail information of the issue i meet.</div><div>It may be too boring to read,but i try to explain it more detail.</div><div><br></div><div><br></div><div><br></div><div><div>We use centos 7.4(kernel version:3.10.0-693.el7.x86_64).</div><div>Several days ago,i met a kernel panic issue,the kernel log showed:</div><div><br></div><div>[759990.616719] EDAC MC2: 51 CE memory read error on CPU_SrcID#1_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x4dd4f69 offset:0xcc0 grain:32 syndrome:0x0 - OVERFLOW err_code:0101:0091 socket:1 imc:0 rank:0 bg:1 ba:1 row:1371a col:1a8)</div><div>[759990.627721] soft offline: 0x4dd4f69: migration failed 1, type 6fffff00008000</div><div>[759990.627743] ------------[ cut here ]------------</div><div>[759990.627763] kernel BUG at /data/rpmbuild/BUILD/kernel-3.10.0/kernel-3.10.0/mm/hugetlb.c:1250!</div><div>[759990.628390] CPU: 27 PID: 457768 Comm: mcelog</div><div>[759990.628413] Hardware name: Lenovo HR650X /HR650X , BIOS HR6N0333 06/23/2018</div><div>[759990.628433] task: ffff882f7f4f4f10 ti: ffff883326f10000 task.ti: ffff883326f10000</div><div>[759990.628452] RIP: 0010:[<ffffffff811cc094>] [<ffffffff811cc094>] free_huge_page+0x1e4/0x200</div><div>[759990.628479] RSP: 0018:ffff883326f13d80 EFLAGS: 00010213</div><div>[759990.628493] RAX: 0000000000000001 RBX: ffffea0137000000 RCX: 0000000000000012</div><div>[759990.628511] RDX: 0000000040000000 RSI: ffffffff81f55500 RDI: ffffea0137000000</div><div>[759990.628529] RBP: ffff883326f13da8 R08: ffffffff81f4e0e8 R09: 0000000000000000</div><div>......</div><div>[759990.628741] Call Trace:</div><div>[759990.628752] [<ffffffff8169fe6a>] __put_compound_page+0x1f/0x22</div><div>[759990.628768] [<ffffffff8169fea2>] put_compound_page+0x35/0x174</div><div>[759990.628786] [<ffffffff811905a5>] put_page+0x45/0x50</div><div>[759990.629591] [<ffffffff811d09a0>] putback_active_hugepage+0xd0/0xf0</div><div>[759990.630365] [<ffffffff811fa0eb>] soft_offline_page+0x4db/0x580</div><div>[759990.631134] [<ffffffff81453595>] store_soft_offline_page+0xa5/0xe0</div><div>[759990.631900] [<ffffffff8143a298>] dev_attr_store+0x18/0x30</div><div>[759990.632660] [<ffffffff81280486>] sysfs_write_file+0xc6/0x140</div><div>[759990.633409] [<ffffffff8120101d>] vfs_write+0xbd/0x1e0</div><div>[759990.634148] [<ffffffff81201e2f>] SyS_write+0x7f/0xe0</div><div>[759990.634870] [<ffffffff816b4849>] system_call_fastpath+0x16/0x1b</div><div><br></div><div><br></div><div><br></div><div><br></div><div>I check the coredump,by disassembly free_huge_page() :</div><div>0xffffffff811cbf78 <free_huge_page+0xc8>: cmp $0xffffffff,%eax</div><div>0xffffffff811cbf7b <free_huge_page+0xcb>: jne 0xffffffff811cc094 <free_huge_page+0x1e4></div><div>......</div><div>0xffffffff811cc094 <free_huge_page+0x1e4>: ud2</div><div><br></div><div><br></div><div>and check the sourcecode,i can only know that the panic reason is:</div><div>page->_count=0 but page->_mapcount=1,so hit the</div><div>BUG_ON(page_mapcount(page));</div><div>But can not get any further clue how the issue happen.</div><div><br></div><div><br></div><div><br></div><div><br></div><div>So i modify the code as the patch show,and apply the new code to our produce line and wait some days,then the issue come again on another server.</div><div>And this time,by analyse the coredump using crash tool,i can know the whole file path which trigger the issue.</div><div>For example:</div><div>crash> page.mapping ffffea02f9000000</div><div> mapping = 0xffff88b098ae8160</div><div>crash> address_space.host 0xffff88b098ae8160</div><div> host = 0xffff88b098ae8010</div><div>crash> inode.i_dentry 0xffff88b098ae8010</div><div> i_dentry = {</div><div> first = 0xffff88b0bbeb58b0</div><div> }</div><div>crash> <a href="http://dentry.d_name.name">dentry.d_name.name</a> -l dentry.d_alias 0xffff88b0bbeb58b0</div><div> <a href="http://d_name.name">d_name.name</a> = 0xffff88b0bbeb5838 "file_a"</div><div><br></div><div>So i can know the issue happen when doing soft offline to the page of the file "file_a".</div><div>And i can also know the whole file path by list the dentry.d_parent and check the dentry name.</div><div><br></div><div><br></div><div><br></div><div><br></div><div>Check with other team,i know that their user component will use file_a all the time,</div><div>so the page->_mapcount not equal to -1 seems normal,and page->_count=0 is abnormal at that time.</div><div><br></div><div>I guess if i triggeer a soft offline to the physical addr of the page using by file_a,maybe the issue can reproduce.</div><div>So i write a user application to mmap to file_a and get the physical addr of the page,the key step just as the following:</div><div> fd = open(FILE_A_PATH,O_RDWR,0666);</div><div> buf = mmap(NULL, pagesize, PROT_READ, MAP_SHARED, fd, 0);</div><div> phys_addr = vtop((unsigned long long)buf);</div><div>In the function vtop(),i use "/proc/pid/pagemap" to get the physical addr of the page.</div><div><br></div><div>Suppose that the physical addr is 0xbe40000000,then i can trigger a soft offline to the addr:</div><div> echo 0xbe40000000 > /sys/devices/system/memory/soft_offline_page</div><div>And after i trigger two or more times,the issue reproduce.</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>Then i use systemtap to probe page->_count and page->_mapcount in the fucntions soft_offline_page(),putback_active_hugepage() and migrate_pages()</div><div>part of my systemtap script:</div><div><br></div><div>function get_page_mapcount:long (page:long) %{</div><div> struct page *page;</div><div><br></div><div> page = (struct page *)STAP_ARG_page;</div><div> if(page == NULL)</div><div> STAP_RETURN(NULL);</div><div> else</div><div> STAP_RETURN(page_mapcount(page)-1);</div><div>%}</div><div><br></div><div>probe kernel.function("migrate_pages")</div><div>{</div><div> page2 = get_page_from_migrate_pages($from);</div><div> printf("Now exec migrate_pages -- page=%p,pfn=%ld,phy_addr=0x%lx,page_flags=0x%lx\n",page2,get_page_pfn(page2),get_page_phy_addr(page2),get_page_flags(page2));</div><div> printf("page->mapping=%p,page->_count=%d,page->_mapcount=%d\n",get_page_mapping(page2),get_page_count(page2),get_page_mapcount(page2));</div><div> print_backtrace();</div><div>}</div><div><br></div><div>Then i trigger soft offline to reproduce the issue,and finally find the root cause:</div><div>In centos 7.4,when run into soft_offline_huge_page(),get_any_page() only increase the page->_count by 1,</div><div>(isolate_huge_page() will also inc page->_count by 1 but then put_page() will release the ref)</div><div><br></div><div>because we use 1 GB size of hugepage,so hugepage_migration_supported() will always return false,</div><div>so soft_offline_huge_page() --> migrate_pages() --> unmap_and_move_huge_page() will call putback_active_hugepage() to decrease page->_count by 1 just as the code show:</div><div>static int unmap_and_move_huge_page(new_page_t get_new_page,</div><div>{</div><div>......</div><div><span style="white-space:pre"> </span>if (!hugepage_migration_supported(page_hstate(hpage))) {</div><div><span style="white-space:pre"> </span> putback_active_hugepage(hpage); // ==> will decrease page->_count by 1</div><div><span style="white-space:pre"> </span> return -ENOSYS;</div><div><span style="white-space:pre"> </span>}</div><div>......</div><div><br></div><div>Then when return to soft_offline_huge_page,page->_count will be decrease by 1 again by putback_active_hugepage():</div><div>static int soft_offline_huge_page(struct page *page, int flags)</div><div>{</div><div> <span style="white-space:pre"> </span>ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,</div><div> <span style="white-space:pre"> </span>MIGRATE_SYNC, MR_MEMORY_FAILURE);</div><div> <span style="white-space:pre"> </span>if (ret) {</div><div> <span style="white-space:pre"> </span>pr_info("soft offline: %#lx: migration failed %d, type %lx\n",</div><div> <span style="white-space:pre"> </span>pfn, ret, page->flags);</div><div> <span style="white-space:pre"> </span>putback_active_hugepage(hpage); // ==> here will decrease page->_count by 1 again</div><div> <span style="white-space:pre"> </span>......</div><div> <span style="white-space:pre"> </span>} else {</div><div> <span style="white-space:pre"> </span> ......</div><div> <span style="white-space:pre"> </span>}</div><div>}</div><div><br></div><div><br></div><div>So we can know when call soft_offline_page() to the 1 GB size of hugepage,page->_count will be abnormally decrease by 1!</div><div><br></div><div><br></div><div><br></div><div>【 I remove one putback_active_hugepage() in soft_offline_huge_page() to fix this issue. 】</div><div><br></div><div>And i check the latest kernel code on git hub(4.19),it seems already fix this issue by the following code:</div><div>static int soft_offline_huge_page(struct page *page, int flags)</div><div>{</div><div> <span style="white-space:pre"> </span>ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,</div><div> <span style="white-space:pre"> </span>MIGRATE_SYNC, MR_MEMORY_FAILURE);</div><div> <span style="white-space:pre"> </span>if (ret) {</div><div> <span style="white-space:pre"> </span>pr_info("soft offline: %#lx: hugepage migration failed %d, type %lx (%pGp)\n",</div><div> <span style="white-space:pre"> </span> pfn, ret, page->flags, &page->flags);</div><div> <span style="white-space:pre"> </span>if (!list_empty(&pagelist)) // ==> seems this code can fix the issue i meet</div><div> <span style="white-space:pre"> </span> putback_movable_pages(&pagelist);</div><div> <span style="white-space:pre"> </span>if (ret > 0)</div><div> <span style="white-space:pre"> </span>ret = -EIO;</div><div> <span style="white-space:pre"> </span>} else {</div><div> <span style="white-space:pre"> </span>}</div><div>}</div><div>But i can not find a similar bug fix report or commit log.</div></div><br class="gmail-Apple-interchange-newline"></div>
On Thu 15-11-18 16:30:32, Yongkai Wu wrote:
[...]
Thanks for the clarification. It can be helpful for somebody trying to
debug a similar issue in the future.
> But i can not find a similar bug fix report or commit log.
What about 6bc9b56433b7 ("mm: fix race on soft-offlining free huge pages") ?
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c007fb5..ba693bb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1248,10 +1248,11 @@ void free_huge_page(struct page *page) (struct hugepage_subpool *)page_private(page); bool restore_reserve; + VM_BUG_ON_PAGE(page_count(page), page); + VM_BUG_ON_PAGE(page_mapcount(page), page); + set_page_private(page, 0); page->mapping = NULL; - VM_BUG_ON_PAGE(page_count(page), page); - VM_BUG_ON_PAGE(page_mapcount(page), page); restore_reserve = PagePrivate(page); ClearPagePrivate(page);
It is better to keep page mapping info when free_huge_page() hit the VM_BUG_ON_PAGE, so we can get more infomation from the coredump for further analysis. Signed-off-by: Yongkai Wu <nic_w@163.com> --- mm/hugetlb.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)