diff mbox series

mm/page_owner: fix a crash after memory offline

Message ID 1570732366-16426-1-git-send-email-cai@lca.pw (mailing list archive)
State New, archived
Headers show
Series mm/page_owner: fix a crash after memory offline | expand

Commit Message

Qian Cai Oct. 10, 2019, 6:32 p.m. UTC
The linux-next series "mm/memory_hotplug: Shrink zones before removing
memory" [1] seems make a crash easier to reproduce while reading
/proc/pagetypeinfo after offlining a memory section. Fix it by using
pfn_to_online_page() in the PFN walker.

[1] https://lore.kernel.org/linux-mm/20191006085646.5768-1-david@redhat.com/

page:ffffea0021200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
There is not page extension available.
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1107!
RIP: 0010:pagetypeinfo_showmixedcount_print+0x4fb/0x680
Call Trace:
 walk_zones_in_node+0x3a/0xc0
 pagetypeinfo_show+0x260/0x2c0
 seq_read+0x27e/0x710
 proc_reg_read+0x12e/0x190
 __vfs_read+0x50/0xa0
 vfs_read+0xcb/0x1e0
 ksys_read+0xc6/0x160
 __x64_sys_read+0x43/0x50
 do_syscall_64+0xcc/0xaec
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Qian Cai <cai@lca.pw>
---
 mm/page_owner.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

David Hildenbrand Oct. 10, 2019, 6:55 p.m. UTC | #1
On 10.10.19 20:32, Qian Cai wrote:
> The linux-next series "mm/memory_hotplug: Shrink zones before removing
> memory" [1] seems make a crash easier to reproduce while reading
> /proc/pagetypeinfo after offlining a memory section. Fix it by using
> pfn_to_online_page() in the PFN walker.

Can you please rephrase the subject+description to describe the actual 
problem and drop the reference to the series?

E.g., similar to my recent patches:

"mm/page_owner: Don't access uninitialized memmaps when reading 
/proc/pagetypeinfo

Uninitialized memmaps contain garbage and in the worst case trigger 
kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone 
and reading /proc/pagetypeinfo, we can trigger a kernel BUG: ...
"

However, you also have to justify why it is okay to no longer consider 
ZONE_DEVICE (I think walk_zones_in_node() will skip ZONE_DEVICE due to 
assert_populated == true and ZONE_DEVICE will never be populated, 
Therefore, we will never end in this code path with ZONE_DEVICE).


> 
> [1] https://lore.kernel.org/linux-mm/20191006085646.5768-1-david@redhat.com/
> 
> page:ffffea0021200000 is uninitialized and poisoned
> raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> There is not page extension available.
> ------------[ cut here ]------------
> kernel BUG at include/linux/mm.h:1107!
> RIP: 0010:pagetypeinfo_showmixedcount_print+0x4fb/0x680
> Call Trace:
>   walk_zones_in_node+0x3a/0xc0
>   pagetypeinfo_show+0x260/0x2c0
>   seq_read+0x27e/0x710
>   proc_reg_read+0x12e/0x190
>   __vfs_read+0x50/0xa0
>   vfs_read+0xcb/0x1e0
>   ksys_read+0xc6/0x160
>   __x64_sys_read+0x43/0x50
>   do_syscall_64+0xcc/0xaec
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>   mm/page_owner.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index dee931184788..03a6b19b3cdd 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -296,11 +296,10 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
>   		pageblock_mt = get_pageblock_migratetype(page);
>   

What about the pfn_valid() in the outermost loop? You can skip over the 
whole pageblock if the first page is not online.

>   		for (; pfn < block_end_pfn; pfn++) {
> -			if (!pfn_valid_within(pfn))
> +			page = pfn_to_online_page(pfn);
> +			if (!page)
>   				continue;
>   
> -			page = pfn_to_page(pfn);
> -
>   			if (page_zone(page) != zone)
>   				continue;
>   
>
Qian Cai Oct. 10, 2019, 7:12 p.m. UTC | #2
On Thu, 2019-10-10 at 20:55 +0200, David Hildenbrand wrote:
> On 10.10.19 20:32, Qian Cai wrote:
> > The linux-next series "mm/memory_hotplug: Shrink zones before removing
> > memory" [1] seems make a crash easier to reproduce while reading
> > /proc/pagetypeinfo after offlining a memory section. Fix it by using
> > pfn_to_online_page() in the PFN walker.
> 
> Can you please rephrase the subject+description to describe the actual 
> problem and drop the reference to the series?

I'd figure it is better for you to post this as you are on the top of this whole
mess. What do you think?

> 
> E.g., similar to my recent patches:
> 
> "mm/page_owner: Don't access uninitialized memmaps when reading 
> /proc/pagetypeinfo
> 
> Uninitialized memmaps contain garbage and in the worst case trigger 
> kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
> touched.
> 
> For example, when not onlining a memory block that is spanned by a zone 
> and reading /proc/pagetypeinfo, we can trigger a kernel BUG: ...
> "
> 
> However, you also have to justify why it is okay to no longer consider 
> ZONE_DEVICE (I think walk_zones_in_node() will skip ZONE_DEVICE due to 
> assert_populated == true and ZONE_DEVICE will never be populated, 
> Therefore, we will never end in this code path with ZONE_DEVICE).
> 
> 
> > 
> > [1] https://lore.kernel.org/linux-mm/20191006085646.5768-1-david@redhat.com/
> > 
> > page:ffffea0021200000 is uninitialized and poisoned
> > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> > There is not page extension available.
> > ------------[ cut here ]------------
> > kernel BUG at include/linux/mm.h:1107!
> > RIP: 0010:pagetypeinfo_showmixedcount_print+0x4fb/0x680
> > Call Trace:
> >   walk_zones_in_node+0x3a/0xc0
> >   pagetypeinfo_show+0x260/0x2c0
> >   seq_read+0x27e/0x710
> >   proc_reg_read+0x12e/0x190
> >   __vfs_read+0x50/0xa0
> >   vfs_read+0xcb/0x1e0
> >   ksys_read+0xc6/0x160
> >   __x64_sys_read+0x43/0x50
> >   do_syscall_64+0xcc/0xaec
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >   mm/page_owner.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/page_owner.c b/mm/page_owner.c
> > index dee931184788..03a6b19b3cdd 100644
> > --- a/mm/page_owner.c
> > +++ b/mm/page_owner.c
> > @@ -296,11 +296,10 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
> >   		pageblock_mt = get_pageblock_migratetype(page);
> >   
> 
> What about the pfn_valid() in the outermost loop? You can skip over the 
> whole pageblock if the first page is not online.
> 
> >   		for (; pfn < block_end_pfn; pfn++) {
> > -			if (!pfn_valid_within(pfn))
> > +			page = pfn_to_online_page(pfn);
> > +			if (!page)
> >   				continue;
> >   
> > -			page = pfn_to_page(pfn);
> > -
> >   			if (page_zone(page) != zone)
> >   				continue;
> >   
> > 
> 
>
David Hildenbrand Oct. 11, 2019, 9:57 a.m. UTC | #3
On 10.10.19 21:12, Qian Cai wrote:
> On Thu, 2019-10-10 at 20:55 +0200, David Hildenbrand wrote:
>> On 10.10.19 20:32, Qian Cai wrote:
>>> The linux-next series "mm/memory_hotplug: Shrink zones before removing
>>> memory" [1] seems make a crash easier to reproduce while reading
>>> /proc/pagetypeinfo after offlining a memory section. Fix it by using
>>> pfn_to_online_page() in the PFN walker.
>>
>> Can you please rephrase the subject+description to describe the actual
>> problem and drop the reference to the series?
> 
> I'd figure it is better for you to post this as you are on the top of this whole
> mess. What do you think?

You mean, I found the key to an unlimited amount of undetected BUGs, so 
I should fix them? ;)

In case you don't have time to explore all the dirty details, I can take 
it from here. Just let me know.
Qian Cai Oct. 11, 2019, 11:07 a.m. UTC | #4
> On Oct 11, 2019, at 5:57 AM, David Hildenbrand <david@redhat.com> wrote:
> 
> You mean, I found the key to an unlimited amount of undetected BUGs, so I should fix them? ;)

No, it is more of for the sake of consistency, and the fact that all those bugs are only starting to show up only recently after applied to your series, so you will probably explain for a better changelog why that fact is not important.

> 
> In case you don't have time to explore all the dirty details, I can take it from here. Just let me know.

Indeed, I will appreciate that if I don’t need to dig all those dirty details again myself.
diff mbox series

Patch

diff --git a/mm/page_owner.c b/mm/page_owner.c
index dee931184788..03a6b19b3cdd 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -296,11 +296,10 @@  void pagetypeinfo_showmixedcount_print(struct seq_file *m,
 		pageblock_mt = get_pageblock_migratetype(page);
 
 		for (; pfn < block_end_pfn; pfn++) {
-			if (!pfn_valid_within(pfn))
+			page = pfn_to_online_page(pfn);
+			if (!page)
 				continue;
 
-			page = pfn_to_page(pfn);
-
 			if (page_zone(page) != zone)
 				continue;