Message ID | 20181210130712.30148-2-zaslonko@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Initialize struct pages for the full section | expand |
On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote: > If memory end is not aligned with the sparse memory section boundary, the > mapping of such a section is only partly initialized. It would be great to mention how you can end up in the situation like this(a user provided memmap or a strange HW). > This may lead to > VM_BUG_ON due to uninitialized struct page access from > is_mem_section_removable() or test_pages_in_a_zone() function triggered by > memory_hotplug sysfs handlers: > > page:000003d082008000 is uninitialized and poisoned > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) > Call Trace: > ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) > [<00000000008f15c4>] show_valid_zones+0x5c/0x190 > [<00000000008cf9c4>] dev_attr_show+0x34/0x70 > [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 > [<00000000003e4194>] seq_read+0x204/0x480 > [<00000000003b53ea>] __vfs_read+0x32/0x178 > [<00000000003b55b2>] vfs_read+0x82/0x138 > [<00000000003b5be2>] ksys_read+0x5a/0xb0 > [<0000000000b86ba0>] system_call+0xdc/0x2d8 > Last Breaking-Event-Address: > [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 > Kernel panic - not syncing: Fatal exception: panic_on_oops > > Fix the problem by initializing the last memory section of the highest zone > in memmap_init_zone() till the very end, even if it goes beyond the zone > end. Why do we need to restrict this to the highest zone? In other words, why cannot we do what I was suggesting earlier [1]. What does prevent other zones to have an incomplete section boundary? [1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz > Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> > Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> > Cc: <stable@vger.kernel.org> > --- > mm/page_alloc.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 2ec9cc407216..41ef5508e5f1 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > cond_resched(); > } > } > +#ifdef CONFIG_SPARSEMEM > + /* > + * If there is no zone spanning the rest of the section > + * then we should at least initialize those pages. Otherwise we > + * could blow up on a poisoned page in some paths which depend > + * on full sections being initialized (e.g. memory hotplug). > + */ > + if (end_pfn == max_pfn) { > + while (end_pfn % PAGES_PER_SECTION) { > + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, > + nid); > + end_pfn++; > + } > + } > +#endif > } > > #ifdef CONFIG_ZONE_DEVICE > -- > 2.16.4
On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote: >If memory end is not aligned with the sparse memory section boundary, the >mapping of such a section is only partly initialized. This may lead to >VM_BUG_ON due to uninitialized struct page access from >is_mem_section_removable() or test_pages_in_a_zone() function triggered by >memory_hotplug sysfs handlers: > > page:000003d082008000 is uninitialized and poisoned > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) > Call Trace: > ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) > [<00000000008f15c4>] show_valid_zones+0x5c/0x190 > [<00000000008cf9c4>] dev_attr_show+0x34/0x70 > [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 > [<00000000003e4194>] seq_read+0x204/0x480 > [<00000000003b53ea>] __vfs_read+0x32/0x178 > [<00000000003b55b2>] vfs_read+0x82/0x138 > [<00000000003b5be2>] ksys_read+0x5a/0xb0 > [<0000000000b86ba0>] system_call+0xdc/0x2d8 > Last Breaking-Event-Address: > [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 > Kernel panic - not syncing: Fatal exception: panic_on_oops > >Fix the problem by initializing the last memory section of the highest zone >in memmap_init_zone() till the very end, even if it goes beyond the zone >end. > >Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >Cc: <stable@vger.kernel.org> >--- > mm/page_alloc.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > >diff --git a/mm/page_alloc.c b/mm/page_alloc.c >index 2ec9cc407216..41ef5508e5f1 100644 >--- a/mm/page_alloc.c >+++ b/mm/page_alloc.c >@@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > cond_resched(); > } > } >+#ifdef CONFIG_SPARSEMEM >+ /* >+ * If there is no zone spanning the rest of the section >+ * then we should at least initialize those pages. Otherwise we >+ * could blow up on a poisoned page in some paths which depend >+ * on full sections being initialized (e.g. memory hotplug). >+ */ >+ if (end_pfn == max_pfn) { >+ while (end_pfn % PAGES_PER_SECTION) { >+ __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >+ nid); >+ end_pfn++; >+ } >+ } >+#endif If my understanding is correct, end_pfn is not a valid range. memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I am afraid this will break the syntax. And max_pfn is also not a valid one. For example, on x86, update_end_of_memory_vars() will update max_pfn, which is calculated by: end_pfn = PFN_UP(start + size); BTW, as you mentioned this apply to hotplug case. And then why this couldn't happen during boot up? What differ these two cases? > } > > #ifdef CONFIG_ZONE_DEVICE >-- >2.16.4
Hello, On 10.12.2018 14:24, Michal Hocko wrote: > On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote: >> If memory end is not aligned with the sparse memory section boundary, the >> mapping of such a section is only partly initialized. > > It would be great to mention how you can end up in the situation like > this(a user provided memmap or a strange HW). Yes, I should probably keep full failure samples from my previous patch version. > >> This may lead to >> VM_BUG_ON due to uninitialized struct page access from >> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >> memory_hotplug sysfs handlers: >> >> page:000003d082008000 is uninitialized and poisoned >> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >> Call Trace: >> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >> [<00000000003e4194>] seq_read+0x204/0x480 >> [<00000000003b53ea>] __vfs_read+0x32/0x178 >> [<00000000003b55b2>] vfs_read+0x82/0x138 >> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >> Last Breaking-Event-Address: >> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >> Kernel panic - not syncing: Fatal exception: panic_on_oops >> >> Fix the problem by initializing the last memory section of the highest zone >> in memmap_init_zone() till the very end, even if it goes beyond the zone >> end. > > Why do we need to restrict this to the highest zone? In other words, why > cannot we do what I was suggesting earlier [1]. What does prevent other > zones to have an incomplete section boundary? Well, as you were also suggesting earlier: 'If we do not have a zone which spans the rest of the section'. I'm not sure how else we should verify that. Moreover, I was able to recreate the problem only with the highest zone (memory end is not on the section boundary). > > [1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz > >> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >> Cc: <stable@vger.kernel.org> >> --- >> mm/page_alloc.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2ec9cc407216..41ef5508e5f1 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >> cond_resched(); >> } >> } >> +#ifdef CONFIG_SPARSEMEM >> + /* >> + * If there is no zone spanning the rest of the section >> + * then we should at least initialize those pages. Otherwise we >> + * could blow up on a poisoned page in some paths which depend >> + * on full sections being initialized (e.g. memory hotplug). >> + */ >> + if (end_pfn == max_pfn) { >> + while (end_pfn % PAGES_PER_SECTION) { >> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >> + nid); >> + end_pfn++; >> + } >> + } >> +#endif >> } >> >> #ifdef CONFIG_ZONE_DEVICE >> -- >> 2.16.4 > Thanks, Mikhail Zaslonko
Hello, On 10.12.2018 16:10, Wei Yang wrote: > On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote: >> If memory end is not aligned with the sparse memory section boundary, the >> mapping of such a section is only partly initialized. This may lead to >> VM_BUG_ON due to uninitialized struct page access from >> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >> memory_hotplug sysfs handlers: >> >> page:000003d082008000 is uninitialized and poisoned >> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >> Call Trace: >> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >> [<00000000003e4194>] seq_read+0x204/0x480 >> [<00000000003b53ea>] __vfs_read+0x32/0x178 >> [<00000000003b55b2>] vfs_read+0x82/0x138 >> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >> Last Breaking-Event-Address: >> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >> Kernel panic - not syncing: Fatal exception: panic_on_oops >> >> Fix the problem by initializing the last memory section of the highest zone >> in memmap_init_zone() till the very end, even if it goes beyond the zone >> end. >> >> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >> Cc: <stable@vger.kernel.org> >> --- >> mm/page_alloc.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 2ec9cc407216..41ef5508e5f1 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >> cond_resched(); >> } >> } >> +#ifdef CONFIG_SPARSEMEM >> + /* >> + * If there is no zone spanning the rest of the section >> + * then we should at least initialize those pages. Otherwise we >> + * could blow up on a poisoned page in some paths which depend >> + * on full sections being initialized (e.g. memory hotplug). >> + */ >> + if (end_pfn == max_pfn) { >> + while (end_pfn % PAGES_PER_SECTION) { >> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >> + nid); >> + end_pfn++; >> + } >> + } >> +#endif > > If my understanding is correct, end_pfn is not a valid range. > > memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I > am afraid this will break the syntax. > > And max_pfn is also not a valid one. For example, on x86, I used pfn_max here to check for the highest zone. What would be a better way? > update_end_of_memory_vars() will update max_pfn, which is calculated by: > > end_pfn = PFN_UP(start + size); > > BTW, as you mentioned this apply to hotplug case. And then why this couldn't > happen during boot up? What differ these two cases? Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers). You can find more details in the original thread: https://marc.info/?t=153658306400001&r=1&w=2 > >> } >> >> #ifdef CONFIG_ZONE_DEVICE >> -- >> 2.16.4 > Thanks, Mikhail Zaslonko
On Mon 10-12-18 16:45:37, Zaslonko Mikhail wrote: > Hello, > > On 10.12.2018 14:24, Michal Hocko wrote: [...] > > Why do we need to restrict this to the highest zone? In other words, why > > cannot we do what I was suggesting earlier [1]. What does prevent other > > zones to have an incomplete section boundary? > > Well, as you were also suggesting earlier: 'If we do not have a zone which > spans the rest of the section'. I'm not sure how else we should verify that. I am not sure I follow here. Why cannot we simply drop end_pfn check and keep the rest? > Moreover, I was able to recreate the problem only with the highest zone > (memory end is not on the section boundary). What exactly prevents exactmap memmap to generate these unfinished zones? > > [1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz > > > >> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> > >> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> > >> Cc: <stable@vger.kernel.org> > >> --- > >> mm/page_alloc.c | 15 +++++++++++++++ > >> 1 file changed, 15 insertions(+) > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 2ec9cc407216..41ef5508e5f1 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > >> cond_resched(); > >> } > >> } > >> +#ifdef CONFIG_SPARSEMEM > >> + /* > >> + * If there is no zone spanning the rest of the section > >> + * then we should at least initialize those pages. Otherwise we > >> + * could blow up on a poisoned page in some paths which depend > >> + * on full sections being initialized (e.g. memory hotplug). > >> + */ > >> + if (end_pfn == max_pfn) { > >> + while (end_pfn % PAGES_PER_SECTION) { > >> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, > >> + nid); > >> + end_pfn++; > >> + } > >> + } > >> +#endif > >> } > >> > >> #ifdef CONFIG_ZONE_DEVICE > >> -- > >> 2.16.4 > > > > Thanks, > Mikhail Zaslonko
Hi, [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v4.19.8, v4.14.87, v4.9.144, v4.4.166, v3.18.128, v4.19.8: Failed to apply! Possible dependencies: Unable to calculate v4.14.87: Failed to apply! Possible dependencies: Unable to calculate v4.9.144: Failed to apply! Possible dependencies: Unable to calculate v4.4.166: Failed to apply! Possible dependencies: Unable to calculate v3.18.128: Failed to apply! Possible dependencies: Unable to calculate How should we proceed with this patch? -- Thanks, Sasha
On Mon, Dec 10, 2018 at 05:14:36PM +0100, Zaslonko Mikhail wrote: >Hello, > >On 10.12.2018 16:10, Wei Yang wrote: >> On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote: >>> If memory end is not aligned with the sparse memory section boundary, the >>> mapping of such a section is only partly initialized. This may lead to >>> VM_BUG_ON due to uninitialized struct page access from >>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >>> memory_hotplug sysfs handlers: >>> >>> page:000003d082008000 is uninitialized and poisoned >>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>> Call Trace: >>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >>> [<00000000003e4194>] seq_read+0x204/0x480 >>> [<00000000003b53ea>] __vfs_read+0x32/0x178 >>> [<00000000003b55b2>] vfs_read+0x82/0x138 >>> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >>> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >>> Last Breaking-Event-Address: >>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >>> Kernel panic - not syncing: Fatal exception: panic_on_oops >>> >>> Fix the problem by initializing the last memory section of the highest zone >>> in memmap_init_zone() till the very end, even if it goes beyond the zone >>> end. >>> >>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >>> Cc: <stable@vger.kernel.org> >>> --- >>> mm/page_alloc.c | 15 +++++++++++++++ >>> 1 file changed, 15 insertions(+) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index 2ec9cc407216..41ef5508e5f1 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >>> cond_resched(); >>> } >>> } >>> +#ifdef CONFIG_SPARSEMEM >>> + /* >>> + * If there is no zone spanning the rest of the section >>> + * then we should at least initialize those pages. Otherwise we >>> + * could blow up on a poisoned page in some paths which depend >>> + * on full sections being initialized (e.g. memory hotplug). >>> + */ >>> + if (end_pfn == max_pfn) { >>> + while (end_pfn % PAGES_PER_SECTION) { >>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >>> + nid); >>> + end_pfn++; >>> + } >>> + } >>> +#endif >> >> If my understanding is correct, end_pfn is not a valid range. >> >> memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I >> am afraid this will break the syntax. >> >> And max_pfn is also not a valid one. For example, on x86, >I used pfn_max here to check for the highest zone. What would be a better way? > >> update_end_of_memory_vars() will update max_pfn, which is calculated by: >> >> end_pfn = PFN_UP(start + size); >> >> BTW, as you mentioned this apply to hotplug case. And then why this couldn't >> happen during boot up? What differ these two cases? > >Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes >place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers). >You can find more details in the original thread: >https://marc.info/?t=153658306400001&r=1&w=2 > Thanks. I took a look into the original thread and try to reproduce this on x86. My step is: 1. config page_poisoning 2. use kernel parameter mem=3075M 3. cat the last memory block device sysfs file removable eg. when mem is 3075, cat memory9/removable I don't see the Call trace. Do I miss something to reproduce it? >> >>> } >>> >>> #ifdef CONFIG_ZONE_DEVICE >>> -- >>> 2.16.4 >> > >Thanks, >Mikhail Zaslonko
On Mon, Dec 10, 2018 at 02:24:51PM +0100, Michal Hocko wrote: >On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote: >> If memory end is not aligned with the sparse memory section boundary, the >> mapping of such a section is only partly initialized. > >It would be great to mention how you can end up in the situation like >this(a user provided memmap or a strange HW). > >> This may lead to >> VM_BUG_ON due to uninitialized struct page access from >> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >> memory_hotplug sysfs handlers: >> >> page:000003d082008000 is uninitialized and poisoned >> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >> Call Trace: >> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >> [<00000000003e4194>] seq_read+0x204/0x480 >> [<00000000003b53ea>] __vfs_read+0x32/0x178 >> [<00000000003b55b2>] vfs_read+0x82/0x138 >> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >> Last Breaking-Event-Address: >> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >> Kernel panic - not syncing: Fatal exception: panic_on_oops >> >> Fix the problem by initializing the last memory section of the highest zone >> in memmap_init_zone() till the very end, even if it goes beyond the zone >> end. > >Why do we need to restrict this to the highest zone? In other words, why >cannot we do what I was suggesting earlier [1]. What does prevent other >zones to have an incomplete section boundary? > >[1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz > I tried to go through the original list and make myself familiar with the bug. Confused why initialize the *last* end_pfn could fix this, since is_mem_section_removable() will iterate on each page of a section. This means we need to initialize all the pages left in the section. One way to fix this in my mind is to record the last pfn in mem_section. This could be done in memory_preset(), since after that we may assume the section is full. Not sure whether you would like it.
On 11.12.2018 10:49, Wei Yang wrote: > On Mon, Dec 10, 2018 at 02:24:51PM +0100, Michal Hocko wrote: >> On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote: >>> If memory end is not aligned with the sparse memory section boundary, the >>> mapping of such a section is only partly initialized. >> >> It would be great to mention how you can end up in the situation like >> this(a user provided memmap or a strange HW). >> >>> This may lead to >>> VM_BUG_ON due to uninitialized struct page access from >>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >>> memory_hotplug sysfs handlers: >>> >>> page:000003d082008000 is uninitialized and poisoned >>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>> Call Trace: >>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >>> [<00000000003e4194>] seq_read+0x204/0x480 >>> [<00000000003b53ea>] __vfs_read+0x32/0x178 >>> [<00000000003b55b2>] vfs_read+0x82/0x138 >>> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >>> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >>> Last Breaking-Event-Address: >>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >>> Kernel panic - not syncing: Fatal exception: panic_on_oops >>> >>> Fix the problem by initializing the last memory section of the highest zone >>> in memmap_init_zone() till the very end, even if it goes beyond the zone >>> end. >> >> Why do we need to restrict this to the highest zone? In other words, why >> cannot we do what I was suggesting earlier [1]. What does prevent other >> zones to have an incomplete section boundary? >> >> [1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz >> > > I tried to go through the original list and make myself familiar with > the bug. > > Confused why initialize the *last* end_pfn could fix this, since > is_mem_section_removable() will iterate on each page of a section. This > means we need to initialize all the pages left in the section. That's exactly what the fix does. We initialize all the pages left in the section. > > One way to fix this in my mind is to record the last pfn in mem_section. Do you mean last initialized pfn? I guess we have agreed upon that the entire section should be initialized. > This could be done in memory_preset(), since after that we may assume > the section is full. Not sure whether you would like it. >
On 10.12.2018 19:19, Michal Hocko wrote: > On Mon 10-12-18 16:45:37, Zaslonko Mikhail wrote: >> Hello, >> >> On 10.12.2018 14:24, Michal Hocko wrote: > [...] >>> Why do we need to restrict this to the highest zone? In other words, why >>> cannot we do what I was suggesting earlier [1]. What does prevent other >>> zones to have an incomplete section boundary? >> >> Well, as you were also suggesting earlier: 'If we do not have a zone which >> spans the rest of the section'. I'm not sure how else we should verify that. > > I am not sure I follow here. Why cannot we simply drop end_pfn check and > keep the rest? > >> Moreover, I was able to recreate the problem only with the highest zone >> (memory end is not on the section boundary). > > What exactly prevents exactmap memmap to generate these unfinished zones? Probably you're right. I'm re-sending the patch V2. > >>> [1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz >>> >>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >>>> Cc: <stable@vger.kernel.org> >>>> --- >>>> mm/page_alloc.c | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 2ec9cc407216..41ef5508e5f1 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >>>> cond_resched(); >>>> } >>>> } >>>> +#ifdef CONFIG_SPARSEMEM >>>> + /* >>>> + * If there is no zone spanning the rest of the section >>>> + * then we should at least initialize those pages. Otherwise we >>>> + * could blow up on a poisoned page in some paths which depend >>>> + * on full sections being initialized (e.g. memory hotplug). >>>> + */ >>>> + if (end_pfn == max_pfn) { >>>> + while (end_pfn % PAGES_PER_SECTION) { >>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >>>> + nid); >>>> + end_pfn++; >>>> + } >>>> + } >>>> +#endif >>>> } >>>> >>>> #ifdef CONFIG_ZONE_DEVICE >>>> -- >>>> 2.16.4 >>> >> >> Thanks, >> Mikhail Zaslonko >
Hello, On 11.12.2018 02:50, Wei Yang wrote: > On Mon, Dec 10, 2018 at 05:14:36PM +0100, Zaslonko Mikhail wrote: >> Hello, >> >> On 10.12.2018 16:10, Wei Yang wrote: >>> On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote: >>>> If memory end is not aligned with the sparse memory section boundary, the >>>> mapping of such a section is only partly initialized. This may lead to >>>> VM_BUG_ON due to uninitialized struct page access from >>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >>>> memory_hotplug sysfs handlers: >>>> >>>> page:000003d082008000 is uninitialized and poisoned >>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>>> Call Trace: >>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >>>> [<00000000003e4194>] seq_read+0x204/0x480 >>>> [<00000000003b53ea>] __vfs_read+0x32/0x178 >>>> [<00000000003b55b2>] vfs_read+0x82/0x138 >>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >>>> Last Breaking-Event-Address: >>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >>>> Kernel panic - not syncing: Fatal exception: panic_on_oops >>>> >>>> Fix the problem by initializing the last memory section of the highest zone >>>> in memmap_init_zone() till the very end, even if it goes beyond the zone >>>> end. >>>> >>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >>>> Cc: <stable@vger.kernel.org> >>>> --- >>>> mm/page_alloc.c | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 2ec9cc407216..41ef5508e5f1 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >>>> cond_resched(); >>>> } >>>> } >>>> +#ifdef CONFIG_SPARSEMEM >>>> + /* >>>> + * If there is no zone spanning the rest of the section >>>> + * then we should at least initialize those pages. Otherwise we >>>> + * could blow up on a poisoned page in some paths which depend >>>> + * on full sections being initialized (e.g. memory hotplug). >>>> + */ >>>> + if (end_pfn == max_pfn) { >>>> + while (end_pfn % PAGES_PER_SECTION) { >>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >>>> + nid); >>>> + end_pfn++; >>>> + } >>>> + } >>>> +#endif >>> >>> If my understanding is correct, end_pfn is not a valid range. >>> >>> memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I >>> am afraid this will break the syntax. >>> >>> And max_pfn is also not a valid one. For example, on x86, >> I used pfn_max here to check for the highest zone. What would be a better way? >> >>> update_end_of_memory_vars() will update max_pfn, which is calculated by: >>> >>> end_pfn = PFN_UP(start + size); >>> >>> BTW, as you mentioned this apply to hotplug case. And then why this couldn't >>> happen during boot up? What differ these two cases? >> >> Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes >> place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers). >> You can find more details in the original thread: >> https://marc.info/?t=153658306400001&r=1&w=2 >> > > Thanks. > > I took a look into the original thread and try to reproduce this on x86. > > My step is: > > 1. config page_poisoning > 2. use kernel parameter mem=3075M > 3. cat the last memory block device sysfs file removable > eg. when mem is 3075, cat memory9/removable > > I don't see the Call trace. Do I miss something to reproduce it? > No you don't. I guess there might be deviations depending on the architecture (I am on s390). As I understand, memory block size is 384 Mb on your system and memory9 is the last block on the list? BTW, do you have CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_PGFLAGS on? >>> >>>> } >>>> >>>> #ifdef CONFIG_ZONE_DEVICE >>>> -- >>>> 2.16.4 >>> >> >> Thanks, >> Mikhail Zaslonko >
On Tue, Dec 11, 2018 at 04:23:05PM +0100, Zaslonko Mikhail wrote: >Hello, > >On 11.12.2018 02:50, Wei Yang wrote: >> On Mon, Dec 10, 2018 at 05:14:36PM +0100, Zaslonko Mikhail wrote: >>> Hello, >>> >>> On 10.12.2018 16:10, Wei Yang wrote: >>>> On Mon, Dec 10, 2018 at 02:07:12PM +0100, Mikhail Zaslonko wrote: >>>>> If memory end is not aligned with the sparse memory section boundary, the >>>>> mapping of such a section is only partly initialized. This may lead to >>>>> VM_BUG_ON due to uninitialized struct page access from >>>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >>>>> memory_hotplug sysfs handlers: >>>>> >>>>> page:000003d082008000 is uninitialized and poisoned >>>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>>>> Call Trace: >>>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >>>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >>>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >>>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >>>>> [<00000000003e4194>] seq_read+0x204/0x480 >>>>> [<00000000003b53ea>] __vfs_read+0x32/0x178 >>>>> [<00000000003b55b2>] vfs_read+0x82/0x138 >>>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >>>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >>>>> Last Breaking-Event-Address: >>>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >>>>> Kernel panic - not syncing: Fatal exception: panic_on_oops >>>>> >>>>> Fix the problem by initializing the last memory section of the highest zone >>>>> in memmap_init_zone() till the very end, even if it goes beyond the zone >>>>> end. >>>>> >>>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> >>>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> >>>>> Cc: <stable@vger.kernel.org> >>>>> --- >>>>> mm/page_alloc.c | 15 +++++++++++++++ >>>>> 1 file changed, 15 insertions(+) >>>>> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index 2ec9cc407216..41ef5508e5f1 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >>>>> cond_resched(); >>>>> } >>>>> } >>>>> +#ifdef CONFIG_SPARSEMEM >>>>> + /* >>>>> + * If there is no zone spanning the rest of the section >>>>> + * then we should at least initialize those pages. Otherwise we >>>>> + * could blow up on a poisoned page in some paths which depend >>>>> + * on full sections being initialized (e.g. memory hotplug). >>>>> + */ >>>>> + if (end_pfn == max_pfn) { >>>>> + while (end_pfn % PAGES_PER_SECTION) { >>>>> + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, >>>>> + nid); >>>>> + end_pfn++; >>>>> + } >>>>> + } >>>>> +#endif >>>> >>>> If my understanding is correct, end_pfn is not a valid range. >>>> >>>> memmap_init_zone() initialize the range [start_pfn, start_pfn + size). I >>>> am afraid this will break the syntax. >>>> >>>> And max_pfn is also not a valid one. For example, on x86, >>> I used pfn_max here to check for the highest zone. What would be a better way? >>> >>>> update_end_of_memory_vars() will update max_pfn, which is calculated by: >>>> >>>> end_pfn = PFN_UP(start + size); >>>> >>>> BTW, as you mentioned this apply to hotplug case. And then why this couldn't >>>> happen during boot up? What differ these two cases? >>> >>> Well, the pages left uninitialized during bootup (initial problem), but the panic itself takes >>> place when we try to process memory_hotplug sysfs attributes (thus triggering sysfs handlers). >>> You can find more details in the original thread: >>> https://marc.info/?t=153658306400001&r=1&w=2 >>> >> >> Thanks. >> >> I took a look into the original thread and try to reproduce this on x86. >> >> My step is: >> >> 1. config page_poisoning >> 2. use kernel parameter mem=3075M >> 3. cat the last memory block device sysfs file removable >> eg. when mem is 3075, cat memory9/removable >> >> I don't see the Call trace. Do I miss something to reproduce it? >> > >No you don't. I guess there might be deviations depending on the architecture (I am on s390). >As I understand, memory block size is 384 Mb on your system and memory9 is the last block on the list? Sorry, my calculation is not correct. The last memory_block is 23 instead of 9. >BTW, do you have CONFIG_DEBUG_VM and CONFIG_DEBUG_VM_PGFLAGS on? > Yes, I have set it: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y And the kernel cmdline is: BOOT_IMAGE=/vmlinuz-4.20.0-rc5+ root=UUID=98aa84d6-9ba6-4033-ab91-9ca6fe3dd74f ro \ resume=UUID=b7c21053-d9c1-4e58-8488-7d385f8ee107 console=ttyS0 \ LANG=en_US.UTF-8 mem=3075M > >>>> >>>>> } >>>>> >>>>> #ifdef CONFIG_ZONE_DEVICE >>>>> -- >>>>> 2.16.4 >>>> >>> >>> Thanks, >>> Mikhail Zaslonko >>
On Tue, Dec 11, 2018 at 04:17:34PM +0100, Zaslonko Mikhail wrote: > > >On 11.12.2018 10:49, Wei Yang wrote: >> On Mon, Dec 10, 2018 at 02:24:51PM +0100, Michal Hocko wrote: >>> On Mon 10-12-18 14:07:12, Mikhail Zaslonko wrote: >>>> If memory end is not aligned with the sparse memory section boundary, the >>>> mapping of such a section is only partly initialized. >>> >>> It would be great to mention how you can end up in the situation like >>> this(a user provided memmap or a strange HW). >>> >>>> This may lead to >>>> VM_BUG_ON due to uninitialized struct page access from >>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by >>>> memory_hotplug sysfs handlers: >>>> >>>> page:000003d082008000 is uninitialized and poisoned >>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>>> Call Trace: >>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) >>>> [<00000000008f15c4>] show_valid_zones+0x5c/0x190 >>>> [<00000000008cf9c4>] dev_attr_show+0x34/0x70 >>>> [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 >>>> [<00000000003e4194>] seq_read+0x204/0x480 >>>> [<00000000003b53ea>] __vfs_read+0x32/0x178 >>>> [<00000000003b55b2>] vfs_read+0x82/0x138 >>>> [<00000000003b5be2>] ksys_read+0x5a/0xb0 >>>> [<0000000000b86ba0>] system_call+0xdc/0x2d8 >>>> Last Breaking-Event-Address: >>>> [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 >>>> Kernel panic - not syncing: Fatal exception: panic_on_oops >>>> >>>> Fix the problem by initializing the last memory section of the highest zone >>>> in memmap_init_zone() till the very end, even if it goes beyond the zone >>>> end. >>> >>> Why do we need to restrict this to the highest zone? In other words, why >>> cannot we do what I was suggesting earlier [1]. What does prevent other >>> zones to have an incomplete section boundary? >>> >>> [1] http://lkml.kernel.org/r/20181105183533.GQ4361@dhcp22.suse.cz >>> >> >> I tried to go through the original list and make myself familiar with >> the bug. >> >> Confused why initialize the *last* end_pfn could fix this, since >> is_mem_section_removable() will iterate on each page of a section. This >> means we need to initialize all the pages left in the section. >That's exactly what the fix does. We initialize all the pages left in >the section. > You are right, I misunderstand your code. >> >> One way to fix this in my mind is to record the last pfn in mem_section. >Do you mean last initialized pfn? I guess we have agreed upon that the >entire section should be initialized. Well, that would be great. > >> This could be done in memory_preset(), since after that we may assume >> the section is full. Not sure whether you would like it. >>
Hi, [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.1.9, v4.19.50, v4.14.125, v4.9.181, v4.4.181. v5.1.9: Build OK! v4.19.50: Failed to apply! Possible dependencies: Unable to calculate v4.14.125: Failed to apply! Possible dependencies: Unable to calculate v4.9.181: Failed to apply! Possible dependencies: Unable to calculate v4.4.181: Failed to apply! Possible dependencies: Unable to calculate How should we proceed with this patch? -- Thanks, Sasha
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec9cc407216..41ef5508e5f1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5542,6 +5542,21 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, cond_resched(); } } +#ifdef CONFIG_SPARSEMEM + /* + * If there is no zone spanning the rest of the section + * then we should at least initialize those pages. Otherwise we + * could blow up on a poisoned page in some paths which depend + * on full sections being initialized (e.g. memory hotplug). + */ + if (end_pfn == max_pfn) { + while (end_pfn % PAGES_PER_SECTION) { + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, + nid); + end_pfn++; + } + } +#endif } #ifdef CONFIG_ZONE_DEVICE