diff mbox

Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()"

Message ID 20160104163528.be56a4b1.akpm@linux-foundation.org (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Morton Jan. 5, 2016, 12:35 a.m. UTC
On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:

> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
> > On Mon, 4 Jan 2016 22:42:33 +0000 Mark Brown <broonie@kernel.org> wrote:
> 
> > > platforms in the kernelci.org boot tests[1].  Doing bisections with
> > > Arndale and BeagleBone Black identifies 904769ac82ebf (mm/page_alloc.c:
> > > calculate zone_start_pfn at zone_spanned_pages_in_node()) from the akpm
> > > tree as the first broken commit[2,3].  An example bootlog from the
> > > failure is:
> 
> > Thanks.  That patch has rather a blooper if
> > CONFIG_HAVE_MEMBLOCK_NODE_MAP=n.  Is that the case in your testing?
> 
> Seems to be what's making a difference from a quick run through, yes.

OK, thanks.

Stephen, can we please retain

mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
mm-introduce-kernelcore=mirror-option.patch
mm-introduce-kernelcore=mirror-option-fix.patch
mm-introduce-kernelcore=mirror-option-fix-2.patch

and add the below?

Or don't bother - I'll do an mmotm tomorrow with these in it.

I'd still like reviewing and testing from Taku Izumi please.



From: Arnd Bergmann <arnd@arndb.de>
Subject: mm/page_alloc.c: set a zone_start_pfn value in zone_spanned_pages_in_node

We got a new build warning in linux-next:

mm/page_alloc.c: In function 'free_area_init_node':
mm/page_alloc.c:5278:25: warning: 'zone_start_pfn' may be used uninitialized in this function [-Wmaybe-uninitialized]
    zone->zone_start_pfn = zone_start_pfn;
mm/page_alloc.c:5265:17: note: 'zone_start_pfn' was declared here
   unsigned long zone_start_pfn, zone_end_pfn;

The code indeed looks wrong, but this is just a guess of what the
fix might be: I have not looked it in detail, so please treat this
as a bug report.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Stephen Rothwell Jan. 5, 2016, 12:49 a.m. UTC | #1
Hi Andrew,

On Mon, 4 Jan 2016 16:35:28 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Stephen, can we please retain
> 
> mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
> mm-introduce-kernelcore=mirror-option.patch
> mm-introduce-kernelcore=mirror-option-fix.patch
> mm-introduce-kernelcore=mirror-option-fix-2.patch
> 
> and add the below?

Sure, that is easier than dropping the above patches, anyway.
Stephen Rothwell Jan. 5, 2016, 5:47 a.m. UTC | #2
Hi Andrew,

On Tue, 5 Jan 2016 11:49:18 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 4 Jan 2016 16:35:28 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > Stephen, can we please retain
> > 
> > mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node.patch
> > mm-introduce-kernelcore=mirror-option.patch
> > mm-introduce-kernelcore=mirror-option-fix.patch
> > mm-introduce-kernelcore=mirror-option-fix-2.patch
> > 
> > and add the below?  
> 
> Sure, that is easier than dropping the above patches, anyway.

I have done that *except* that
mm-introduce-kernelcore=mirror-option-fix-2.patch is not in mmotm and I
cannot find it anywhere.
Mark Brown Jan. 5, 2016, 11:45 a.m. UTC | #3
On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
> > On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:

> > > Thanks.  That patch has rather a blooper if
> > > CONFIG_HAVE_MEMBLOCK_NODE_MAP=n.  Is that the case in your testing?

> > Seems to be what's making a difference from a quick run through, yes.

> OK, thanks.

Seems like I was mistaken here somehow or there's some other problem -
I've kicked off another bisect for today's -next:

   https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console

and will follow up with any results.
Sudeep Holla Jan. 5, 2016, 12:21 p.m. UTC | #4
On 05/01/16 11:45, Mark Brown wrote:
> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:
>
>>>> Thanks.  That patch has rather a blooper if
>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n.  Is that the case in your testing?
>
>>> Seems to be what's making a difference from a quick run through, yes.
>
>> OK, thanks.
>
> Seems like I was mistaken here somehow or there's some other problem -
> I've kicked off another bisect for today's -next:
>
>     https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console
>
> and will follow up with any results.
>

With both patches applied(one already in today's -next), I am able to
boot on ARM64 platform but I get huge load(for each pfn) of below warning:

-->8

BUG: Bad page state in process swapper  pfn:900000
page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0
flags: 0x0()
page dumped because: nonzero mapcount
Modules linked in:
Hardware name: ARM Juno development board (r0) (DT)
Call trace:
[<ffffffc000089830>] dump_backtrace+0x0/0x180
[<ffffffc0000899c4>] show_stack+0x14/0x20
[<ffffffc000335008>] dump_stack+0x90/0xc8
[<ffffffc0001531f8>] bad_page+0xd8/0x138
[<ffffffc000153470>] free_pages_prepare+0x218/0x290
[<ffffffc000154d4c>] __free_pages_ok+0x1c/0xb8
[<ffffffc000155638>] __free_pages+0x30/0x50
[<ffffffc00092fa9c>] __free_pages_bootmem+0xa0/0xa8
[<ffffffc0009321d0>] free_all_bootmem+0x11c/0x184
[<ffffffc000925264>] mem_init+0x48/0x1b4
[<ffffffc0009217e0>] start_kernel+0x224/0x3b4
[<0000000080663000>] 0x80663000
Disabling lock debugging due to kernel taint
Mark Brown Jan. 5, 2016, 7:24 p.m. UTC | #5
On Tue, Jan 05, 2016 at 12:21:51PM +0000, Sudeep Holla wrote:
> On 05/01/16 11:45, Mark Brown wrote:
> >On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote:
> >>On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown <broonie@kernel.org> wrote:
> >>>On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote:

> >>>>Thanks.  That patch has rather a blooper if
> >>>>CONFIG_HAVE_MEMBLOCK_NODE_MAP=n.  Is that the case in your testing?

> >>>Seems to be what's making a difference from a quick run through, yes.

> >>OK, thanks.

> >Seems like I was mistaken here somehow or there's some other problem -
> >I've kicked off another bisect for today's -next:

> >    https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console

> >and will follow up with any results.

> With both patches applied(one already in today's -next), I am able to
> boot on ARM64 platform but I get huge load(for each pfn) of below warning:

Bisect on today's -next with Arndale (an ARM platform) flags the same
patch:

  https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console

as does Juno which is an arm64 platform:

  https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/138/console

(it does get to a console but with lots of the backtraces Sudeep
indicated).
diff mbox

Patch

diff -puN mm/page_alloc.c~mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-calculate-zone_start_pfn-at-zone_spanned_pages_in_node-fix
+++ a/mm/page_alloc.c
@@ -5013,6 +5013,8 @@  static inline unsigned long __meminit zo
 					unsigned long *zone_end_pfn,
 					unsigned long *zones_size)
 {
+	*zone_start_pfn = node_start_pfn;
+	*zone_end_pfn = node_end_pfn;
 	return zones_size[zone_type];
 }