diff mbox series

[v2,2/2] memblock: do not start bottom-up allocations with kernel_end

Message ID 20201217201214.3414100-2-guro@fb.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] mm: cma: allocate cma areas bottom-up | expand

Commit Message

Roman Gushchin Dec. 17, 2020, 8:12 p.m. UTC
With kaslr the kernel image is placed at a random place, so starting
the bottom-up allocation with the kernel_end can result in an
allocation failure and a warning like this one:

[    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002921] ------------[ cut here ]------------
[    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
[    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
[    0.002937] Modules linked in:
[    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
[    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
[    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
[    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
[    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
[    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
[    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
[    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
[    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
[    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
[    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
[    0.002956] Call Trace:
[    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
[    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
[    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
[    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
[    0.002969]  ? native_set_fixmap+0x82/0xd0
[    0.002971]  ? flat_get_apic_id+0x5/0x10
[    0.002973]  ? register_lapic_address+0x8e/0x97
[    0.002975]  ? setup_arch+0x8a5/0xc3f
[    0.002978]  ? start_kernel+0x66/0x547
[    0.002980]  ? load_ucode_bsp+0x4c/0xcd
[    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
[    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
[    0.002988] ---[ end trace f151227d0b39be70 ]---

At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the
bottom-up allocation has the same chances to success as a top-down
allocation, so there is no reason to fallback in the case of a
failure. All together it simplifies the logic.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memblock.c | 49 ++++++-------------------------------------------
 1 file changed, 6 insertions(+), 43 deletions(-)

Comments

Wonhyuk Yang Dec. 19, 2020, 2:52 p.m. UTC | #1
Hi Roman,

On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote:
>
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
>
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
>
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.

I figure out that it was introduced by
commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode")

According to this commit, The purpose of bottom up allocation is to
allocate memory from the unhotpluggable node.
Roman Gushchin Dec. 19, 2020, 5:05 p.m. UTC | #2
On Sat, Dec 19, 2020 at 11:52:19PM +0900, Wonhyuk Yang wrote:
> Hi Roman,
> 
> On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > With kaslr the kernel image is placed at a random place, so starting
> > the bottom-up allocation with the kernel_end can result in an
> > allocation failure and a warning like this one:
> >
> > [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> > [    0.002921] ------------[ cut here ]------------
> > [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> > [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> > [    0.002956] Call Trace:
> > [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> > [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> > [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> > [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> > [    0.002969]  ? native_set_fixmap+0x82/0xd0
> > [    0.002971]  ? flat_get_apic_id+0x5/0x10
> > [    0.002973]  ? register_lapic_address+0x8e/0x97
> > [    0.002975]  ? setup_arch+0x8a5/0xc3f
> > [    0.002978]  ? start_kernel+0x66/0x547
> > [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> > [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> > [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> >
> > At the same time, the kernel image is protected with memblock_reserve(),
> > so we can just start searching at PAGE_SIZE. In this case the
> > bottom-up allocation has the same chances to success as a top-down
> > allocation, so there is no reason to fallback in the case of a
> > failure. All together it simplifies the logic.
> 
> I figure out that it was introduced by
> commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode")
> 
> According to this commit, The purpose of bottom up allocation is to
> allocate memory from the unhotpluggable node.

Hi Wonhyuk,

correct! And it remains this way, we just don't need to skip
all the memory before the kernel_end.

Thanks!
Mike Rapoport Dec. 20, 2020, 6:49 a.m. UTC | #3
On Thu, Dec 17, 2020 at 12:12:14PM -0800, Roman Gushchin wrote:
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
> 
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002937] Modules linked in:
> [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
> [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
> [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
> [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
> [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
> [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
> [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
> [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
> [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
> [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> [    0.002988] ---[ end trace f151227d0b39be70 ]---
> 
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/memblock.c | 49 ++++++-------------------------------------------
>  1 file changed, 6 insertions(+), 43 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b68ee86788af..10bd7d1ef0f4 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
>   *
>   * Find @size free area aligned to @align in the specified range and node.
>   *
> - * When allocation direction is bottom-up, the @start should be greater
> - * than the end of the kernel image. Otherwise, it will be trimmed. The
> - * reason is that we want the bottom-up allocation just near the kernel
> - * image so it is highly likely that the allocated memory and the kernel
> - * will reside in the same node.
> - *
> - * If bottom-up allocation failed, will try to allocate memory top-down.
> - *
>   * Return:
>   * Found address on success, 0 on failure.
>   */
> @@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>  					phys_addr_t end, int nid,
>  					enum memblock_flags flags)
>  {
> -	phys_addr_t kernel_end, ret;
> -
>  	/* pump up @end */
>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
>  	    end == MEMBLOCK_ALLOC_KASAN)
> @@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>  	/* avoid allocating the first page */
>  	start = max_t(phys_addr_t, start, PAGE_SIZE);
>  	end = max(start, end);
> -	kernel_end = __pa_symbol(_end);
> -
> -	/*
> -	 * try bottom-up allocation only when bottom-up mode
> -	 * is set and @end is above the kernel image.
> -	 */
> -	if (memblock_bottom_up() && end > kernel_end) {
> -		phys_addr_t bottom_up_start;
> -
> -		/* make sure we will allocate above the kernel */
> -		bottom_up_start = max(start, kernel_end);
>  
> -		/* ok, try bottom-up allocation first */
> -		ret = __memblock_find_range_bottom_up(bottom_up_start, end,
> -						      size, align, nid, flags);
> -		if (ret)
> -			return ret;
> -
> -		/*
> -		 * we always limit bottom-up allocation above the kernel,
> -		 * but top-down allocation doesn't have the limit, so
> -		 * retrying top-down allocation may succeed when bottom-up
> -		 * allocation failed.
> -		 *
> -		 * bottom-up allocation is expected to be fail very rarely,
> -		 * so we use WARN_ONCE() here to see the stack trace if
> -		 * fail happens.
> -		 */
> -		WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
> -			  "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
> -	}
> -
> -	return __memblock_find_range_top_down(start, end, size, align, nid,
> -					      flags);
> +	if (memblock_bottom_up())
> +		return __memblock_find_range_bottom_up(start, end, size, align,
> +						       nid, flags);
> +	else
> +		return __memblock_find_range_top_down(start, end, size, align,
> +						      nid, flags);
>  }
>  
>  /**
> -- 
> 2.26.2
>
Thiago Jung Bauermann Jan. 22, 2021, 4:37 a.m. UTC | #4
Mike Rapoport <rppt@kernel.org> writes:

> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
patch. This happens on some ppc64le bare metal (powernv) server machines with
CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
to solve this issue in a different way:

https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/

Since this patch solves that problem, is it possible to include it in the next
feasible v5.11-rcX, with the following tag?

Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")

This is because reverting the commit above also solves the problem on the
machines where I've seen this issue.
Andrew Morton Jan. 24, 2021, 2:09 a.m. UTC | #5
On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:

> Mike Rapoport <rppt@kernel.org> writes:
> 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > 
> > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> 
> I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
> patch. This happens on some ppc64le bare metal (powernv) server machines with
> CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
> to solve this issue in a different way:
> 
> https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
> 
> Since this patch solves that problem, is it possible to include it in the next
> feasible v5.11-rcX, with the following tag?

We could do this, if we're confident that this patch doesn't depend on
[1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...

> Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")

I added that.
Mike Rapoport Jan. 24, 2021, 7:34 a.m. UTC | #6
On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
> 
> > Mike Rapoport <rppt@kernel.org> writes:
> > 
> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > 
> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
> > patch. This happens on some ppc64le bare metal (powernv) server machines with
> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
> > to solve this issue in a different way:
> > 
> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
> > 
> > Since this patch solves that problem, is it possible to include it in the next
> > feasible v5.11-rcX, with the following tag?
> 
> We could do this, if we're confident that this patch doesn't depend on
> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...

A think it does not depend on cma bottom-up allocation, it's rather the other
way around: without this CMA bottom-up allocation could fail with KASLR
enabled.

Still, this patch may need updates to the way x86 does early reservations:

https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org
 
> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
> 
> I added that.
> 
>
Thiago Jung Bauermann Jan. 26, 2021, 12:30 a.m. UTC | #7
Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this,

Thanks!

>> if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I agree. Conceptually, this could have been patch 1 in this series.

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

Ah, I wasn't aware of this. Thanks for fixing those issues. That series
seems to be well accepted.

>> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
>> 
>> I added that.

Thanks!
Thiago Jung Bauermann Feb. 8, 2021, 11:58 p.m. UTC | #8
Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this, if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I noticed that this patch is now upstream as:

2dcb39645441 memblock: do not start bottom-up allocations with kernel_end

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

... but the patches from this link still aren't. Isn't this a potential
problem for x86?

The patch series on the link above is now superseded by v2:

https://lore.kernel.org/linux-mm/20210128105711.10428-1-rppt@kernel.org/
Florian Fainelli Feb. 28, 2021, 4:18 a.m. UTC | #9
On 12/17/2020 12:12 PM, Roman Gushchin wrote:
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
> 
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002937] Modules linked in:
> [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
> [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
> [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
> [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
> [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
> [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
> [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
> [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
> [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
> [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> [    0.002988] ---[ end trace f151227d0b39be70 ]---
> 
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Hi Roman, Thomas and other linux-mips folks,

Kamal and myself have been unable to boot v5.11 on MIPS since this
commit, reverting it makes our MIPS platforms boot successfully. We do
not see a warning like this one in the commit message, instead what
happens appear to be a corrupted Device Tree which prevents the parsing
of the "rdb" node and leading to the interrupt controllers not being
registered, and the system eventually not booting.

The Device Tree is built-into the kernel image and resides at
arch/mips/boot/dts/brcm/bcm97435svmb.dts.

Do you have any idea what could be wrong with MIPS specifically here?

Thanks!
Mike Rapoport Feb. 28, 2021, 9 a.m. UTC | #10
Hi Florian,

On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> 
> On 12/17/2020 12:12 PM, Roman Gushchin wrote:
> > With kaslr the kernel image is placed at a random place, so starting
> > the bottom-up allocation with the kernel_end can result in an
> > allocation failure and a warning like this one:
> > 
> > [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> > [    0.002921] ------------[ cut here ]------------
> > [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> > [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> > [    0.002937] Modules linked in:
> > [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
> > [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> > [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
> > [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
> > [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> > [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
> > [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
> > [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
> > [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
> > [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
> > [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
> > [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
> > [    0.002956] Call Trace:
> > [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> > [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> > [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> > [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> > [    0.002969]  ? native_set_fixmap+0x82/0xd0
> > [    0.002971]  ? flat_get_apic_id+0x5/0x10
> > [    0.002973]  ? register_lapic_address+0x8e/0x97
> > [    0.002975]  ? setup_arch+0x8a5/0xc3f
> > [    0.002978]  ? start_kernel+0x66/0x547
> > [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> > [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> > [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> > [    0.002988] ---[ end trace f151227d0b39be70 ]---
> > 
> > At the same time, the kernel image is protected with memblock_reserve(),
> > so we can just start searching at PAGE_SIZE. In this case the
> > bottom-up allocation has the same chances to success as a top-down
> > allocation, so there is no reason to fallback in the case of a
> > failure. All together it simplifies the logic.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Hi Roman, Thomas and other linux-mips folks,
> 
> Kamal and myself have been unable to boot v5.11 on MIPS since this
> commit, reverting it makes our MIPS platforms boot successfully. We do
> not see a warning like this one in the commit message, instead what
> happens appear to be a corrupted Device Tree which prevents the parsing
> of the "rdb" node and leading to the interrupt controllers not being
> registered, and the system eventually not booting.
> 
> The Device Tree is built-into the kernel image and resides at
> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> 
> Do you have any idea what could be wrong with MIPS specifically here?

Apparently there is a memblock allocation in one of the functions called
from arch_mem_init() between plat_mem_setup() and
early_init_fdt_reserve_self().

If you have serial available that early we can try to track it down with
forcing memblock_debug in mm/memblock.c to 1:

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..83034245f8d5 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -151,7 +151,7 @@ static __refdata struct memblock_type *memblock_memory = &memblock.memory;
                        pr_info(fmt, ##__VA_ARGS__);                    \
        } while (0)
 
-static int memblock_debug __initdata_memblock;
+static int memblock_debug __initdata_memblock = 1;
 static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;


Regardless, I think that moving DT self reservation just after
plat_mem_setup() is safe and it'll make things more robust.

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 279be0153f8b..f476b99a7bcd 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -623,6 +623,8 @@ static void __init arch_mem_init(char **cmdline_p)
 {
 	/* call board setup routine */
 	plat_mem_setup();
+	early_init_fdt_reserve_self();
+	early_init_fdt_scan_reserved_mem();
 	memblock_set_bottom_up(true);
 
 	bootcmdline_init();
@@ -636,9 +638,6 @@ static void __init arch_mem_init(char **cmdline_p)
 
 	check_kernel_sections_mem();
 
-	early_init_fdt_reserve_self();
-	early_init_fdt_scan_reserved_mem();
-
 #ifndef CONFIG_NUMA
 	memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0);
 #endif
Florian Fainelli Feb. 28, 2021, 6:19 p.m. UTC | #11
Hi Mike,

On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> Hi Florian,
> 
> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>
>> On 12/17/2020 12:12 PM, Roman Gushchin wrote:
>>> With kaslr the kernel image is placed at a random place, so starting
>>> the bottom-up allocation with the kernel_end can result in an
>>> allocation failure and a warning like this one:
>>>
>>> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
>>> [    0.002921] ------------[ cut here ]------------
>>> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
>>> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
>>> [    0.002937] Modules linked in:
>>> [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
>>> [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
>>> [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
>>> [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
>>> [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
>>> [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
>>> [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
>>> [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
>>> [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
>>> [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
>>> [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
>>> [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
>>> [    0.002956] Call Trace:
>>> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
>>> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
>>> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
>>> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
>>> [    0.002969]  ? native_set_fixmap+0x82/0xd0
>>> [    0.002971]  ? flat_get_apic_id+0x5/0x10
>>> [    0.002973]  ? register_lapic_address+0x8e/0x97
>>> [    0.002975]  ? setup_arch+0x8a5/0xc3f
>>> [    0.002978]  ? start_kernel+0x66/0x547
>>> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
>>> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
>>> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
>>> [    0.002988] ---[ end trace f151227d0b39be70 ]---
>>>
>>> At the same time, the kernel image is protected with memblock_reserve(),
>>> so we can just start searching at PAGE_SIZE. In this case the
>>> bottom-up allocation has the same chances to success as a top-down
>>> allocation, so there is no reason to fallback in the case of a
>>> failure. All together it simplifies the logic.
>>>
>>> Signed-off-by: Roman Gushchin <guro@fb.com>
>>
>> Hi Roman, Thomas and other linux-mips folks,
>>
>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>> commit, reverting it makes our MIPS platforms boot successfully. We do
>> not see a warning like this one in the commit message, instead what
>> happens appear to be a corrupted Device Tree which prevents the parsing
>> of the "rdb" node and leading to the interrupt controllers not being
>> registered, and the system eventually not booting.
>>
>> The Device Tree is built-into the kernel image and resides at
>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>
>> Do you have any idea what could be wrong with MIPS specifically here?
> 
> Apparently there is a memblock allocation in one of the functions called
> from arch_mem_init() between plat_mem_setup() and
> early_init_fdt_reserve_self().
> 
> If you have serial available that early we can try to track it down with
> forcing memblock_debug in mm/memblock.c to 1:
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..83034245f8d5 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -151,7 +151,7 @@ static __refdata struct memblock_type *memblock_memory = &memblock.memory;
>                         pr_info(fmt, ##__VA_ARGS__);                    \
>         } while (0)
>  
> -static int memblock_debug __initdata_memblock;
> +static int memblock_debug __initdata_memblock = 1;
>  static bool system_has_some_mirror __initdata_memblock = false;
>  static int memblock_can_resize __initdata_memblock;
>  static int memblock_memory_in_slab __initdata_memblock = 0;
> 
> 
> Regardless, I think that moving DT self reservation just after
> plat_mem_setup() is safe and it'll make things more robust.
> 
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 279be0153f8b..f476b99a7bcd 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -623,6 +623,8 @@ static void __init arch_mem_init(char **cmdline_p)
>  {
>  	/* call board setup routine */
>  	plat_mem_setup();
> +	early_init_fdt_reserve_self();
> +	early_init_fdt_scan_reserved_mem();
>  	memblock_set_bottom_up(true);
>  
>  	bootcmdline_init();
> @@ -636,9 +638,6 @@ static void __init arch_mem_init(char **cmdline_p)
>  
>  	check_kernel_sections_mem();
>  
> -	early_init_fdt_reserve_self();
> -	early_init_fdt_scan_reserved_mem();
> -
>  #ifndef CONFIG_NUMA
>  	memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0);
>  #endif

Thanks a lot for taking a look! The current/broken memblock=debug output
looks like this:

[    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
(mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
Feb 28 10:01:50 PST 2021
[    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
[    0.000000] FPU revision is: 00130001
[    0.000000] memblock_add: [0x00000000-0x0fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x20000000-0x4fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x90000000-0xcfffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] MIPS: machine is Broadcom BCM97435SVMB
[    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
setup_arch+0x128/0x69c
[    0.000000] memblock_reserve: [0x00010000-0x018313cf]
setup_arch+0x1f8/0x69c
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
setup_arch+0x3fc/0x69c
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
linesize 32 bytes
[    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
[    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000000000-0x00000000cfffffff]
[    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.135+0x6c/0xc8
[    0.000000] memblock_reserve: [0x01831400-0x032313ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
bytes flags: 0x0
[    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0xa
[    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
bytes flags: 0x0
[    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
bytes flags: 0x0
[    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
bytes flags: 0x0
[    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
bytes flags: 0x0
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
[    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
[    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
[    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
[    0.000000] memblock_reserve: [0x03231400-0x032323ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
[    0.000000] memblock_reserve: [0x03233000-0x0327afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03245000-0x03244fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03257000-0x03256fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03269000-0x03268fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x0327b000-0x0327afff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
[    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
[    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
[    0.000000] memblock_reserve: [0x03232400-0x0323240f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
[    0.000000] memblock_reserve: [0x03232480-0x0323248f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
[    0.000000] memblock_reserve: [0x03232500-0x0323257f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
[    0.000000] memblock_reserve: [0x03232580-0x032325db]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
[    0.000000] memblock_reserve: [0x03232600-0x032328ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
[    0.000000] memblock_reserve: [0x03232900-0x03232c03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
[    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x0000f000-0x0000ffff]
pcpu_embed_first_chunk+0x838/0x884
[    0.000000] memblock_free: [0x03231400-0x032323ff]
pcpu_embed_first_chunk+0x850/0x884
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
[    0.000000] Kernel command line: console=ttyS0,115200 earlycon
[    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
bytes, linear)
[    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
bytes, linear)
[    0.000000] memblock_reserve: [0x00000000-0x000003ff]
trap_init+0x70/0x4e8
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
cma-reserved, 1835008K highmem)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] OF: Bad cell count for /rdb
[    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
[    0.000000] OF: of_irq_init: children remain, but no parents
[    0.000000] random: get_random_bytes called from
start_kernel+0x444/0x654 with crng_init=0
[    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
wraps every 8589934590000000ns

and with your patch applied which unfortunately did not work we have the
following:

[    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
(mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #86 SMP Sun
Feb 28 10:04:54 PST 2021
[    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
[    0.000000] FPU revision is: 00130001
[    0.000000] memblock_add: [0x00000000-0x0fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x20000000-0x4fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x90000000-0xcfffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] MIPS: machine is Broadcom BCM97435SVMB
[    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
setup_arch+0x60/0x6a4
[    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] memblock_reserve: [0x00010000-0x018313cf]
setup_arch+0x200/0x6a4
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
setup_arch+0x404/0x6a4
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4
[    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4
[    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4
[    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
linesize 32 bytes
[    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
[    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000000000-0x00000000cfffffff]
[    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.135+0x6c/0xc8
[    0.000000] memblock_reserve: [0x01831400-0x032313ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
bytes flags: 0x0
[    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0xa
[    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
bytes flags: 0x0
[    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
bytes flags: 0x0
[    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
bytes flags: 0x0
[    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
bytes flags: 0x0
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
[    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
[    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
[    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
[    0.000000] memblock_reserve: [0x03231400-0x032323ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
[    0.000000] memblock_reserve: [0x03233000-0x0327afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03245000-0x03244fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03257000-0x03256fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03269000-0x03268fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x0327b000-0x0327afff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
[    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
[    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
[    0.000000] memblock_reserve: [0x03232400-0x0323240f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
[    0.000000] memblock_reserve: [0x03232480-0x0323248f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
[    0.000000] memblock_reserve: [0x03232500-0x0323257f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
[    0.000000] memblock_reserve: [0x03232580-0x032325db]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
[    0.000000] memblock_reserve: [0x03232600-0x032328ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
[    0.000000] memblock_reserve: [0x03232900-0x03232c03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
[    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x0000f000-0x0000ffff]
pcpu_embed_first_chunk+0x838/0x884
[    0.000000] memblock_free: [0x03231400-0x032323ff]
pcpu_embed_first_chunk+0x850/0x884
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
[    0.000000] Kernel command line: console=ttyS0,115200 earlycon
[    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
bytes, linear)
[    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
bytes, linear)
[    0.000000] memblock_reserve: [0x00000000-0x000003ff]
trap_init+0x70/0x4e8
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
cma-reserved, 1835008K highmem)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] OF: Bad cell count for /rdb
[    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
[    0.000000] OF: of_irq_init: children remain, but no parents
[    0.000000] random: get_random_bytes called from
start_kernel+0x444/0x654 with crng_init=0
[    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
wraps every 8589934590000000ns

With only the revert of f787b0b4502cde50c3583432d6cb9bd8306fc242
("memblock: do not start bottom-up allocations with kernel_end") and an
unmodified arch/mips/kernel/setup.c, this boots successfully:

[    0.000000] Linux version 5.11.0-gf787b0b4502c (florian@locahost)
(mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #88 SMP Sun
Feb 28 10:13:21 PST 2021
[    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
[    0.000000] FPU revision is: 00130001
[    0.000000] memblock_add: [0x00000000-0x0fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x20000000-0x4fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x90000000-0xcfffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] MIPS: machine is Broadcom BCM97435SVMB
[    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] memblock_reserve: [0x00aa9600-0x00aac0a0]
setup_arch+0x128/0x69c
[    0.000000] memblock_reserve: [0x00010000-0x018313cf]
setup_arch+0x1f8/0x69c
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x01831400-0x01833ea0]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x01833ea4-0x0183be4b]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x018313d0-0x018313e8]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_reserve: [0x0096c000-0x0096bfff]
setup_arch+0x3fc/0x69c
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0183be80-0x0183be9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0183bf00-0x0183bf1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0183bf80-0x0183bf9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
linesize 32 bytes
[    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0183c000-0x0183cfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0183d000-0x0183dfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0183e000-0x0183efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
[    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000000000-0x00000000cfffffff]
[    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.135+0x6c/0xc8
[    0.000000] memblock_reserve: [0x0183f000-0x0323efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0323f000-0x0323f01f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0323f080-0x0323f1ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
bytes flags: 0x0
[    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x8
[    0.000000]  reserved[0x0]   [0x00010000-0x018313e8], 0x018213e9
bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x01831400-0x01833ea0], 0x00002aa1
bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0x01833ea4-0x0183be4b], 0x00007fa8
bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0x0183be80-0x0183be9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0x0183bf00-0x0183bf1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x5]   [0x0183bf80-0x0183bf9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x6]   [0x0183c000-0x0323f01f], 0x01a03020
bytes flags: 0x0
[    0.000000]  reserved[0x7]   [0x0323f080-0x0323f1ff], 0x00000180
bytes flags: 0x0
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
[    0.000000] memblock_reserve: [0x0323f200-0x0323f21d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
[    0.000000] memblock_reserve: [0x0323f280-0x0323f29d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
[    0.000000] memblock_reserve: [0x03240000-0x03240fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
[    0.000000] memblock_reserve: [0x03241000-0x03241fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
[    0.000000] memblock_reserve: [0x03242000-0x03289fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03254000-0x03253fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03266000-0x03265fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03278000-0x03277fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x0328a000-0x03289fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
[    0.000000] memblock_reserve: [0x0323f300-0x0323f303]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
[    0.000000] memblock_reserve: [0x0323f380-0x0323f383]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
[    0.000000] memblock_reserve: [0x0323f400-0x0323f40f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
[    0.000000] memblock_reserve: [0x0323f480-0x0323f48f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
[    0.000000] memblock_reserve: [0x0323f500-0x0323f57f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
[    0.000000] memblock_reserve: [0x0323f580-0x0323f5db]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
[    0.000000] memblock_reserve: [0x0323f600-0x0323f8ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
[    0.000000] memblock_reserve: [0x0323f900-0x0323fc03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
[    0.000000] memblock_reserve: [0x0323fc80-0x0323fd3f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03240000-0x03240fff]
pcpu_embed_first_chunk+0x838/0x884
[    0.000000] memblock_free: [0x03241000-0x03241fff]
pcpu_embed_first_chunk+0x850/0x884
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
[    0.000000] Kernel command line: console=ttyS0,115200 earlycon
[    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0328a000-0x032a9fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
bytes, linear)
[    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x032aa000-0x032b9fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
bytes, linear)
[    0.000000] memblock_reserve: [0x00000000-0x000003ff]
trap_init+0x70/0x4e8
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 2045272K/2097152K available (8226K kernel code,
1078K rwdata, 1336K rodata, 13800K init, 260K bss, 51880K reserved, 0K
cma-reserved, 1835008K highmem)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] irq_bcm7038_l1: registered BCM7038 L1 intc
(/rdb/interrupt-controller@41b500, IRQs: 128)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@403000, parent irq: 52)
[    0.000000] irq_bcm7120_l2: registered BCM7120 L2 intc
(/rdb/interrupt-controller@406780, parent IRQ(s): 2)
[    0.000000] irq_bcm7120_l2: registered BCM7120 L2 intc
(/rdb/interrupt-controller@409480, parent IRQ(s): 3)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@408440, parent irq: 54)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@41b000, parent irq: 24)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@41bd00, parent irq: 25)
[    0.000000] random: get_random_bytes called from
start_kernel+0x444/0x654 with crng_init=0
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 10882621761 ns
[    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
wraps every 8589934590000000ns

The DTB is located at this offset within vmlinux:

37084: 80aac0a1      0 OBJECT  GLOBAL DEFAULT       10
__dtb_bcm97435svmb_end
48909: 80aa9600      0 OBJECT  GLOBAL DEFAULT       10
__dtb_bcm97435svmb_begin

0x8000_0000 maps to physical address 0x0 on these MIPS platforms.
Serge Semin Feb. 28, 2021, 11:08 p.m. UTC | #12
Hi folks,
What you've got here seems a more complicated problem than it
could originally look like. Please, see my comments below.

(Note I've discarded some of the email logs, which of no interest
to the discovered problem. Please also note that I haven't got any
Broadcom hardware to test out a solution suggested below.)

On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> Hi Mike,
> 
> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > Hi Florian,
> > 
> > On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>

> >> [...]

> >>
> >> Hi Roman, Thomas and other linux-mips folks,
> >>
> >> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >> commit, reverting it makes our MIPS platforms boot successfully. We do
> >> not see a warning like this one in the commit message, instead what
> >> happens appear to be a corrupted Device Tree which prevents the parsing
> >> of the "rdb" node and leading to the interrupt controllers not being
> >> registered, and the system eventually not booting.
> >>
> >> The Device Tree is built-into the kernel image and resides at
> >> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>
> >> Do you have any idea what could be wrong with MIPS specifically here?

Most likely the problem you've discovered has been there for quite
some time. The patch you are referring to just caused it to be
triggered by extending the early allocation range. See before that
patch was accepted the early memory allocations had been performed
in the range:
[kernel_end, RAM_END].
The patch changed that, so the early allocations are done within
[RAM_START + PAGE_SIZE, RAM_END].

In normal situations it's safe to do that as long as all the critical
memory regions (including the memory residing a space below the
kernel) have been reserved. But as soon as a memory with some critical
structures haven't been reserved, the kernel may allocate it to be used
for instance for early initializations with obviously unpredictable but
most of the times unpleasant consequences.

> > 
> > Apparently there is a memblock allocation in one of the functions called
> > from arch_mem_init() between plat_mem_setup() and
> > early_init_fdt_reserve_self().

Mike, alas according to the log provided by Florian that's not the reason
of the problem. Please, see my considerations below.

> [...]
> 
> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> Feb 28 10:01:50 PST 2021
> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> [    0.000000] FPU revision is: 00130001

> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> early_init_dt_scan_memory+0x160/0x1e0
> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> early_init_dt_scan_memory+0x160/0x1e0
> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> early_init_dt_scan_memory+0x160/0x1e0

Here the memory has been added to the memblock allocator.

> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> [    0.000000] printk: bootconsole [ns16550a0] enabled

> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> setup_arch+0x128/0x69c

Here the fdt memory has been reserved. (Note it's built into the
kernel.)

> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> setup_arch+0x1f8/0x69c

Here the kernel itself together with built-in dtb have been reserved.
So far so good.

> [    0.000000] Initrd not found or empty - disabling initrd

> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> from=0x00000000 max_addr=0x00000000
> early_init_dt_alloc_memory_arch+0x40/0x84
> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> from=0x00000000 max_addr=0x00000000
> early_init_dt_alloc_memory_arch+0x40/0x84
> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> memblock_alloc_range_nid+0xf8/0x198

The log above most likely belongs to the call-chain:
setup_arch()
+-> arch_mem_init()
    +-> device_tree_init() - BMIPS specific method
        +-> unflatten_and_copy_device_tree()

So to speak here we've copied the fdt from the original space
[0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
it to [0x00003aa4-0x0000ba4b].

The problem is that a bit later the next call-chain is performed:
setup_arch()
+-> plat_smp_setup()
    +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
        +-> if (!board_ebase_setup)
                 board_ebase_setup = &bmips_ebase_setup;

So at the moment of the CPU traps initialization the bmips_ebase_setup()
method is called. What trap_init() does isn't compatible with the
allocation performed by the unflatten_and_copy_device_tree() method.
See the next comment.

> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> from=0x00000000 max_addr=0x00000000
> early_init_dt_alloc_memory_arch+0x40/0x84
> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> setup_arch+0x3fc/0x69c
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> bytes.
> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> linesize 32 bytes
> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> [    0.000000] Initmem setup node 0 [mem
> 0x0000000000000000-0x00000000cfffffff]
> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> from=0x00000000 max_addr=0x00000000
> alloc_node_mem_map.constprop.135+0x6c/0xc8
> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> [    0.000000]  memory.cnt  = 0x3
> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> bytes flags: 0x0
> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> bytes flags: 0x0
> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0xa
> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> bytes flags: 0x0
> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> bytes flags: 0x0
> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> bytes flags: 0x0
> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> bytes flags: 0x0
> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> bytes flags: 0x0
> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> bytes flags: 0x0
> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> pcpu_embed_first_chunk+0x838/0x884
> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> pcpu_embed_first_chunk+0x850/0x884
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> bytes, linear)
> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> bytes, linear)

> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> trap_init+0x70/0x4e8

Most likely someplace here the corruption has happened. The log above
has just reserved a memory for NMI/reset vectors:
arch/mips/kernel/traps.c: trap_init(void): Line 2373.

But then the board_ebase_setup() pointer is dereferenced and called,
which has been initialized with bmips_ebase_setup() earlier and which
overwrites the ebase variable with: 0x80001000 as this is
CPU_BMIPS5000 CPU. So any further calls of the functions like
set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
corruption of the memory above 0x80001000, which as we have discovered
belongs to fdt and unflattened device tree.

> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> cma-reserved, 1835008K highmem)
> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [    0.000000] rcu: Hierarchical RCU implementation.
> [    0.000000] rcu:     RCU event tracing is enabled.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> is 25 jiffies.
> [    0.000000] NR_IRQS: 256

> [    0.000000] OF: Bad cell count for /rdb
> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> [    0.000000] OF: of_irq_init: children remain, but no parents

So here is the first time we have got the consequence of the corruption
popped up. Luckily it's just the "Bad cells count" error. We could have
got much less obvious log here up to getting a crash at some place
further...

> [    0.000000] random: get_random_bytes called from
> start_kernel+0x444/0x654 with crng_init=0
> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> wraps every 8589934590000000ns

> 
> and with your patch applied which unfortunately did not work we have the
> following:
>
> [...]

So a patch like this shall workaround the corruption:

--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
 
 	__dt_setup_arch(dtb);
 
+	memblock_reserve(0x0, 0x1000 + 0x100*64);
+
 	for (q = bmips_quirk_list; q->quirk_fn; q++) {
 		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
 					     q->compatible)) {

But the main question is how to fix the problem in general. At least
for Broadcom CPUs the reservation needs to be performed before
device_tree_init() is called, since the later is the very first
method which starts allocating from memblock. So the best candidate is
to use plat_mem_setup() for reservation right after the memory is
added to the memblock allocator by means of the __dt_setup_arch()
function invocation. In addition, we need take into account the amount
of memory each type of the Broadcom CPU needs for the exception
vectors. So a function like this could be used to reserve the
exception vectors memory:

static void bmips_ebase_reserve(void)
{
	phys_addr_t base, size = VECTORSPACING*64;

	switch (current_cpu_type()) {
	case CPU_BMIPS4350:
		return;
	case CPU_BMIPS3300:        
	case CPU_BMIPS4380:
		base = 0x0400;
		break;
	case CPU_BMIPS5000:
		base = 0x1000;
		break;
	default:
		return;
	}

	memblock_reserve(base, size);
}

Though I am not sure it's correct. At least on P5600 the vector spacing
is configurable.

Anyway all of that concerns the Broadcom CPUs. But the same problem we
can experience for some other platforms which developers weren't
careful enough in reserving all the critical memory sections in the
platform code. Especially after the introduced by Roman patch has been
merged into the kernel.

-Sergey

> 
> [...]
> -- 
> Florian
Florian Fainelli March 1, 2021, 3:50 a.m. UTC | #13
Hi Serge,

On 2/28/2021 3:08 PM, Serge Semin wrote:
> Hi folks,
> What you've got here seems a more complicated problem than it
> could originally look like. Please, see my comments below.
> 
> (Note I've discarded some of the email logs, which of no interest
> to the discovered problem. Please also note that I haven't got any
> Broadcom hardware to test out a solution suggested below.)
> 
> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>> Hi Mike,
>>
>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>> Hi Florian,
>>>
>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>
> 
>>>> [...]
> 
>>>>
>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>
>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>> not see a warning like this one in the commit message, instead what
>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>> registered, and the system eventually not booting.
>>>>
>>>> The Device Tree is built-into the kernel image and resides at
>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>
>>>> Do you have any idea what could be wrong with MIPS specifically here?
> 
> Most likely the problem you've discovered has been there for quite
> some time. The patch you are referring to just caused it to be
> triggered by extending the early allocation range. See before that
> patch was accepted the early memory allocations had been performed
> in the range:
> [kernel_end, RAM_END].
> The patch changed that, so the early allocations are done within
> [RAM_START + PAGE_SIZE, RAM_END].
> 
> In normal situations it's safe to do that as long as all the critical
> memory regions (including the memory residing a space below the
> kernel) have been reserved. But as soon as a memory with some critical
> structures haven't been reserved, the kernel may allocate it to be used
> for instance for early initializations with obviously unpredictable but
> most of the times unpleasant consequences.
> 
>>>
>>> Apparently there is a memblock allocation in one of the functions called
>>> from arch_mem_init() between plat_mem_setup() and
>>> early_init_fdt_reserve_self().
> 
> Mike, alas according to the log provided by Florian that's not the reason
> of the problem. Please, see my considerations below.
> 
>> [...]
>>
>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>> Feb 28 10:01:50 PST 2021
>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>> [    0.000000] FPU revision is: 00130001
> 
>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
> 
> Here the memory has been added to the memblock allocator.
> 
>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>> [    0.000000] printk: bootconsole [ns16550a0] enabled
> 
>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>> setup_arch+0x128/0x69c
> 
> Here the fdt memory has been reserved. (Note it's built into the
> kernel.)
> 
>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>> setup_arch+0x1f8/0x69c
> 
> Here the kernel itself together with built-in dtb have been reserved.
> So far so good.
> 
>> [    0.000000] Initrd not found or empty - disabling initrd
> 
>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>> memblock_alloc_range_nid+0xf8/0x198
> 
> The log above most likely belongs to the call-chain:
> setup_arch()
> +-> arch_mem_init()
>     +-> device_tree_init() - BMIPS specific method
>         +-> unflatten_and_copy_device_tree()
> 
> So to speak here we've copied the fdt from the original space
> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> it to [0x00003aa4-0x0000ba4b].
> 
> The problem is that a bit later the next call-chain is performed:
> setup_arch()
> +-> plat_smp_setup()
>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>         +-> if (!board_ebase_setup)
>                  board_ebase_setup = &bmips_ebase_setup;
> 
> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> method is called. What trap_init() does isn't compatible with the
> allocation performed by the unflatten_and_copy_device_tree() method.
> See the next comment.
> 
>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>> setup_arch+0x3fc/0x69c
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>> bytes.
>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>> linesize 32 bytes
>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Zone ranges:
>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>> [    0.000000] Initmem setup node 0 [mem
>> 0x0000000000000000-0x00000000cfffffff]
>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000
>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] MEMBLOCK configuration:
>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>> [    0.000000]  memory.cnt  = 0x3
>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>> bytes flags: 0x0
>> [    0.000000]  reserved.cnt  = 0xa
>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>> bytes flags: 0x0
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>> pcpu_embed_first_chunk+0x838/0x884
>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>> pcpu_embed_first_chunk+0x850/0x884
>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>> bytes, linear)
>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>> bytes, linear)
> 
>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>> trap_init+0x70/0x4e8
> 
> Most likely someplace here the corruption has happened. The log above
> has just reserved a memory for NMI/reset vectors:
> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> 
> But then the board_ebase_setup() pointer is dereferenced and called,
> which has been initialized with bmips_ebase_setup() earlier and which
> overwrites the ebase variable with: 0x80001000 as this is
> CPU_BMIPS5000 CPU. So any further calls of the functions like
> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> corruption of the memory above 0x80001000, which as we have discovered
> belongs to fdt and unflattened device tree.
> 
>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>> cma-reserved, 1835008K highmem)
>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> [    0.000000] rcu: Hierarchical RCU implementation.
>> [    0.000000] rcu:     RCU event tracing is enabled.
>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>> is 25 jiffies.
>> [    0.000000] NR_IRQS: 256
> 
>> [    0.000000] OF: Bad cell count for /rdb
>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>> [    0.000000] OF: of_irq_init: children remain, but no parents
> 
> So here is the first time we have got the consequence of the corruption
> popped up. Luckily it's just the "Bad cells count" error. We could have
> got much less obvious log here up to getting a crash at some place
> further...
> 
>> [    0.000000] random: get_random_bytes called from
>> start_kernel+0x444/0x654 with crng_init=0
>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>> wraps every 8589934590000000ns
> 
>>
>> and with your patch applied which unfortunately did not work we have the
>> following:
>>
>> [...]
> 
> So a patch like this shall workaround the corruption:
> 
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {

This patch works, thanks a lot for the troubleshooting and analysis! How
about the following which would be more generic and works as well and
should be more universal since it does not require each architecture to
provide an appropriate call to memblock_reserve():

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..b0a173b500e8 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2367,10 +2367,7 @@ void __init trap_init(void)

        if (!cpu_has_mips_r2_r6) {
                ebase = CAC_BASE;
-               ebase_pa = virt_to_phys((void *)ebase);
                vec_size = 0x400;
-
-               memblock_reserve(ebase_pa, vec_size);
        } else {
                if (cpu_has_veic || cpu_has_vint)
                        vec_size = 0x200 + VECTORSPACING*64;
@@ -2410,6 +2407,14 @@ void __init trap_init(void)

        if (board_ebase_setup)
                board_ebase_setup();
+
+       /* board_ebase_setup() can change the exception base address
+        * reserve it now after changes were made.
+        */
+       if (!cpu_has_mips_r2_r6) {
+               ebase_pa = virt_to_phys((void *)ebase);
+               memblock_reserve(ebase_pa, vec_size);
+       }
        per_cpu_trap_init(true);
        memblock_set_bottom_up(false);
Serge Semin March 1, 2021, 9:22 a.m. UTC | #14
On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >> setup_arch+0x3fc/0x69c
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >> bytes.
> >> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >> linesize 32 bytes
> >> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> >> [    0.000000] Initmem setup node 0 [mem
> >> 0x0000000000000000-0x00000000cfffffff]
> >> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000
> >> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] MEMBLOCK configuration:
> >> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> >> [    0.000000]  memory.cnt  = 0x3
> >> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved.cnt  = 0xa
> >> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> >> bytes flags: 0x0
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >> pcpu_embed_first_chunk+0x838/0x884
> >> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> >> pcpu_embed_first_chunk+0x850/0x884
> >> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> >> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >> bytes, linear)
> >> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 

> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():

Hm, are you sure it's working? If so, my analysis hasn't been quite
correct. My suggestion was based on the memory initializations,
allocations and reservations trace. So here is the sequence of most
crucial of them:
1) Memblock initialization:
   start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
   (At this point I suggested to place the exceptions memory
    reservation.)
2) Base FDT memory reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
   reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
4) Reserve kernel itself, some critical sections like initrd and
   crash-kernel:
   start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
5) Copy and unflatten the built-into the kernel device tree
   (BMIPS-platform code):
   start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
   This is the very first time an allocation from the memblock pool
   is performed. Since we haven't reserved a memory for the exception
   vectors yet, the memblock allocator is free to return that memory
   range for any other use. Needless to say if we try to use that memory
   later without consulting with memblock, we may and in our case
   will get into troubles.
6) Many random early memblock allocations for kernel use before
   buddy and sl*b allocators are up and running...
   Note if for some fortunate reason the allocations made in 5) didn't
   overlap the exceptions memory, here we have much more chances to
   do that with obviously fatal consequences of the ranges independent
   usage.
7) Trap/exception vectors initialization and !memory reservation! for
   them:
   start_kernel()->trap_init()
   Only at this point we get to reserve the memory for the vectors.
8) Init and run buddy/sl*b allocators:
   start_kernel()->mm_init()->...mem_init()...

There are a lot of allocations done in 5) and 6) before the
trap_init() is called in 7). You can see that in your log. That's why
I have doubts that your patch worked well. Most likely you've
forgotten to revert the workaround suggested by me in the previous
message. Could you make sure that you didn't and re-test your patch
again? If it still works then I might have confused something and it's
strange that my patch worked in the first place...

A food for thoughts for everyone (Thomas, Mark, please join the
discussion). What we've got here is a bit bigger problem. AFAICS
if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
performs the allocation above the very first PAGE_SIZE memory chunk
(see that method code for details). So we are currently on a safe side
for some older MIPS platforms. But the platform with VEIC/VINT may get
into the same troubles here if they didn't reserve exception memory
early enough before the kernel starts random allocations from
memblock. So we either need to provide a generic workaround for that
or make sure each platform gets to reserve vectors itself for instance
in the plat_mem_setup() method.

-Sergey

> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }
>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);
> -- 
> Florian
Mike Rapoport March 1, 2021, 9:45 a.m. UTC | #15
On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84

...

> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 
> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():
> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }

With this it's still possible to have memblock allocations around ebase_pa
before it is reserved.

I think we have two options here to solve it in more or less generic way:

* split the reservation of ebase from traps_init() and move it earlier to
setup_arch(). I didn't check what board_ebase_setup() do, if they need to
allocate memory it would not work.

* add an API to memblock to set lower limit for allocations and then set
the lower limit, to e.g. kernel load address in arch_mem_init(). This may
add complexity for configurations with relocatable kernel and kaslr.

>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);
Roman Gushchin March 2, 2021, 3:55 a.m. UTC | #16
On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> > Hi Serge,
> > 
> > On 2/28/2021 3:08 PM, Serge Semin wrote:
> > > Hi folks,
> > > What you've got here seems a more complicated problem than it
> > > could originally look like. Please, see my comments below.
> > > 
> > > (Note I've discarded some of the email logs, which of no interest
> > > to the discovered problem. Please also note that I haven't got any
> > > Broadcom hardware to test out a solution suggested below.)
> > > 
> > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> > >> Hi Mike,
> > >>
> > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > >>> Hi Florian,
> > >>>
> > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> > >>>>
> > > 
> > >>>> [...]
> > > 
> > >>>>
> > >>>> Hi Roman, Thomas and other linux-mips folks,
> > >>>>
> > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> > >>>> not see a warning like this one in the commit message, instead what
> > >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> > >>>> of the "rdb" node and leading to the interrupt controllers not being
> > >>>> registered, and the system eventually not booting.
> > >>>>
> > >>>> The Device Tree is built-into the kernel image and resides at
> > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> > >>>>
> > >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > > 
> > > Most likely the problem you've discovered has been there for quite
> > > some time. The patch you are referring to just caused it to be
> > > triggered by extending the early allocation range. See before that
> > > patch was accepted the early memory allocations had been performed
> > > in the range:
> > > [kernel_end, RAM_END].
> > > The patch changed that, so the early allocations are done within
> > > [RAM_START + PAGE_SIZE, RAM_END].
> > > 
> > > In normal situations it's safe to do that as long as all the critical
> > > memory regions (including the memory residing a space below the
> > > kernel) have been reserved. But as soon as a memory with some critical
> > > structures haven't been reserved, the kernel may allocate it to be used
> > > for instance for early initializations with obviously unpredictable but
> > > most of the times unpleasant consequences.
> > > 
> > >>>
> > >>> Apparently there is a memblock allocation in one of the functions called
> > >>> from arch_mem_init() between plat_mem_setup() and
> > >>> early_init_fdt_reserve_self().
> > > 
> > > Mike, alas according to the log provided by Florian that's not the reason
> > > of the problem. Please, see my considerations below.
> > > 
> > >> [...]
> > >>
> > >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> > >> Feb 28 10:01:50 PST 2021
> > >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> > >> [    0.000000] FPU revision is: 00130001
> > > 
> > >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > > 
> > > Here the memory has been added to the memblock allocator.
> > > 
> > >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> > >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> > >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > 
> > >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> > >> setup_arch+0x128/0x69c
> > > 
> > > Here the fdt memory has been reserved. (Note it's built into the
> > > kernel.)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> > >> setup_arch+0x1f8/0x69c
> > > 
> > > Here the kernel itself together with built-in dtb have been reserved.
> > > So far so good.
> > > 
> > >> [    0.000000] Initrd not found or empty - disabling initrd
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > > 
> > > The log above most likely belongs to the call-chain:
> > > setup_arch()
> > > +-> arch_mem_init()
> > >     +-> device_tree_init() - BMIPS specific method
> > >         +-> unflatten_and_copy_device_tree()
> > > 
> > > So to speak here we've copied the fdt from the original space
> > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > > it to [0x00003aa4-0x0000ba4b].
> > > 
> > > The problem is that a bit later the next call-chain is performed:
> > > setup_arch()
> > > +-> plat_smp_setup()
> > >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> > >         +-> if (!board_ebase_setup)
> > >                  board_ebase_setup = &bmips_ebase_setup;
> > > 
> > > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > > method is called. What trap_init() does isn't compatible with the
> > > allocation performed by the unflatten_and_copy_device_tree() method.
> > > See the next comment.
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> 
> ...
> 
> > >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> > >> bytes, linear)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> > >> trap_init+0x70/0x4e8
> > > 
> > > Most likely someplace here the corruption has happened. The log above
> > > has just reserved a memory for NMI/reset vectors:
> > > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > > 
> > > But then the board_ebase_setup() pointer is dereferenced and called,
> > > which has been initialized with bmips_ebase_setup() earlier and which
> > > overwrites the ebase variable with: 0x80001000 as this is
> > > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > > corruption of the memory above 0x80001000, which as we have discovered
> > > belongs to fdt and unflattened device tree.
> > > 
> > >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> > >> cma-reserved, 1835008K highmem)
> > >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > >> [    0.000000] rcu: Hierarchical RCU implementation.
> > >> [    0.000000] rcu:     RCU event tracing is enabled.
> > >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > >> is 25 jiffies.
> > >> [    0.000000] NR_IRQS: 256
> > > 
> > >> [    0.000000] OF: Bad cell count for /rdb
> > >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> > >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > > 
> > > So here is the first time we have got the consequence of the corruption
> > > popped up. Luckily it's just the "Bad cells count" error. We could have
> > > got much less obvious log here up to getting a crash at some place
> > > further...
> > > 
> > >> [    0.000000] random: get_random_bytes called from
> > >> start_kernel+0x444/0x654 with crng_init=0
> > >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> > >> wraps every 8589934590000000ns
> > > 
> > >>
> > >> and with your patch applied which unfortunately did not work we have the
> > >> following:
> > >>
> > >> [...]
> > > 
> > > So a patch like this shall workaround the corruption:
> > > 
> > > --- a/arch/mips/bmips/setup.c
> > > +++ b/arch/mips/bmips/setup.c
> > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> > >  
> > >  	__dt_setup_arch(dtb);
> > >  
> > > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > > +
> > >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> > >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> > >  					     q->compatible)) {
> > 
> > This patch works, thanks a lot for the troubleshooting and analysis! How
> > about the following which would be more generic and works as well and
> > should be more universal since it does not require each architecture to
> > provide an appropriate call to memblock_reserve():
> > 
> > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > index e0352958e2f7..b0a173b500e8 100644
> > --- a/arch/mips/kernel/traps.c
> > +++ b/arch/mips/kernel/traps.c
> > @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> > 
> >         if (!cpu_has_mips_r2_r6) {
> >                 ebase = CAC_BASE;
> > -               ebase_pa = virt_to_phys((void *)ebase);
> >                 vec_size = 0x400;
> > -
> > -               memblock_reserve(ebase_pa, vec_size);
> >         } else {
> >                 if (cpu_has_veic || cpu_has_vint)
> >                         vec_size = 0x200 + VECTORSPACING*64;
> > @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> > 
> >         if (board_ebase_setup)
> >                 board_ebase_setup();
> > +
> > +       /* board_ebase_setup() can change the exception base address
> > +        * reserve it now after changes were made.
> > +        */
> > +       if (!cpu_has_mips_r2_r6) {
> > +               ebase_pa = virt_to_phys((void *)ebase);
> > +               memblock_reserve(ebase_pa, vec_size);
> > +       }

Hi folks!

First, I'm really sorry for breaking things and also being silent for last
couple of days: I was almost completely offline. Thank you for working on
this!

> 
> With this it's still possible to have memblock allocations around ebase_pa
> before it is reserved.
> 
> I think we have two options here to solve it in more or less generic way:
> 
> * split the reservation of ebase from traps_init() and move it earlier to
> setup_arch(). I didn't check what board_ebase_setup() do, if they need to
> allocate memory it would not work.

It seems that it doesn't allocate any memory, so it sounds like a good option.
But doesn't the ebase initialization depend on the memblock allocator?

I see in trap_init():
    if (!cpu_has_mips_r2_r6) {
        ...
    } else {
        ...
	ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
	...
	if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000))
	    ebase = CKSEG0ADDR(ebase_pa);
        else
            ebase = (unsigned long)phys_to_virt(ebase_pa);


> 
> * add an API to memblock to set lower limit for allocations and then set
> the lower limit, to e.g. kernel load address in arch_mem_init(). This may
> add complexity for configurations with relocatable kernel and kaslr.

This option looks more like a workaround to me, but maybe it's ok too.

Thanks!
Florian Fainelli March 2, 2021, 4:09 a.m. UTC | #17
On 3/1/2021 1:22 AM, Serge Semin wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
>> Hi Serge,
>>
>> On 2/28/2021 3:08 PM, Serge Semin wrote:
>>> Hi folks,
>>> What you've got here seems a more complicated problem than it
>>> could originally look like. Please, see my comments below.
>>>
>>> (Note I've discarded some of the email logs, which of no interest
>>> to the discovered problem. Please also note that I haven't got any
>>> Broadcom hardware to test out a solution suggested below.)
>>>
>>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>>>> Hi Mike,
>>>>
>>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>>>> Hi Florian,
>>>>>
>>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>>>
>>>
>>>>>> [...]
>>>
>>>>>>
>>>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>>>
>>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>>>> not see a warning like this one in the commit message, instead what
>>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>>>> registered, and the system eventually not booting.
>>>>>>
>>>>>> The Device Tree is built-into the kernel image and resides at
>>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>>>
>>>>>> Do you have any idea what could be wrong with MIPS specifically here?
>>>
>>> Most likely the problem you've discovered has been there for quite
>>> some time. The patch you are referring to just caused it to be
>>> triggered by extending the early allocation range. See before that
>>> patch was accepted the early memory allocations had been performed
>>> in the range:
>>> [kernel_end, RAM_END].
>>> The patch changed that, so the early allocations are done within
>>> [RAM_START + PAGE_SIZE, RAM_END].
>>>
>>> In normal situations it's safe to do that as long as all the critical
>>> memory regions (including the memory residing a space below the
>>> kernel) have been reserved. But as soon as a memory with some critical
>>> structures haven't been reserved, the kernel may allocate it to be used
>>> for instance for early initializations with obviously unpredictable but
>>> most of the times unpleasant consequences.
>>>
>>>>>
>>>>> Apparently there is a memblock allocation in one of the functions called
>>>>> from arch_mem_init() between plat_mem_setup() and
>>>>> early_init_fdt_reserve_self().
>>>
>>> Mike, alas according to the log provided by Florian that's not the reason
>>> of the problem. Please, see my considerations below.
>>>
>>>> [...]
>>>>
>>>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>>>> Feb 28 10:01:50 PST 2021
>>>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>>>> [    0.000000] FPU revision is: 00130001
>>>
>>>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>
>>> Here the memory has been added to the memblock allocator.
>>>
>>>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>>>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>>>> [    0.000000] printk: bootconsole [ns16550a0] enabled
>>>
>>>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>>>> setup_arch+0x128/0x69c
>>>
>>> Here the fdt memory has been reserved. (Note it's built into the
>>> kernel.)
>>>
>>>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>>>> setup_arch+0x1f8/0x69c
>>>
>>> Here the kernel itself together with built-in dtb have been reserved.
>>> So far so good.
>>>
>>>> [    0.000000] Initrd not found or empty - disabling initrd
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>
>>> The log above most likely belongs to the call-chain:
>>> setup_arch()
>>> +-> arch_mem_init()
>>>     +-> device_tree_init() - BMIPS specific method
>>>         +-> unflatten_and_copy_device_tree()
>>>
>>> So to speak here we've copied the fdt from the original space
>>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
>>> it to [0x00003aa4-0x0000ba4b].
>>>
>>> The problem is that a bit later the next call-chain is performed:
>>> setup_arch()
>>> +-> plat_smp_setup()
>>>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>>>         +-> if (!board_ebase_setup)
>>>                  board_ebase_setup = &bmips_ebase_setup;
>>>
>>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
>>> method is called. What trap_init() does isn't compatible with the
>>> allocation performed by the unflatten_and_copy_device_tree() method.
>>> See the next comment.
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>>>> setup_arch+0x3fc/0x69c
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>>>> bytes.
>>>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>>>> linesize 32 bytes
>>>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Zone ranges:
>>>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000] Early memory node ranges
>>>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>>>> [    0.000000] Initmem setup node 0 [mem
>>>> 0x0000000000000000-0x00000000cfffffff]
>>>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000
>>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>>>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] MEMBLOCK configuration:
>>>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>>>> [    0.000000]  memory.cnt  = 0x3
>>>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved.cnt  = 0xa
>>>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>>>> bytes flags: 0x0
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>>>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>>>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>>>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>>>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>>>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>>>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>>>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>>>> pcpu_embed_first_chunk+0x838/0x884
>>>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>>>> pcpu_embed_first_chunk+0x850/0x884
>>>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>>>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>>>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>>>> bytes, linear)
>>>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>>>> bytes, linear)
>>>
>>>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>>>> trap_init+0x70/0x4e8
>>>
>>> Most likely someplace here the corruption has happened. The log above
>>> has just reserved a memory for NMI/reset vectors:
>>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
>>>
>>> But then the board_ebase_setup() pointer is dereferenced and called,
>>> which has been initialized with bmips_ebase_setup() earlier and which
>>> overwrites the ebase variable with: 0x80001000 as this is
>>> CPU_BMIPS5000 CPU. So any further calls of the functions like
>>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
>>> corruption of the memory above 0x80001000, which as we have discovered
>>> belongs to fdt and unflattened device tree.
>>>
>>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>>>> cma-reserved, 1835008K highmem)
>>>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>>>> [    0.000000] rcu: Hierarchical RCU implementation.
>>>> [    0.000000] rcu:     RCU event tracing is enabled.
>>>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>>>> is 25 jiffies.
>>>> [    0.000000] NR_IRQS: 256
>>>
>>>> [    0.000000] OF: Bad cell count for /rdb
>>>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>>>> [    0.000000] OF: of_irq_init: children remain, but no parents
>>>
>>> So here is the first time we have got the consequence of the corruption
>>> popped up. Luckily it's just the "Bad cells count" error. We could have
>>> got much less obvious log here up to getting a crash at some place
>>> further...
>>>
>>>> [    0.000000] random: get_random_bytes called from
>>>> start_kernel+0x444/0x654 with crng_init=0
>>>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>>>> wraps every 8589934590000000ns
>>>
>>>>
>>>> and with your patch applied which unfortunately did not work we have the
>>>> following:
>>>>
>>>> [...]
>>>
>>> So a patch like this shall workaround the corruption:
>>>
>>> --- a/arch/mips/bmips/setup.c
>>> +++ b/arch/mips/bmips/setup.c
>>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>>>  
>>>  	__dt_setup_arch(dtb);
>>>  
>>> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
>>> +
>>>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>>>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>>>  					     q->compatible)) {
>>
> 
>> This patch works, thanks a lot for the troubleshooting and analysis! How
>> about the following which would be more generic and works as well and
>> should be more universal since it does not require each architecture to
>> provide an appropriate call to memblock_reserve():
> 
> Hm, are you sure it's working?

I was until I noticed that I was working on top of a revert of Roman's
patch sorry about the brain fart here.

> If so, my analysis hasn't been quite
> correct. My suggestion was based on the memory initializations,
> allocations and reservations trace. So here is the sequence of most
> crucial of them:
> 1) Memblock initialization:
>    start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
>    (At this point I suggested to place the exceptions memory
>     reservation.)
> 2) Base FDT memory reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
>    reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> 4) Reserve kernel itself, some critical sections like initrd and
>    crash-kernel:
>    start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> 5) Copy and unflatten the built-into the kernel device tree
>    (BMIPS-platform code):
>    start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
>    This is the very first time an allocation from the memblock pool
>    is performed. Since we haven't reserved a memory for the exception
>    vectors yet, the memblock allocator is free to return that memory
>    range for any other use. Needless to say if we try to use that memory
>    later without consulting with memblock, we may and in our case
>    will get into troubles.
> 6) Many random early memblock allocations for kernel use before
>    buddy and sl*b allocators are up and running...
>    Note if for some fortunate reason the allocations made in 5) didn't
>    overlap the exceptions memory, here we have much more chances to
>    do that with obviously fatal consequences of the ranges independent
>    usage.
> 7) Trap/exception vectors initialization and !memory reservation! for
>    them:
>    start_kernel()->trap_init()
>    Only at this point we get to reserve the memory for the vectors.
> 8) Init and run buddy/sl*b allocators:
>    start_kernel()->mm_init()->...mem_init()...
> 
> There are a lot of allocations done in 5) and 6) before the
> trap_init() is called in 7). You can see that in your log. That's why
> I have doubts that your patch worked well. Most likely you've
> forgotten to revert the workaround suggested by me in the previous
> message. Could you make sure that you didn't and re-test your patch
> again? If it still works then I might have confused something and it's
> strange that my patch worked in the first place...

I would like to submit a fix for 5.12-rc1 and get it back ported into
5.11 so we have BMIPS machines boot again, that will be essentially your
earlier proposed fix.

BMIPS is the only "legacy" MIPS platform that defines an exception base,
so while this problem may certainly exist with other platforms, I do
wonder how likely it is there, though?

> 
> A food for thoughts for everyone (Thomas, Mark, please join the
> discussion). What we've got here is a bit bigger problem. AFAICS
> if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> performs the allocation above the very first PAGE_SIZE memory chunk
> (see that method code for details). So we are currently on a safe side
> for some older MIPS platforms. But the platform with VEIC/VINT may get
> into the same troubles here if they didn't reserve exception memory
> early enough before the kernel starts random allocations from
> memblock. So we either need to provide a generic workaround for that
> or make sure each platform gets to reserve vectors itself for instance
> in the plat_mem_setup() method.
> 
> -Sergey
> 
>>
>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
>> index e0352958e2f7..b0a173b500e8 100644
>> --- a/arch/mips/kernel/traps.c
>> +++ b/arch/mips/kernel/traps.c
>> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
>>
>>         if (!cpu_has_mips_r2_r6) {
>>                 ebase = CAC_BASE;
>> -               ebase_pa = virt_to_phys((void *)ebase);
>>                 vec_size = 0x400;
>> -
>> -               memblock_reserve(ebase_pa, vec_size);
>>         } else {
>>                 if (cpu_has_veic || cpu_has_vint)
>>                         vec_size = 0x200 + VECTORSPACING*64;
>> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
>>
>>         if (board_ebase_setup)
>>                 board_ebase_setup();
>> +
>> +       /* board_ebase_setup() can change the exception base address
>> +        * reserve it now after changes were made.
>> +        */
>> +       if (!cpu_has_mips_r2_r6) {
>> +               ebase_pa = virt_to_phys((void *)ebase);
>> +               memblock_reserve(ebase_pa, vec_size);
>> +       }
>>         per_cpu_trap_init(true);
>>         memblock_set_bottom_up(false);
>> -- 
>> Florian
diff mbox series

Patch

diff --git a/mm/memblock.c b/mm/memblock.c
index b68ee86788af..10bd7d1ef0f4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -275,14 +275,6 @@  __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
  *
  * Find @size free area aligned to @align in the specified range and node.
  *
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
  * Return:
  * Found address on success, 0 on failure.
  */
@@ -291,8 +283,6 @@  static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 					phys_addr_t end, int nid,
 					enum memblock_flags flags)
 {
-	phys_addr_t kernel_end, ret;
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
 	    end == MEMBLOCK_ALLOC_KASAN)
@@ -301,40 +291,13 @@  static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 	/* avoid allocating the first page */
 	start = max_t(phys_addr_t, start, PAGE_SIZE);
 	end = max(start, end);
-	kernel_end = __pa_symbol(_end);
-
-	/*
-	 * try bottom-up allocation only when bottom-up mode
-	 * is set and @end is above the kernel image.
-	 */
-	if (memblock_bottom_up() && end > kernel_end) {
-		phys_addr_t bottom_up_start;
-
-		/* make sure we will allocate above the kernel */
-		bottom_up_start = max(start, kernel_end);
 
-		/* ok, try bottom-up allocation first */
-		ret = __memblock_find_range_bottom_up(bottom_up_start, end,
-						      size, align, nid, flags);
-		if (ret)
-			return ret;
-
-		/*
-		 * we always limit bottom-up allocation above the kernel,
-		 * but top-down allocation doesn't have the limit, so
-		 * retrying top-down allocation may succeed when bottom-up
-		 * allocation failed.
-		 *
-		 * bottom-up allocation is expected to be fail very rarely,
-		 * so we use WARN_ONCE() here to see the stack trace if
-		 * fail happens.
-		 */
-		WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
-			  "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
-	}
-
-	return __memblock_find_range_top_down(start, end, size, align, nid,
-					      flags);
+	if (memblock_bottom_up())
+		return __memblock_find_range_bottom_up(start, end, size, align,
+						       nid, flags);
+	else
+		return __memblock_find_range_top_down(start, end, size, align,
+						      nid, flags);
 }
 
 /**