Message ID | 20190423224748.3765-5-fancer.lancer@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | mips: Post-bootmem-memblock transition fixes | expand |
Hi Serge, On Wed, Apr 24, 2019 at 01:47:40AM +0300, Serge Semin wrote: > The reserved_end variable had been used by the bootmem_init() code > to find a lowest limit of memory available for memmap blob. The original > code just tried to find a free memory space higher than kernel was placed. > This limitation seems justified for the memmap ragion search process, but > I can't see any obvious reason to reserve the unused space below kernel > seeing some platforms place it much higher than standard 1MB. There are 2 reasons I'm aware of: 1) Older systems generally had something like an ISA bus which used addresses below the kernel, and bootloaders like YAMON left behind functions that could be called right at the start of RAM. This sort of thing should be accounted for by /memreserve/ in DT or similar platform-specific reservations though rather than generically, and at least Malta & SEAD-3 DTs already have /memreserve/ entries for it. So this part I think is OK. Some other older platforms might need updating, but that's fine. 2) trap_init() only allocates memory for the exception vector if using a vectored interrupt mode. In other cases it just uses CAC_BASE which currently gets reserved as part of this region between PHYS_OFFSET & _text. I think this behavior is bogus, and we should instead: - Allocate the exception vector memory using memblock_alloc() for CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable EBase register). If we're not using vectored interrupts then allocating one page will do, and we already have the size calculation for if we are. - Otherwise use CAC_BASE but call memblock_reserve() on the first page. I think we should make that change before this one goes in. I can try to get to it tomorrow, but feel free to beat me to it. Thanks, Paul
On Wed, Apr 24, 2019 at 10:43:48PM +0000, Paul Burton wrote: > Hi Serge, > > On Wed, Apr 24, 2019 at 01:47:40AM +0300, Serge Semin wrote: > > The reserved_end variable had been used by the bootmem_init() code > > to find a lowest limit of memory available for memmap blob. The original > > code just tried to find a free memory space higher than kernel was placed. > > This limitation seems justified for the memmap ragion search process, but > > I can't see any obvious reason to reserve the unused space below kernel > > seeing some platforms place it much higher than standard 1MB. > > There are 2 reasons I'm aware of: > > 1) Older systems generally had something like an ISA bus which used > addresses below the kernel, and bootloaders like YAMON left behind > functions that could be called right at the start of RAM. This sort > of thing should be accounted for by /memreserve/ in DT or similar > platform-specific reservations though rather than generically, and > at least Malta & SEAD-3 DTs already have /memreserve/ entries for > it. So this part I think is OK. Some other older platforms might > need updating, but that's fine. > Regarding ISA. As far as I remember devices on that bus can DMA only to the lowest 16MB. So in case if kernel is too big or placed pretty much high, they may be left even without reachable memory at all in current implementation. > 2) trap_init() only allocates memory for the exception vector if using > a vectored interrupt mode. In other cases it just uses CAC_BASE > which currently gets reserved as part of this region between > PHYS_OFFSET & _text. > > I think this behavior is bogus, and we should instead: > > - Allocate the exception vector memory using memblock_alloc() for > CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable > EBase register). If we're not using vectored interrupts then > allocating one page will do, and we already have the size > calculation for if we are. > > - Otherwise use CAC_BASE but call memblock_reserve() on the first > page. > > I think we should make that change before this one goes in. I can > try to get to it tomorrow, but feel free to beat me to it. > As far as I understood you and the code this should be enough to fix the problem: diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 98ca55d62201..f680253e2617 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2326,6 +2326,8 @@ void __init trap_init(void) ebase += (read_c0_ebase() & 0x3ffff000); } } + + memblock_reserve(ebase, PAGE_SIZE); } if (cpu_has_mmips) { --- Allocation has already been implemented in the if-branch under the (cpu_has_veic || cpu_has_vint) condition. So we don't need to change there anything. In case if vectored interrupts aren't supported the else-clause is taken and we need to reserve whatever is set in the exception base address variable. I'll add this patch between 3d and 4th ones if you are ok with it. -Sergey > Thanks, > Paul
Hi Serge, On Fri, Apr 26, 2019 at 03:00:36AM +0300, Serge Semin wrote: > > 1) Older systems generally had something like an ISA bus which used > > addresses below the kernel, and bootloaders like YAMON left behind > > functions that could be called right at the start of RAM. This sort > > of thing should be accounted for by /memreserve/ in DT or similar > > platform-specific reservations though rather than generically, and > > at least Malta & SEAD-3 DTs already have /memreserve/ entries for > > it. So this part I think is OK. Some other older platforms might > > need updating, but that's fine. > > > > Regarding ISA. As far as I remember devices on that bus can DMA only to the > lowest 16MB. So in case if kernel is too big or placed pretty much high, > they may be left even without reachable memory at all in current > implementation. Sure - I'm not too worried about these old buses, platforms can continue to reserve the memory through DT or otherwise if they need to. > > 2) trap_init() only allocates memory for the exception vector if using > > a vectored interrupt mode. In other cases it just uses CAC_BASE > > which currently gets reserved as part of this region between > > PHYS_OFFSET & _text. > > > > I think this behavior is bogus, and we should instead: > > > > - Allocate the exception vector memory using memblock_alloc() for > > CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable > > EBase register). If we're not using vectored interrupts then > > allocating one page will do, and we already have the size > > calculation for if we are. > > > > - Otherwise use CAC_BASE but call memblock_reserve() on the first > > page. > > > > I think we should make that change before this one goes in. I can > > try to get to it tomorrow, but feel free to beat me to it. > > > > As far as I understood you and the code this should be enough to fix > the problem: > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > index 98ca55d62201..f680253e2617 100644 > --- a/arch/mips/kernel/traps.c > +++ b/arch/mips/kernel/traps.c > @@ -2326,6 +2326,8 @@ void __init trap_init(void) > ebase += (read_c0_ebase() & 0x3ffff000); > } > } > + > + memblock_reserve(ebase, PAGE_SIZE); > } > > if (cpu_has_mmips) { > --- > > Allocation has already been implemented in the if-branch under the > (cpu_has_veic || cpu_has_vint) condition. So we don't need to change > there anything. > In case if vectored interrupts aren't supported the else-clause is > taken and we need to reserve whatever is set in the exception base > address variable. > > I'll add this patch between 3d and 4th ones if you are ok with it. I think that would work, but I have other motivations to allocate the memory in non-vectored cases anyway. I just sent a series that does that & cleans up a little [1]. If you could take a look that would be great. With that change made I think this patch will be good to apply. Thanks, Paul [1] https://lore.kernel.org/linux-mips/20190430225216.7164-1-paul.burton@mips.com/T/#t
On Tue, Apr 30, 2019 at 10:58:33PM +0000, Paul Burton wrote: Hello Paul > Hi Serge, > > On Fri, Apr 26, 2019 at 03:00:36AM +0300, Serge Semin wrote: > > > 1) Older systems generally had something like an ISA bus which used > > > addresses below the kernel, and bootloaders like YAMON left behind > > > functions that could be called right at the start of RAM. This sort > > > of thing should be accounted for by /memreserve/ in DT or similar > > > platform-specific reservations though rather than generically, and > > > at least Malta & SEAD-3 DTs already have /memreserve/ entries for > > > it. So this part I think is OK. Some other older platforms might > > > need updating, but that's fine. > > > > > > > Regarding ISA. As far as I remember devices on that bus can DMA only to the > > lowest 16MB. So in case if kernel is too big or placed pretty much high, > > they may be left even without reachable memory at all in current > > implementation. > > Sure - I'm not too worried about these old buses, platforms can continue > to reserve the memory through DT or otherwise if they need to. > > > > 2) trap_init() only allocates memory for the exception vector if using > > > a vectored interrupt mode. In other cases it just uses CAC_BASE > > > which currently gets reserved as part of this region between > > > PHYS_OFFSET & _text. > > > > > > I think this behavior is bogus, and we should instead: > > > > > > - Allocate the exception vector memory using memblock_alloc() for > > > CPUs implementing MIPSr2 or higher (ie. CPUs with a programmable > > > EBase register). If we're not using vectored interrupts then > > > allocating one page will do, and we already have the size > > > calculation for if we are. > > > > > > - Otherwise use CAC_BASE but call memblock_reserve() on the first > > > page. > > > > > > I think we should make that change before this one goes in. I can > > > try to get to it tomorrow, but feel free to beat me to it. > > > > > > > As far as I understood you and the code this should be enough to fix > > the problem: > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > index 98ca55d62201..f680253e2617 100644 > > --- a/arch/mips/kernel/traps.c > > +++ b/arch/mips/kernel/traps.c > > @@ -2326,6 +2326,8 @@ void __init trap_init(void) > > ebase += (read_c0_ebase() & 0x3ffff000); > > } > > } > > + > > + memblock_reserve(ebase, PAGE_SIZE); > > } > > > > if (cpu_has_mmips) { > > --- > > > > Allocation has already been implemented in the if-branch under the > > (cpu_has_veic || cpu_has_vint) condition. So we don't need to change > > there anything. > > In case if vectored interrupts aren't supported the else-clause is > > taken and we need to reserve whatever is set in the exception base > > address variable. > > > > I'll add this patch between 3d and 4th ones if you are ok with it. > > I think that would work, but I have other motivations to allocate the > memory in non-vectored cases anyway. I just sent a series that does that > & cleans up a little [1]. If you could take a look that would be great. > With that change made I think this patch will be good to apply. > Just reviewed and tested your series on my machine. I tagged the whole series in a response to the cover-letter of [1]. Could you please proceed with this patchset review procedure? There are also eight more patches left without your tag or comment. This patch is also left with no explicit tag. BTW I see you already applied patches 1-3 to the mips-next, so what shall I do when sending a v2 patchset with fixes asked to be provided for patch 12 and possibly for others in future? Shall I just resend the series without that applied patches or send them over with your acked-by tagges? -Sergey > Thanks, > Paul > > [1] https://lore.kernel.org/linux-mips/20190430225216.7164-1-paul.burton@mips.com/T/#t
Hello, Serge Semin wrote: > The reserved_end variable had been used by the bootmem_init() code > to find a lowest limit of memory available for memmap blob. The original > code just tried to find a free memory space higher than kernel was placed. > This limitation seems justified for the memmap ragion search process, but > I can't see any obvious reason to reserve the unused space below kernel > seeing some platforms place it much higher than standard 1MB. Moreover > the RELOCATION config enables it to be loaded at any memory address. > So lets reserve the memory occupied by the kernel only, leaving the region > below being free for allocations. After doing this we can now discard the > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > since it's going to be free anyway (unless marked as reserved by > platforms). > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Applied to mips-next. Thanks, Paul [ This message was auto-generated; if you believe anything is incorrect then please email paul.burton@mips.com to report it. ]
Hi Serge, On Thu, May 02, 2019 at 05:24:37PM +0300, Serge Semin wrote: > Just reviewed and tested your series on my machine. I tagged the whole series > in a response to the cover-letter of [1]. Thanks for the review & testing; that series is now in mips-next. > Could you please proceed with this patchset review procedure? There are > also eight more patches left without your tag or comment. This patch > is also left with no explicit tag. > > BTW I see you already applied patches 1-3 to the mips-next, so what shall I > do when sending a v2 patchset with fixes asked to be provided for patch 12 > and possibly for others in future? Shall I just resend the series without that > applied patches or send them over with your acked-by tagges? I've so far applied patches 1-7 of your series to mips-next, and stopped at patch 8 which has a comment to address. My preference would be if you could send a v2 which just contains the remaining patches (ie. patches 8-12 become patches 1-5), ideally atop the mips-next branch. The series looks good to me once the review comments are addressed, but no need to add an Acked-by - it'll be implicit when I apply them to mips-next. Thanks, Paul
Hello Paul On Thu, May 02, 2019 at 06:45:39PM +0000, Paul Burton wrote: > Hi Serge, > > On Thu, May 02, 2019 at 05:24:37PM +0300, Serge Semin wrote: > > Just reviewed and tested your series on my machine. I tagged the whole series > > in a response to the cover-letter of [1]. > > Thanks for the review & testing; that series is now in mips-next. > > > Could you please proceed with this patchset review procedure? There are > > also eight more patches left without your tag or comment. This patch > > is also left with no explicit tag. > > > > BTW I see you already applied patches 1-3 to the mips-next, so what shall I > > do when sending a v2 patchset with fixes asked to be provided for patch 12 > > and possibly for others in future? Shall I just resend the series without that > > applied patches or send them over with your acked-by tagges? > > I've so far applied patches 1-7 of your series to mips-next, and stopped > at patch 8 which has a comment to address. > > My preference would be if you could send a v2 which just contains the > remaining patches (ie. patches 8-12 become patches 1-5), ideally atop > the mips-next branch. > > The series looks good to me once the review comments are addressed, but > no need to add an Acked-by - it'll be implicit when I apply them to > mips-next. > Agreed. I'll do it shortly. -Sergey > Thanks, > Paul
Hi Serge, On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > The reserved_end variable had been used by the bootmem_init() code > to find a lowest limit of memory available for memmap blob. The original > code just tried to find a free memory space higher than kernel was placed. > This limitation seems justified for the memmap ragion search process, but > I can't see any obvious reason to reserve the unused space below kernel > seeing some platforms place it much higher than standard 1MB. Moreover > the RELOCATION config enables it to be loaded at any memory address. > So lets reserve the memory occupied by the kernel only, leaving the region > below being free for allocations. After doing this we can now discard the > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > since it's going to be free anyway (unless marked as reserved by > platforms). > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: VFS: Mounted root (nfs filesystem) on device 0:13. devtmpfs: mounted BUG: Bad page state in process swapper pfn:00001 page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 flags: 0x0() raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c 00000001 804a3490 00000001 81000000 0030f231 80148558 00000003 10008400 87c1dd80 7599ee13 00000000 00000000 804b0000 00000000 00000007 00000000 00000085 00000000 62722d31 00000084 804b0000 39347874 00000000 804b7820 8040cef8 81000010 00000001 00000007 00000001 81000000 00000008 8021de24 00000000 804a0000 ... Call Trace: [<8010adec>] show_stack+0x74/0x104 [<801a5e44>] bad_page+0x130/0x138 [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 [<801a789c>] free_unref_page+0x40/0x68 [<801120f4>] free_init_pages+0xec/0x104 [<803bdde8>] free_initmem+0x10/0x58 [<803bdb8c>] kernel_init+0x20/0x100 [<801057c8>] ret_from_kernel_thread+0x14/0x1c Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper pfn:00002 [...] CONFIG_RELOCATABLE is not set, so the only relevant part is the change quoted below. > --- a/arch/mips/kernel/setup.c > +++ b/arch/mips/kernel/setup.c > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > static void __init bootmem_init(void) > { > - unsigned long reserved_end; > phys_addr_t ramstart = PHYS_ADDR_MAX; > int i; > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > * will reserve the area used for the initrd. > */ > init_initrd(); > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > - memblock_reserve(PHYS_OFFSET, > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > + /* Reserve memory occupied by kernel. */ > + memblock_reserve(__pa_symbol(&_text), > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > /* > * max_low_pfn is not a number of pages. The number of pages With some debug code added: Determined physical RAM map: memory: 08000000 @ 00000000 (usable) bootmem_init:390: PHYS_OFFSET = 0x0 bootmem_init:391: __pa_symbol(&_text) = 0x100000 bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8 bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8 Hence the old code reserved 1 MiB extra at the beginning. Note that the new code also dropped the rounding up of the memory block size to a multiple of PAGE_SIZE. I'm not sure the latter actually matters or not. Do you have a clue? Thanks! Gr{oetje,eeting}s, Geert
Hi Geert, On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > Hi Serge, > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > The reserved_end variable had been used by the bootmem_init() code > > to find a lowest limit of memory available for memmap blob. The original > > code just tried to find a free memory space higher than kernel was placed. > > This limitation seems justified for the memmap ragion search process, but > > I can't see any obvious reason to reserve the unused space below kernel > > seeing some platforms place it much higher than standard 1MB. Moreover > > the RELOCATION config enables it to be loaded at any memory address. > > So lets reserve the memory occupied by the kernel only, leaving the region > > below being free for allocations. After doing this we can now discard the > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > since it's going to be free anyway (unless marked as reserved by > > platforms). > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > VFS: Mounted root (nfs filesystem) on device 0:13. > devtmpfs: mounted > BUG: Bad page state in process swapper pfn:00001 > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > flags: 0x0() > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > page dumped because: nonzero mapcount > Modules linked in: > CPU: 0 PID: 1 Comm: swapper Not tainted > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > 00000001 804a3490 > 00000001 81000000 0030f231 80148558 00000003 10008400 > 87c1dd80 7599ee13 > 00000000 00000000 804b0000 00000000 00000007 00000000 > 00000085 00000000 > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > 8040cef8 81000010 > 00000001 00000007 00000001 81000000 00000008 8021de24 > 00000000 804a0000 > ... > Call Trace: > [<8010adec>] show_stack+0x74/0x104 > [<801a5e44>] bad_page+0x130/0x138 > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > [<801a789c>] free_unref_page+0x40/0x68 > [<801120f4>] free_init_pages+0xec/0x104 > [<803bdde8>] free_initmem+0x10/0x58 > [<803bdb8c>] kernel_init+0x20/0x100 > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > Disabling lock debugging due to kernel taint > BUG: Bad page state in process swapper pfn:00002 > [...] > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > change quoted below. > > > --- a/arch/mips/kernel/setup.c > > +++ b/arch/mips/kernel/setup.c > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > static void __init bootmem_init(void) > > { > > - unsigned long reserved_end; > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > int i; > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > * will reserve the area used for the initrd. > > */ > > init_initrd(); > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > - memblock_reserve(PHYS_OFFSET, > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > + /* Reserve memory occupied by kernel. */ > > + memblock_reserve(__pa_symbol(&_text), > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > /* > > * max_low_pfn is not a number of pages. The number of pages > > With some debug code added: > > Determined physical RAM map: > memory: 08000000 @ 00000000 (usable) > bootmem_init:390: PHYS_OFFSET = 0x0 > bootmem_init:391: __pa_symbol(&_text) = 0x100000 > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8 > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8 Have you tried adding memblock=debug to the command line? Not sure it'll help, but still :) > Hence the old code reserved 1 MiB extra at the beginning. > > Note that the new code also dropped the rounding up of the memory block > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > matters or not. I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. > Do you have a clue? Thanks! > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds >
Hello Geert, Mike On Tue, May 21, 2019 at 06:53:10PM +0300, Mike Rapoport wrote: > Hi Geert, > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > Hi Serge, > > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > The reserved_end variable had been used by the bootmem_init() code > > > to find a lowest limit of memory available for memmap blob. The original > > > code just tried to find a free memory space higher than kernel was placed. > > > This limitation seems justified for the memmap ragion search process, but > > > I can't see any obvious reason to reserve the unused space below kernel > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > the RELOCATION config enables it to be loaded at any memory address. > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > below being free for allocations. After doing this we can now discard the > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > since it's going to be free anyway (unless marked as reserved by > > > platforms). > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > devtmpfs: mounted > > BUG: Bad page state in process swapper pfn:00001 > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > flags: 0x0() > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > page dumped because: nonzero mapcount > > Modules linked in: > > CPU: 0 PID: 1 Comm: swapper Not tainted > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > 00000001 804a3490 > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > 87c1dd80 7599ee13 > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > 00000085 00000000 > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > 8040cef8 81000010 > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > 00000000 804a0000 > > ... > > Call Trace: > > [<8010adec>] show_stack+0x74/0x104 > > [<801a5e44>] bad_page+0x130/0x138 > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > [<801a789c>] free_unref_page+0x40/0x68 > > [<801120f4>] free_init_pages+0xec/0x104 > > [<803bdde8>] free_initmem+0x10/0x58 > > [<803bdb8c>] kernel_init+0x20/0x100 > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > Disabling lock debugging due to kernel taint > > BUG: Bad page state in process swapper pfn:00002 > > [...] > > The root cause of the problem most likely is in prom_free_prom_memory() method of arch/mips/txx9/generic/setup.c: void __init prom_free_prom_memory(void) { unsigned long saddr = PAGE_SIZE; unsigned long eaddr = __pa_symbol(&_text); if (saddr < eaddr) free_init_pages("prom memory", saddr, eaddr); } As you can see the txx9 platform tries to free a memory which isn't reserved and set free from the very beginning due to the patch above. So as soon as you remove the free_init_pages("prom memory", ...) the problem shall be fixed. Could you try it and send a result to us whether it helped? Regards, -Sergey > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > > change quoted below. > > > > > --- a/arch/mips/kernel/setup.c > > > +++ b/arch/mips/kernel/setup.c > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > > > static void __init bootmem_init(void) > > > { > > > - unsigned long reserved_end; > > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > > int i; > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > > * will reserve the area used for the initrd. > > > */ > > > init_initrd(); > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > > > - memblock_reserve(PHYS_OFFSET, > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > > + /* Reserve memory occupied by kernel. */ > > > + memblock_reserve(__pa_symbol(&_text), > > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > > > /* > > > * max_low_pfn is not a number of pages. The number of pages > > > > With some debug code added: > > > > Determined physical RAM map: > > memory: 08000000 @ 00000000 (usable) > > bootmem_init:390: PHYS_OFFSET = 0x0 > > bootmem_init:391: __pa_symbol(&_text) = 0x100000 > > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8 > > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8 > > Have you tried adding memblock=debug to the command line? > Not sure it'll help, but still :) > > > Hence the old code reserved 1 MiB extra at the beginning. > > > > Note that the new code also dropped the rounding up of the memory block > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > > matters or not. > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. > > > Do you have a clue? Thanks! > > > > Gr{oetje,eeting}s, > > > > Geert > > > > -- > > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > > > In personal conversations with technical people, I call myself a hacker. But > > when I'm talking to journalists I just say "programmer" or something like that. > > -- Linus Torvalds > > > > -- > Sincerely yours, > Mike. >
Hi Mike, On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > The reserved_end variable had been used by the bootmem_init() code > > > to find a lowest limit of memory available for memmap blob. The original > > > code just tried to find a free memory space higher than kernel was placed. > > > This limitation seems justified for the memmap ragion search process, but > > > I can't see any obvious reason to reserve the unused space below kernel > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > the RELOCATION config enables it to be loaded at any memory address. > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > below being free for allocations. After doing this we can now discard the > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > since it's going to be free anyway (unless marked as reserved by > > > platforms). > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > devtmpfs: mounted > > BUG: Bad page state in process swapper pfn:00001 > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > flags: 0x0() > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > page dumped because: nonzero mapcount > > Modules linked in: > > CPU: 0 PID: 1 Comm: swapper Not tainted > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > 00000001 804a3490 > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > 87c1dd80 7599ee13 > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > 00000085 00000000 > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > 8040cef8 81000010 > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > 00000000 804a0000 > > ... > > Call Trace: > > [<8010adec>] show_stack+0x74/0x104 > > [<801a5e44>] bad_page+0x130/0x138 > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > [<801a789c>] free_unref_page+0x40/0x68 > > [<801120f4>] free_init_pages+0xec/0x104 > > [<803bdde8>] free_initmem+0x10/0x58 > > [<803bdb8c>] kernel_init+0x20/0x100 > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > Disabling lock debugging due to kernel taint > > BUG: Bad page state in process swapper pfn:00002 > > [...] > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > > change quoted below. > > > > > --- a/arch/mips/kernel/setup.c > > > +++ b/arch/mips/kernel/setup.c > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > > > static void __init bootmem_init(void) > > > { > > > - unsigned long reserved_end; > > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > > int i; > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > > * will reserve the area used for the initrd. > > > */ > > > init_initrd(); > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > > > - memblock_reserve(PHYS_OFFSET, > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > > + /* Reserve memory occupied by kernel. */ > > > + memblock_reserve(__pa_symbol(&_text), > > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > > > /* > > > * max_low_pfn is not a number of pages. The number of pages > > > > With some debug code added: > > > > Determined physical RAM map: > > memory: 08000000 @ 00000000 (usable) > > bootmem_init:390: PHYS_OFFSET = 0x0 > > bootmem_init:391: __pa_symbol(&_text) = 0x100000 > > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8 > > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8 > > Have you tried adding memblock=debug to the command line? > Not sure it'll help, but still :) Thanks! Output below... Determined physical RAM map: memory: 08000000 @ 00000000 (usable) +memblock_reserve: [0x00100000-0x004b77c7] setup_arch+0x258/0x8e4 Initrd not found or empty - disabling initrd +memblock_reserve: [0x00448000-0x00447fff] setup_arch+0x5ac/0x8e4 +MEMBLOCK configuration: + memory size = 0x08000000 reserved size = 0x003b77c8 + memory.cnt = 0x1 + memory[0x0] [0x00000000-0x07ffffff], 0x08000000 bytes on node 0 flags: 0x0 + reserved.cnt = 0x1 + reserved[0x0] [0x00100000-0x004b77c7], 0x003b77c8 bytes flags: 0x0 +memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x7ec/0x8e4 +memblock_reserve: [0x004b77e0-0x004b77ff] memblock_alloc_range_nid+0x130/0x178 Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes. Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes Zone ranges: @@ -16,10 +26,48 @@ Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000000000000-0x0000000007ffffff] Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff] +memblock_alloc_try_nid: 1048576 bytes align=0x20 nid=0 from=0x00000000 max_addr=0x00000000 alloc_node_mem_map.constprop.31+0x6c/0xc8 +memblock_reserve: [0x004b7800-0x005b77ff] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 16 bytes align=0x20 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap.isra.13+0x68/0xa0 +memblock_reserve: [0x005b7800-0x005b780f] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0xb0/0x508 +memblock_reserve: [0x005b7820-0x005b7893] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0xf0/0x508 +memblock_reserve: [0x005b78a0-0x005b7913] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x114/0x508 +memblock_reserve: [0x005b7920-0x005b7993] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_alloc_info+0x60/0xb8 +memblock_reserve: [0x005b8000-0x005b8fff] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 32768 bytes align=0x1000 nid=-1 from=0x01000000 max_addr=0x00000000 setup_per_cpu_areas+0x38/0xa8 +memblock_reserve: [0x01000000-0x01007fff] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x200/0x588 +memblock_reserve: [0x005b79a0-0x005b79a3] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x22c/0x588 +memblock_reserve: [0x005b79c0-0x005b79c3] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x250/0x588 +memblock_reserve: [0x005b79e0-0x005b79e3] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x288/0x588 +memblock_reserve: [0x005b7a00-0x005b7a03] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 120 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x47c/0x588 +memblock_reserve: [0x005b7a20-0x005b7a97] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 89 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x88/0x2e0 +memblock_reserve: [0x005b7aa0-0x005b7af8] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 1024 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xd8/0x2e0 +memblock_reserve: [0x005b7b00-0x005b7eff] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 1028 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x118/0x2e0 +memblock_reserve: [0x005b9000-0x005b9403] memblock_alloc_range_nid+0x130/0x178 +memblock_alloc_try_nid: 256 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x154/0x2e0 +memblock_reserve: [0x005b7f00-0x005b7fff] memblock_alloc_range_nid+0x130/0x178 + memblock_free: [0x005b8000-0x005b8fff] start_kernel+0x164/0x508 Built 1 zonelists, mobility grouping on. Total pages: 32512 -Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3 +Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3 memblock=debug +memblock_alloc_try_nid: 65536 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x270/0x478 +memblock_reserve: [0x005b9420-0x005c941f] memblock_alloc_range_nid+0x130/0x178 Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) +memblock_alloc_try_nid: 32768 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x270/0x478 +memblock_reserve: [0x005c9420-0x005d141f] memblock_alloc_range_nid+0x130/0x178 Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) +memblock_reserve: [0x00000000-0x000003ff] trap_init+0x58/0x474 Memory: 126100K/131072K available (2830K kernel code, 147K rwdata, 508K rodata, 220K init, 93K bss, 4972K reserved, 0K cma-reserved) > > Hence the old code reserved 1 MiB extra at the beginning. > > > > Note that the new code also dropped the rounding up of the memory block > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > > matters or not. > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. Yes, by prom_free_prom_memory(), as pointed out by Serge. Gr{oetje,eeting}s, Geert
Hi Serge, On Tue, May 21, 2019 at 6:39 PM Serge Semin <fancer.lancer@gmail.com> wrote: > On Tue, May 21, 2019 at 06:53:10PM +0300, Mike Rapoport wrote: > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > > The reserved_end variable had been used by the bootmem_init() code > > > > to find a lowest limit of memory available for memmap blob. The original > > > > code just tried to find a free memory space higher than kernel was placed. > > > > This limitation seems justified for the memmap ragion search process, but > > > > I can't see any obvious reason to reserve the unused space below kernel > > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > > the RELOCATION config enables it to be loaded at any memory address. > > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > > below being free for allocations. After doing this we can now discard the > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > > since it's going to be free anyway (unless marked as reserved by > > > > platforms). > > > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > > devtmpfs: mounted > > > BUG: Bad page state in process swapper pfn:00001 > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > > flags: 0x0() > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > > page dumped because: nonzero mapcount > > > Modules linked in: > > > CPU: 0 PID: 1 Comm: swapper Not tainted > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > > 00000001 804a3490 > > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > > 87c1dd80 7599ee13 > > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > > 00000085 00000000 > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > > 8040cef8 81000010 > > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > > 00000000 804a0000 > > > ... > > > Call Trace: > > > [<8010adec>] show_stack+0x74/0x104 > > > [<801a5e44>] bad_page+0x130/0x138 > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > > [<801a789c>] free_unref_page+0x40/0x68 > > > [<801120f4>] free_init_pages+0xec/0x104 > > > [<803bdde8>] free_initmem+0x10/0x58 > > > [<803bdb8c>] kernel_init+0x20/0x100 > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > > Disabling lock debugging due to kernel taint > > > BUG: Bad page state in process swapper pfn:00002 > > > [...] > > > > > The root cause of the problem most likely is in prom_free_prom_memory() method of > arch/mips/txx9/generic/setup.c: > void __init prom_free_prom_memory(void) > { > unsigned long saddr = PAGE_SIZE; > unsigned long eaddr = __pa_symbol(&_text); > > if (saddr < eaddr) > free_init_pages("prom memory", saddr, eaddr); > } > > As you can see the txx9 platform tries to free a memory which isn't reserved > and set free from the very beginning due to the patch above. So as soon as you > remove the free_init_pages("prom memory", ...) the problem shall be fixed. > Could you try it and send a result to us whether it helped? Thanks, that does the trick! Will send a patch shortly... Gr{oetje,eeting}s, Geert
On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote: > Hi Mike, > > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > > The reserved_end variable had been used by the bootmem_init() code > > > > to find a lowest limit of memory available for memmap blob. The original > > > > code just tried to find a free memory space higher than kernel was placed. > > > > This limitation seems justified for the memmap ragion search process, but > > > > I can't see any obvious reason to reserve the unused space below kernel > > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > > the RELOCATION config enables it to be loaded at any memory address. > > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > > below being free for allocations. After doing this we can now discard the > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > > since it's going to be free anyway (unless marked as reserved by > > > > platforms). > > > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > > devtmpfs: mounted > > > BUG: Bad page state in process swapper pfn:00001 > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > > flags: 0x0() > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > > page dumped because: nonzero mapcount > > > Modules linked in: > > > CPU: 0 PID: 1 Comm: swapper Not tainted > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > > 00000001 804a3490 > > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > > 87c1dd80 7599ee13 > > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > > 00000085 00000000 > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > > 8040cef8 81000010 > > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > > 00000000 804a0000 > > > ... > > > Call Trace: > > > [<8010adec>] show_stack+0x74/0x104 > > > [<801a5e44>] bad_page+0x130/0x138 > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > > [<801a789c>] free_unref_page+0x40/0x68 > > > [<801120f4>] free_init_pages+0xec/0x104 > > > [<803bdde8>] free_initmem+0x10/0x58 > > > [<803bdb8c>] kernel_init+0x20/0x100 > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > > Disabling lock debugging due to kernel taint > > > BUG: Bad page state in process swapper pfn:00002 > > > [...] > > > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > > > change quoted below. > > > > > > > --- a/arch/mips/kernel/setup.c > > > > +++ b/arch/mips/kernel/setup.c > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > > > > > static void __init bootmem_init(void) > > > > { > > > > - unsigned long reserved_end; > > > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > > > int i; > > > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > > > * will reserve the area used for the initrd. > > > > */ > > > > init_initrd(); > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > > > > > - memblock_reserve(PHYS_OFFSET, > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > > > + /* Reserve memory occupied by kernel. */ > > > > + memblock_reserve(__pa_symbol(&_text), > > > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > > > > > /* > > > > * max_low_pfn is not a number of pages. The number of pages > > > > > > With some debug code added: > > > > > > Determined physical RAM map: > > > memory: 08000000 @ 00000000 (usable) > > > bootmem_init:390: PHYS_OFFSET = 0x0 > > > bootmem_init:391: __pa_symbol(&_text) = 0x100000 > > > bootmem_init:392: __pa_symbol(&_end) = 0x4b77c8 > > > bootmem_init:393: PFN_UP(__pa_symbol(&_end)) = 0x4b8 > > > > Have you tried adding memblock=debug to the command line? > > Not sure it'll help, but still :) > > Thanks! Output below... > > Determined physical RAM map: > memory: 08000000 @ 00000000 (usable) > +memblock_reserve: [0x00100000-0x004b77c7] setup_arch+0x258/0x8e4 > Initrd not found or empty - disabling initrd > +memblock_reserve: [0x00448000-0x00447fff] setup_arch+0x5ac/0x8e4 > +MEMBLOCK configuration: > + memory size = 0x08000000 reserved size = 0x003b77c8 > + memory.cnt = 0x1 > + memory[0x0] [0x00000000-0x07ffffff], 0x08000000 bytes on node 0 flags: 0x0 > + reserved.cnt = 0x1 > + reserved[0x0] [0x00100000-0x004b77c7], 0x003b77c8 bytes flags: 0x0 > +memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 setup_arch+0x7ec/0x8e4 > +memblock_reserve: [0x004b77e0-0x004b77ff] memblock_alloc_range_nid+0x130/0x178 > Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes. > Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes > Zone ranges: > @@ -16,10 +26,48 @@ Movable zone start for each node > Early memory node ranges > node 0: [mem 0x0000000000000000-0x0000000007ffffff] > Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff] > +memblock_alloc_try_nid: 1048576 bytes align=0x20 nid=0 > from=0x00000000 max_addr=0x00000000 > alloc_node_mem_map.constprop.31+0x6c/0xc8 > +memblock_reserve: [0x004b7800-0x005b77ff] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 16 bytes align=0x20 nid=0 from=0x00000000 > max_addr=0x00000000 setup_usemap.isra.13+0x68/0xa0 > +memblock_reserve: [0x005b7800-0x005b780f] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 start_kernel+0xb0/0x508 > +memblock_reserve: [0x005b7820-0x005b7893] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 start_kernel+0xf0/0x508 > +memblock_reserve: [0x005b78a0-0x005b7913] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 116 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 start_kernel+0x114/0x508 > +memblock_reserve: [0x005b7920-0x005b7993] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_alloc_alloc_info+0x60/0xb8 > +memblock_reserve: [0x005b8000-0x005b8fff] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 32768 bytes align=0x1000 nid=-1 > from=0x01000000 max_addr=0x00000000 setup_per_cpu_areas+0x38/0xa8 > +memblock_reserve: [0x01000000-0x01007fff] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_setup_first_chunk+0x200/0x588 > +memblock_reserve: [0x005b79a0-0x005b79a3] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_setup_first_chunk+0x22c/0x588 > +memblock_reserve: [0x005b79c0-0x005b79c3] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_setup_first_chunk+0x250/0x588 > +memblock_reserve: [0x005b79e0-0x005b79e3] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_setup_first_chunk+0x288/0x588 > +memblock_reserve: [0x005b7a00-0x005b7a03] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 120 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_setup_first_chunk+0x47c/0x588 > +memblock_reserve: [0x005b7a20-0x005b7a97] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 89 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_alloc_first_chunk+0x88/0x2e0 > +memblock_reserve: [0x005b7aa0-0x005b7af8] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 1024 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_alloc_first_chunk+0xd8/0x2e0 > +memblock_reserve: [0x005b7b00-0x005b7eff] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 1028 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_alloc_first_chunk+0x118/0x2e0 > +memblock_reserve: [0x005b9000-0x005b9403] memblock_alloc_range_nid+0x130/0x178 > +memblock_alloc_try_nid: 256 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 pcpu_alloc_first_chunk+0x154/0x2e0 > +memblock_reserve: [0x005b7f00-0x005b7fff] memblock_alloc_range_nid+0x130/0x178 > + memblock_free: [0x005b8000-0x005b8fff] start_kernel+0x164/0x508 > Built 1 zonelists, mobility grouping on. Total pages: 32512 > -Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw > nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3 > +Kernel command line: console=ttyS0,9600 ip=on root=/dev/nfs rw > nfsroot=192.168.97.29:/nas/rbtx4927/debian-mipsel,tcp,v3 > memblock=debug > +memblock_alloc_try_nid: 65536 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 alloc_large_system_hash+0x270/0x478 > +memblock_reserve: [0x005b9420-0x005c941f] memblock_alloc_range_nid+0x130/0x178 > Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) > +memblock_alloc_try_nid: 32768 bytes align=0x20 nid=-1 from=0x00000000 > max_addr=0x00000000 alloc_large_system_hash+0x270/0x478 > +memblock_reserve: [0x005c9420-0x005d141f] memblock_alloc_range_nid+0x130/0x178 > Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) > +memblock_reserve: [0x00000000-0x000003ff] trap_init+0x58/0x474 > Memory: 126100K/131072K available (2830K kernel code, 147K rwdata, > 508K rodata, 220K init, 93K bss, 4972K reserved, 0K cma-reserved) Presuming your system is !cpu_has_mips_r2_r6 and CAC_BASE is 0 the log looks completely sane > > > Hence the old code reserved 1 MiB extra at the beginning. > > > > > > Note that the new code also dropped the rounding up of the memory block > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > > > matters or not. > > > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. > > Yes, by prom_free_prom_memory(), as pointed out by Serge. I wonder how other MIPS variants would react to the fact that the memory below the kernel is not reserved ;-) > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds >
Hi Mike, On Wed, May 22, 2019 at 10:08 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote: > > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > > > The reserved_end variable had been used by the bootmem_init() code > > > > > to find a lowest limit of memory available for memmap blob. The original > > > > > code just tried to find a free memory space higher than kernel was placed. > > > > > This limitation seems justified for the memmap ragion search process, but > > > > > I can't see any obvious reason to reserve the unused space below kernel > > > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > > > the RELOCATION config enables it to be loaded at any memory address. > > > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > > > below being free for allocations. After doing this we can now discard the > > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > > > since it's going to be free anyway (unless marked as reserved by > > > > > platforms). > > > > > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > > > devtmpfs: mounted > > > > BUG: Bad page state in process swapper pfn:00001 > > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > > > flags: 0x0() > > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > > > page dumped because: nonzero mapcount > > > > Modules linked in: > > > > CPU: 0 PID: 1 Comm: swapper Not tainted > > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > > > 00000001 804a3490 > > > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > > > 87c1dd80 7599ee13 > > > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > > > 00000085 00000000 > > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > > > 8040cef8 81000010 > > > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > > > 00000000 804a0000 > > > > ... > > > > Call Trace: > > > > [<8010adec>] show_stack+0x74/0x104 > > > > [<801a5e44>] bad_page+0x130/0x138 > > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > > > [<801a789c>] free_unref_page+0x40/0x68 > > > > [<801120f4>] free_init_pages+0xec/0x104 > > > > [<803bdde8>] free_initmem+0x10/0x58 > > > > [<803bdb8c>] kernel_init+0x20/0x100 > > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > > > Disabling lock debugging due to kernel taint > > > > BUG: Bad page state in process swapper pfn:00002 > > > > [...] > > > > > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > > > > change quoted below. > > > > > > > > > --- a/arch/mips/kernel/setup.c > > > > > +++ b/arch/mips/kernel/setup.c > > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > > > > > > > static void __init bootmem_init(void) > > > > > { > > > > > - unsigned long reserved_end; > > > > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > > > > int i; > > > > > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > > > > * will reserve the area used for the initrd. > > > > > */ > > > > > init_initrd(); > > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > > > > > > > - memblock_reserve(PHYS_OFFSET, > > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > > > > + /* Reserve memory occupied by kernel. */ > > > > > + memblock_reserve(__pa_symbol(&_text), > > > > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > > > > > > > /* > > > > > * max_low_pfn is not a number of pages. The number of pages > > > > Hence the old code reserved 1 MiB extra at the beginning. > > > > > > > > Note that the new code also dropped the rounding up of the memory block > > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > > > > matters or not. > > > > > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. > > > > Yes, by prom_free_prom_memory(), as pointed out by Serge. > > I wonder how other MIPS variants would react to the fact that the memory > below the kernel is not reserved ;-) Exactly... Looks like at least arch/mips/dec/prom/memory.c needs a similar but\ more complicated fix, due to declance handling... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hello folks, On Wed, May 22, 2019 at 10:14:47AM +0200, Geert Uytterhoeven wrote: > Hi Mike, > > On Wed, May 22, 2019 at 10:08 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > > On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote: > > > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > > > > The reserved_end variable had been used by the bootmem_init() code > > > > > > to find a lowest limit of memory available for memmap blob. The original > > > > > > code just tried to find a free memory space higher than kernel was placed. > > > > > > This limitation seems justified for the memmap ragion search process, but > > > > > > I can't see any obvious reason to reserve the unused space below kernel > > > > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > > > > the RELOCATION config enables it to be loaded at any memory address. > > > > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > > > > below being free for allocations. After doing this we can now discard the > > > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > > > > since it's going to be free anyway (unless marked as reserved by > > > > > > platforms). > > > > > > > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > > > > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > > > > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > > > > devtmpfs: mounted > > > > > BUG: Bad page state in process swapper pfn:00001 > > > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > > > > flags: 0x0() > > > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > > > > page dumped because: nonzero mapcount > > > > > Modules linked in: > > > > > CPU: 0 PID: 1 Comm: swapper Not tainted > > > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > > > > 00000001 804a3490 > > > > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > > > > 87c1dd80 7599ee13 > > > > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > > > > 00000085 00000000 > > > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > > > > 8040cef8 81000010 > > > > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > > > > 00000000 804a0000 > > > > > ... > > > > > Call Trace: > > > > > [<8010adec>] show_stack+0x74/0x104 > > > > > [<801a5e44>] bad_page+0x130/0x138 > > > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > > > > [<801a789c>] free_unref_page+0x40/0x68 > > > > > [<801120f4>] free_init_pages+0xec/0x104 > > > > > [<803bdde8>] free_initmem+0x10/0x58 > > > > > [<803bdb8c>] kernel_init+0x20/0x100 > > > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > > > > Disabling lock debugging due to kernel taint > > > > > BUG: Bad page state in process swapper pfn:00002 > > > > > [...] > > > > > > > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > > > > > change quoted below. > > > > > > > > > > > --- a/arch/mips/kernel/setup.c > > > > > > +++ b/arch/mips/kernel/setup.c > > > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > > > > > > > > > static void __init bootmem_init(void) > > > > > > { > > > > > > - unsigned long reserved_end; > > > > > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > > > > > int i; > > > > > > > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > > > > > * will reserve the area used for the initrd. > > > > > > */ > > > > > > init_initrd(); > > > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > > > > > > > > > - memblock_reserve(PHYS_OFFSET, > > > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > > > > > + /* Reserve memory occupied by kernel. */ > > > > > > + memblock_reserve(__pa_symbol(&_text), > > > > > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > > > > > > > > > /* > > > > > > * max_low_pfn is not a number of pages. The number of pages > > > > > > Hence the old code reserved 1 MiB extra at the beginning. > > > > > > > > > > Note that the new code also dropped the rounding up of the memory block > > > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > > > > > matters or not. > > > > > > > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. > > > > > > Yes, by prom_free_prom_memory(), as pointed out by Serge. > > > > I wonder how other MIPS variants would react to the fact that the memory > > below the kernel is not reserved ;-) > > Exactly... > > Looks like at least arch/mips/dec/prom/memory.c needs a similar but\ > more complicated fix, due to declance handling... > The problem might be fixed there by the next patch: --- diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c index 5073d2ed78bb..5a0c734b5d04 100644 --- a/arch/mips/dec/prom/memory.c +++ b/arch/mips/dec/prom/memory.c @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic) pmax_setup_memory_region(); else rex_setup_memory_region(); -} - -void __init prom_free_prom_memory(void) -{ - unsigned long end; - - /* - * Free everything below the kernel itself but leave - * the first page reserved for the exception handlers. - */ #if IS_ENABLED(CONFIG_DECLANCE) /* - * Leave 128 KB reserved for Lance memory for - * IOASIC DECstations. + * Reserve 128 KB for Lance memory for IOASIC DECstations. * * XXX: save this address for use in dec_lance.c? */ if (IOASIC) - end = __pa(&_text) - 0x00020000; - else + memblock_reserve(__pa_symbol(&_text), 0x00020000); #endif - end = __pa(&_text); - - free_init_pages("unused PROM memory", PAGE_SIZE, end); } --- Didn't wanna use prom_FREE_prom_memory to actually reserve a memory chunk, so I moved the reservation into the prom_meminit() method. Regarding the first page for the exception handlers. We don't need to reserve it here, since it is already done in arch/mips/kernel/traps.c . Regards, -Sergey > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
Hi Serge, On Wed, May 22, 2019 at 3:34 PM Serge Semin <fancer.lancer@gmail.com> wrote: > On Wed, May 22, 2019 at 10:14:47AM +0200, Geert Uytterhoeven wrote: > > On Wed, May 22, 2019 at 10:08 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > On Wed, May 22, 2019 at 09:47:04AM +0200, Geert Uytterhoeven wrote: > > > > On Tue, May 21, 2019 at 5:53 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > > On Tue, May 21, 2019 at 04:56:39PM +0200, Geert Uytterhoeven wrote: > > > > > > On Wed, Apr 24, 2019 at 12:50 AM Serge Semin <fancer.lancer@gmail.com> wrote: > > > > > > > The reserved_end variable had been used by the bootmem_init() code > > > > > > > to find a lowest limit of memory available for memmap blob. The original > > > > > > > code just tried to find a free memory space higher than kernel was placed. > > > > > > > This limitation seems justified for the memmap ragion search process, but > > > > > > > I can't see any obvious reason to reserve the unused space below kernel > > > > > > > seeing some platforms place it much higher than standard 1MB. Moreover > > > > > > > the RELOCATION config enables it to be loaded at any memory address. > > > > > > > So lets reserve the memory occupied by the kernel only, leaving the region > > > > > > > below being free for allocations. After doing this we can now discard the > > > > > > > code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols > > > > > > > since it's going to be free anyway (unless marked as reserved by > > > > > > > platforms). > > > > > > > > > > > > > > Signed-off-by: Serge Semin <fancer.lancer@gmail.com> > > > > > > > > > > > > This is now commit b93ddc4f9156205e ("mips: Reserve memory for the kernel > > > > > > image resources") in v5.2-rc1, which causes rbtx4927 to crash during boot: > > > > > > > > > > > > VFS: Mounted root (nfs filesystem) on device 0:13. > > > > > > devtmpfs: mounted > > > > > > BUG: Bad page state in process swapper pfn:00001 > > > > > > page:804b7820 refcount:0 mapcount:-128 mapping:00000000 index:0x1 > > > > > > flags: 0x0() > > > > > > raw: 00000000 00000100 00000200 00000000 00000001 00000000 ffffff7f 00000000 > > > > > > page dumped because: nonzero mapcount > > > > > > Modules linked in: > > > > > > CPU: 0 PID: 1 Comm: swapper Not tainted > > > > > > 5.2.0-rc1-rbtx4927-00468-g3c05ea3d4077b756-dirty #137 > > > > > > Stack : 00000000 10008400 8040dd2c 87c1b974 8044af63 8040dd2c > > > > > > 00000001 804a3490 > > > > > > 00000001 81000000 0030f231 80148558 00000003 10008400 > > > > > > 87c1dd80 7599ee13 > > > > > > 00000000 00000000 804b0000 00000000 00000007 00000000 > > > > > > 00000085 00000000 > > > > > > 62722d31 00000084 804b0000 39347874 00000000 804b7820 > > > > > > 8040cef8 81000010 > > > > > > 00000001 00000007 00000001 81000000 00000008 8021de24 > > > > > > 00000000 804a0000 > > > > > > ... > > > > > > Call Trace: > > > > > > [<8010adec>] show_stack+0x74/0x104 > > > > > > [<801a5e44>] bad_page+0x130/0x138 > > > > > > [<801a654c>] free_pcppages_bulk+0x17c/0x3b0 > > > > > > [<801a789c>] free_unref_page+0x40/0x68 > > > > > > [<801120f4>] free_init_pages+0xec/0x104 > > > > > > [<803bdde8>] free_initmem+0x10/0x58 > > > > > > [<803bdb8c>] kernel_init+0x20/0x100 > > > > > > [<801057c8>] ret_from_kernel_thread+0x14/0x1c > > > > > > Disabling lock debugging due to kernel taint > > > > > > BUG: Bad page state in process swapper pfn:00002 > > > > > > [...] > > > > > > > > > > > > CONFIG_RELOCATABLE is not set, so the only relevant part is the > > > > > > change quoted below. > > > > > > > > > > > > > --- a/arch/mips/kernel/setup.c > > > > > > > +++ b/arch/mips/kernel/setup.c > > > > > > > @@ -371,7 +371,6 @@ static void __init bootmem_init(void) > > > > > > > > > > > > > > static void __init bootmem_init(void) > > > > > > > { > > > > > > > - unsigned long reserved_end; > > > > > > > phys_addr_t ramstart = PHYS_ADDR_MAX; > > > > > > > int i; > > > > > > > > > > > > > > @@ -382,10 +381,10 @@ static void __init bootmem_init(void) > > > > > > > * will reserve the area used for the initrd. > > > > > > > */ > > > > > > > init_initrd(); > > > > > > > - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); > > > > > > > > > > > > > > - memblock_reserve(PHYS_OFFSET, > > > > > > > - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); > > > > > > > + /* Reserve memory occupied by kernel. */ > > > > > > > + memblock_reserve(__pa_symbol(&_text), > > > > > > > + __pa_symbol(&_end) - __pa_symbol(&_text)); > > > > > > > > > > > > > > /* > > > > > > > * max_low_pfn is not a number of pages. The number of pages > > > > > > > > Hence the old code reserved 1 MiB extra at the beginning. > > > > > > > > > > > > Note that the new code also dropped the rounding up of the memory block > > > > > > size to a multiple of PAGE_SIZE. I'm not sure the latter actually > > > > > > matters or not. > > > > > > > > > > I'd say that bad page state for pfn 1 is caused by "freeing" the first 1M. > > > > > > > > Yes, by prom_free_prom_memory(), as pointed out by Serge. > > > > > > I wonder how other MIPS variants would react to the fact that the memory > > > below the kernel is not reserved ;-) > > > > Exactly... > > > > Looks like at least arch/mips/dec/prom/memory.c needs a similar but\ > > more complicated fix, due to declance handling... > > > > The problem might be fixed there by the next patch: > --- > diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c > index 5073d2ed78bb..5a0c734b5d04 100644 > --- a/arch/mips/dec/prom/memory.c > +++ b/arch/mips/dec/prom/memory.c > @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic) > pmax_setup_memory_region(); > else > rex_setup_memory_region(); > -} > - > -void __init prom_free_prom_memory(void) > -{ > - unsigned long end; > - > - /* > - * Free everything below the kernel itself but leave > - * the first page reserved for the exception handlers. > - */ > > #if IS_ENABLED(CONFIG_DECLANCE) > /* > - * Leave 128 KB reserved for Lance memory for > - * IOASIC DECstations. > + * Reserve 128 KB for Lance memory for IOASIC DECstations. > * > * XXX: save this address for use in dec_lance.c? > */ > if (IOASIC) > - end = __pa(&_text) - 0x00020000; > - else > + memblock_reserve(__pa_symbol(&_text), 0x00020000); Shouldn't that be memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000); ? > #endif > - end = __pa(&_text); > - > - free_init_pages("unused PROM memory", PAGE_SIZE, end); > } > --- > > Didn't wanna use prom_FREE_prom_memory to actually reserve a memory > chunk, so I moved the reservation into the prom_meminit() method. I guess Maciej will test this on real hardware, eventually... > Regarding the first page for the exception handlers. We don't need > to reserve it here, since it is already done in arch/mips/kernel/traps.c . Thanks for checking! That was actually something I was still wondering about... Gr{oetje,eeting}s, Geert
On Wed, May 22, 2019 at 03:44:47PM +0200, Geert Uytterhoeven wrote: > Hi Serge, > > ... > > > > > The problem might be fixed there by the next patch: > > --- > > diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c > > index 5073d2ed78bb..5a0c734b5d04 100644 > > --- a/arch/mips/dec/prom/memory.c > > +++ b/arch/mips/dec/prom/memory.c > > @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic) > > pmax_setup_memory_region(); > > else > > rex_setup_memory_region(); > > -} > > - > > -void __init prom_free_prom_memory(void) > > -{ > > - unsigned long end; > > - > > - /* > > - * Free everything below the kernel itself but leave > > - * the first page reserved for the exception handlers. > > - */ > > > > #if IS_ENABLED(CONFIG_DECLANCE) > > /* > > - * Leave 128 KB reserved for Lance memory for > > - * IOASIC DECstations. > > + * Reserve 128 KB for Lance memory for IOASIC DECstations. > > * > > * XXX: save this address for use in dec_lance.c? > > */ > > if (IOASIC) > > - end = __pa(&_text) - 0x00020000; > > - else > > + memblock_reserve(__pa_symbol(&_text), 0x00020000); > > Shouldn't that be > > memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000); > > ? > Right. Thanks. The updated version of the patch is attached to this email. -Sergey > > #endif > > - end = __pa(&_text); > > - > > - free_init_pages("unused PROM memory", PAGE_SIZE, end); > > } > > --- > > > > Didn't wanna use prom_FREE_prom_memory to actually reserve a memory > > chunk, so I moved the reservation into the prom_meminit() method. > > I guess Maciej will test this on real hardware, eventually... > > > Regarding the first page for the exception handlers. We don't need > > to reserve it here, since it is already done in arch/mips/kernel/traps.c . > > Thanks for checking! That was actually something I was still wondering > about... > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c index 5073d2ed78bb..031c0cd45d85 100644 --- a/arch/mips/dec/prom/memory.c +++ b/arch/mips/dec/prom/memory.c @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic) pmax_setup_memory_region(); else rex_setup_memory_region(); -} - -void __init prom_free_prom_memory(void) -{ - unsigned long end; - - /* - * Free everything below the kernel itself but leave - * the first page reserved for the exception handlers. - */ #if IS_ENABLED(CONFIG_DECLANCE) /* - * Leave 128 KB reserved for Lance memory for - * IOASIC DECstations. + * Reserve 128 KB for Lance memory for IOASIC DECstations. * * XXX: save this address for use in dec_lance.c? */ if (IOASIC) - end = __pa(&_text) - 0x00020000; - else + memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000); #endif - end = __pa(&_text); - - free_init_pages("unused PROM memory", PAGE_SIZE, end); }
On Wed, 22 May 2019, Serge Semin wrote: > > > The problem might be fixed there by the next patch: > > > --- > > > diff --git a/arch/mips/dec/prom/memory.c b/arch/mips/dec/prom/memory.c > > > index 5073d2ed78bb..5a0c734b5d04 100644 > > > --- a/arch/mips/dec/prom/memory.c > > > +++ b/arch/mips/dec/prom/memory.c > > > @@ -91,29 +91,14 @@ void __init prom_meminit(u32 magic) > > > pmax_setup_memory_region(); > > > else > > > rex_setup_memory_region(); > > > -} > > > - > > > -void __init prom_free_prom_memory(void) > > > -{ > > > - unsigned long end; > > > - > > > - /* > > > - * Free everything below the kernel itself but leave > > > - * the first page reserved for the exception handlers. > > > - */ > > > > > > #if IS_ENABLED(CONFIG_DECLANCE) > > > /* > > > - * Leave 128 KB reserved for Lance memory for > > > - * IOASIC DECstations. > > > + * Reserve 128 KB for Lance memory for IOASIC DECstations. > > > * > > > * XXX: save this address for use in dec_lance.c? > > > */ > > > if (IOASIC) > > > - end = __pa(&_text) - 0x00020000; > > > - else > > > + memblock_reserve(__pa_symbol(&_text), 0x00020000); > > > > Shouldn't that be > > > > memblock_reserve(__pa_symbol(&_text) - 0x00020000, 0x00020000); > > > > ? > > > > Right. Thanks. The updated version of the patch is attached to this email. > > -Sergey > > > > #endif > > > - end = __pa(&_text); > > > - > > > - free_init_pages("unused PROM memory", PAGE_SIZE, end); > > > } > > > --- > > > > > > Didn't wanna use prom_FREE_prom_memory to actually reserve a memory > > > chunk, so I moved the reservation into the prom_meminit() method. > > > > I guess Maciej will test this on real hardware, eventually... I finally got to it as I was hit by it the hard way (and I do have kept the thread in my inbox!), however this is the wrong fix. With DEC hardware the whole 128KiB region is reserved as firmware working memory area and we call into the firmware throughout bootstrap on several occasions. Therefore we have to stay away from it until we know we won't need any firmware services any longer, which is at this point. So this piece has to stay, and the removed reservation has to be reinstated in platform code. I'll be posting a fix separately. NB I suspect CFE platforms may need a similar fix, but I don't have access to my SWARM now, so I'll verify it on another occasion. Other platforms may need it too; at least up to a point an assumption was the kernel load address is just at the end of any firmware working area typically allocated right beyond the exception vector area, hence the reservation. I realise the assumption may have changed at one point and the oldtimers who have known it may have been away or not paying enough attention while the newcomers did not realise that. Maciej
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 185e0e42e009..f71a7d32a687 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -371,7 +371,6 @@ static void __init bootmem_init(void) static void __init bootmem_init(void) { - unsigned long reserved_end; phys_addr_t ramstart = PHYS_ADDR_MAX; int i; @@ -382,10 +381,10 @@ static void __init bootmem_init(void) * will reserve the area used for the initrd. */ init_initrd(); - reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); - memblock_reserve(PHYS_OFFSET, - (reserved_end << PAGE_SHIFT) - PHYS_OFFSET); + /* Reserve memory occupied by kernel. */ + memblock_reserve(__pa_symbol(&_text), + __pa_symbol(&_end) - __pa_symbol(&_text)); /* * max_low_pfn is not a number of pages. The number of pages @@ -501,29 +500,6 @@ static void __init bootmem_init(void) memory_present(0, start, end); } -#ifdef CONFIG_RELOCATABLE - /* - * The kernel reserves all memory below its _end symbol as bootmem, - * but the kernel may now be at a much higher address. The memory - * between the original and new locations may be returned to the system. - */ - if (__pa_symbol(_text) > __pa_symbol(VMLINUX_LOAD_ADDRESS)) { - unsigned long offset; - extern void show_kernel_relocation(const char *level); - - offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS); - memblock_free(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset); - -#if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO) - /* - * This information is necessary when debugging the kernel - * But is a security vulnerability otherwise! - */ - show_kernel_relocation(KERN_INFO); -#endif - } -#endif - /* * Reserve initrd memory if needed. */
The reserved_end variable had been used by the bootmem_init() code to find a lowest limit of memory available for memmap blob. The original code just tried to find a free memory space higher than kernel was placed. This limitation seems justified for the memmap ragion search process, but I can't see any obvious reason to reserve the unused space below kernel seeing some platforms place it much higher than standard 1MB. Moreover the RELOCATION config enables it to be loaded at any memory address. So lets reserve the memory occupied by the kernel only, leaving the region below being free for allocations. After doing this we can now discard the code freeing a space between kernel _text and VMLINUX_LOAD_ADDRESS symbols since it's going to be free anyway (unless marked as reserved by platforms). Signed-off-by: Serge Semin <fancer.lancer@gmail.com> --- arch/mips/kernel/setup.c | 30 +++--------------------------- 1 file changed, 3 insertions(+), 27 deletions(-)