Message ID | alpine.LFD.2.21.2010142230090.866917@eddie.linux-mips.org (mailing list archive) |
---|---|
State | Accepted |
Commit | cf3af0a4d3b62ab48e0b90180ea161d0f5d4953f |
Headers | show |
Series | [v2] MIPS: DEC: Restore bootmem reservation for firmware working memory area | expand |
On Wed, Oct 14, 2020 at 10:34:56PM +0100, Maciej W. Rozycki wrote: > Fix a crash on DEC platforms starting with: > > VFS: Mounted root (nfs filesystem) on device 0:11. > Freeing unused PROM memory: 124k freed > BUG: Bad page state in process swapper pfn:00001 > page:(ptrval) refcount:0 mapcount:-128 mapping:00000000 index:0x1 pfn:0x1 > flags: 0x0() > raw: 00000000 00000100 00000122 00000000 00000001 00000000 ffffff7f 00000000 > page dumped because: nonzero mapcount > Modules linked in: > CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-00858-g865c50e1d279 #1 > Stack : 8065dc48 0000000b 8065d2b8 9bc27dcc 80645bfc 9bc259a4 806a1b97 80703124 > 80710000 8064a900 00000001 80099574 806b116c 1000ec00 9bc27d88 806a6f30 > 00000000 00000000 80645bfc 00000000 31232039 80706ba4 2e392e35 8039f348 > 2d383538 00000070 0000000a 35363867 00000000 806c2830 80710000 806b0000 > 80710000 8064a900 00000001 81000000 00000000 00000000 8035af2c 80700000 > ... > Call Trace: > [<8004bc5c>] show_stack+0x34/0x104 > [<8015675c>] bad_page+0xfc/0x128 > [<80157714>] free_pcppages_bulk+0x1f4/0x5dc > [<801591cc>] free_unref_page+0xc0/0x130 > [<8015cb04>] free_reserved_area+0x144/0x1d8 > [<805abd78>] kernel_init+0x20/0x100 > [<80046070>] ret_from_kernel_thread+0x14/0x1c > Disabling lock debugging due to kernel taint > > caused by an attempt to free bootmem space that as from commit > b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") has > not been anymore reserved due to the removal of generic MIPS arch code > that used to reserve all the memory from the beginning of RAM up to the > kernel load address. > > This memory does need to be reserved on DEC platforms however as it is > used by REX firmware as working area, as per the TURBOchannel firmware > specification[1]: > > Table 2-2 REX Memory Regions > ------------------------------------------------------------------------- > Starting Ending > Region Address Address Use > ------------------------------------------------------------------------- > 0 0xa0000000 0xa000ffff Restart block, exception vectors, > REX stack and bss > 1 0xa0010000 0xa0017fff Keyboard or tty drivers > > 2 0xa0018000 0xa001f3ff 1) CRT driver > > 3 0xa0020000 0xa002ffff boot, cnfg, init and t objects > > 4 0xa0020000 0xa002ffff 64KB scratch space > ------------------------------------------------------------------------- > 1) Note that the last 3 Kbytes of region 2 are reserved for backward > compatibility with previous system software. > ------------------------------------------------------------------------- > > (this table uses KSEG2 unmapped virtual addresses, which in the MIPS > architecture are offset from physical addresses by a fixed value of > 0xa0000000 and therefore the regions referred do correspond to the > beginning of the physical address space) and we call into the firmware > on several occasions throughout the bootstrap process. It is believed > that pre-REX firmware used with non-TURBOchannel DEC platforms has the > same requirements, as hinted by note #1 cited. > > Recreate the discarded reservation then, in DEC platform code, removing > the crash. > > References: > > [1] "TURBOchannel Firmware Specification", On-line version, > EK-TCAAD-FS-004, Digital Equipment Corporation, January 1993, > Chapter 2 "System Module Firmware", p. 2-5 > > Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> > Fixes: b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") > Cc: stable@vger.kernel.org # v5.2+ > --- > Changes from v1: > > - Fix 2nd argument of the call to `memblock_reserve' (thanks Serge!). > --- > arch/mips/dec/setup.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) applied to mips-next. Thomas.
Maciej, Regardless of a comment below feel free to add: Reviewed-by: Serge Semin <fancer.lancer@gmail.com> On Wed, Oct 14, 2020 at 10:34:56PM +0100, Maciej W. Rozycki wrote: > Fix a crash on DEC platforms starting with: > > VFS: Mounted root (nfs filesystem) on device 0:11. > Freeing unused PROM memory: 124k freed > BUG: Bad page state in process swapper pfn:00001 > page:(ptrval) refcount:0 mapcount:-128 mapping:00000000 index:0x1 pfn:0x1 > flags: 0x0() > raw: 00000000 00000100 00000122 00000000 00000001 00000000 ffffff7f 00000000 > page dumped because: nonzero mapcount > Modules linked in: > CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-00858-g865c50e1d279 #1 > Stack : 8065dc48 0000000b 8065d2b8 9bc27dcc 80645bfc 9bc259a4 806a1b97 80703124 > 80710000 8064a900 00000001 80099574 806b116c 1000ec00 9bc27d88 806a6f30 > 00000000 00000000 80645bfc 00000000 31232039 80706ba4 2e392e35 8039f348 > 2d383538 00000070 0000000a 35363867 00000000 806c2830 80710000 806b0000 > 80710000 8064a900 00000001 81000000 00000000 00000000 8035af2c 80700000 > ... > Call Trace: > [<8004bc5c>] show_stack+0x34/0x104 > [<8015675c>] bad_page+0xfc/0x128 > [<80157714>] free_pcppages_bulk+0x1f4/0x5dc > [<801591cc>] free_unref_page+0xc0/0x130 > [<8015cb04>] free_reserved_area+0x144/0x1d8 > [<805abd78>] kernel_init+0x20/0x100 > [<80046070>] ret_from_kernel_thread+0x14/0x1c > Disabling lock debugging due to kernel taint > > caused by an attempt to free bootmem space that as from commit > b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") has > not been anymore reserved due to the removal of generic MIPS arch code > that used to reserve all the memory from the beginning of RAM up to the > kernel load address. > > This memory does need to be reserved on DEC platforms however as it is > used by REX firmware as working area, as per the TURBOchannel firmware > specification[1]: > > Table 2-2 REX Memory Regions > ------------------------------------------------------------------------- > Starting Ending > Region Address Address Use > ------------------------------------------------------------------------- > 0 0xa0000000 0xa000ffff Restart block, exception vectors, > REX stack and bss > 1 0xa0010000 0xa0017fff Keyboard or tty drivers > > 2 0xa0018000 0xa001f3ff 1) CRT driver > > 3 0xa0020000 0xa002ffff boot, cnfg, init and t objects > > 4 0xa0020000 0xa002ffff 64KB scratch space > ------------------------------------------------------------------------- > 1) Note that the last 3 Kbytes of region 2 are reserved for backward > compatibility with previous system software. > ------------------------------------------------------------------------- > ... > @@ -146,6 +150,9 @@ void __init plat_mem_setup(void) > > ioport_resource.start = ~0UL; > ioport_resource.end = 0UL; > + > + /* Stay away from the firmware working memory area for now. */ > + memblock_reserve(PHYS_OFFSET, __pa_symbol(&_text) - PHYS_OFFSET); I am just wondering... Here we reserve a region within [PHYS_OFFSET, __pa_symbol(&_text)]. But then in the prom_free_prom_memory() method we'll get to free a region as either [PAGE_SIZE, __pa_symbol(&_text)] or [PAGE_SIZE, __pa_symbol(&_text) - 0x00020000]. First of all the start address of the being freed region is PAGE_SIZE, which doesn't take the PHYS_OFFSET into account. I assume it won't cause problems because PHYS_OFFSET is set to 0 for DEC. If so then we either shouldn't use PHYS_OFFSET here or should take PHYS_OFFSET into account there on freeing or should just forget about that since other than confusion it won't cause any problem.) (I should have posted this question to v1 of this patch instead of pointing out on the confusing size argument of the memblock_reserve() method. Sorry about that.) Secondly why is PAGE_SIZE selected as the start address? It belongs to the Region 1 in accordance with "Table 2-2 REX Memory Regions" cited above. Thus we get to keep reserved just a part of the Region 1. Most likely it's the restart block and the exception vectors. Even assuming that the DEC developers knew what they were doing, I am wondering can we be sure that a single page is enough for all that data?.. -Sergey > } > > /*
On Thu, 15 Oct 2020, Serge Semin wrote: > > Table 2-2 REX Memory Regions > > ------------------------------------------------------------------------- > > Starting Ending > > Region Address Address Use > > ------------------------------------------------------------------------- > > 0 0xa0000000 0xa000ffff Restart block, exception vectors, > > REX stack and bss > > 1 0xa0010000 0xa0017fff Keyboard or tty drivers > > > > 2 0xa0018000 0xa001f3ff 1) CRT driver > > > > 3 0xa0020000 0xa002ffff boot, cnfg, init and t objects > > > > 4 0xa0020000 0xa002ffff 64KB scratch space > > ------------------------------------------------------------------------- > > 1) Note that the last 3 Kbytes of region 2 are reserved for backward > > compatibility with previous system software. > > ------------------------------------------------------------------------- > > > > ... > > > @@ -146,6 +150,9 @@ void __init plat_mem_setup(void) > > > > ioport_resource.start = ~0UL; > > ioport_resource.end = 0UL; > > > + > > + /* Stay away from the firmware working memory area for now. */ > > + memblock_reserve(PHYS_OFFSET, __pa_symbol(&_text) - PHYS_OFFSET); > > I am just wondering... > Here we reserve a region within [PHYS_OFFSET, __pa_symbol(&_text)]. But then in > the prom_free_prom_memory() method we'll get to free a region as either > [PAGE_SIZE, __pa_symbol(&_text)] or [PAGE_SIZE, __pa_symbol(&_text) - 0x00020000]. > > First of all the start address of the being freed region is PAGE_SIZE, which doesn't > take the PHYS_OFFSET into account. I assume it won't cause problems because > PHYS_OFFSET is set to 0 for DEC. If so then we either shouldn't use PHYS_OFFSET > here or should take PHYS_OFFSET into account there on freeing or should just forget > about that since other than confusion it won't cause any problem.) > (I should have posted this question to v1 of this patch instead of pointing out > on the confusing size argument of the memblock_reserve() method. Sorry about > that.) Technically, yes, we could use PHYS_OFFSET here, though as a separate change, as it's not related to the regression addressed. Please mind that this is very old code, which long predates the existence of PHYS_OFFSET, introduced with commit 6f284a2ce7b8 ("[MIPS] FLATMEM: introduce PHYS_OFFSET.") back in 2007 only. While `prom_free_prom_memory' code dates back to commit b5766e7e6177 ("o bootmem fixes for DECstations") (no proper change heading here; this is from CVS repo days) from 2000, and I fiddled with it myself not so long afterwards with commit e25ac8bd2505 ("DECstation fixes from Maciej") in 2001 (both in the old LMO GIT repo). So if anytime it should have been updated in the course of a code audit that was surely due across all platforms with the introduction of PHYS_OFFSET. Of course it doesn't mean it must not or cannot be fixed now, but it has been functionally correct even if semantically broken, so no one saw the urge to change it (let alone notice the problem in the first place -- you are the first one). > Secondly why is PAGE_SIZE selected as the start address? It belongs to the > Region 1 in accordance with "Table 2-2 REX Memory Regions" cited above. Thus we > get to keep reserved just a part of the Region 1. Most likely it's the restart > block and the exception vectors. Even assuming that the DEC developers knew what > they were doing, I am wondering can we be sure that a single page is enough for > all that data?.. Again this is so old as to predate the existence of synthesised exception handlers we currently use (which has been a blessing BTW), which I reckon take a little less space than the preassembled handlers we previously had used to, and stay well within even the smallest supported page size of 4KiB. Anything else we can just overwrite as we do not mean to call into the firmware at this point anymore; we couldn't trust it to do the right thing anyway once we have booted (e.g. not to keep interrupts masked for too long, etc.). I've been occasionally thinking about a better solution in place of the LANCE hack here, needed because the chip has only its low 16 out of 24 address lines wired, due to a limitation of the IOASIC DMA controller (it also drives one half of the IOASIC's 16-bit data bus only, communicating with every other byte of system memory only and hence the requirement for a 128KiB allocation, with every other byte wasted, rather than a 64KiB one). Had all 24 lines been used, we could use dynamic ZONE_DMA buffers as with PC ISA DMA, as the LANCE implements proper DMA rings, but with low 16 only it would not really play. I have not come up with any solution however that would be significantly better than what we have, and the current one works, so I have left it as it is. Do these explanations address your concerns? Maciej
On Thu, Oct 15, 2020 at 01:03:10AM +0100, Maciej W. Rozycki wrote: > On Thu, 15 Oct 2020, Serge Semin wrote: > > > > Table 2-2 REX Memory Regions > > > ------------------------------------------------------------------------- > > > Starting Ending > > > Region Address Address Use > > > ------------------------------------------------------------------------- > > > 0 0xa0000000 0xa000ffff Restart block, exception vectors, > > > REX stack and bss > > > 1 0xa0010000 0xa0017fff Keyboard or tty drivers > > > > > > 2 0xa0018000 0xa001f3ff 1) CRT driver > > > > > > 3 0xa0020000 0xa002ffff boot, cnfg, init and t objects > > > > > > 4 0xa0020000 0xa002ffff 64KB scratch space > > > ------------------------------------------------------------------------- > > > 1) Note that the last 3 Kbytes of region 2 are reserved for backward > > > compatibility with previous system software. > > > ------------------------------------------------------------------------- > > > > > > > ... > > > > > @@ -146,6 +150,9 @@ void __init plat_mem_setup(void) > > > > > > ioport_resource.start = ~0UL; > > > ioport_resource.end = 0UL; > > > > > + > > > + /* Stay away from the firmware working memory area for now. */ > > > + memblock_reserve(PHYS_OFFSET, __pa_symbol(&_text) - PHYS_OFFSET); > > > > I am just wondering... > > Here we reserve a region within [PHYS_OFFSET, __pa_symbol(&_text)]. But then in > > the prom_free_prom_memory() method we'll get to free a region as either > > [PAGE_SIZE, __pa_symbol(&_text)] or [PAGE_SIZE, __pa_symbol(&_text) - 0x00020000]. > > > > First of all the start address of the being freed region is PAGE_SIZE, which doesn't > > take the PHYS_OFFSET into account. I assume it won't cause problems because > > PHYS_OFFSET is set to 0 for DEC. If so then we either shouldn't use PHYS_OFFSET > > here or should take PHYS_OFFSET into account there on freeing or should just forget > > about that since other than confusion it won't cause any problem.) > > (I should have posted this question to v1 of this patch instead of pointing out > > on the confusing size argument of the memblock_reserve() method. Sorry about > > that.) > > Technically, yes, we could use PHYS_OFFSET here, though as a separate > change, as it's not related to the regression addressed. > > Please mind that this is very old code, which long predates the existence > of PHYS_OFFSET, introduced with commit 6f284a2ce7b8 ("[MIPS] FLATMEM: > introduce PHYS_OFFSET.") back in 2007 only. While `prom_free_prom_memory' > code dates back to commit b5766e7e6177 ("o bootmem fixes for DECstations") > (no proper change heading here; this is from CVS repo days) from 2000, and > I fiddled with it myself not so long afterwards with commit e25ac8bd2505 > ("DECstation fixes from Maciej") in 2001 (both in the old LMO GIT repo). > So if anytime it should have been updated in the course of a code audit > that was surely due across all platforms with the introduction of > PHYS_OFFSET. > > Of course it doesn't mean it must not or cannot be fixed now, but it has > been functionally correct even if semantically broken, so no one saw the > urge to change it (let alone notice the problem in the first place -- you > are the first one). > > > Secondly why is PAGE_SIZE selected as the start address? It belongs to the > > Region 1 in accordance with "Table 2-2 REX Memory Regions" cited above. Thus we > > get to keep reserved just a part of the Region 1. Most likely it's the restart > > block and the exception vectors. Even assuming that the DEC developers knew what > > they were doing, I am wondering can we be sure that a single page is enough for > > all that data?.. > > Again this is so old as to predate the existence of synthesised exception > handlers we currently use (which has been a blessing BTW), which I reckon > take a little less space than the preassembled handlers we previously had > used to, and stay well within even the smallest supported page size of > 4KiB. Anything else we can just overwrite as we do not mean to call into > the firmware at this point anymore; we couldn't trust it to do the right > thing anyway once we have booted (e.g. not to keep interrupts masked for > too long, etc.). > > I've been occasionally thinking about a better solution in place of the > LANCE hack here, needed because the chip has only its low 16 out of 24 > address lines wired, due to a limitation of the IOASIC DMA controller (it > also drives one half of the IOASIC's 16-bit data bus only, communicating > with every other byte of system memory only and hence the requirement for > a 128KiB allocation, with every other byte wasted, rather than a 64KiB > one). > > Had all 24 lines been used, we could use dynamic ZONE_DMA buffers as with > PC ISA DMA, as the LANCE implements proper DMA rings, but with low 16 only > it would not really play. I have not come up with any solution however > that would be significantly better than what we have, and the current one > works, so I have left it as it is. > > Do these explanations address your concerns? Yep, completely. Thanks for the very detailed response. -Sergey > > Maciej
Index: linux-3maxp/arch/mips/dec/setup.c =================================================================== --- linux-3maxp.orig/arch/mips/dec/setup.c +++ linux-3maxp/arch/mips/dec/setup.c @@ -6,7 +6,7 @@ * for more details. * * Copyright (C) 1998 Harald Koerfgen - * Copyright (C) 2000, 2001, 2002, 2003, 2005 Maciej W. Rozycki + * Copyright (C) 2000, 2001, 2002, 2003, 2005, 2020 Maciej W. Rozycki */ #include <linux/console.h> #include <linux/export.h> @@ -15,6 +15,7 @@ #include <linux/ioport.h> #include <linux/irq.h> #include <linux/irqnr.h> +#include <linux/memblock.h> #include <linux/param.h> #include <linux/percpu-defs.h> #include <linux/sched.h> @@ -22,6 +23,7 @@ #include <linux/types.h> #include <linux/pm.h> +#include <asm/addrspace.h> #include <asm/bootinfo.h> #include <asm/cpu.h> #include <asm/cpu-features.h> @@ -29,7 +31,9 @@ #include <asm/irq.h> #include <asm/irq_cpu.h> #include <asm/mipsregs.h> +#include <asm/page.h> #include <asm/reboot.h> +#include <asm/sections.h> #include <asm/time.h> #include <asm/traps.h> #include <asm/wbflush.h> @@ -146,6 +150,9 @@ void __init plat_mem_setup(void) ioport_resource.start = ~0UL; ioport_resource.end = 0UL; + + /* Stay away from the firmware working memory area for now. */ + memblock_reserve(PHYS_OFFSET, __pa_symbol(&_text) - PHYS_OFFSET); } /*
Fix a crash on DEC platforms starting with: VFS: Mounted root (nfs filesystem) on device 0:11. Freeing unused PROM memory: 124k freed BUG: Bad page state in process swapper pfn:00001 page:(ptrval) refcount:0 mapcount:-128 mapping:00000000 index:0x1 pfn:0x1 flags: 0x0() raw: 00000000 00000100 00000122 00000000 00000001 00000000 ffffff7f 00000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 5.9.0-00858-g865c50e1d279 #1 Stack : 8065dc48 0000000b 8065d2b8 9bc27dcc 80645bfc 9bc259a4 806a1b97 80703124 80710000 8064a900 00000001 80099574 806b116c 1000ec00 9bc27d88 806a6f30 00000000 00000000 80645bfc 00000000 31232039 80706ba4 2e392e35 8039f348 2d383538 00000070 0000000a 35363867 00000000 806c2830 80710000 806b0000 80710000 8064a900 00000001 81000000 00000000 00000000 8035af2c 80700000 ... Call Trace: [<8004bc5c>] show_stack+0x34/0x104 [<8015675c>] bad_page+0xfc/0x128 [<80157714>] free_pcppages_bulk+0x1f4/0x5dc [<801591cc>] free_unref_page+0xc0/0x130 [<8015cb04>] free_reserved_area+0x144/0x1d8 [<805abd78>] kernel_init+0x20/0x100 [<80046070>] ret_from_kernel_thread+0x14/0x1c Disabling lock debugging due to kernel taint caused by an attempt to free bootmem space that as from commit b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") has not been anymore reserved due to the removal of generic MIPS arch code that used to reserve all the memory from the beginning of RAM up to the kernel load address. This memory does need to be reserved on DEC platforms however as it is used by REX firmware as working area, as per the TURBOchannel firmware specification[1]: Table 2-2 REX Memory Regions ------------------------------------------------------------------------- Starting Ending Region Address Address Use ------------------------------------------------------------------------- 0 0xa0000000 0xa000ffff Restart block, exception vectors, REX stack and bss 1 0xa0010000 0xa0017fff Keyboard or tty drivers 2 0xa0018000 0xa001f3ff 1) CRT driver 3 0xa0020000 0xa002ffff boot, cnfg, init and t objects 4 0xa0020000 0xa002ffff 64KB scratch space ------------------------------------------------------------------------- 1) Note that the last 3 Kbytes of region 2 are reserved for backward compatibility with previous system software. ------------------------------------------------------------------------- (this table uses KSEG2 unmapped virtual addresses, which in the MIPS architecture are offset from physical addresses by a fixed value of 0xa0000000 and therefore the regions referred do correspond to the beginning of the physical address space) and we call into the firmware on several occasions throughout the bootstrap process. It is believed that pre-REX firmware used with non-TURBOchannel DEC platforms has the same requirements, as hinted by note #1 cited. Recreate the discarded reservation then, in DEC platform code, removing the crash. References: [1] "TURBOchannel Firmware Specification", On-line version, EK-TCAAD-FS-004, Digital Equipment Corporation, January 1993, Chapter 2 "System Module Firmware", p. 2-5 Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Fixes: b93ddc4f9156 ("mips: Reserve memory for the kernel image resources") Cc: stable@vger.kernel.org # v5.2+ --- Changes from v1: - Fix 2nd argument of the call to `memblock_reserve' (thanks Serge!). --- arch/mips/dec/setup.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) linux-dec-prom-memblock-reserve.diff