diff mbox series

MIPS: reserve the memblock right after the kernel

Message ID 20201106141001.57637-1-alexander.sverdlin@nokia.com (mailing list archive)
State Superseded
Headers show
Series MIPS: reserve the memblock right after the kernel | expand

Commit Message

Alexander Sverdlin Nov. 6, 2020, 2:10 p.m. UTC
From: Alexander Sverdlin <alexander.sverdlin@nokia.com>

Linux doesn't own the memory immediately after the kernel image. On Octeon
bootloader places a shared structure right close after the kernel _end,
refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.

If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
memory block overlapping with the above octeon_bootinfo structure, which
is being overwritten afterwards.

Cc: stable@vger.kernel.org
Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
---
 arch/mips/kernel/setup.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Thomas Bogendoerfer Nov. 7, 2020, 9:40 a.m. UTC | #1
On Fri, Nov 06, 2020 at 03:10:01PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> 
> Linux doesn't own the memory immediately after the kernel image. On Octeon
> bootloader places a shared structure right close after the kernel _end,
> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
> 
> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
> memory block overlapping with the above octeon_bootinfo structure, which
> is being overwritten afterwards.

as this special for Octeon how about added the memblock_reserve
in octen specific code ?

Thomas.
Alexander Sverdlin Nov. 9, 2020, 10:34 a.m. UTC | #2
Hello Thomas,

On 07/11/2020 10:40, Thomas Bogendoerfer wrote:
>> Linux doesn't own the memory immediately after the kernel image. On Octeon
>> bootloader places a shared structure right close after the kernel _end,
>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
>>
>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
>> memory block overlapping with the above octeon_bootinfo structure, which
>> is being overwritten afterwards.
> as this special for Octeon how about added the memblock_reserve
> in octen specific code ?

while the shared structure which is being corrupted is indeed Octeon-specific,
the wrong assumption that the memory right after the kernel can be allocated by memblock
allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem().

I personally will be fine with repairing Octeon only as I don't have other MIPS
targets to care about, but maybe someone else in the MIPS community will find
this fix useful...
Alexander Sverdlin Nov. 9, 2020, 11:08 a.m. UTC | #3
Hi Thomas,

On 09/11/2020 11:34, Alexander Sverdlin wrote:
>>> Linux doesn't own the memory immediately after the kernel image. On Octeon
>>> bootloader places a shared structure right close after the kernel _end,
>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
>>>
>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
>>> memory block overlapping with the above octeon_bootinfo structure, which
>>> is being overwritten afterwards.
>> as this special for Octeon how about added the memblock_reserve
>> in octen specific code ?
> while the shared structure which is being corrupted is indeed Octeon-specific,
> the wrong assumption that the memory right after the kernel can be allocated by memblock
> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem().
> 
> I personally will be fine with repairing Octeon only as I don't have other MIPS
> targets to care about, but maybe someone else in the MIPS community will find
> this fix useful...

another reason for not putting that to Octeon platform code was the fact, that one
would need to put the knowledge about wrong assumption of ARCH (MIPS) code into
different code area of Octeon platform.
If at some point in time check_kernel_sections_mem() will be altered/fixed, nobody
will understand why Octeon still has this workaround.
Thomas Bogendoerfer Nov. 10, 2020, 9:55 a.m. UTC | #4
On Mon, Nov 09, 2020 at 11:34:33AM +0100, Alexander Sverdlin wrote:
> Hello Thomas,
> 
> On 07/11/2020 10:40, Thomas Bogendoerfer wrote:
> >> Linux doesn't own the memory immediately after the kernel image. On Octeon
> >> bootloader places a shared structure right close after the kernel _end,
> >> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
> >>
> >> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
> >> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
> >> memory block overlapping with the above octeon_bootinfo structure, which
> >> is being overwritten afterwards.
> > as this special for Octeon how about added the memblock_reserve
> > in octen specific code ?
> 
> while the shared structure which is being corrupted is indeed Octeon-specific,
> the wrong assumption that the memory right after the kernel can be allocated by memblock
> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem().

ok, I see your point. IMHO this whole check_kernel_sections_mem() should
be removed. IMHO memory adding should only be done my memory detection code.

Could you send a patch, which removes check_kernel_section_mem completly ?

Thomas.
Alexander Sverdlin Nov. 10, 2020, 10:29 a.m. UTC | #5
Hello Thomas,

On 10/11/2020 10:55, Thomas Bogendoerfer wrote:
>>>> Linux doesn't own the memory immediately after the kernel image. On Octeon
>>>> bootloader places a shared structure right close after the kernel _end,
>>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
>>>>
>>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
>>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
>>>> memory block overlapping with the above octeon_bootinfo structure, which
>>>> is being overwritten afterwards.
>>> as this special for Octeon how about added the memblock_reserve
>>> in octen specific code ?
>> while the shared structure which is being corrupted is indeed Octeon-specific,
>> the wrong assumption that the memory right after the kernel can be allocated by memblock
>> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem().
> ok, I see your point. IMHO this whole check_kernel_sections_mem() should
> be removed. IMHO memory adding should only be done my memory detection code.
> 
> Could you send a patch, which removes check_kernel_section_mem completly ?

this will expose one issue:
platforms usually do it in a sane way, like it was done last 15 years, namely
add kernel image without non-complete pages on the boundaries.
This will lead to the situation, that request_resource() will fail at least
for .bss section of the kernel and it will not be properly displayed under
/proc/iomem (and probably same problem will appear, which initially motivated
the creation of check_kernel_section_mem()).

As I understood, the issue is that memblock API operates internally on the
page granularity (at least there are many ROUND_DOWN() inside for the size
or upper boundary), so for request_resource() to success one has to claim
the rest of the .bss last page. And with current memblock API
memblock_reserve() must appear somewhere, being this ARCH or platform code.
Serge Semin Nov. 11, 2020, 2:52 p.m. UTC | #6
Hello Alexander

On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote:
> Hello Thomas,
> 
> On 10/11/2020 10:55, Thomas Bogendoerfer wrote:
> >>>> Linux doesn't own the memory immediately after the kernel image. On Octeon
> >>>> bootloader places a shared structure right close after the kernel _end,
> >>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c.
> >>>>
> >>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc()
> >>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns
> >>>> memory block overlapping with the above octeon_bootinfo structure, which
> >>>> is being overwritten afterwards.
> >>> as this special for Octeon how about added the memblock_reserve
> >>> in octen specific code ?
> >> while the shared structure which is being corrupted is indeed Octeon-specific,
> >> the wrong assumption that the memory right after the kernel can be allocated by memblock
> >> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem().
> > ok, I see your point. IMHO this whole check_kernel_sections_mem() should
> > be removed. IMHO memory adding should only be done my memory detection code.
> > 
> > Could you send a patch, which removes check_kernel_section_mem completly ?
> 

> this will expose one issue:
> platforms usually do it in a sane way, like it was done last 15 years, namely
> add kernel image without non-complete pages on the boundaries.
> This will lead to the situation, that request_resource() will fail at least
> for .bss section of the kernel and it will not be properly displayed under
> /proc/iomem (and probably same problem will appear, which initially motivated
> the creation of check_kernel_section_mem()).

Are you saying that some old platforms rely on the
check_kernel_section_mem() method adding the memory occupied by the
kernel to the system? If so, do you have an example of such?

Personally I also had my hand itching to remove that method years ago,
but I didn't dare to do so for the same reason in mind... On the other
hand if we detected all the platforms that needed that method, we could
have moved it to their prom_init() or something and got rid of that
atavism for good.

> 
> As I understood, the issue is that memblock API operates internally on the
> page granularity (at least there are many ROUND_DOWN() inside for the size
> or upper boundary),

Hm, I don't think so. Memblock doesn't work with the pages granularity,
but with memory ranges. round_down()/round_up() are used to find a memory
range with proper alignment. (See __memblock_find_range_top_{up,down}()
method implementation.)

Memblock allocates a memory region with exact size and alignment as
requested. That's the beauty of that allocator and one of the reasons
why the kernel platforms have been painfully converted to using it instead
of the old bootmem allocator. BTW the later one has indeed operated
with page granularity.

Getting back to the memblock allocator. It works with pages only when
the kernel comes to starting the buddy allocator. So the kernel
invokes memblock_free_all(), which eventually gets to calling
free_low_memory_core_early()->__free_memory_core(). The later method indeed
sets the memory pages free, but as you can see it's done with correct
aligning PFN_UP(phys_start)/PFN_DOWN(end).

> so for request_resource() to success one has to claim
> the rest of the .bss last page. And with current memblock API
> memblock_reserve() must appear somewhere, being this ARCH or platform code.

After a short glance at the request_resource() code I didn't manage to
find a reason why the method would fail to request a page-unaligned
region. AFAICS it will fail only if the memory occupied by the kernel
hasn't been registered as system memory. The later case may happen
only for the systems which rely on the check_kernel_section_mem()
method being called in the generic arch_mem_init(). Of course we
shouldn't blindly have it removed, but instead move it to the
platforms, which have been unfortunate enough not to add the kernel
memory to the system memory pool.

So IMHO what could be the best conclusion in the framework of this patch:
1) As Thomas said any platform-specific reservation should be done in the
platform-specific code. That means if octeon needs some memory behind
the kernel being reserved, then it should be done for example in
prom_init().
2) The check_kernel_sections_mem() method can be removed. But it
should be done carefully. We at least need to try to find all the
platforms, which rely on its functionality.

-Sergey

> 
> -- 
> Best regards,
> Alexander Sverdlin.
Jiaxun Yang Nov. 13, 2020, 2:30 a.m. UTC | #7
Hi all,

在 2020/11/11 22:52, Serge Semin 写道:
> Hello Alexander
>
> On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote:
> 2) The check_kernel_sections_mem() method can be removed. But it
> should be done carefully. We at least need to try to find all the
> platforms, which rely on its functionality.
It have been more than 10 years since introducing this this check, but
I believe there must be a reason at the time.

Also currently we have some unmaintained platform and it's impossible
to test on them for us.

For Cavium's issue, I believe removing the page rounding can help.
I'd suggest keep this method but remove the rounding part, also
leave a warning or kernel taint when out of boundary kernel image
is detected.

Thanks.

- Jiaxun

>
> -Sergey
>
>> -- 
>> Best regards,
>> Alexander Sverdlin.
Jinyang He Nov. 13, 2020, 3:28 a.m. UTC | #8
Hi, Jiaxun,

On 11/13/2020 10:30 AM, Jiaxun Yang wrote:
> Hi all,
>
> 在 2020/11/11 22:52, Serge Semin 写道:
>> Hello Alexander
>>
>> On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote:
>> 2) The check_kernel_sections_mem() method can be removed. But it
>> should be done carefully. We at least need to try to find all the
>> platforms, which rely on its functionality.
> It have been more than 10 years since introducing this this check, but
> I believe there must be a reason at the time.
>
> Also currently we have some unmaintained platform and it's impossible
> to test on them for us.
>
> For Cavium's issue, I believe removing the page rounding can help.
> I'd suggest keep this method but remove the rounding part, also
> leave a warning or kernel taint when out of boundary kernel image
> is detected.
>
> Thanks.
>
> - Jiaxun
>
I found the first submission.
Commit: d3ff93380232 (mips: Make sure kernel memory is in iomem)
As I thought, this check is related to kdump. More directly, it is
related to the "mem=" parameter. Kexec-tools provide a "mem=" parameter
to limit the kernel location in kdump. But sometimes the memory may be not
enough and this check gives a way to ensure the capture kernel can
start rightly. Although "mem=" is not in kexec-tools now, I thought
it is also useful if someone use "mem=" to do other things.

Thanks,
Jinyang

>>
>> -Sergey
>>
>>> -- 
>>> Best regards,
>>> Alexander Sverdlin.
Maciej W. Rozycki Nov. 13, 2020, 3:49 a.m. UTC | #9
On Fri, 13 Nov 2020, Jinyang He wrote:

> Commit: d3ff93380232 (mips: Make sure kernel memory is in iomem)
> As I thought, this check is related to kdump. More directly, it is
> related to the "mem=" parameter. Kexec-tools provide a "mem=" parameter
> to limit the kernel location in kdump. But sometimes the memory may be not
> enough and this check gives a way to ensure the capture kernel can
> start rightly. Although "mem=" is not in kexec-tools now, I thought
> it is also useful if someone use "mem=" to do other things.

 I used to supply `mem=' by hand on many occasions across different MIPS 
systems and for various reasons over the last 20 years; surely before 
kexec was even thought of.  The option itself has been there in the MIPS 
port since I added it to Linux 2.4.0 back in Dec 2000.

  Maciej
Alexander Sverdlin Nov. 13, 2020, 9:17 a.m. UTC | #10
Hello Serge, Thomas,

On 11/11/2020 15:52, Serge Semin wrote:
>>> Could you send a patch, which removes check_kernel_section_mem completly ?

finally I think you are right and this would be the right way.

>> this will expose one issue:
>> platforms usually do it in a sane way, like it was done last 15 years, namely
>> add kernel image without non-complete pages on the boundaries.
>> This will lead to the situation, that request_resource() will fail at least
>> for .bss section of the kernel and it will not be properly displayed under
>> /proc/iomem (and probably same problem will appear, which initially motivated
>> the creation of check_kernel_section_mem()).
> 
> Are you saying that some old platforms rely on the
> check_kernel_section_mem() method adding the memory occupied by the
> kernel to the system? If so, do you have an example of such?

Initially I was confused why the below patch didn't solve the issue on Octeon:

@@ -532,8 +532,8 @@ static void __init request_crashkernel(struct resource *res)
 
 static void __init check_kernel_sections_mem(void)
 {
-       phys_addr_t start = PFN_PHYS(PFN_DOWN(__pa_symbol(&_text)));
-       phys_addr_t size = PFN_PHYS(PFN_UP(__pa_symbol(&_end))) - start;
+       phys_addr_t start = __pa_symbol(&_text);
+       phys_addr_t size = __pa_symbol(&_end) - start;

... and finally I understood, that the reason was in fact that I tested on Linux
v5.4, which still had this code to reserve RAM resources:

        for_each_memblock(memory, region) {                                                                                                                                                                        
                phys_addr_t start = PFN_PHYS(memblock_region_memory_base_pfn(region));                                                                                                                             
                phys_addr_t end = PFN_PHYS(memblock_region_memory_end_pfn(region)) - 1;                                                                                                                            
                struct resource *res;                                                                                                                                                                              
                                                                                                                                                                                                                   
                ...
	
                res->start = start;                                                                                                                                                                                
                res->end = end;                                                                                                                                                                                    
                res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;                                                                                                                                              
                res->name = "System RAM";                                                                                                                                                                          
                                                                                                                                                                                                                   
                request_resource(&iomem_resource, res);                                                                                                                                                            
 
so I suppose that's where this evil truncation happened. Nowdays this is different
and I believe we can try to remove check_kernel_sections_mem() completely and
this will solve the memory corruption on Octeon.

> So IMHO what could be the best conclusion in the framework of this patch:
> 1) As Thomas said any platform-specific reservation should be done in the
> platform-specific code. That means if octeon needs some memory behind
> the kernel being reserved, then it should be done for example in
> prom_init().
> 2) The check_kernel_sections_mem() method can be removed. But it
> should be done carefully. We at least need to try to find all the
> platforms, which rely on its functionality.

Thanks for looking into this! I agree with your analysis, I'll try to rework,
removing check_kernel_sections_mem().
Alexander Sverdlin Nov. 13, 2020, 12:29 p.m. UTC | #11
Hi!

On 13/11/2020 03:30, Jiaxun Yang wrote:
>> 2) The check_kernel_sections_mem() method can be removed. But it
>> should be done carefully. We at least need to try to find all the
>> platforms, which rely on its functionality.
> It have been more than 10 years since introducing this this check, but
> I believe there must be a reason at the time.

It doesn't look like 10 years to me :)

commit a94e4f24ec836c8984f839594bad7454184975b1
Author: Jiaxun Yang <jiaxun.yang@flygoat.com>
Date:   Mon Aug 19 22:23:13 2019 +0800

    MIPS: init: Drop boot_mem_map

at least not so long in public, that's why we only noticed recently, that
is breaks Octeon platform.
Alexander Sverdlin Nov. 13, 2020, 12:56 p.m. UTC | #12
Hi Jiaxun,

On 13/11/2020 03:30, Jiaxun Yang wrote:
>> 2) The check_kernel_sections_mem() method can be removed. But it
>> should be done carefully. We at least need to try to find all the
>> platforms, which rely on its functionality.
> It have been more than 10 years since introducing this this check, but
> I believe there must be a reason at the time.

could you maybe share the reason you've submitted it with us?

> Also currently we have some unmaintained platform and it's impossible
> to test on them for us.

Do you at least know the list of platforms this patch is required for?
Alexander Sverdlin Nov. 13, 2020, 1:09 p.m. UTC | #13
Hello Serge, Thomas,

On 13/11/2020 10:17, Alexander Sverdlin wrote:
>> So IMHO what could be the best conclusion in the framework of this patch:
>> 1) As Thomas said any platform-specific reservation should be done in the
>> platform-specific code. That means if octeon needs some memory behind
>> the kernel being reserved, then it should be done for example in
>> prom_init().
>> 2) The check_kernel_sections_mem() method can be removed. But it
>> should be done carefully. We at least need to try to find all the
>> platforms, which rely on its functionality.
> Thanks for looking into this! I agree with your analysis, I'll try to rework,
> removing check_kernel_sections_mem().

but now, after grepping inside arch/mips, I found that only Octeon does memblock_add()
of the area between _text and _and explicitly.

Therefore, maybe many other platforms indeed rely on check_kernel_sections_mem()?
Maybe the proper way would be really to remote the PFN_UP()/PFN_DOWN() from
check_kernel_sections_mem(), which is not necessary after commit b10d6bca8720
("arch, drivers: replace for_each_membock() with for_each_mem_range()")
which fixed the resource_init()?

As completely unrelated optimization I can remove the same memblock_add() of the
kernel sections from the Octeon platform code.
Serge Semin Nov. 16, 2020, 10:31 p.m. UTC | #14
On Fri, Nov 13, 2020 at 02:09:09PM +0100, Alexander Sverdlin wrote:
> Hello Serge, Thomas,
> 
> On 13/11/2020 10:17, Alexander Sverdlin wrote:
> >> So IMHO what could be the best conclusion in the framework of this patch:
> >> 1) As Thomas said any platform-specific reservation should be done in the
> >> platform-specific code. That means if octeon needs some memory behind
> >> the kernel being reserved, then it should be done for example in
> >> prom_init().
> >> 2) The check_kernel_sections_mem() method can be removed. But it
> >> should be done carefully. We at least need to try to find all the
> >> platforms, which rely on its functionality.
> > Thanks for looking into this! I agree with your analysis, I'll try to rework,
> > removing check_kernel_sections_mem().
> 

> but now, after grepping inside arch/mips, I found that only Octeon does memblock_add()
> of the area between _text and _and explicitly.
> 
> Therefore, maybe many other platforms indeed rely on check_kernel_sections_mem()?

Taking into account what Maciej said, now I am not sure it was a good
idea to discard the check_kernel_sections_mem() method. Indeed it is
useful for a custom memory layout passed via the kernel parameters.

> Maybe the proper way would be really to remote the PFN_UP()/PFN_DOWN() from
> check_kernel_sections_mem(), which is not necessary after commit b10d6bca8720
> ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
> which fixed the resource_init()?
> 

If you think they are redundant, why not?

> As completely unrelated optimization I can remove the same memblock_add() of the
> kernel sections from the Octeon platform code. 

Why not as long as it will work. AFAICS the octeon platform code does
some kernel start address adjustment while the generic MIPS code
doesn't. Are you sure using the generic version for octeon won't cause
any problem?

-Sergey

> 
> -- 
> Best regards,
> Alexander Sverdlin.
Alexander Sverdlin Nov. 17, 2020, 9:41 a.m. UTC | #15
Hello Serge,

On 16/11/2020 23:31, Serge Semin wrote:
>> As completely unrelated optimization I can remove the same memblock_add() of the
>> kernel sections from the Octeon platform code. 
> Why not as long as it will work. AFAICS the octeon platform code does
> some kernel start address adjustment while the generic MIPS code
> doesn't. Are you sure using the generic version for octeon won't cause
> any problem?

as I interpret this adjustment, this is open-coded virt_to_phys.
In my tests both code blocks reserve the same memory (well, generic code claims the rest of the last page,
and I'm going to fix this).
diff mbox series

Patch

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 0d42532..f6cf2f6 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -504,6 +504,12 @@  static void __init check_kernel_sections_mem(void)
 	if (!memblock_is_region_memory(start, size)) {
 		pr_info("Kernel sections are not in the memory maps\n");
 		memblock_add(start, size);
+		/*
+		 * Octeon bootloader places shared data structure right after
+		 * the kernel => make sure it will not be corrupted.
+		 */
+		memblock_reserve(__pa_symbol(&_end),
+				 start + size - __pa_symbol(&_end));
 	}
 }