Message ID | 20201106141001.57637-1-alexander.sverdlin@nokia.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | MIPS: reserve the memblock right after the kernel | expand |
On Fri, Nov 06, 2020 at 03:10:01PM +0100, Alexander A Sverdlin wrote: > From: Alexander Sverdlin <alexander.sverdlin@nokia.com> > > Linux doesn't own the memory immediately after the kernel image. On Octeon > bootloader places a shared structure right close after the kernel _end, > refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. > > If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() > inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns > memory block overlapping with the above octeon_bootinfo structure, which > is being overwritten afterwards. as this special for Octeon how about added the memblock_reserve in octen specific code ? Thomas.
Hello Thomas, On 07/11/2020 10:40, Thomas Bogendoerfer wrote: >> Linux doesn't own the memory immediately after the kernel image. On Octeon >> bootloader places a shared structure right close after the kernel _end, >> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. >> >> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() >> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns >> memory block overlapping with the above octeon_bootinfo structure, which >> is being overwritten afterwards. > as this special for Octeon how about added the memblock_reserve > in octen specific code ? while the shared structure which is being corrupted is indeed Octeon-specific, the wrong assumption that the memory right after the kernel can be allocated by memblock allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem(). I personally will be fine with repairing Octeon only as I don't have other MIPS targets to care about, but maybe someone else in the MIPS community will find this fix useful...
Hi Thomas, On 09/11/2020 11:34, Alexander Sverdlin wrote: >>> Linux doesn't own the memory immediately after the kernel image. On Octeon >>> bootloader places a shared structure right close after the kernel _end, >>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. >>> >>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() >>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns >>> memory block overlapping with the above octeon_bootinfo structure, which >>> is being overwritten afterwards. >> as this special for Octeon how about added the memblock_reserve >> in octen specific code ? > while the shared structure which is being corrupted is indeed Octeon-specific, > the wrong assumption that the memory right after the kernel can be allocated by memblock > allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem(). > > I personally will be fine with repairing Octeon only as I don't have other MIPS > targets to care about, but maybe someone else in the MIPS community will find > this fix useful... another reason for not putting that to Octeon platform code was the fact, that one would need to put the knowledge about wrong assumption of ARCH (MIPS) code into different code area of Octeon platform. If at some point in time check_kernel_sections_mem() will be altered/fixed, nobody will understand why Octeon still has this workaround.
On Mon, Nov 09, 2020 at 11:34:33AM +0100, Alexander Sverdlin wrote: > Hello Thomas, > > On 07/11/2020 10:40, Thomas Bogendoerfer wrote: > >> Linux doesn't own the memory immediately after the kernel image. On Octeon > >> bootloader places a shared structure right close after the kernel _end, > >> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. > >> > >> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() > >> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns > >> memory block overlapping with the above octeon_bootinfo structure, which > >> is being overwritten afterwards. > > as this special for Octeon how about added the memblock_reserve > > in octen specific code ? > > while the shared structure which is being corrupted is indeed Octeon-specific, > the wrong assumption that the memory right after the kernel can be allocated by memblock > allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem(). ok, I see your point. IMHO this whole check_kernel_sections_mem() should be removed. IMHO memory adding should only be done my memory detection code. Could you send a patch, which removes check_kernel_section_mem completly ? Thomas.
Hello Thomas, On 10/11/2020 10:55, Thomas Bogendoerfer wrote: >>>> Linux doesn't own the memory immediately after the kernel image. On Octeon >>>> bootloader places a shared structure right close after the kernel _end, >>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. >>>> >>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() >>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns >>>> memory block overlapping with the above octeon_bootinfo structure, which >>>> is being overwritten afterwards. >>> as this special for Octeon how about added the memblock_reserve >>> in octen specific code ? >> while the shared structure which is being corrupted is indeed Octeon-specific, >> the wrong assumption that the memory right after the kernel can be allocated by memblock >> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem(). > ok, I see your point. IMHO this whole check_kernel_sections_mem() should > be removed. IMHO memory adding should only be done my memory detection code. > > Could you send a patch, which removes check_kernel_section_mem completly ? this will expose one issue: platforms usually do it in a sane way, like it was done last 15 years, namely add kernel image without non-complete pages on the boundaries. This will lead to the situation, that request_resource() will fail at least for .bss section of the kernel and it will not be properly displayed under /proc/iomem (and probably same problem will appear, which initially motivated the creation of check_kernel_section_mem()). As I understood, the issue is that memblock API operates internally on the page granularity (at least there are many ROUND_DOWN() inside for the size or upper boundary), so for request_resource() to success one has to claim the rest of the .bss last page. And with current memblock API memblock_reserve() must appear somewhere, being this ARCH or platform code.
Hello Alexander On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote: > Hello Thomas, > > On 10/11/2020 10:55, Thomas Bogendoerfer wrote: > >>>> Linux doesn't own the memory immediately after the kernel image. On Octeon > >>>> bootloader places a shared structure right close after the kernel _end, > >>>> refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. > >>>> > >>>> If check_kernel_sections_mem() rounds the PFNs up, first memblock_alloc() > >>>> inside early_init_dt_alloc_memory_arch() <= device_tree_init() returns > >>>> memory block overlapping with the above octeon_bootinfo structure, which > >>>> is being overwritten afterwards. > >>> as this special for Octeon how about added the memblock_reserve > >>> in octen specific code ? > >> while the shared structure which is being corrupted is indeed Octeon-specific, > >> the wrong assumption that the memory right after the kernel can be allocated by memblock > >> allocator and re-used somewhere in Linux is in MIPS-generic check_kernel_sections_mem(). > > ok, I see your point. IMHO this whole check_kernel_sections_mem() should > > be removed. IMHO memory adding should only be done my memory detection code. > > > > Could you send a patch, which removes check_kernel_section_mem completly ? > > this will expose one issue: > platforms usually do it in a sane way, like it was done last 15 years, namely > add kernel image without non-complete pages on the boundaries. > This will lead to the situation, that request_resource() will fail at least > for .bss section of the kernel and it will not be properly displayed under > /proc/iomem (and probably same problem will appear, which initially motivated > the creation of check_kernel_section_mem()). Are you saying that some old platforms rely on the check_kernel_section_mem() method adding the memory occupied by the kernel to the system? If so, do you have an example of such? Personally I also had my hand itching to remove that method years ago, but I didn't dare to do so for the same reason in mind... On the other hand if we detected all the platforms that needed that method, we could have moved it to their prom_init() or something and got rid of that atavism for good. > > As I understood, the issue is that memblock API operates internally on the > page granularity (at least there are many ROUND_DOWN() inside for the size > or upper boundary), Hm, I don't think so. Memblock doesn't work with the pages granularity, but with memory ranges. round_down()/round_up() are used to find a memory range with proper alignment. (See __memblock_find_range_top_{up,down}() method implementation.) Memblock allocates a memory region with exact size and alignment as requested. That's the beauty of that allocator and one of the reasons why the kernel platforms have been painfully converted to using it instead of the old bootmem allocator. BTW the later one has indeed operated with page granularity. Getting back to the memblock allocator. It works with pages only when the kernel comes to starting the buddy allocator. So the kernel invokes memblock_free_all(), which eventually gets to calling free_low_memory_core_early()->__free_memory_core(). The later method indeed sets the memory pages free, but as you can see it's done with correct aligning PFN_UP(phys_start)/PFN_DOWN(end). > so for request_resource() to success one has to claim > the rest of the .bss last page. And with current memblock API > memblock_reserve() must appear somewhere, being this ARCH or platform code. After a short glance at the request_resource() code I didn't manage to find a reason why the method would fail to request a page-unaligned region. AFAICS it will fail only if the memory occupied by the kernel hasn't been registered as system memory. The later case may happen only for the systems which rely on the check_kernel_section_mem() method being called in the generic arch_mem_init(). Of course we shouldn't blindly have it removed, but instead move it to the platforms, which have been unfortunate enough not to add the kernel memory to the system memory pool. So IMHO what could be the best conclusion in the framework of this patch: 1) As Thomas said any platform-specific reservation should be done in the platform-specific code. That means if octeon needs some memory behind the kernel being reserved, then it should be done for example in prom_init(). 2) The check_kernel_sections_mem() method can be removed. But it should be done carefully. We at least need to try to find all the platforms, which rely on its functionality. -Sergey > > -- > Best regards, > Alexander Sverdlin.
Hi all, 在 2020/11/11 22:52, Serge Semin 写道: > Hello Alexander > > On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote: > 2) The check_kernel_sections_mem() method can be removed. But it > should be done carefully. We at least need to try to find all the > platforms, which rely on its functionality. It have been more than 10 years since introducing this this check, but I believe there must be a reason at the time. Also currently we have some unmaintained platform and it's impossible to test on them for us. For Cavium's issue, I believe removing the page rounding can help. I'd suggest keep this method but remove the rounding part, also leave a warning or kernel taint when out of boundary kernel image is detected. Thanks. - Jiaxun > > -Sergey > >> -- >> Best regards, >> Alexander Sverdlin.
Hi, Jiaxun, On 11/13/2020 10:30 AM, Jiaxun Yang wrote: > Hi all, > > 在 2020/11/11 22:52, Serge Semin 写道: >> Hello Alexander >> >> On Tue, Nov 10, 2020 at 11:29:50AM +0100, Alexander Sverdlin wrote: >> 2) The check_kernel_sections_mem() method can be removed. But it >> should be done carefully. We at least need to try to find all the >> platforms, which rely on its functionality. > It have been more than 10 years since introducing this this check, but > I believe there must be a reason at the time. > > Also currently we have some unmaintained platform and it's impossible > to test on them for us. > > For Cavium's issue, I believe removing the page rounding can help. > I'd suggest keep this method but remove the rounding part, also > leave a warning or kernel taint when out of boundary kernel image > is detected. > > Thanks. > > - Jiaxun > I found the first submission. Commit: d3ff93380232 (mips: Make sure kernel memory is in iomem) As I thought, this check is related to kdump. More directly, it is related to the "mem=" parameter. Kexec-tools provide a "mem=" parameter to limit the kernel location in kdump. But sometimes the memory may be not enough and this check gives a way to ensure the capture kernel can start rightly. Although "mem=" is not in kexec-tools now, I thought it is also useful if someone use "mem=" to do other things. Thanks, Jinyang >> >> -Sergey >> >>> -- >>> Best regards, >>> Alexander Sverdlin.
On Fri, 13 Nov 2020, Jinyang He wrote: > Commit: d3ff93380232 (mips: Make sure kernel memory is in iomem) > As I thought, this check is related to kdump. More directly, it is > related to the "mem=" parameter. Kexec-tools provide a "mem=" parameter > to limit the kernel location in kdump. But sometimes the memory may be not > enough and this check gives a way to ensure the capture kernel can > start rightly. Although "mem=" is not in kexec-tools now, I thought > it is also useful if someone use "mem=" to do other things. I used to supply `mem=' by hand on many occasions across different MIPS systems and for various reasons over the last 20 years; surely before kexec was even thought of. The option itself has been there in the MIPS port since I added it to Linux 2.4.0 back in Dec 2000. Maciej
Hello Serge, Thomas, On 11/11/2020 15:52, Serge Semin wrote: >>> Could you send a patch, which removes check_kernel_section_mem completly ? finally I think you are right and this would be the right way. >> this will expose one issue: >> platforms usually do it in a sane way, like it was done last 15 years, namely >> add kernel image without non-complete pages on the boundaries. >> This will lead to the situation, that request_resource() will fail at least >> for .bss section of the kernel and it will not be properly displayed under >> /proc/iomem (and probably same problem will appear, which initially motivated >> the creation of check_kernel_section_mem()). > > Are you saying that some old platforms rely on the > check_kernel_section_mem() method adding the memory occupied by the > kernel to the system? If so, do you have an example of such? Initially I was confused why the below patch didn't solve the issue on Octeon: @@ -532,8 +532,8 @@ static void __init request_crashkernel(struct resource *res) static void __init check_kernel_sections_mem(void) { - phys_addr_t start = PFN_PHYS(PFN_DOWN(__pa_symbol(&_text))); - phys_addr_t size = PFN_PHYS(PFN_UP(__pa_symbol(&_end))) - start; + phys_addr_t start = __pa_symbol(&_text); + phys_addr_t size = __pa_symbol(&_end) - start; ... and finally I understood, that the reason was in fact that I tested on Linux v5.4, which still had this code to reserve RAM resources: for_each_memblock(memory, region) { phys_addr_t start = PFN_PHYS(memblock_region_memory_base_pfn(region)); phys_addr_t end = PFN_PHYS(memblock_region_memory_end_pfn(region)) - 1; struct resource *res; ... res->start = start; res->end = end; res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; res->name = "System RAM"; request_resource(&iomem_resource, res); so I suppose that's where this evil truncation happened. Nowdays this is different and I believe we can try to remove check_kernel_sections_mem() completely and this will solve the memory corruption on Octeon. > So IMHO what could be the best conclusion in the framework of this patch: > 1) As Thomas said any platform-specific reservation should be done in the > platform-specific code. That means if octeon needs some memory behind > the kernel being reserved, then it should be done for example in > prom_init(). > 2) The check_kernel_sections_mem() method can be removed. But it > should be done carefully. We at least need to try to find all the > platforms, which rely on its functionality. Thanks for looking into this! I agree with your analysis, I'll try to rework, removing check_kernel_sections_mem().
Hi! On 13/11/2020 03:30, Jiaxun Yang wrote: >> 2) The check_kernel_sections_mem() method can be removed. But it >> should be done carefully. We at least need to try to find all the >> platforms, which rely on its functionality. > It have been more than 10 years since introducing this this check, but > I believe there must be a reason at the time. It doesn't look like 10 years to me :) commit a94e4f24ec836c8984f839594bad7454184975b1 Author: Jiaxun Yang <jiaxun.yang@flygoat.com> Date: Mon Aug 19 22:23:13 2019 +0800 MIPS: init: Drop boot_mem_map at least not so long in public, that's why we only noticed recently, that is breaks Octeon platform.
Hi Jiaxun, On 13/11/2020 03:30, Jiaxun Yang wrote: >> 2) The check_kernel_sections_mem() method can be removed. But it >> should be done carefully. We at least need to try to find all the >> platforms, which rely on its functionality. > It have been more than 10 years since introducing this this check, but > I believe there must be a reason at the time. could you maybe share the reason you've submitted it with us? > Also currently we have some unmaintained platform and it's impossible > to test on them for us. Do you at least know the list of platforms this patch is required for?
Hello Serge, Thomas, On 13/11/2020 10:17, Alexander Sverdlin wrote: >> So IMHO what could be the best conclusion in the framework of this patch: >> 1) As Thomas said any platform-specific reservation should be done in the >> platform-specific code. That means if octeon needs some memory behind >> the kernel being reserved, then it should be done for example in >> prom_init(). >> 2) The check_kernel_sections_mem() method can be removed. But it >> should be done carefully. We at least need to try to find all the >> platforms, which rely on its functionality. > Thanks for looking into this! I agree with your analysis, I'll try to rework, > removing check_kernel_sections_mem(). but now, after grepping inside arch/mips, I found that only Octeon does memblock_add() of the area between _text and _and explicitly. Therefore, maybe many other platforms indeed rely on check_kernel_sections_mem()? Maybe the proper way would be really to remote the PFN_UP()/PFN_DOWN() from check_kernel_sections_mem(), which is not necessary after commit b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()") which fixed the resource_init()? As completely unrelated optimization I can remove the same memblock_add() of the kernel sections from the Octeon platform code.
On Fri, Nov 13, 2020 at 02:09:09PM +0100, Alexander Sverdlin wrote: > Hello Serge, Thomas, > > On 13/11/2020 10:17, Alexander Sverdlin wrote: > >> So IMHO what could be the best conclusion in the framework of this patch: > >> 1) As Thomas said any platform-specific reservation should be done in the > >> platform-specific code. That means if octeon needs some memory behind > >> the kernel being reserved, then it should be done for example in > >> prom_init(). > >> 2) The check_kernel_sections_mem() method can be removed. But it > >> should be done carefully. We at least need to try to find all the > >> platforms, which rely on its functionality. > > Thanks for looking into this! I agree with your analysis, I'll try to rework, > > removing check_kernel_sections_mem(). > > but now, after grepping inside arch/mips, I found that only Octeon does memblock_add() > of the area between _text and _and explicitly. > > Therefore, maybe many other platforms indeed rely on check_kernel_sections_mem()? Taking into account what Maciej said, now I am not sure it was a good idea to discard the check_kernel_sections_mem() method. Indeed it is useful for a custom memory layout passed via the kernel parameters. > Maybe the proper way would be really to remote the PFN_UP()/PFN_DOWN() from > check_kernel_sections_mem(), which is not necessary after commit b10d6bca8720 > ("arch, drivers: replace for_each_membock() with for_each_mem_range()") > which fixed the resource_init()? > If you think they are redundant, why not? > As completely unrelated optimization I can remove the same memblock_add() of the > kernel sections from the Octeon platform code. Why not as long as it will work. AFAICS the octeon platform code does some kernel start address adjustment while the generic MIPS code doesn't. Are you sure using the generic version for octeon won't cause any problem? -Sergey > > -- > Best regards, > Alexander Sverdlin.
Hello Serge, On 16/11/2020 23:31, Serge Semin wrote: >> As completely unrelated optimization I can remove the same memblock_add() of the >> kernel sections from the Octeon platform code. > Why not as long as it will work. AFAICS the octeon platform code does > some kernel start address adjustment while the generic MIPS code > doesn't. Are you sure using the generic version for octeon won't cause > any problem? as I interpret this adjustment, this is open-coded virt_to_phys. In my tests both code blocks reserve the same memory (well, generic code claims the rest of the last page, and I'm going to fix this).
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 0d42532..f6cf2f6 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -504,6 +504,12 @@ static void __init check_kernel_sections_mem(void) if (!memblock_is_region_memory(start, size)) { pr_info("Kernel sections are not in the memory maps\n"); memblock_add(start, size); + /* + * Octeon bootloader places shared data structure right after + * the kernel => make sure it will not be corrupted. + */ + memblock_reserve(__pa_symbol(&_end), + start + size - __pa_symbol(&_end)); } }