diff mbox series

init: fix allocated page overlapping with PTR_ERR

Message ID 20240418102943.180510-1-namcao@linutronix.de (mailing list archive)
State Superseded
Headers show
Series init: fix allocated page overlapping with PTR_ERR | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-1-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-1-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-1-test-6 warning .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Nam Cao April 18, 2024, 10:29 a.m. UTC
There is nothing preventing kernel memory allocators from allocating a
page that overlaps with PTR_ERR(), except for architecture-specific
code that setup memblock.

It was discovered that RISCV architecture doesn't setup memblock
corectly, leading to a page overlapping with PTR_ERR() being allocated,
and subsequently crashing the kernel (link in Close: )

The reported crash has nothing to do with PTR_ERR(): the last page
(at address 0xfffff000) being allocated leads to an unexpected
arithmetic overflow in ext4; but still, this page shouldn't be
allocated in the first place.

Because PTR_ERR() is an architecture-independent thing, we shouldn't
ask every single architecture to set this up. There may be other
architectures beside RISCV that have the same problem.

Fix this one and for all by reserving the physical memory page that
may be mapped to the last virtual memory page as part of low memory.

Unfortunately, this means if there is actual memory at this reserved
location, that memory will become inaccessible. However, if this page
is not reserved, it can only be accessed as high memory, so this
doesn't matter if high memory is not supported. Even if high memory is
supported, it is still only one page.

Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: <stable@vger.kernel.org> # all versions
---
 init/main.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Mike Rapoport April 18, 2024, 10:54 a.m. UTC | #1
On Thu, Apr 18, 2024 at 12:29:43PM +0200, Nam Cao wrote:
> There is nothing preventing kernel memory allocators from allocating a
> page that overlaps with PTR_ERR(), except for architecture-specific
> code that setup memblock.
> 
> It was discovered that RISCV architecture doesn't setup memblock
> corectly, leading to a page overlapping with PTR_ERR() being allocated,
> and subsequently crashing the kernel (link in Close: )
> 
> The reported crash has nothing to do with PTR_ERR(): the last page
> (at address 0xfffff000) being allocated leads to an unexpected
> arithmetic overflow in ext4; but still, this page shouldn't be
> allocated in the first place.
> 
> Because PTR_ERR() is an architecture-independent thing, we shouldn't
> ask every single architecture to set this up. There may be other
> architectures beside RISCV that have the same problem.
> 
> Fix this one and for all by reserving the physical memory page that
> may be mapped to the last virtual memory page as part of low memory.
> 
> Unfortunately, this means if there is actual memory at this reserved
> location, that memory will become inaccessible. However, if this page
> is not reserved, it can only be accessed as high memory, so this
> doesn't matter if high memory is not supported. Even if high memory is
> supported, it is still only one page.
> 
> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: <stable@vger.kernel.org> # all versions

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  init/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/init/main.c b/init/main.c
> index 881f6230ee59..f8d2793c4641 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -900,6 +900,7 @@ void start_kernel(void)
>  	page_address_init();
>  	pr_notice("%s", linux_banner);
>  	early_security_init();
> +	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
>  	setup_arch(&command_line);
>  	setup_boot_config();
>  	setup_command_line(command_line);
> -- 
> 2.39.2
>
Nam Cao April 18, 2024, 11:12 a.m. UTC | #2
On 2024-04-18 Nam Cao wrote:
> There is nothing preventing kernel memory allocators from allocating a
> page that overlaps with PTR_ERR(), except for architecture-specific
> code that setup memblock.
> 
> It was discovered that RISCV architecture doesn't setup memblock
> corectly, leading to a page overlapping with PTR_ERR() being allocated,
> and subsequently crashing the kernel (link in Close: )
> 
> The reported crash has nothing to do with PTR_ERR(): the last page
> (at address 0xfffff000) being allocated leads to an unexpected
> arithmetic overflow in ext4; but still, this page shouldn't be
> allocated in the first place.
> 
> Because PTR_ERR() is an architecture-independent thing, we shouldn't
> ask every single architecture to set this up. There may be other
> architectures beside RISCV that have the same problem.
> 
> Fix this one and for all by reserving the physical memory page that
> may be mapped to the last virtual memory page as part of low memory.
> 
> Unfortunately, this means if there is actual memory at this reserved
> location, that memory will become inaccessible. However, if this page
> is not reserved, it can only be accessed as high memory, so this
> doesn't matter if high memory is not supported. Even if high memory is
> supported, it is still only one page.
> 
> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: <stable@vger.kernel.org> # all versions

Sorry, forgot to add:
Reported-by: Björn Töpel <bjorn@kernel.org>

> ---
>  init/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/init/main.c b/init/main.c
> index 881f6230ee59..f8d2793c4641 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -900,6 +900,7 @@ void start_kernel(void)
>  	page_address_init();
>  	pr_notice("%s", linux_banner);
>  	early_security_init();
> +	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
>  	setup_arch(&command_line);
>  	setup_boot_config();
>  	setup_command_line(command_line);
Björn Töpel April 18, 2024, 12:41 p.m. UTC | #3
Nam Cao <namcao@linutronix.de> writes:

> On 2024-04-18 Nam Cao wrote:
>> There is nothing preventing kernel memory allocators from allocating a
>> page that overlaps with PTR_ERR(), except for architecture-specific
>> code that setup memblock.
>> 
>> It was discovered that RISCV architecture doesn't setup memblock
>> corectly, leading to a page overlapping with PTR_ERR() being allocated,
>> and subsequently crashing the kernel (link in Close: )
>> 
>> The reported crash has nothing to do with PTR_ERR(): the last page
>> (at address 0xfffff000) being allocated leads to an unexpected
>> arithmetic overflow in ext4; but still, this page shouldn't be
>> allocated in the first place.
>> 
>> Because PTR_ERR() is an architecture-independent thing, we shouldn't
>> ask every single architecture to set this up. There may be other
>> architectures beside RISCV that have the same problem.
>> 
>> Fix this one and for all by reserving the physical memory page that
>> may be mapped to the last virtual memory page as part of low memory.
>> 
>> Unfortunately, this means if there is actual memory at this reserved
>> location, that memory will become inaccessible. However, if this page
>> is not reserved, it can only be accessed as high memory, so this
>> doesn't matter if high memory is not supported. Even if high memory is
>> supported, it is still only one page.
>> 
>> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
>> Signed-off-by: Nam Cao <namcao@linutronix.de>
>> Cc: <stable@vger.kernel.org> # all versions
>
> Sorry, forgot to add:
> Reported-by: Björn Töpel <bjorn@kernel.org>

Hmm, can't we get rid of the whole check in arch/riscv/mm/init.c for
32b?

--8<--
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index fe8e159394d8..1e91d5728887 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -196,7 +196,6 @@ early_param("mem", early_mem);
 static void __init setup_bootmem(void)
 {
 	phys_addr_t vmlinux_end = __pa_symbol(&_end);
-	phys_addr_t max_mapped_addr;
 	phys_addr_t phys_ram_end, vmlinux_start;
 
 	if (IS_ENABLED(CONFIG_XIP_KERNEL))
@@ -234,21 +233,6 @@ static void __init setup_bootmem(void)
 	if (IS_ENABLED(CONFIG_64BIT))
 		kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
 
-	/*
-	 * memblock allocator is not aware of the fact that last 4K bytes of
-	 * the addressable memory can not be mapped because of IS_ERR_VALUE
-	 * macro. Make sure that last 4k bytes are not usable by memblock
-	 * if end of dram is equal to maximum addressable memory.  For 64-bit
-	 * kernel, this problem can't happen here as the end of the virtual
-	 * address space is occupied by the kernel mapping then this check must
-	 * be done as soon as the kernel mapping base address is determined.
-	 */
-	if (!IS_ENABLED(CONFIG_64BIT)) {
-		max_mapped_addr = __pa(~(ulong)0);
-		if (max_mapped_addr == (phys_ram_end - 1))
-			memblock_set_current_limit(max_mapped_addr - 4096);
-	}
-
 	min_low_pfn = PFN_UP(phys_ram_base);
 	max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end);
 	high_memory = (void *)(__va(PFN_PHYS(max_low_pfn)));
--8<--

Mike hints that's *not* the case
(https://lore.kernel.org/linux-riscv/ZiAkRMUfiPDUGPdL@kernel.org/).
memblock_reserve() should disallow allocation as well, no?

Thanks, and FWIW:

Tested-by: Björn Töpel <bjorn@rivosinc.com>
Nam Cao April 18, 2024, 1:01 p.m. UTC | #4
On 2024-04-18 Björn Töpel wrote:
> Nam Cao <namcao@linutronix.de> writes:
> 
> > On 2024-04-18 Nam Cao wrote:
> >> There is nothing preventing kernel memory allocators from allocating a
> >> page that overlaps with PTR_ERR(), except for architecture-specific
> >> code that setup memblock.
> >> 
> >> It was discovered that RISCV architecture doesn't setup memblock
> >> corectly, leading to a page overlapping with PTR_ERR() being allocated,
> >> and subsequently crashing the kernel (link in Close: )
> >> 
> >> The reported crash has nothing to do with PTR_ERR(): the last page
> >> (at address 0xfffff000) being allocated leads to an unexpected
> >> arithmetic overflow in ext4; but still, this page shouldn't be
> >> allocated in the first place.
> >> 
> >> Because PTR_ERR() is an architecture-independent thing, we shouldn't
> >> ask every single architecture to set this up. There may be other
> >> architectures beside RISCV that have the same problem.
> >> 
> >> Fix this one and for all by reserving the physical memory page that
> >> may be mapped to the last virtual memory page as part of low memory.
> >> 
> >> Unfortunately, this means if there is actual memory at this reserved
> >> location, that memory will become inaccessible. However, if this page
> >> is not reserved, it can only be accessed as high memory, so this
> >> doesn't matter if high memory is not supported. Even if high memory is
> >> supported, it is still only one page.
> >> 
> >> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
> >> Signed-off-by: Nam Cao <namcao@linutronix.de>
> >> Cc: <stable@vger.kernel.org> # all versions
> >
> > Sorry, forgot to add:
> > Reported-by: Björn Töpel <bjorn@kernel.org>
> 
> Hmm, can't we get rid of the whole check in arch/riscv/mm/init.c for
> 32b?

We can, but that depends on this patch. So my intention is to wait for
this patch to be applied first, because I don't want to bother the
maintainers with dependencies.

> --8<--
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index fe8e159394d8..1e91d5728887 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -196,7 +196,6 @@ early_param("mem", early_mem);
>  static void __init setup_bootmem(void)
>  {
>  	phys_addr_t vmlinux_end = __pa_symbol(&_end);
> -	phys_addr_t max_mapped_addr;
>  	phys_addr_t phys_ram_end, vmlinux_start;
>  
>  	if (IS_ENABLED(CONFIG_XIP_KERNEL))
> @@ -234,21 +233,6 @@ static void __init setup_bootmem(void)
>  	if (IS_ENABLED(CONFIG_64BIT))
>  		kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
>  
> -	/*
> -	 * memblock allocator is not aware of the fact that last 4K bytes of
> -	 * the addressable memory can not be mapped because of IS_ERR_VALUE
> -	 * macro. Make sure that last 4k bytes are not usable by memblock
> -	 * if end of dram is equal to maximum addressable memory.  For 64-bit
> -	 * kernel, this problem can't happen here as the end of the virtual
> -	 * address space is occupied by the kernel mapping then this check must
> -	 * be done as soon as the kernel mapping base address is determined.
> -	 */
> -	if (!IS_ENABLED(CONFIG_64BIT)) {
> -		max_mapped_addr = __pa(~(ulong)0);
> -		if (max_mapped_addr == (phys_ram_end - 1))
> -			memblock_set_current_limit(max_mapped_addr - 4096);
> -	}
> -

If you are going to send this, you can add:
Reviewed-by: Nam Cao <namcao@linutronix.de>

>  	min_low_pfn = PFN_UP(phys_ram_base);
>  	max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end);
>  	high_memory = (void *)(__va(PFN_PHYS(max_low_pfn)));
> --8<--
> 
> Mike hints that's *not* the case
> (https://lore.kernel.org/linux-riscv/ZiAkRMUfiPDUGPdL@kernel.org/).
> memblock_reserve() should disallow allocation as well, no?

He said it can't be removed if we set max_low_pfn instead of using
memblock_reserve()

If max_low_pfn() is used, then it can be removed:
https://lore.kernel.org/linux-riscv/Zh6n-nvnQbL-0xss@kernel.org

Best regards,
Nam
Nam Cao April 18, 2024, 1:07 p.m. UTC | #5
On 2024-04-18 Nam Cao wrote:
> > Mike hints that's *not* the case
> > (https://lore.kernel.org/linux-riscv/ZiAkRMUfiPDUGPdL@kernel.org/).
> > memblock_reserve() should disallow allocation as well, no?
> 
> He said it can't be removed if we set max_low_pfn instead of using
> memblock_reserve()
> 
> If max_low_pfn() is used, then it can be removed:
     ^ I mean memblock_reserve()

> https://lore.kernel.org/linux-riscv/Zh6n-nvnQbL-0xss@kernel.org
> 
> Best regards,
> Nam
>
Joel Granados April 29, 2024, 12:52 p.m. UTC | #6
On Thu, Apr 18, 2024 at 12:29:43PM +0200, Nam Cao wrote:
> There is nothing preventing kernel memory allocators from allocating a
> page that overlaps with PTR_ERR(), except for architecture-specific
> code that setup memblock.
> 
> It was discovered that RISCV architecture doesn't setup memblock
> corectly, leading to a page overlapping with PTR_ERR() being allocated,
> and subsequently crashing the kernel (link in Close: )
> 
> The reported crash has nothing to do with PTR_ERR(): the last page
> (at address 0xfffff000) being allocated leads to an unexpected
> arithmetic overflow in ext4; but still, this page shouldn't be
> allocated in the first place.
> 
> Because PTR_ERR() is an architecture-independent thing, we shouldn't
> ask every single architecture to set this up. There may be other
> architectures beside RISCV that have the same problem.
> 
> Fix this one and for all by reserving the physical memory page that
> may be mapped to the last virtual memory page as part of low memory.
> 
> Unfortunately, this means if there is actual memory at this reserved
> location, that memory will become inaccessible. However, if this page
> is not reserved, it can only be accessed as high memory, so this
> doesn't matter if high memory is not supported. Even if high memory is
> supported, it is still only one page.
> 
> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: <stable@vger.kernel.org> # all versions
> ---
>  init/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/init/main.c b/init/main.c
> index 881f6230ee59..f8d2793c4641 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -900,6 +900,7 @@ void start_kernel(void)
>  	page_address_init();
>  	pr_notice("%s", linux_banner);
>  	early_security_init();
> +	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
>  	setup_arch(&command_line);
>  	setup_boot_config();
>  	setup_command_line(command_line);
> -- 
> 2.39.2
> 

I received a similar(ish) report recently
https://lore.kernel.org/oe-kbuild-all/202404211031.J6l2AfJk-lkp@intel.com/
regarding RISC-V in init/mail.c. Here is the meat of the report in case
you want to avoid going to the actual link:
"
...
   riscv64-linux-ld: section .data LMA [000000000099b000,0000000001424de7] overlaps section .text LMA [0000000000104040,000000000213c543]
   riscv64-linux-ld: section .data..percpu LMA [00000000024e2000,00000000026b46e7] overlaps section .rodata LMA [000000000213c580,000000000292d0dd]
   riscv64-linux-ld: section .rodata VMA [ffffffff8213c580,ffffffff8292d0dd] overlaps section .data VMA [ffffffff82000000,ffffffff82a89de7]
   init/main.o: in function `rdinit_setup':
>> init/main.c:613:(.init.text+0x358): relocation truncated to fit: R_RISCV_GPREL_I against symbol `__setup_start' defined in .init.rodata section in .tmp_vmlinux.kallsyms1
   net/ipv4/ipconfig.o: in function `ic_dhcp_init_options':
   net/ipv4/ipconfig.c:682:(.init.text+0x9b4): relocation truncated to fit: R_RISCV_GPREL_I against `ic_bootp_cookie'
   net/sunrpc/auth_gss/gss_krb5_mech.o: in function `gss_krb5_prepare_enctype_priority_list':
>> net/sunrpc/auth_gss/gss_krb5_mech.c:213:(.text.gss_krb5_prepare_enctype_priority_list+0x9c): relocation truncated to fit: R_RISCV_GPREL_I against `gss_krb5_enctypes.0'
   lib/maple_tree.o: in function `mas_leaf_max_gap':
>> lib/maple_tree.c:1512:(.text.mas_leaf_max_gap+0x2b8): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
   lib/maple_tree.o: in function `ma_dead_node':
>> lib/maple_tree.c:560:(.text.mas_data_end+0x110): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
   lib/maple_tree.o: in function `mas_extend_spanning_null':
>> lib/maple_tree.c:3662:(.text.mas_extend_spanning_null+0x69c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
   lib/maple_tree.o: in function `mas_mab_cp':
>> lib/maple_tree.c:1943:(.text.mas_mab_cp+0x248): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
   lib/maple_tree.o: in function `mab_mas_cp':
>> lib/maple_tree.c:2000:(.text.mab_mas_cp+0x15c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
   lib/maple_tree.o: in function `mas_reuse_node':
>> lib/maple_tree.c:3416:(.text.mas_reuse_node+0x17c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_slots'
   lib/maple_tree.o: in function `mt_free_walk':
>> lib/maple_tree.c:5238:(.text.mt_free_walk+0x15c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_slots'
   lib/maple_tree.o: in function `mtree_lookup_walk':
   lib/maple_tree.c:3700:(.text.mtree_lookup_walk+0x94): additional relocation overflows omitted from the output
...

"

Could the fix that you have posted here be related to that report?
Comments are greatly appreciated.

Best
--

Joel Granados
Nam Cao April 30, 2024, 7:31 a.m. UTC | #7
On Mon, Apr 29, 2024 at 02:52:30PM +0200, Joel Granados wrote:
> On Thu, Apr 18, 2024 at 12:29:43PM +0200, Nam Cao wrote:
> > There is nothing preventing kernel memory allocators from allocating a
> > page that overlaps with PTR_ERR(), except for architecture-specific
> > code that setup memblock.
> > 
> > It was discovered that RISCV architecture doesn't setup memblock
> > corectly, leading to a page overlapping with PTR_ERR() being allocated,
> > and subsequently crashing the kernel (link in Close: )
> > 
> > The reported crash has nothing to do with PTR_ERR(): the last page
> > (at address 0xfffff000) being allocated leads to an unexpected
> > arithmetic overflow in ext4; but still, this page shouldn't be
> > allocated in the first place.
> > 
> > Because PTR_ERR() is an architecture-independent thing, we shouldn't
> > ask every single architecture to set this up. There may be other
> > architectures beside RISCV that have the same problem.
> > 
> > Fix this one and for all by reserving the physical memory page that
> > may be mapped to the last virtual memory page as part of low memory.
> > 
> > Unfortunately, this means if there is actual memory at this reserved
> > location, that memory will become inaccessible. However, if this page
> > is not reserved, it can only be accessed as high memory, so this
> > doesn't matter if high memory is not supported. Even if high memory is
> > supported, it is still only one page.
> > 
> > Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
> > Signed-off-by: Nam Cao <namcao@linutronix.de>
> > Cc: <stable@vger.kernel.org> # all versions
> > ---
> >  init/main.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/init/main.c b/init/main.c
> > index 881f6230ee59..f8d2793c4641 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -900,6 +900,7 @@ void start_kernel(void)
> >  	page_address_init();
> >  	pr_notice("%s", linux_banner);
> >  	early_security_init();
> > +	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
> >  	setup_arch(&command_line);
> >  	setup_boot_config();
> >  	setup_command_line(command_line);
> > -- 
> > 2.39.2
> > 
> 
> I received a similar(ish) report recently
> https://lore.kernel.org/oe-kbuild-all/202404211031.J6l2AfJk-lkp@intel.com/
> regarding RISC-V in init/mail.c. Here is the meat of the report in case
> you want to avoid going to the actual link:

This issue doesn't look like it has anything to do with this patch: this
patch is about overlapping of dynamically allocated memory, while I think
the issue is about overlapping sections during linking (maybe something
wrong with riscv linker script?)

Also, FWIW, this patch is not going to be in mainline because of a
regression.

Nonetheless, I will have a look at this later.

Best regards,
Nam

> "
> ...
>    riscv64-linux-ld: section .data LMA [000000000099b000,0000000001424de7] overlaps section .text LMA [0000000000104040,000000000213c543]
>    riscv64-linux-ld: section .data..percpu LMA [00000000024e2000,00000000026b46e7] overlaps section .rodata LMA [000000000213c580,000000000292d0dd]
>    riscv64-linux-ld: section .rodata VMA [ffffffff8213c580,ffffffff8292d0dd] overlaps section .data VMA [ffffffff82000000,ffffffff82a89de7]
>    init/main.o: in function `rdinit_setup':
> >> init/main.c:613:(.init.text+0x358): relocation truncated to fit: R_RISCV_GPREL_I against symbol `__setup_start' defined in .init.rodata section in .tmp_vmlinux.kallsyms1
>    net/ipv4/ipconfig.o: in function `ic_dhcp_init_options':
>    net/ipv4/ipconfig.c:682:(.init.text+0x9b4): relocation truncated to fit: R_RISCV_GPREL_I against `ic_bootp_cookie'
>    net/sunrpc/auth_gss/gss_krb5_mech.o: in function `gss_krb5_prepare_enctype_priority_list':
> >> net/sunrpc/auth_gss/gss_krb5_mech.c:213:(.text.gss_krb5_prepare_enctype_priority_list+0x9c): relocation truncated to fit: R_RISCV_GPREL_I against `gss_krb5_enctypes.0'
>    lib/maple_tree.o: in function `mas_leaf_max_gap':
> >> lib/maple_tree.c:1512:(.text.mas_leaf_max_gap+0x2b8): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>    lib/maple_tree.o: in function `ma_dead_node':
> >> lib/maple_tree.c:560:(.text.mas_data_end+0x110): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>    lib/maple_tree.o: in function `mas_extend_spanning_null':
> >> lib/maple_tree.c:3662:(.text.mas_extend_spanning_null+0x69c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>    lib/maple_tree.o: in function `mas_mab_cp':
> >> lib/maple_tree.c:1943:(.text.mas_mab_cp+0x248): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>    lib/maple_tree.o: in function `mab_mas_cp':
> >> lib/maple_tree.c:2000:(.text.mab_mas_cp+0x15c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>    lib/maple_tree.o: in function `mas_reuse_node':
> >> lib/maple_tree.c:3416:(.text.mas_reuse_node+0x17c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_slots'
>    lib/maple_tree.o: in function `mt_free_walk':
> >> lib/maple_tree.c:5238:(.text.mt_free_walk+0x15c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_slots'
>    lib/maple_tree.o: in function `mtree_lookup_walk':
>    lib/maple_tree.c:3700:(.text.mtree_lookup_walk+0x94): additional relocation overflows omitted from the output
> ...
> 
> "
> 
> Could the fix that you have posted here be related to that report?
> Comments are greatly appreciated.
> 
> Best
> --
> 
> Joel Granados
Alexandre Ghiti April 30, 2024, 8:37 a.m. UTC | #8
Hi Joel, Nam,

On 30/04/2024 09:31, Nam Cao wrote:
> On Mon, Apr 29, 2024 at 02:52:30PM +0200, Joel Granados wrote:
>> On Thu, Apr 18, 2024 at 12:29:43PM +0200, Nam Cao wrote:
>>> There is nothing preventing kernel memory allocators from allocating a
>>> page that overlaps with PTR_ERR(), except for architecture-specific
>>> code that setup memblock.
>>>
>>> It was discovered that RISCV architecture doesn't setup memblock
>>> corectly, leading to a page overlapping with PTR_ERR() being allocated,
>>> and subsequently crashing the kernel (link in Close: )
>>>
>>> The reported crash has nothing to do with PTR_ERR(): the last page
>>> (at address 0xfffff000) being allocated leads to an unexpected
>>> arithmetic overflow in ext4; but still, this page shouldn't be
>>> allocated in the first place.
>>>
>>> Because PTR_ERR() is an architecture-independent thing, we shouldn't
>>> ask every single architecture to set this up. There may be other
>>> architectures beside RISCV that have the same problem.
>>>
>>> Fix this one and for all by reserving the physical memory page that
>>> may be mapped to the last virtual memory page as part of low memory.
>>>
>>> Unfortunately, this means if there is actual memory at this reserved
>>> location, that memory will become inaccessible. However, if this page
>>> is not reserved, it can only be accessed as high memory, so this
>>> doesn't matter if high memory is not supported. Even if high memory is
>>> supported, it is still only one page.
>>>
>>> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
>>> Signed-off-by: Nam Cao <namcao@linutronix.de>
>>> Cc: <stable@vger.kernel.org> # all versions
>>> ---
>>>   init/main.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/init/main.c b/init/main.c
>>> index 881f6230ee59..f8d2793c4641 100644
>>> --- a/init/main.c
>>> +++ b/init/main.c
>>> @@ -900,6 +900,7 @@ void start_kernel(void)
>>>   	page_address_init();
>>>   	pr_notice("%s", linux_banner);
>>>   	early_security_init();
>>> +	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
>>>   	setup_arch(&command_line);
>>>   	setup_boot_config();
>>>   	setup_command_line(command_line);
>>> -- 
>>> 2.39.2
>>>
>> I received a similar(ish) report recently
>> https://lore.kernel.org/oe-kbuild-all/202404211031.J6l2AfJk-lkp@intel.com/
>> regarding RISC-V in init/mail.c. Here is the meat of the report in case
>> you want to avoid going to the actual link:
> This issue doesn't look like it has anything to do with this patch: this
> patch is about overlapping of dynamically allocated memory, while I think
> the issue is about overlapping sections during linking (maybe something
> wrong with riscv linker script?)
>
> Also, FWIW, this patch is not going to be in mainline because of a
> regression.
>
> Nonetheless, I will have a look at this later.


The config shows that it is a XIP kernel that comes with its own 
limitations (text is limited to 32MB for example), so I'm not surprised 
to see those overlaps.

We already discussed the removal of randconfig builds on XIP configs, 
but IIRC it is not possible.

Alex


>
> Best regards,
> Nam
>
>> "
>> ...
>>     riscv64-linux-ld: section .data LMA [000000000099b000,0000000001424de7] overlaps section .text LMA [0000000000104040,000000000213c543]
>>     riscv64-linux-ld: section .data..percpu LMA [00000000024e2000,00000000026b46e7] overlaps section .rodata LMA [000000000213c580,000000000292d0dd]
>>     riscv64-linux-ld: section .rodata VMA [ffffffff8213c580,ffffffff8292d0dd] overlaps section .data VMA [ffffffff82000000,ffffffff82a89de7]
>>     init/main.o: in function `rdinit_setup':
>>>> init/main.c:613:(.init.text+0x358): relocation truncated to fit: R_RISCV_GPREL_I against symbol `__setup_start' defined in .init.rodata section in .tmp_vmlinux.kallsyms1
>>     net/ipv4/ipconfig.o: in function `ic_dhcp_init_options':
>>     net/ipv4/ipconfig.c:682:(.init.text+0x9b4): relocation truncated to fit: R_RISCV_GPREL_I against `ic_bootp_cookie'
>>     net/sunrpc/auth_gss/gss_krb5_mech.o: in function `gss_krb5_prepare_enctype_priority_list':
>>>> net/sunrpc/auth_gss/gss_krb5_mech.c:213:(.text.gss_krb5_prepare_enctype_priority_list+0x9c): relocation truncated to fit: R_RISCV_GPREL_I against `gss_krb5_enctypes.0'
>>     lib/maple_tree.o: in function `mas_leaf_max_gap':
>>>> lib/maple_tree.c:1512:(.text.mas_leaf_max_gap+0x2b8): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>>     lib/maple_tree.o: in function `ma_dead_node':
>>>> lib/maple_tree.c:560:(.text.mas_data_end+0x110): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>>     lib/maple_tree.o: in function `mas_extend_spanning_null':
>>>> lib/maple_tree.c:3662:(.text.mas_extend_spanning_null+0x69c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>>     lib/maple_tree.o: in function `mas_mab_cp':
>>>> lib/maple_tree.c:1943:(.text.mas_mab_cp+0x248): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>>     lib/maple_tree.o: in function `mab_mas_cp':
>>>> lib/maple_tree.c:2000:(.text.mab_mas_cp+0x15c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_pivots'
>>     lib/maple_tree.o: in function `mas_reuse_node':
>>>> lib/maple_tree.c:3416:(.text.mas_reuse_node+0x17c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_slots'
>>     lib/maple_tree.o: in function `mt_free_walk':
>>>> lib/maple_tree.c:5238:(.text.mt_free_walk+0x15c): relocation truncated to fit: R_RISCV_GPREL_I against `mt_slots'
>>     lib/maple_tree.o: in function `mtree_lookup_walk':
>>     lib/maple_tree.c:3700:(.text.mtree_lookup_walk+0x94): additional relocation overflows omitted from the output
>> ...
>>
>> "
>>
>> Could the fix that you have posted here be related to that report?
>> Comments are greatly appreciated.
>>
>> Best
>> --
>>
>> Joel Granados
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Joel Granados April 30, 2024, 1:35 p.m. UTC | #9
On Tue, Apr 30, 2024 at 10:37:59AM +0200, Alexandre Ghiti wrote:
...
> > the issue is about overlapping sections during linking (maybe something
> > wrong with riscv linker script?)
> >
> > Also, FWIW, this patch is not going to be in mainline because of a
> > regression.
> >
> > Nonetheless, I will have a look at this later.
> 
> 
> The config shows that it is a XIP kernel that comes with its own 
> limitations (text is limited to 32MB for example), so I'm not surprised 
> to see those overlaps.
> 
> We already discussed the removal of randconfig builds on XIP configs, 
> but IIRC it is not possible.
Have you had them in public? Do you have a link to the discussions.
Maybe there is something there that will tell me what to with this
report.

Best
Joel Granados April 30, 2024, 3:42 p.m. UTC | #10
On Tue, Apr 30, 2024 at 10:37:59AM +0200, Alexandre Ghiti wrote:
> Hi Joel, Nam,
> 
> On 30/04/2024 09:31, Nam Cao wrote:
> > On Mon, Apr 29, 2024 at 02:52:30PM +0200, Joel Granados wrote:
> >> On Thu, Apr 18, 2024 at 12:29:43PM +0200, Nam Cao wrote:
> >>> There is nothing preventing kernel memory allocators from allocating a
> >>> page that overlaps with PTR_ERR(), except for architecture-specific
> >>> code that setup memblock.
> >>>
> >>> It was discovered that RISCV architecture doesn't setup memblock
> >>> corectly, leading to a page overlapping with PTR_ERR() being allocated,
> >>> and subsequently crashing the kernel (link in Close: )
> >>>
> >>> The reported crash has nothing to do with PTR_ERR(): the last page
> >>> (at address 0xfffff000) being allocated leads to an unexpected
> >>> arithmetic overflow in ext4; but still, this page shouldn't be
> >>> allocated in the first place.
> >>>
> >>> Because PTR_ERR() is an architecture-independent thing, we shouldn't
> >>> ask every single architecture to set this up. There may be other
> >>> architectures beside RISCV that have the same problem.
> >>>
> >>> Fix this one and for all by reserving the physical memory page that
> >>> may be mapped to the last virtual memory page as part of low memory.
> >>>
> >>> Unfortunately, this means if there is actual memory at this reserved
> >>> location, that memory will become inaccessible. However, if this page
> >>> is not reserved, it can only be accessed as high memory, so this
> >>> doesn't matter if high memory is not supported. Even if high memory is
> >>> supported, it is still only one page.
> >>>
> >>> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us
> >>> Signed-off-by: Nam Cao <namcao@linutronix.de>
> >>> Cc: <stable@vger.kernel.org> # all versions
> >>> ---
> >>>   init/main.c | 1 +
> >>>   1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/init/main.c b/init/main.c
> >>> index 881f6230ee59..f8d2793c4641 100644
> >>> --- a/init/main.c
> >>> +++ b/init/main.c
> >>> @@ -900,6 +900,7 @@ void start_kernel(void)
> >>>   	page_address_init();
> >>>   	pr_notice("%s", linux_banner);
> >>>   	early_security_init();
> >>> +	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
> >>>   	setup_arch(&command_line);
> >>>   	setup_boot_config();
> >>>   	setup_command_line(command_line);
> >>> -- 
> >>> 2.39.2
> >>>
> >> I received a similar(ish) report recently
> >> https://lore.kernel.org/oe-kbuild-all/202404211031.J6l2AfJk-lkp@intel.com/
> >> regarding RISC-V in init/mail.c. Here is the meat of the report in case
> >> you want to avoid going to the actual link:
> > This issue doesn't look like it has anything to do with this patch: this
> > patch is about overlapping of dynamically allocated memory, while I think
> > the issue is about overlapping sections during linking (maybe something
> > wrong with riscv linker script?)
> >
> > Also, FWIW, this patch is not going to be in mainline because of a
> > regression.
> >
> > Nonetheless, I will have a look at this later.
> 
> 
> The config shows that it is a XIP kernel that comes with its own 
> limitations (text is limited to 32MB for example), so I'm not surprised 
> to see those overlaps.
> 
> We already discussed the removal of randconfig builds on XIP configs, 
> but IIRC it is not possible.

I just tested this going back until "2023-09-20 602bf1830798 (HEAD)
Merge branch 'for-6.7' into for-next  [Petr Mladek]" and I still saw the
overlapping errors.

Is this just something that happens always?

Best
Nam Cao May 10, 2024, 6:35 a.m. UTC | #11
On Tue, Apr 30, 2024 at 05:42:38PM +0200, Joel Granados wrote:
> On Tue, Apr 30, 2024 at 10:37:59AM +0200, Alexandre Ghiti wrote:
> > The config shows that it is a XIP kernel that comes with its own 
> > limitations (text is limited to 32MB for example), so I'm not surprised 
> > to see those overlaps.
> > 
> > We already discussed the removal of randconfig builds on XIP configs, 
> > but IIRC it is not possible.
> 
> I just tested this going back until "2023-09-20 602bf1830798 (HEAD)
> Merge branch 'for-6.7' into for-next  [Petr Mladek]" and I still saw the
> overlapping errors.
> 
> Is this just something that happens always?

Alex is write that this is due to the 32MB size limit on XIP kernel. This
means build failure happens if too many configurations are enabled and the
kernel gets too large.

I just sent a series to lift the size restriction and fix this build failure:
https://lore.kernel.org/lkml/cover.1715286093.git.namcao@linutronix.de/

Best regards,
Nam
diff mbox series

Patch

diff --git a/init/main.c b/init/main.c
index 881f6230ee59..f8d2793c4641 100644
--- a/init/main.c
+++ b/init/main.c
@@ -900,6 +900,7 @@  void start_kernel(void)
 	page_address_init();
 	pr_notice("%s", linux_banner);
 	early_security_init();
+	memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
 	setup_arch(&command_line);
 	setup_boot_config();
 	setup_command_line(command_line);