diff mbox series

binfmt_elf_fdpic: fix /proc/<pid>/auxv

Message ID 20240322195418.2160164-1-jcmvbkbc@gmail.com (mailing list archive)
State New
Headers show
Series binfmt_elf_fdpic: fix /proc/<pid>/auxv | expand

Commit Message

Max Filippov March 22, 2024, 7:54 p.m. UTC
Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
empty because there's no code that initializes mm->saved_auxv in the
FDPIC ELF loader.

Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
aux vector copying to userspace with initialization of mm->saved_auxv
first and then copying it to userspace as a whole.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 fs/binfmt_elf_fdpic.c | 88 +++++++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 46 deletions(-)

Comments

Kees Cook April 24, 2024, 10:55 p.m. UTC | #1
On Fri, 22 Mar 2024 12:54:18 -0700, Max Filippov wrote:
> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
> empty because there's no code that initializes mm->saved_auxv in the
> FDPIC ELF loader.
> 
> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
> aux vector copying to userspace with initialization of mm->saved_auxv
> first and then copying it to userspace as a whole.
> 
> [...]

Applied to for-next/execve, thanks!

[1/1] binfmt_elf_fdpic: fix /proc/<pid>/auxv
      https://git.kernel.org/kees/c/10e29251be0e

Take care,
Greg Ungerer Aug. 12, 2024, 2:26 a.m. UTC | #2
Hi Max,

On 23/3/24 05:54, Max Filippov wrote:
> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
> empty because there's no code that initializes mm->saved_auxv in the
> FDPIC ELF loader.
> 
> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
> aux vector copying to userspace with initialization of mm->saved_auxv
> first and then copying it to userspace as a whole.
> 
> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

This is breaking ARM nommu builds supporting fdpic and elf binaries for me.

Tests I have for m68k and riscv nommu setups running elf binaries
don't show any problems - I am only seeing this on ARM.


...
Freeing unused kernel image (initmem) memory: 472K
This architecture does not have kernel memory protection.
Run /init as init process
Internal error: Oops - undefined instruction: 0 [#1] ARM
Modules linked in:
CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1
Hardware name: ARM-Versatile (Device Tree Support)
PC is at load_elf_fdpic_binary+0xb34/0xb80
LR is at 0x0
pc : [<00109ce8>]    lr : [<00000000>]    psr: 80000153
sp : 00823e40  ip : 00000000  fp : 00b8fee4
r10: 009c9b80  r9 : 00b8ff80  r8 : 009ee800
r7 : 00000000  r6 : 009f7e80  r5 : 00b8fedc  r4 : 00b87000
r3 : 00b8fed8  r2 : 00b8fee0  r1 : 00b87128  r0 : 00b8fef0
Flags: Nzcv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
Control: 00091176  Table: 00000000  DAC: 00000000
Register r0 information: non-slab/vmalloc memory
Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428
Register r2 information: non-slab/vmalloc memory
Register r3 information: non-slab/vmalloc memory
Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428
Register r5 information: non-slab/vmalloc memory
Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32
Register r7 information: non-slab/vmalloc memory
Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512
Register r9 information: non-slab/vmalloc memory
Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128
Register r11 information: non-slab/vmalloc memory
Register r12 information: non-slab/vmalloc memory
Process init (pid: 1, stack limit = 0x(ptrval))
Stack: (0x00823e40 to 0x00824000)
3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
Call trace:
  load_elf_fdpic_binary from bprm_execve+0x1b4/0x488
  bprm_execve from kernel_execve+0x154/0x1e4
  kernel_execve from kernel_init+0x4c/0x108
  kernel_init from ret_from_fork+0x14/0x38
Exception stack(0x00823fb0 to 0x00823ff8)
3fa0:                                     ???????? ???????? ???????? ????????
3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
3fe0: ???????? ???????? ???????? ???????? ???????? ????????
Code: bad PC value
---[ end trace 0000000000000000 ]---
note: init[1] exited with irqs disabled
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---


The code around that PC is:

   109cd0:       e2833ff1        add     r3, r3, #964    @ 0x3c4
   109cd4:       e5933000        ldr     r3, [r3]
   109cd8:       e5933328        ldr     r3, [r3, #808]  @ 0x328
   109cdc:       e5933084        ldr     r3, [r3, #132]  @ 0x84
   109ce0:       e5843034        str     r3, [r4, #52]   @ 0x34
   109ce4:       eafffdbc        b       1093dc <load_elf_fdpic_binary+0x228>
   109ce8:       e7f001f2        .word   0xe7f001f2
   109cec:       eb09471d        bl      35b968 <__stack_chk_fail>
   109cf0:       e59f0038        ldr     r0, [pc, #56]   @ 109d30 <load_elf_fdpic_binary+0xb7c>
   109cf4:       eb092f03        bl      355908 <_printk>
   109cf8:       eafffdb7        b       1093dc <load_elf_fdpic_binary+0x228>


Reverting just this change gets it working again.

Any idea what might be going on?

Regards
Greg


> ---
>   fs/binfmt_elf_fdpic.c | 88 +++++++++++++++++++++----------------------
>   1 file changed, 42 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index fefc642541cb..7b4542a0cbe3 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -505,8 +505,9 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>   	char *k_platform, *k_base_platform;
>   	char __user *u_platform, *u_base_platform, *p;
>   	int loop;
> -	int nr;	/* reset for each csp adjustment */
>   	unsigned long flags = 0;
> +	int ei_index;
> +	elf_addr_t *elf_info;
>   
>   #ifdef CONFIG_MMU
>   	/* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions
> @@ -601,44 +602,24 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>   	csp -= sp & 15UL;
>   	sp -= sp & 15UL;
>   
> -	/* put the ELF interpreter info on the stack */
> -#define NEW_AUX_ENT(id, val)						\
> -	do {								\
> -		struct { unsigned long _id, _val; } __user *ent, v;	\
> -									\
> -		ent = (void __user *) csp;				\
> -		v._id = (id);						\
> -		v._val = (val);						\
> -		if (copy_to_user(ent + nr, &v, sizeof(v)))		\
> -			return -EFAULT;					\
> -		nr++;							\
> +	/* Create the ELF interpreter info */
> +	elf_info = (elf_addr_t *)mm->saved_auxv;
> +	/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
> +#define NEW_AUX_ENT(id, val) \
> +	do { \
> +		*elf_info++ = id; \
> +		*elf_info++ = val; \
>   	} while (0)
>   
> -	nr = 0;
> -	csp -= 2 * sizeof(unsigned long);
> -	NEW_AUX_ENT(AT_NULL, 0);
> -	if (k_platform) {
> -		nr = 0;
> -		csp -= 2 * sizeof(unsigned long);
> -		NEW_AUX_ENT(AT_PLATFORM,
> -			    (elf_addr_t) (unsigned long) u_platform);
> -	}
> -
> -	if (k_base_platform) {
> -		nr = 0;
> -		csp -= 2 * sizeof(unsigned long);
> -		NEW_AUX_ENT(AT_BASE_PLATFORM,
> -			    (elf_addr_t) (unsigned long) u_base_platform);
> -	}
> -
> -	if (bprm->have_execfd) {
> -		nr = 0;
> -		csp -= 2 * sizeof(unsigned long);
> -		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
> -	}
> -
> -	nr = 0;
> -	csp -= DLINFO_ITEMS * 2 * sizeof(unsigned long);
> +#ifdef ARCH_DLINFO
> +	/*
> +	 * ARCH_DLINFO must come first so PPC can do its special alignment of
> +	 * AUXV.
> +	 * update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT() in
> +	 * ARCH_DLINFO changes
> +	 */
> +	ARCH_DLINFO;
> +#endif
>   	NEW_AUX_ENT(AT_HWCAP,	ELF_HWCAP);
>   #ifdef ELF_HWCAP2
>   	NEW_AUX_ENT(AT_HWCAP2,	ELF_HWCAP2);
> @@ -659,17 +640,32 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>   	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid));
>   	NEW_AUX_ENT(AT_SECURE,	bprm->secureexec);
>   	NEW_AUX_ENT(AT_EXECFN,	bprm->exec);
> +	if (k_platform) {
> +		NEW_AUX_ENT(AT_PLATFORM,
> +			    (elf_addr_t)(unsigned long)u_platform);
> +	}
> +	if (k_base_platform) {
> +		NEW_AUX_ENT(AT_BASE_PLATFORM,
> +			    (elf_addr_t)(unsigned long)u_base_platform);
> +	}
> +	if (bprm->have_execfd) {
> +		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
> +	}
> +#undef NEW_AUX_ENT
> +	/* AT_NULL is zero; clear the rest too */
> +	memset(elf_info, 0, (char *)mm->saved_auxv +
> +	       sizeof(mm->saved_auxv) - (char *)elf_info);
>   
> -#ifdef ARCH_DLINFO
> -	nr = 0;
> -	csp -= AT_VECTOR_SIZE_ARCH * 2 * sizeof(unsigned long);
> +	/* And advance past the AT_NULL entry.  */
> +	elf_info += 2;
>   
> -	/* ARCH_DLINFO must come last so platform specific code can enforce
> -	 * special alignment requirements on the AUXV if necessary (eg. PPC).
> -	 */
> -	ARCH_DLINFO;
> -#endif
> -#undef NEW_AUX_ENT
> +	ei_index = elf_info - (elf_addr_t *)mm->saved_auxv;
> +	csp -= ei_index * sizeof(elf_addr_t);
> +
> +	/* Put the elf_info on the stack in the right place.  */
> +	if (copy_to_user((void __user *)csp, mm->saved_auxv,
> +			 ei_index * sizeof(elf_addr_t)))
> +		return -EFAULT;
>   
>   	/* allocate room for argv[] and envv[] */
>   	csp -= (bprm->envc + 1) * sizeof(elf_caddr_t);
Max Filippov Aug. 12, 2024, 6:02 p.m. UTC | #3
Hi Greg,

On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote:
> On 23/3/24 05:54, Max Filippov wrote:
> > Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
> > empty because there's no code that initializes mm->saved_auxv in the
> > FDPIC ELF loader.
> >
> > Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
> > aux vector copying to userspace with initialization of mm->saved_auxv
> > first and then copying it to userspace as a whole.
> >
> > Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
>
> This is breaking ARM nommu builds supporting fdpic and elf binaries for me.
>
> Tests I have for m68k and riscv nommu setups running elf binaries
> don't show any problems - I am only seeing this on ARM.
>
>
> ...
> Freeing unused kernel image (initmem) memory: 472K
> This architecture does not have kernel memory protection.
> Run /init as init process
> Internal error: Oops - undefined instruction: 0 [#1] ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1
> Hardware name: ARM-Versatile (Device Tree Support)
> PC is at load_elf_fdpic_binary+0xb34/0xb80
> LR is at 0x0
> pc : [<00109ce8>]    lr : [<00000000>]    psr: 80000153
> sp : 00823e40  ip : 00000000  fp : 00b8fee4
> r10: 009c9b80  r9 : 00b8ff80  r8 : 009ee800
> r7 : 00000000  r6 : 009f7e80  r5 : 00b8fedc  r4 : 00b87000
> r3 : 00b8fed8  r2 : 00b8fee0  r1 : 00b87128  r0 : 00b8fef0
> Flags: Nzcv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> Control: 00091176  Table: 00000000  DAC: 00000000
> Register r0 information: non-slab/vmalloc memory
> Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428
> Register r2 information: non-slab/vmalloc memory
> Register r3 information: non-slab/vmalloc memory
> Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428
> Register r5 information: non-slab/vmalloc memory
> Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32
> Register r7 information: non-slab/vmalloc memory
> Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512
> Register r9 information: non-slab/vmalloc memory
> Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128
> Register r11 information: non-slab/vmalloc memory
> Register r12 information: non-slab/vmalloc memory
> Process init (pid: 1, stack limit = 0x(ptrval))
> Stack: (0x00823e40 to 0x00824000)
> 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> Call trace:
>   load_elf_fdpic_binary from bprm_execve+0x1b4/0x488
>   bprm_execve from kernel_execve+0x154/0x1e4
>   kernel_execve from kernel_init+0x4c/0x108
>   kernel_init from ret_from_fork+0x14/0x38
> Exception stack(0x00823fb0 to 0x00823ff8)
> 3fa0:                                     ???????? ???????? ???????? ????????
> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> 3fe0: ???????? ???????? ???????? ???????? ???????? ????????
> Code: bad PC value
> ---[ end trace 0000000000000000 ]---
> note: init[1] exited with irqs disabled
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>
>
> The code around that PC is:
>
>    109cd0:       e2833ff1        add     r3, r3, #964    @ 0x3c4
>    109cd4:       e5933000        ldr     r3, [r3]
>    109cd8:       e5933328        ldr     r3, [r3, #808]  @ 0x328
>    109cdc:       e5933084        ldr     r3, [r3, #132]  @ 0x84
>    109ce0:       e5843034        str     r3, [r4, #52]   @ 0x34
>    109ce4:       eafffdbc        b       1093dc <load_elf_fdpic_binary+0x228>
>    109ce8:       e7f001f2        .word   0xe7f001f2
>    109cec:       eb09471d        bl      35b968 <__stack_chk_fail>
>    109cf0:       e59f0038        ldr     r0, [pc, #56]   @ 109d30 <load_elf_fdpic_binary+0xb7c>
>    109cf4:       eb092f03        bl      355908 <_printk>
>    109cf8:       eafffdb7        b       1093dc <load_elf_fdpic_binary+0x228>
>
>
> Reverting just this change gets it working again.
>
> Any idea what might be going on?

Other than that the layout of the aux vector has slightly changed I don't
see anything. I can take a look at what's going on if you can share the
reproducer.
Greg Ungerer Aug. 13, 2024, 4:53 a.m. UTC | #4
Hi Max,

On 13/8/24 04:02, Max Filippov wrote:
> Hi Greg,
> 
> On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote:
>> On 23/3/24 05:54, Max Filippov wrote:
>>> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
>>> empty because there's no code that initializes mm->saved_auxv in the
>>> FDPIC ELF loader.
>>>
>>> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
>>> aux vector copying to userspace with initialization of mm->saved_auxv
>>> first and then copying it to userspace as a whole.
>>>
>>> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
>>
>> This is breaking ARM nommu builds supporting fdpic and elf binaries for me.
>>
>> Tests I have for m68k and riscv nommu setups running elf binaries
>> don't show any problems - I am only seeing this on ARM.
>>
>>
>> ...
>> Freeing unused kernel image (initmem) memory: 472K
>> This architecture does not have kernel memory protection.
>> Run /init as init process
>> Internal error: Oops - undefined instruction: 0 [#1] ARM
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1
>> Hardware name: ARM-Versatile (Device Tree Support)
>> PC is at load_elf_fdpic_binary+0xb34/0xb80
>> LR is at 0x0
>> pc : [<00109ce8>]    lr : [<00000000>]    psr: 80000153
>> sp : 00823e40  ip : 00000000  fp : 00b8fee4
>> r10: 009c9b80  r9 : 00b8ff80  r8 : 009ee800
>> r7 : 00000000  r6 : 009f7e80  r5 : 00b8fedc  r4 : 00b87000
>> r3 : 00b8fed8  r2 : 00b8fee0  r1 : 00b87128  r0 : 00b8fef0
>> Flags: Nzcv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
>> Control: 00091176  Table: 00000000  DAC: 00000000
>> Register r0 information: non-slab/vmalloc memory
>> Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428
>> Register r2 information: non-slab/vmalloc memory
>> Register r3 information: non-slab/vmalloc memory
>> Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428
>> Register r5 information: non-slab/vmalloc memory
>> Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32
>> Register r7 information: non-slab/vmalloc memory
>> Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512
>> Register r9 information: non-slab/vmalloc memory
>> Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128
>> Register r11 information: non-slab/vmalloc memory
>> Register r12 information: non-slab/vmalloc memory
>> Process init (pid: 1, stack limit = 0x(ptrval))
>> Stack: (0x00823e40 to 0x00824000)
>> 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> Call trace:
>>    load_elf_fdpic_binary from bprm_execve+0x1b4/0x488
>>    bprm_execve from kernel_execve+0x154/0x1e4
>>    kernel_execve from kernel_init+0x4c/0x108
>>    kernel_init from ret_from_fork+0x14/0x38
>> Exception stack(0x00823fb0 to 0x00823ff8)
>> 3fa0:                                     ???????? ???????? ???????? ????????
>> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
>> 3fe0: ???????? ???????? ???????? ???????? ???????? ????????
>> Code: bad PC value
>> ---[ end trace 0000000000000000 ]---
>> note: init[1] exited with irqs disabled
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>>
>>
>> The code around that PC is:
>>
>>     109cd0:       e2833ff1        add     r3, r3, #964    @ 0x3c4
>>     109cd4:       e5933000        ldr     r3, [r3]
>>     109cd8:       e5933328        ldr     r3, [r3, #808]  @ 0x328
>>     109cdc:       e5933084        ldr     r3, [r3, #132]  @ 0x84
>>     109ce0:       e5843034        str     r3, [r4, #52]   @ 0x34
>>     109ce4:       eafffdbc        b       1093dc <load_elf_fdpic_binary+0x228>
>>     109ce8:       e7f001f2        .word   0xe7f001f2
>>     109cec:       eb09471d        bl      35b968 <__stack_chk_fail>
>>     109cf0:       e59f0038        ldr     r0, [pc, #56]   @ 109d30 <load_elf_fdpic_binary+0xb7c>
>>     109cf4:       eb092f03        bl      355908 <_printk>
>>     109cf8:       eafffdb7        b       1093dc <load_elf_fdpic_binary+0x228>
>>
>>
>> Reverting just this change gets it working again.
>>
>> Any idea what might be going on?
> 
> Other than that the layout of the aux vector has slightly changed I don't
> see anything. I can take a look at what's going on if you can share the
> reproducer.

I build the test binary using a simple script:

   https://github.com/gregungerer/simple-linux/blob/master/build-armnommu-linux-uclibc-fdpic.sh

Run the resulting vmlinux and devicetree under qemu as per comments at top of that script.

Note that that script has a revert of this change (at line 191), so you would need
to take that out to reproduce the problem. This script will look for a couple of
configs and a versatile patch:

   https://github.com/gregungerer/simple-linux/blob/master/configs/linux-6.10-armnommu-versatile.config
   https://github.com/gregungerer/simple-linux/blob/master/configs/uClibc-ng-1.0.49-armnommu-fdpic.config
   https://github.com/gregungerer/simple-linux/blob/master/configs/busybox-1.36.1.config
   https://github.com/gregungerer/simple-linux/blob/master/configs/rootfs.dev
   https://github.com/gregungerer/simple-linux/blob/master/patches/linux-6.10-armnommu-versatile.patch
   https://github.com/gregungerer/simple-linux/blob/master/patches/linux-6.10-binfmt_elf_fdpic-fix-proc-pid-auxv.patch

Or you could just clone the whole small set in one step from:

   https://github.com/gregungerer/simple-linux.git
   
If you need to me to run something I can do that easily too.

Regards
Greg
Max Filippov Aug. 13, 2024, 6:28 p.m. UTC | #5
On Mon, Aug 12, 2024 at 9:53 PM Greg Ungerer <gregungerer@westnet.com.au> wrote:
> On 13/8/24 04:02, Max Filippov wrote:
> > On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote:
> >> On 23/3/24 05:54, Max Filippov wrote:
> >>> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
> >>> empty because there's no code that initializes mm->saved_auxv in the
> >>> FDPIC ELF loader.
> >>>
> >>> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
> >>> aux vector copying to userspace with initialization of mm->saved_auxv
> >>> first and then copying it to userspace as a whole.
> >>>
> >>> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
> >>
> >> This is breaking ARM nommu builds supporting fdpic and elf binaries for me.
> >>
> >> Tests I have for m68k and riscv nommu setups running elf binaries
> >> don't show any problems - I am only seeing this on ARM.
> >>
> >>
> >> ...
> >> Freeing unused kernel image (initmem) memory: 472K
> >> This architecture does not have kernel memory protection.
> >> Run /init as init process
> >> Internal error: Oops - undefined instruction: 0 [#1] ARM
> >> Modules linked in:
> >> CPU: 0 PID: 1 Comm: init Not tainted 6.10.0 #1
> >> Hardware name: ARM-Versatile (Device Tree Support)
> >> PC is at load_elf_fdpic_binary+0xb34/0xb80
> >> LR is at 0x0
> >> pc : [<00109ce8>]    lr : [<00000000>]    psr: 80000153
> >> sp : 00823e40  ip : 00000000  fp : 00b8fee4
> >> r10: 009c9b80  r9 : 00b8ff80  r8 : 009ee800
> >> r7 : 00000000  r6 : 009f7e80  r5 : 00b8fedc  r4 : 00b87000
> >> r3 : 00b8fed8  r2 : 00b8fee0  r1 : 00b87128  r0 : 00b8fef0
> >> Flags: Nzcv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
> >> Control: 00091176  Table: 00000000  DAC: 00000000
> >> Register r0 information: non-slab/vmalloc memory
> >> Register r1 information: slab/vmalloc mm_struct start 00b87000 pointer offset 296 size 428
> >> Register r2 information: non-slab/vmalloc memory
> >> Register r3 information: non-slab/vmalloc memory
> >> Register r4 information: slab/vmalloc mm_struct start 00b87000 pointer offset 0 size 428
> >> Register r5 information: non-slab/vmalloc memory
> >> Register r6 information: slab/vmalloc kmalloc-32 start 009f7e80 pointer offset 0 size 32
> >> Register r7 information: non-slab/vmalloc memory
> >> Register r8 information: slab/vmalloc kmalloc-512 start 009ee800 pointer offset 0 size 512
> >> Register r9 information: non-slab/vmalloc memory
> >> Register r10 information: slab/vmalloc kmalloc-128 start 009c9b80 pointer offset 0 size 128
> >> Register r11 information: non-slab/vmalloc memory
> >> Register r12 information: non-slab/vmalloc memory
> >> Process init (pid: 1, stack limit = 0x(ptrval))
> >> Stack: (0x00823e40 to 0x00824000)
> >> 3e40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3e60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3e80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3ea0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3ec0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3ee0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3f00: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3f20: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3f40: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3f60: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3f80: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3fa0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3fe0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> Call trace:
> >>    load_elf_fdpic_binary from bprm_execve+0x1b4/0x488
> >>    bprm_execve from kernel_execve+0x154/0x1e4
> >>    kernel_execve from kernel_init+0x4c/0x108
> >>    kernel_init from ret_from_fork+0x14/0x38
> >> Exception stack(0x00823fb0 to 0x00823ff8)
> >> 3fa0:                                     ???????? ???????? ???????? ????????
> >> 3fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ????????
> >> 3fe0: ???????? ???????? ???????? ???????? ???????? ????????
> >> Code: bad PC value
> >> ---[ end trace 0000000000000000 ]---
> >> note: init[1] exited with irqs disabled
> >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> >> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> >>
> >>
> >> The code around that PC is:
> >>
> >>     109cd0:       e2833ff1        add     r3, r3, #964    @ 0x3c4
> >>     109cd4:       e5933000        ldr     r3, [r3]
> >>     109cd8:       e5933328        ldr     r3, [r3, #808]  @ 0x328
> >>     109cdc:       e5933084        ldr     r3, [r3, #132]  @ 0x84
> >>     109ce0:       e5843034        str     r3, [r4, #52]   @ 0x34
> >>     109ce4:       eafffdbc        b       1093dc <load_elf_fdpic_binary+0x228>
> >>     109ce8:       e7f001f2        .word   0xe7f001f2
> >>     109cec:       eb09471d        bl      35b968 <__stack_chk_fail>
> >>     109cf0:       e59f0038        ldr     r0, [pc, #56]   @ 109d30 <load_elf_fdpic_binary+0xb7c>
> >>     109cf4:       eb092f03        bl      355908 <_printk>
> >>     109cf8:       eafffdb7        b       1093dc <load_elf_fdpic_binary+0x228>
> >>
> >>
> >> Reverting just this change gets it working again.
> >>
> >> Any idea what might be going on?
> >
> > Other than that the layout of the aux vector has slightly changed I don't
> > see anything. I can take a look at what's going on if you can share the
> > reproducer.
>
> I build the test binary using a simple script:
>
>    https://github.com/gregungerer/simple-linux/blob/master/build-armnommu-linux-uclibc-fdpic.sh
>
> Run the resulting vmlinux and devicetree under qemu as per comments at top of that script.

Cool scripts, thanks. I was able to reproduce the issue, it's the
BUG_ON(csp != sp);
from the create_elf_fdpic_tables() that causes it. I'm still trying
to figure out how that happens.
Max Filippov Aug. 26, 2024, 3:28 a.m. UTC | #6
Hi Greg,

On Tue, Aug 13, 2024 at 11:28 AM Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Mon, Aug 12, 2024 at 9:53 PM Greg Ungerer <gregungerer@westnet.com.au> wrote:
> > On 13/8/24 04:02, Max Filippov wrote:
> > > On Sun, Aug 11, 2024 at 7:26 PM Greg Ungerer <gregungerer@westnet.com.au> wrote:
> > >> On 23/3/24 05:54, Max Filippov wrote:
> > >>> Althought FDPIC linux kernel provides /proc/<pid>/auxv files they are
> > >>> empty because there's no code that initializes mm->saved_auxv in the
> > >>> FDPIC ELF loader.
> > >>>
> > >>> Synchronize FDPIC ELF aux vector setup with ELF. Replace entry-by-entry
> > >>> aux vector copying to userspace with initialization of mm->saved_auxv
> > >>> first and then copying it to userspace as a whole.
> > >>>
> > >>> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
> > >>
> > >> This is breaking ARM nommu builds supporting fdpic and elf binaries for me.
> > >>
> > >> Tests I have for m68k and riscv nommu setups running elf binaries
> > >> don't show any problems - I am only seeing this on ARM.

I see the following:
- the issue with the change is caused by unaccouncounted AUX vector
  entry AT_HWCAP2 that is defined for ARM, but not for any other
  architecture that you tested.
- in the original code this off-by-one error resulted in the last entry of the
  AUX vector being set to zero. Below are the stack dumps from the ARM
  kernels built by your script, one with my change (left) and the other where
  this change is reverted (right):

argc:
00000001  00000001

argv:
00b8ffde  00b8ffde
00000000  00000000

envp:
00b8ffe4  00b8ffe4
00b8ffeb  00b8ffeb
00000000  00000000

auxv entries:
00000010  00000010
000001d7  000001d7
0000001a  0000001a
00000000  00000000
00000006  00000006
00001000  00001000
00000011  00000011
00000064  00000064
00000003  00000003
00980034  00a00034
00000004  00000004
00000020  00000020
00000005  00000005
00000007  00000007
00000007  00000007
00a40000  00a40000
00000008  00000008
00000000  00000000
00000009  00000009
00984040  00a04040
0000000b  0000000b
00000000  00000000
0000000c  0000000c
00000000  00000000
0000000d  0000000d
00000000  00000000
0000000e  0000000e
00000000  00000000
00000017  00000017
00000000  00000000
0000001f  0000001f
00b8fff6  00b8fff6
0000000f  00000000
00b8ffcc  00000000
00000000  00000000
00000000  00000000

The fix is in correct accounting of space for the AT_HWCAP2 entry.
diff mbox series

Patch

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index fefc642541cb..7b4542a0cbe3 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -505,8 +505,9 @@  static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	char *k_platform, *k_base_platform;
 	char __user *u_platform, *u_base_platform, *p;
 	int loop;
-	int nr;	/* reset for each csp adjustment */
 	unsigned long flags = 0;
+	int ei_index;
+	elf_addr_t *elf_info;
 
 #ifdef CONFIG_MMU
 	/* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions
@@ -601,44 +602,24 @@  static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	csp -= sp & 15UL;
 	sp -= sp & 15UL;
 
-	/* put the ELF interpreter info on the stack */
-#define NEW_AUX_ENT(id, val)						\
-	do {								\
-		struct { unsigned long _id, _val; } __user *ent, v;	\
-									\
-		ent = (void __user *) csp;				\
-		v._id = (id);						\
-		v._val = (val);						\
-		if (copy_to_user(ent + nr, &v, sizeof(v)))		\
-			return -EFAULT;					\
-		nr++;							\
+	/* Create the ELF interpreter info */
+	elf_info = (elf_addr_t *)mm->saved_auxv;
+	/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
+#define NEW_AUX_ENT(id, val) \
+	do { \
+		*elf_info++ = id; \
+		*elf_info++ = val; \
 	} while (0)
 
-	nr = 0;
-	csp -= 2 * sizeof(unsigned long);
-	NEW_AUX_ENT(AT_NULL, 0);
-	if (k_platform) {
-		nr = 0;
-		csp -= 2 * sizeof(unsigned long);
-		NEW_AUX_ENT(AT_PLATFORM,
-			    (elf_addr_t) (unsigned long) u_platform);
-	}
-
-	if (k_base_platform) {
-		nr = 0;
-		csp -= 2 * sizeof(unsigned long);
-		NEW_AUX_ENT(AT_BASE_PLATFORM,
-			    (elf_addr_t) (unsigned long) u_base_platform);
-	}
-
-	if (bprm->have_execfd) {
-		nr = 0;
-		csp -= 2 * sizeof(unsigned long);
-		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
-	}
-
-	nr = 0;
-	csp -= DLINFO_ITEMS * 2 * sizeof(unsigned long);
+#ifdef ARCH_DLINFO
+	/*
+	 * ARCH_DLINFO must come first so PPC can do its special alignment of
+	 * AUXV.
+	 * update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT() in
+	 * ARCH_DLINFO changes
+	 */
+	ARCH_DLINFO;
+#endif
 	NEW_AUX_ENT(AT_HWCAP,	ELF_HWCAP);
 #ifdef ELF_HWCAP2
 	NEW_AUX_ENT(AT_HWCAP2,	ELF_HWCAP2);
@@ -659,17 +640,32 @@  static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	NEW_AUX_ENT(AT_EGID,	(elf_addr_t) from_kgid_munged(cred->user_ns, cred->egid));
 	NEW_AUX_ENT(AT_SECURE,	bprm->secureexec);
 	NEW_AUX_ENT(AT_EXECFN,	bprm->exec);
+	if (k_platform) {
+		NEW_AUX_ENT(AT_PLATFORM,
+			    (elf_addr_t)(unsigned long)u_platform);
+	}
+	if (k_base_platform) {
+		NEW_AUX_ENT(AT_BASE_PLATFORM,
+			    (elf_addr_t)(unsigned long)u_base_platform);
+	}
+	if (bprm->have_execfd) {
+		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
+	}
+#undef NEW_AUX_ENT
+	/* AT_NULL is zero; clear the rest too */
+	memset(elf_info, 0, (char *)mm->saved_auxv +
+	       sizeof(mm->saved_auxv) - (char *)elf_info);
 
-#ifdef ARCH_DLINFO
-	nr = 0;
-	csp -= AT_VECTOR_SIZE_ARCH * 2 * sizeof(unsigned long);
+	/* And advance past the AT_NULL entry.  */
+	elf_info += 2;
 
-	/* ARCH_DLINFO must come last so platform specific code can enforce
-	 * special alignment requirements on the AUXV if necessary (eg. PPC).
-	 */
-	ARCH_DLINFO;
-#endif
-#undef NEW_AUX_ENT
+	ei_index = elf_info - (elf_addr_t *)mm->saved_auxv;
+	csp -= ei_index * sizeof(elf_addr_t);
+
+	/* Put the elf_info on the stack in the right place.  */
+	if (copy_to_user((void __user *)csp, mm->saved_auxv,
+			 ei_index * sizeof(elf_addr_t)))
+		return -EFAULT;
 
 	/* allocate room for argv[] and envv[] */
 	csp -= (bprm->envc + 1) * sizeof(elf_caddr_t);