fs/binfmt_elf: Fix memsz > filesz handling

Message ID	20221106021657.1145519-1-pedro.falcato@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Pedro Falcato <pedro.falcato@gmail.com> To: linux-kernel@vger.kernel.org, Kees Cook <keescook@chromium.org>, linux-mm@kvack.org Cc: sam@gentoo.org, Alexander Viro <viro@zeniv.linux.org.uk>, Eric Biederman <ebiederm@xmission.com>, linux-fsdevel@vger.kernel.org, Pedro Falcato <pedro.falcato@gmail.com>, Rich Felker <dalias@libc.org> Subject: [PATCH] fs/binfmt_elf: Fix memsz > filesz handling Date: Sun, 6 Nov 2022 02:16:57 +0000 Message-Id: <20221106021657.1145519-1-pedro.falcato@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	fs/binfmt_elf: Fix memsz > filesz handling \| expand fs/binfmt_elf: Fix memsz > filesz handling

Pedro Falcato Nov. 6, 2022, 2:16 a.m. UTC

The old code for ELF interpreter loading could only handle
1 memsz > filesz segment. This is incorrect, as evidenced
by the elf program loading code, which could handle multiple
such segments.

This patch fixes memsz > filesz handling for elf interpreters
and refactors interpreter/program BSS clearing into a common
codepath.

This bug was uncovered on builds of ppc64le musl libc with
llvm lld 15.0.0, since ppc64 does not allocate file space
for its .plt.

Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
 fs/binfmt_elf.c | 170 ++++++++++++++++--------------------------------
 1 file changed, 56 insertions(+), 114 deletions(-)

Fangrui Song Nov. 6, 2022, 6:44 a.m. UTC | #1

On 2022-11-06, Pedro Falcato wrote:
>The old code for ELF interpreter loading could only handle
>1 memsz > filesz segment. This is incorrect, as evidenced
>by the elf program loading code, which could handle multiple
>such segments.
>
>This patch fixes memsz > filesz handling for elf interpreters
>and refactors interpreter/program BSS clearing into a common
>codepath.
>
>This bug was uncovered on builds of ppc64le musl libc with
>llvm lld 15.0.0, since ppc64 does not allocate file space
>for its .plt.
>
>Cc: Rich Felker <dalias@libc.org>
>Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
>---
> fs/binfmt_elf.c | 170 ++++++++++++++++--------------------------------
> 1 file changed, 56 insertions(+), 114 deletions(-)
>
>diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>index 6a11025e585..ca2961d80fa 100644
>--- a/fs/binfmt_elf.c
>+++ b/fs/binfmt_elf.c
>@@ -109,25 +109,6 @@ static struct linux_binfmt elf_format = {
>
> #define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE))
>
>-static int set_brk(unsigned long start, unsigned long end, int prot)
>-{
>-	start = ELF_PAGEALIGN(start);
>-	end = ELF_PAGEALIGN(end);
>-	if (end > start) {
>-		/*
>-		 * Map the last of the bss segment.
>-		 * If the header is requesting these pages to be
>-		 * executable, honour that (ppc32 needs this).
>-		 */
>-		int error = vm_brk_flags(start, end - start,
>-				prot & PROT_EXEC ? VM_EXEC : 0);
>-		if (error)
>-			return error;
>-	}
>-	current->mm->start_brk = current->mm->brk = end;
>-	return 0;
>-}
>-
> /* We need to explicitly zero any fractional pages
>    after the data section (i.e. bss).  This would
>    contain the junk from the file that should not
>@@ -584,6 +565,41 @@ static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
> 	return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
> }
>
>+static int zero_bss(unsigned long start, unsigned long end, int prot)
>+{
>+	/*
>+	 * First pad the last page from the file up to
>+	 * the page boundary, and zero it from elf_bss up to the end of the page.
>+	 */
>+	if (padzero(start))
>+		return -EFAULT;
>+
>+	/*
>+	 * Next, align both the file and mem bss up to the page size,
>+	 * since this is where elf_bss was just zeroed up to, and where
>+	 * last_bss will end after the vm_brk_flags() below.
>+	 */
>+
>+	start = ELF_PAGEALIGN(start);
>+	end = ELF_PAGEALIGN(end);
>+
>+	/* Finally, if there is still more bss to allocate, do it. */
>+
>+	return (end > start ? vm_brk_flags(start, end - start,
>+		prot & PROT_EXEC ? VM_EXEC : 0) : 0);
>+}
>+
>+static int set_brk(unsigned long start, unsigned long end, int prot)
>+{
>+	int error = zero_bss(start, end, prot);
>+
>+	if (error < 0)
>+		return error;
>+
>+	current->mm->start_brk = current->mm->brk = ELF_PAGEALIGN(end);
>+	return 0;
>+}
>+
> /* This is much more generalized than the library routine read function,
>    so we keep this separate.  Technically the library read function
>    is only provided so that we can read a.out libraries that have
>@@ -597,8 +613,6 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
> 	struct elf_phdr *eppnt;
> 	unsigned long load_addr = 0;
> 	int load_addr_set = 0;
>-	unsigned long last_bss = 0, elf_bss = 0;
>-	int bss_prot = 0;
> 	unsigned long error = ~0UL;
> 	unsigned long total_size;
> 	int i;
>@@ -662,50 +676,21 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
> 				goto out;
> 			}
>
>-			/*
>-			 * Find the end of the file mapping for this phdr, and
>-			 * keep track of the largest address we see for this.
>-			 */
>-			k = load_addr + eppnt->p_vaddr + eppnt->p_filesz;
>-			if (k > elf_bss)
>-				elf_bss = k;
>+			if (eppnt->p_memsz > eppnt->p_filesz) {
>+				/*
>+				 * Handle BSS zeroing and mapping
>+				 */
>+				unsigned long start = load_addr + vaddr + eppnt->p_filesz;
>+				unsigned long end = load_addr + vaddr + eppnt->p_memsz;
>
>-			/*
>-			 * Do the same thing for the memory mapping - between
>-			 * elf_bss and last_bss is the bss section.
>-			 */
>-			k = load_addr + eppnt->p_vaddr + eppnt->p_memsz;
>-			if (k > last_bss) {
>-				last_bss = k;
>-				bss_prot = elf_prot;
>+				error = zero_bss(start, end, elf_prot);
>+
>+				if (error < 0)
>+					goto out;
> 			}
> 		}
> 	}
>
>-	/*
>-	 * Now fill out the bss section: first pad the last page from
>-	 * the file up to the page boundary, and zero it from elf_bss
>-	 * up to the end of the page.
>-	 */
>-	if (padzero(elf_bss)) {
>-		error = -EFAULT;
>-		goto out;
>-	}
>-	/*
>-	 * Next, align both the file and mem bss up to the page size,
>-	 * since this is where elf_bss was just zeroed up to, and where
>-	 * last_bss will end after the vm_brk_flags() below.
>-	 */
>-	elf_bss = ELF_PAGEALIGN(elf_bss);
>-	last_bss = ELF_PAGEALIGN(last_bss);
>-	/* Finally, if there is still more bss to allocate, do it. */
>-	if (last_bss > elf_bss) {
>-		error = vm_brk_flags(elf_bss, last_bss - elf_bss,
>-				bss_prot & PROT_EXEC ? VM_EXEC : 0);
>-		if (error)
>-			goto out;
>-	}
>-
> 	error = load_addr;
> out:
> 	return error;
>@@ -829,8 +814,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
> 	unsigned long error;
> 	struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
> 	struct elf_phdr *elf_property_phdata = NULL;
>-	unsigned long elf_bss, elf_brk;
>-	int bss_prot = 0;
> 	int retval, i;
> 	unsigned long elf_entry;
> 	unsigned long e_entry;
>@@ -1020,9 +1003,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
> 				 executable_stack);
> 	if (retval < 0)
> 		goto out_free_dentry;
>-	
>-	elf_bss = 0;
>-	elf_brk = 0;
>
> 	start_code = ~0UL;
> 	end_code = 0;
>@@ -1041,33 +1021,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
> 		if (elf_ppnt->p_type != PT_LOAD)
> 			continue;
>
>-		if (unlikely (elf_brk > elf_bss)) {
>-			unsigned long nbyte;
>-	
>-			/* There was a PT_LOAD segment with p_memsz > p_filesz
>-			   before this one. Map anonymous pages, if needed,
>-			   and clear the area.  */
>-			retval = set_brk(elf_bss + load_bias,
>-					 elf_brk + load_bias,
>-					 bss_prot);
>-			if (retval)
>-				goto out_free_dentry;
>-			nbyte = ELF_PAGEOFFSET(elf_bss);
>-			if (nbyte) {
>-				nbyte = ELF_MIN_ALIGN - nbyte;
>-				if (nbyte > elf_brk - elf_bss)
>-					nbyte = elf_brk - elf_bss;
>-				if (clear_user((void __user *)elf_bss +
>-							load_bias, nbyte)) {
>-					/*
>-					 * This bss-zeroing can fail if the ELF
>-					 * file specifies odd protections. So
>-					 * we don't check the return value
>-					 */
>-				}
>-			}
>-		}
>-
> 		elf_prot = make_prot(elf_ppnt->p_flags, &arch_state,
> 				     !!interpreter, false);
>
>@@ -1211,41 +1164,30 @@ static int load_elf_binary(struct linux_binprm *bprm)
>
> 		k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
>
>-		if (k > elf_bss)
>-			elf_bss = k;
>+
>+		if (elf_ppnt->p_memsz > elf_ppnt->p_filesz) {
>+			unsigned long seg_end = elf_ppnt->p_vaddr +
>+					 elf_ppnt->p_memsz + load_bias;
>+			retval = set_brk(k + load_bias,
>+					 seg_end,
>+					 elf_prot);
>+			if (retval)
>+				goto out_free_dentry;
>+		}
>+
> 		if ((elf_ppnt->p_flags & PF_X) && end_code < k)
> 			end_code = k;
> 		if (end_data < k)
> 			end_data = k;
>-		k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
>-		if (k > elf_brk) {
>-			bss_prot = elf_prot;
>-			elf_brk = k;
>-		}
> 	}
>
> 	e_entry = elf_ex->e_entry + load_bias;
> 	phdr_addr += load_bias;
>-	elf_bss += load_bias;
>-	elf_brk += load_bias;
> 	start_code += load_bias;
> 	end_code += load_bias;
> 	start_data += load_bias;
> 	end_data += load_bias;
>
>-	/* Calling set_brk effectively mmaps the pages that we need
>-	 * for the bss and break sections.  We must do this before
>-	 * mapping in the interpreter, to make sure it doesn't wind
>-	 * up getting placed where the bss needs to go.
>-	 */
>-	retval = set_brk(elf_bss, elf_brk, bss_prot);
>-	if (retval)
>-		goto out_free_dentry;
>-	if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
>-		retval = -EFAULT; /* Nobody gets to see this, but.. */
>-		goto out_free_dentry;
>-	}
>-
> 	if (interpreter) {
> 		elf_entry = load_elf_interp(interp_elf_ex,
> 					    interpreter,
>-- 
>2.38.1

I have a write-up about the issue: https://maskray.me/blog/2022-11-05-lld-musl-powerpc64
and used a `.section .toc,"aw",@nobits` trick to verify that this patch
makes two RW PT_LOAD (p_filesz < p_memsz) work.

Reviewed-by: Fangrui Song <i@maskray.me>
Tested-by: Fangrui Song <i@maskray.me>

Kees Cook Nov. 7, 2022, 3:59 a.m. UTC | #2

On Sun, Nov 06, 2022 at 02:16:57AM +0000, Pedro Falcato wrote:
> The old code for ELF interpreter loading could only handle
> 1 memsz > filesz segment. This is incorrect, as evidenced
> by the elf program loading code, which could handle multiple
> such segments.
> 
> This patch fixes memsz > filesz handling for elf interpreters
> and refactors interpreter/program BSS clearing into a common
> codepath.
> 
> This bug was uncovered on builds of ppc64le musl libc with
> llvm lld 15.0.0, since ppc64 does not allocate file space
> for its .plt.
> 
> Cc: Rich Felker <dalias@libc.org>
> Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>

Thanks for the patch! I need to triple-check this logic, as there have
been some overlapping (or out-of-order) LOAD bugs in the past too, and I
want to make sure we don't accidentally zero things that already got
loaded, etc.

David, has there been any work on adding a way to instantiate
userspace VMAs in a KUnit test? I tried to write this myself, but I
couldn't figure out how to make the userspace memory mappings appear.
Here's my fumbling attempt:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/kunit/usercopy

I really wish KUnit had userspace mapping support -- I have a bunch of
unit tests that need to get built up around checking for regressions
here, etc.

Anyway, I'll test this patch and get it applied and likely backported
to earlier kernels in the next few days.

-Kees

Pedro Falcato Nov. 8, 2022, 6:13 a.m. UTC | #3

On Mon, Nov 7, 2022 at 3:59 AM Kees Cook <keescook@chromium.org> wrote:
> Thanks for the patch! I need to triple-check this logic, as there have
> been some overlapping (or out-of-order) LOAD bugs in the past too, and I
> want to make sure we don't accidentally zero things that already got
> loaded, etc.
Hi Kees,

Thanks for looking at my patch. I've submitted an (unprompted) v2 with
an additional fix for
a loading bug that could load segments after a .bss on top of .bss
itself, which would
overwrite any bss zeroing efforts. Note that this bug was already
present in load_elf_binary.

See a repro at https://github.com/heatd/elf-bug-questionmark, and the
comments on the patch/repro for more info.
Most ELF loading operating systems out there seem to fail on this one.

As for overlapping/out-of-order LOAD segments, what kind of handling
do we want here?
The ELF spec (http://www.sco.com/developers/gabi/latest/ch5.pheader.html)
says that
"Loadable segment entries in the program header table appear in
ascending order, sorted on the p_vaddr member.",
so do we really want to support that? My -v2 was substantially
simplified based on assuming ELF-compliant executables.

> David, has there been any work on adding a way to instantiate
> userspace VMAs in a KUnit test? I tried to write this myself, but I
> couldn't figure out how to make the userspace memory mappings appear.
> Here's my fumbling attempt:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/kunit/usercopy
>
> I really wish KUnit had userspace mapping support -- I have a bunch of
> unit tests that need to get built up around checking for regressions
> here, etc.
+1 on getting this unit-tested, this is a bit of a minefield

Thanks,
Pedro

David Gow Nov. 16, 2022, 4:34 a.m. UTC | #4

On Mon, Nov 7, 2022 at 11:59 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Sun, Nov 06, 2022 at 02:16:57AM +0000, Pedro Falcato wrote:
> David, has there been any work on adding a way to instantiate
> userspace VMAs in a KUnit test? I tried to write this myself, but I
> couldn't figure out how to make the userspace memory mappings appear.
> Here's my fumbling attempt:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/kunit/usercopy
>
> I really wish KUnit had userspace mapping support -- I have a bunch of
> unit tests that need to get built up around checking for regressions
> here, etc.

Hi Kees,

Sorry the the delayed response!

Alas, my attempts to get this to work haven't been much more
successful than yours. It's definitely something we'd like to support,
but I confess to not knowing enough about the mm code to know exactly
what would be involved.

The workaround is to load tests as modules, and use something like
Vitor's original patch here:
https://lore.kernel.org/all/20200721174036.71072-1-vitor@massaru.org/

Basically, using the existing mm of the module loader. Adapting those
changes to your branch (and fixing a couple of back-to-front KUnit
assertions) does work for me when built as a module, in an x86_64 vm:

root@slicestar:~# modprobe usercopy_kunit
[   52.986290]     # Subtest: usercopy
[   52.986701]     1..1
[   53.246058]     ok 1 - usercopy_test
[   53.246628] ok 1 - usercopy

But getting it to work with built-in tests hasn't been successful so
far. I wondered if we could just piggy-back on init_mm or similar, but
that doesn't seem to work either.

So, in the short-term, this is only possible for modules. If that's
useful enough, we can get Vitor's support patch (or something similar)
in, and just mark any tests module-only (or have them skip if there's
no mm). Because kunit.py only runs built-in tests, though, it's
definitely less convenient.

Cheers,
-- David

Kees Cook Nov. 17, 2022, 10:06 p.m. UTC | #5

Hi,

This has diverged from the original topic a bit, so I've changed the
Subject to hopefully gain visibility. :)

For KUnit, it would be REALLY nice to have a way to attach a userspace
VM to a kernel thread so we can do userspace memory mapping
manipulation, etc. Neither David nor I have been able to figure out the
right set of steps to make this happen. What are we missing?

Details below...

On Wed, Nov 16, 2022 at 12:34:40PM +0800, David Gow wrote:
> On Mon, Nov 7, 2022 at 11:59 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Sun, Nov 06, 2022 at 02:16:57AM +0000, Pedro Falcato wrote:
> > David, has there been any work on adding a way to instantiate
> > userspace VMAs in a KUnit test? I tried to write this myself, but I
> > couldn't figure out how to make the userspace memory mappings appear.
> > Here's my fumbling attempt:
> > https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/kunit/usercopy
> >
> > I really wish KUnit had userspace mapping support -- I have a bunch of
> > unit tests that need to get built up around checking for regressions
> > here, etc.
> 
> Hi Kees,
> 
> Sorry the the delayed response!
> 
> Alas, my attempts to get this to work haven't been much more
> successful than yours. It's definitely something we'd like to support,
> but I confess to not knowing enough about the mm code to know exactly
> what would be involved.
> 
> The workaround is to load tests as modules, and use something like
> Vitor's original patch here:
> https://lore.kernel.org/all/20200721174036.71072-1-vitor@massaru.org/
> 
> Basically, using the existing mm of the module loader. Adapting those
> changes to your branch (and fixing a couple of back-to-front KUnit
> assertions) does work for me when built as a module, in an x86_64 vm:
> 
> root@slicestar:~# modprobe usercopy_kunit
> [   52.986290]     # Subtest: usercopy
> [   52.986701]     1..1
> [   53.246058]     ok 1 - usercopy_test
> [   53.246628] ok 1 - usercopy
> 
> But getting it to work with built-in tests hasn't been successful so
> far. I wondered if we could just piggy-back on init_mm or similar, but
> that doesn't seem to work either.
> 
> So, in the short-term, this is only possible for modules. If that's
> useful enough, we can get Vitor's support patch (or something similar)
> in, and just mark any tests module-only (or have them skip if there's
> no mm). Because kunit.py only runs built-in tests, though, it's
> definitely less convenient.

Thanks for any pointers! :)

-Kees

David Gow Nov. 18, 2022, 7:29 a.m. UTC | #6

On Fri, Nov 18, 2022 at 6:06 AM Kees Cook <keescook@chromium.org> wrote:
>
> Hi,
>
> This has diverged from the original topic a bit, so I've changed the
> Subject to hopefully gain visibility. :)
>
> For KUnit, it would be REALLY nice to have a way to attach a userspace
> VM to a kernel thread so we can do userspace memory mapping
> manipulation, etc. Neither David nor I have been able to figure out the
> right set of steps to make this happen. What are we missing?
>
> Details below...
>
> On Wed, Nov 16, 2022 at 12:34:40PM +0800, David Gow wrote:
> > On Mon, Nov 7, 2022 at 11:59 AM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Sun, Nov 06, 2022 at 02:16:57AM +0000, Pedro Falcato wrote:
> > > David, has there been any work on adding a way to instantiate
> > > userspace VMAs in a KUnit test? I tried to write this myself, but I
> > > couldn't figure out how to make the userspace memory mappings appear.
> > > Here's my fumbling attempt:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=devel/kunit/usercopy
> > >
> > > I really wish KUnit had userspace mapping support -- I have a bunch of
> > > unit tests that need to get built up around checking for regressions
> > > here, etc.
> >
> > Hi Kees,
> >
> > Sorry the the delayed response!
> >
> > Alas, my attempts to get this to work haven't been much more
> > successful than yours. It's definitely something we'd like to support,
> > but I confess to not knowing enough about the mm code to know exactly
> > what would be involved.
> >
> > The workaround is to load tests as modules, and use something like
> > Vitor's original patch here:
> > https://lore.kernel.org/all/20200721174036.71072-1-vitor@massaru.org/
> >
> > Basically, using the existing mm of the module loader. Adapting those
> > changes to your branch (and fixing a couple of back-to-front KUnit
> > assertions) does work for me when built as a module, in an x86_64 vm:
> >
> > root@slicestar:~# modprobe usercopy_kunit
> > [   52.986290]     # Subtest: usercopy
> > [   52.986701]     1..1
> > [   53.246058]     ok 1 - usercopy_test
> > [   53.246628] ok 1 - usercopy
> >
> > But getting it to work with built-in tests hasn't been successful so
> > far. I wondered if we could just piggy-back on init_mm or similar, but
> > that doesn't seem to work either.
> >
> > So, in the short-term, this is only possible for modules. If that's
> > useful enough, we can get Vitor's support patch (or something similar)
> > in, and just mark any tests module-only (or have them skip if there's
> > no mm). Because kunit.py only runs built-in tests, though, it's
> > definitely less convenient.
>
> Thanks for any pointers! :)
>
> -Kees
>
> --
> Kees Cook

FWIW, I had another quick look at this yesterday, and I suspect that
(at least one of) the problem(s) is that function pointers like
mm->get_unmapped_area are only setup as part of exec(), so a
newly-created mm isn't actually useful. Looking at, e.g.,
arch_pick_mm_layout(), there's a whole bunch of architecture-dependent
stuff here to handle, e.g., 32-bit compat.

Cheers,
-- David

fs/binfmt_elf: Fix memsz > filesz handling

Commit Message

Comments

Patch