[RFC,11/11] x86/boot: Move "boot heap" out of .bss
diff mbox series

Message ID 20200205223950.1212394-12-kristen@linux.intel.com
State New
Headers show
Series
  • Finer grained kernel address space randomization
Related show

Commit Message

Kristen Carlson Accardi Feb. 5, 2020, 10:39 p.m. UTC
From: Kees Cook <keescook@chromium.org>

Currently the on-disk decompression image includes the "dynamic" heap
region that is used for malloc() during kernel extraction, relocation,
and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss
section). It makes no sense to balloon the bzImage with "boot_heap"
as it is zeroed at boot, and acts much more like a "brk" region.

This seems to be a trivial change because head_{64,32}.S already only
copies up to the start of the .bss section, so any growth in the .bss
area was already not meaningful when placing the image in memory. The
.bss size is, however, reflected in the boot params "init_size", so the
memory range calculations included the "boot_heap" region. Instead of
wasting the on-disk image size bytes, just account for this heap area
when identifying the mem_avoid ranges, and leave it out of the .bss
section entirely. For good measure, also zero initialize it, as this
was already happening for when zeroing the entire .bss section.

While the bzImage size is dominated by the compressed vmlinux, the
difference removes 64KB for all compressors except bzip2, which removes
4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ:

-rw-r--r-- 1 kees kees 7813168 Feb  2 23:39 arch/x86/boot/bzImage.stock-xz
-rw-r--r-- 1 kees kees 7747632 Feb  2 23:42 arch/x86/boot/bzImage.brk-xz

but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%):

-rw-r--r-- 1 kees kees 15231024 Feb  2 23:44 arch/x86/boot/bzImage.stock-bzip2
-rw-r--r-- 1 kees kees 11036720 Feb  2 23:47 arch/x86/boot/bzImage.brk-bzip2

For the future fine-grain KASLR work, this will avoid significant pain,
as the ELF section parser will use much more memory during boot and
filling the bzImage with megabytes of zeros seemed like a poor idea. :)

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/boot/compressed/head_32.S     | 5 ++---
 arch/x86/boot/compressed/head_64.S     | 7 +++----
 arch/x86/boot/compressed/kaslr.c       | 2 +-
 arch/x86/boot/compressed/misc.c        | 3 +++
 arch/x86/boot/compressed/vmlinux.lds.S | 1 +
 5 files changed, 10 insertions(+), 8 deletions(-)

Comments

Arvind Sankar Feb. 6, 2020, 12:11 a.m. UTC | #1
On Wed, Feb 05, 2020 at 02:39:50PM -0800, Kristen Carlson Accardi wrote:
> From: Kees Cook <keescook@chromium.org>
> 
> Currently the on-disk decompression image includes the "dynamic" heap
> region that is used for malloc() during kernel extraction, relocation,
> and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss
> section). It makes no sense to balloon the bzImage with "boot_heap"
> as it is zeroed at boot, and acts much more like a "brk" region.
> 
> This seems to be a trivial change because head_{64,32}.S already only
> copies up to the start of the .bss section, so any growth in the .bss
> area was already not meaningful when placing the image in memory. The
> .bss size is, however, reflected in the boot params "init_size", so the
> memory range calculations included the "boot_heap" region. Instead of
> wasting the on-disk image size bytes, just account for this heap area
> when identifying the mem_avoid ranges, and leave it out of the .bss
> section entirely. For good measure, also zero initialize it, as this
> was already happening for when zeroing the entire .bss section.
> 
> While the bzImage size is dominated by the compressed vmlinux, the
> difference removes 64KB for all compressors except bzip2, which removes
> 4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ:
> 
> -rw-r--r-- 1 kees kees 7813168 Feb  2 23:39 arch/x86/boot/bzImage.stock-xz
> -rw-r--r-- 1 kees kees 7747632 Feb  2 23:42 arch/x86/boot/bzImage.brk-xz
> 
> but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%):
> 
> -rw-r--r-- 1 kees kees 15231024 Feb  2 23:44 arch/x86/boot/bzImage.stock-bzip2
> -rw-r--r-- 1 kees kees 11036720 Feb  2 23:47 arch/x86/boot/bzImage.brk-bzip2
> 
> For the future fine-grain KASLR work, this will avoid significant pain,
> as the ELF section parser will use much more memory during boot and
> filling the bzImage with megabytes of zeros seemed like a poor idea. :)
> 

I'm not sure I follow this: the reason the bzImage currently contains
.bss and a fix for it is in a patch I have out for review at
https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu

This alone shouldn't make much of a difference across compressors. The
entire .bss is just stored uncompressed as 0's in bzImage currently.
The only thing that gets compressed is the original kernel ELF file. Is
the difference above just from this patch, or is it including the
overhead of function-sections?

It is not necessary for it to contain .bss to get the correct init_size.
The latter is calculated (in x86/boot/header.S) based on the offset of
the _end symbol in the compressed vmlinux, so storing the .bss is just a
bug.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559

From the cover letter:
> Image Size
> ----------
> Adding additional section headers as a result of compiling with
> -ffunction-sections will increase the size of the vmlinux ELF file. In
> addition, the vmlinux.bin file generated in arch/x86/boot/compressed by
> objcopy grows significantly with the current POC implementation. This is
> because the boot heap size must be dramatically increased to support shuffling
> the sections and re-sorting kallsyms. With a sample kernel compilation using a
> stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR was enabled.
> This is because the boot heap area is included in the image itself.
> 
> It is possible to mitigate this issue by moving the boot heap out of .bss.
> Kees Cook has a prototype of this working, and it is included in this
> patchset.

I am also confused by this -- the boot heap is not part of the
vmlinux.bin in arch/x86/boot/compressed: that's a stripped copy of the
decompressed kernel, just before we apply the selected compression to it
and vmlinux.relocs.

Do you mean arch/x86/boot/vmlinux.bin? That is an objcopy of
compressed/vmlinux, and it grows in size with increasing .bss for the
same reason as above (rather it's the cause of bzImage growing).
Kristen Carlson Accardi Feb. 6, 2020, 12:33 a.m. UTC | #2
On Wed, 2020-02-05 at 19:11 -0500, Arvind Sankar wrote:
> On Wed, Feb 05, 2020 at 02:39:50PM -0800, Kristen Carlson Accardi
> wrote:
> > From: Kees Cook <keescook@chromium.org>
> > 
> > Currently the on-disk decompression image includes the "dynamic"
> > heap
> > region that is used for malloc() during kernel extraction,
> > relocation,
> > and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss
> > section). It makes no sense to balloon the bzImage with "boot_heap"
> > as it is zeroed at boot, and acts much more like a "brk" region.
> > 
> > This seems to be a trivial change because head_{64,32}.S already
> > only
> > copies up to the start of the .bss section, so any growth in the
> > .bss
> > area was already not meaningful when placing the image in memory.
> > The
> > .bss size is, however, reflected in the boot params "init_size", so
> > the
> > memory range calculations included the "boot_heap" region. Instead
> > of
> > wasting the on-disk image size bytes, just account for this heap
> > area
> > when identifying the mem_avoid ranges, and leave it out of the .bss
> > section entirely. For good measure, also zero initialize it, as
> > this
> > was already happening for when zeroing the entire .bss section.
> > 
> > While the bzImage size is dominated by the compressed vmlinux, the
> > difference removes 64KB for all compressors except bzip2, which
> > removes
> > 4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ:
> > 
> > -rw-r--r-- 1 kees kees 7813168 Feb  2 23:39
> > arch/x86/boot/bzImage.stock-xz
> > -rw-r--r-- 1 kees kees 7747632 Feb  2 23:42
> > arch/x86/boot/bzImage.brk-xz
> > 
> > but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%):
> > 
> > -rw-r--r-- 1 kees kees 15231024 Feb  2 23:44
> > arch/x86/boot/bzImage.stock-bzip2
> > -rw-r--r-- 1 kees kees 11036720 Feb  2 23:47
> > arch/x86/boot/bzImage.brk-bzip2
> > 
> > For the future fine-grain KASLR work, this will avoid significant
> > pain,
> > as the ELF section parser will use much more memory during boot and
> > filling the bzImage with megabytes of zeros seemed like a poor
> > idea. :)
> > 
> 
> I'm not sure I follow this: the reason the bzImage currently contains
> .bss and a fix for it is in a patch I have out for review at
> https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu
> 
> This alone shouldn't make much of a difference across compressors.
> The
> entire .bss is just stored uncompressed as 0's in bzImage currently.
> The only thing that gets compressed is the original kernel ELF file.
> Is
> the difference above just from this patch, or is it including the
> overhead of function-sections?
> 
> It is not necessary for it to contain .bss to get the correct
> init_size.
> The latter is calculated (in x86/boot/header.S) based on the offset
> of
> the _end symbol in the compressed vmlinux, so storing the .bss is
> just a
> bug.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559
> 
> From the cover letter:
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > In
> > addition, the vmlinux.bin file generated in
> > arch/x86/boot/compressed by
> > objcopy grows significantly with the current POC implementation.
> > This is
> > because the boot heap size must be dramatically increased to
> > support shuffling
> > the sections and re-sorting kallsyms. With a sample kernel
> > compilation using a
> > stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR
> > was enabled.
> > This is because the boot heap area is included in the image itself.
> > 
> > It is possible to mitigate this issue by moving the boot heap out
> > of .bss.
> > Kees Cook has a prototype of this working, and it is included in
> > this
> > patchset.
> 
> I am also confused by this -- the boot heap is not part of the
> vmlinux.bin in arch/x86/boot/compressed: that's a stripped copy of
> the
> decompressed kernel, just before we apply the selected compression to
> it
> and vmlinux.relocs.
> 
> Do you mean arch/x86/boot/vmlinux.bin? That is an objcopy of
> compressed/vmlinux, and it grows in size with increasing .bss for the
> same reason as above (rather it's the cause of bzImage growing).

Right, sorry for the confusion - I see now that I could have worded
that better. the cover letter should say "In addition, the vmlinux.bin
file generated by the objcopy in arch/x86/boot/compressed/Makefile
grows significantly with the current POC implementation."
Kees Cook Feb. 6, 2020, 11:13 a.m. UTC | #3
On Wed, Feb 05, 2020 at 07:11:05PM -0500, Arvind Sankar wrote:
> From: Kees Cook <keescook@chromium.org>
> > This seems to be a trivial change because head_{64,32}.S already only
> > copies up to the start of the .bss section, so any growth in the .bss
> > area was already not meaningful when placing the image in memory. The
> > .bss size is, however, reflected in the boot params "init_size", so the
> > memory range calculations included the "boot_heap" region. Instead of
> > wasting the on-disk image size bytes, just account for this heap area
> > when identifying the mem_avoid ranges, and leave it out of the .bss
> > section entirely. For good measure, also zero initialize it, as this
> > was already happening for when zeroing the entire .bss section.
> 
> I'm not sure I follow this: the reason the bzImage currently contains
> .bss and a fix for it is in a patch I have out for review at
> https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu

Ah! Thank you. Yes, that's _much_ cleaner. I could not figure out why
the linker was actually keeping the .bss section allocated in the
on-disk image. :) We've only had this bug for 10 years. ;)

> This alone shouldn't make much of a difference across compressors. The
> entire .bss is just stored uncompressed as 0's in bzImage currently.
> The only thing that gets compressed is the original kernel ELF file. Is
> the difference above just from this patch, or is it including the
> overhead of function-sections?

With bzip2, it's a 4MB heap in .bss. Other compressors are 64KB. With
fg-kaslr, the heap is 64MB in .bss. It made the bzImage huge. ;) Another
thought I had to deal with the memory utilization in the fg-kaslr shuffle
was to actually choose _two_ kernel locations in memory (via a refactoring
of choose_random_location()). One to decompress into and the other to
write out during the shuffle. Though the symbol table still needs to be
reconstructed, etc, so probably just best to leave it all in the regular
heap (or improve the ZO heap allocator which doesn't really implement
free()).

> It is not necessary for it to contain .bss to get the correct init_size.
> The latter is calculated (in x86/boot/header.S) based on the offset of
> the _end symbol in the compressed vmlinux, so storing the .bss is just a
> bug.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559

Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I
was staring at this, since I only looked around the compressed/ directory.
:)

I should add this:

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d7408af55738..346e36ae163e 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -346,7 +346,7 @@ static void handle_mem_options(void)
  * in header.S, and the memory diagram is based on the one found in
  * misc.c.
  *
  * The following conditions are already enforced by the image layouts
  * and
- * associated code:
+ * associated code (see ../boot/header.S):
  *  - input + input_size >= output + output_size
  *  - kernel_total_size <= init_size
  *  - kernel_total_size <= output_size (see Note below)
Arvind Sankar Feb. 6, 2020, 2:25 p.m. UTC | #4
On Thu, Feb 06, 2020 at 03:13:12AM -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 07:11:05PM -0500, Arvind Sankar wrote:
> > From: Kees Cook <keescook@chromium.org>
> > > This seems to be a trivial change because head_{64,32}.S already only
> > > copies up to the start of the .bss section, so any growth in the .bss
> > > area was already not meaningful when placing the image in memory. The
> > > .bss size is, however, reflected in the boot params "init_size", so the
> > > memory range calculations included the "boot_heap" region. Instead of
> > > wasting the on-disk image size bytes, just account for this heap area
> > > when identifying the mem_avoid ranges, and leave it out of the .bss
> > > section entirely. For good measure, also zero initialize it, as this
> > > was already happening for when zeroing the entire .bss section.
> > 
> > I'm not sure I follow this: the reason the bzImage currently contains
> > .bss and a fix for it is in a patch I have out for review at
> > https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu
> 
> Ah! Thank you. Yes, that's _much_ cleaner. I could not figure out why
> the linker was actually keeping the .bss section allocated in the
> on-disk image. :) We've only had this bug for 10 years. ;)
> 
> > This alone shouldn't make much of a difference across compressors. The
> > entire .bss is just stored uncompressed as 0's in bzImage currently.
> > The only thing that gets compressed is the original kernel ELF file. Is
> > the difference above just from this patch, or is it including the
> > overhead of function-sections?
> 
> With bzip2, it's a 4MB heap in .bss. Other compressors are 64KB. With
> fg-kaslr, the heap is 64MB in .bss. It made the bzImage huge. ;) Another

Ah, I just saw that. Makes more sense now -- so my patch actually saves
~4MiB even now for bz2-compressed bzImages.

> thought I had to deal with the memory utilization in the fg-kaslr shuffle
> was to actually choose _two_ kernel locations in memory (via a refactoring
> of choose_random_location()). One to decompress into and the other to
> write out during the shuffle. Though the symbol table still needs to be
> reconstructed, etc, so probably just best to leave it all in the regular
> heap (or improve the ZO heap allocator which doesn't really implement
> free()).
> 
> > It is not necessary for it to contain .bss to get the correct init_size.
> > The latter is calculated (in x86/boot/header.S) based on the offset of
> > the _end symbol in the compressed vmlinux, so storing the .bss is just a
> > bug.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559
> 
> Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I
> was staring at this, since I only looked around the compressed/ directory.
> :)
> 

There's another thing I noticed -- you would need to ensure that the
init_size in the header covers your boot heap even if you did split it
out. The reason is that the bootloader will only know to reserve enough
memory for init_size: it's possible it might put the initrd or something
else following the kernel, or theoretically there might be reserved
memory regions or the end of physical RAM immediately following, so you
can't assume that area will be available when you get to extract_kernel.
Kees Cook Feb. 6, 2020, 9:32 p.m. UTC | #5
On Thu, Feb 06, 2020 at 09:25:59AM -0500, Arvind Sankar wrote:
> On Thu, Feb 06, 2020 at 03:13:12AM -0800, Kees Cook wrote:
> > Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I
> > was staring at this, since I only looked around the compressed/ directory.
> > :)
> > 
> 
> There's another thing I noticed -- you would need to ensure that the
> init_size in the header covers your boot heap even if you did split it
> out. The reason is that the bootloader will only know to reserve enough
> memory for init_size: it's possible it might put the initrd or something
> else following the kernel, or theoretically there might be reserved
> memory regions or the end of physical RAM immediately following, so you
> can't assume that area will be available when you get to extract_kernel.

Yeah, that's what I was worrying about after I wrote that patch. Yours
is the correct solution. :) (I Acked both of those now).

Patch
diff mbox series

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index f2dfd6d083ef..1f3de8efd40e 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -59,6 +59,7 @@ 
 	.hidden _ebss
 	.hidden _got
 	.hidden _egot
+	.hidden _brk
 
 	__HEAD
 SYM_FUNC_START(startup_32)
@@ -249,7 +250,7 @@  SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
-	leal	boot_heap(%ebx), %eax
+	leal	_brk(%ebx), %eax
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	extract_kernel	/* returns kernel location in %eax */
@@ -276,8 +277,6 @@  efi32_config:
  */
 	.bss
 	.balign 4
-boot_heap:
-	.fill BOOT_HEAP_SIZE, 1, 0
 boot_stack:
 	.fill BOOT_STACK_SIZE, 1, 0
 boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index ee60b81944a7..850bc5220a8d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -42,6 +42,7 @@ 
 	.hidden _ebss
 	.hidden _got
 	.hidden _egot
+	.hidden _brk
 
 	__HEAD
 	.code32
@@ -534,7 +535,7 @@  SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
  */
 	pushq	%rsi			/* Save the real mode argument */
 	movq	%rsi, %rdi		/* real mode address */
-	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
+	leaq	_brk(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
 	movl	$z_input_len, %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
@@ -701,12 +702,10 @@  SYM_DATA_END(efi64_config)
 #endif /* CONFIG_EFI_STUB */
 
 /*
- * Stack and heap for uncompression
+ * Stack for placement and uncompression
  */
 	.bss
 	.balign 4
-SYM_DATA_LOCAL(boot_heap,	.fill BOOT_HEAP_SIZE, 1, 0)
-
 SYM_DATA_START_LOCAL(boot_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
 SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index ae4dce76a9bd..da64d2cdbb42 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -397,7 +397,7 @@  static void handle_mem_options(void)
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
 			   unsigned long output)
 {
-	unsigned long init_size = boot_params->hdr.init_size;
+	unsigned long init_size = boot_params->hdr.init_size + BOOT_HEAP_SIZE;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
 	char *ptr;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 977da0911ce7..cb12da264b59 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -463,6 +463,9 @@  asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 
 	debug_putstr("early console in extract_kernel\n");
 
+	/* Zero what is effectively our .brk section. */
+	memset((void *)heap, 0, BOOT_HEAP_SIZE);
+	debug_putaddr(heap);
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 508cfa6828c5..3ce690474940 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,4 +73,5 @@  SECTIONS
 #endif
 	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
+	_brk = .;
 }