Message ID | 20200205223950.1212394-12-kristen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Finer grained kernel address space randomization | expand |
On Wed, Feb 05, 2020 at 02:39:50PM -0800, Kristen Carlson Accardi wrote: > From: Kees Cook <keescook@chromium.org> > > Currently the on-disk decompression image includes the "dynamic" heap > region that is used for malloc() during kernel extraction, relocation, > and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss > section). It makes no sense to balloon the bzImage with "boot_heap" > as it is zeroed at boot, and acts much more like a "brk" region. > > This seems to be a trivial change because head_{64,32}.S already only > copies up to the start of the .bss section, so any growth in the .bss > area was already not meaningful when placing the image in memory. The > .bss size is, however, reflected in the boot params "init_size", so the > memory range calculations included the "boot_heap" region. Instead of > wasting the on-disk image size bytes, just account for this heap area > when identifying the mem_avoid ranges, and leave it out of the .bss > section entirely. For good measure, also zero initialize it, as this > was already happening for when zeroing the entire .bss section. > > While the bzImage size is dominated by the compressed vmlinux, the > difference removes 64KB for all compressors except bzip2, which removes > 4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ: > > -rw-r--r-- 1 kees kees 7813168 Feb 2 23:39 arch/x86/boot/bzImage.stock-xz > -rw-r--r-- 1 kees kees 7747632 Feb 2 23:42 arch/x86/boot/bzImage.brk-xz > > but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%): > > -rw-r--r-- 1 kees kees 15231024 Feb 2 23:44 arch/x86/boot/bzImage.stock-bzip2 > -rw-r--r-- 1 kees kees 11036720 Feb 2 23:47 arch/x86/boot/bzImage.brk-bzip2 > > For the future fine-grain KASLR work, this will avoid significant pain, > as the ELF section parser will use much more memory during boot and > filling the bzImage with megabytes of zeros seemed like a poor idea. :) > I'm not sure I follow this: the reason the bzImage currently contains .bss and a fix for it is in a patch I have out for review at https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu This alone shouldn't make much of a difference across compressors. The entire .bss is just stored uncompressed as 0's in bzImage currently. The only thing that gets compressed is the original kernel ELF file. Is the difference above just from this patch, or is it including the overhead of function-sections? It is not necessary for it to contain .bss to get the correct init_size. The latter is calculated (in x86/boot/header.S) based on the offset of the _end symbol in the compressed vmlinux, so storing the .bss is just a bug. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559 From the cover letter: > Image Size > ---------- > Adding additional section headers as a result of compiling with > -ffunction-sections will increase the size of the vmlinux ELF file. In > addition, the vmlinux.bin file generated in arch/x86/boot/compressed by > objcopy grows significantly with the current POC implementation. This is > because the boot heap size must be dramatically increased to support shuffling > the sections and re-sorting kallsyms. With a sample kernel compilation using a > stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR was enabled. > This is because the boot heap area is included in the image itself. > > It is possible to mitigate this issue by moving the boot heap out of .bss. > Kees Cook has a prototype of this working, and it is included in this > patchset. I am also confused by this -- the boot heap is not part of the vmlinux.bin in arch/x86/boot/compressed: that's a stripped copy of the decompressed kernel, just before we apply the selected compression to it and vmlinux.relocs. Do you mean arch/x86/boot/vmlinux.bin? That is an objcopy of compressed/vmlinux, and it grows in size with increasing .bss for the same reason as above (rather it's the cause of bzImage growing).
On Wed, 2020-02-05 at 19:11 -0500, Arvind Sankar wrote: > On Wed, Feb 05, 2020 at 02:39:50PM -0800, Kristen Carlson Accardi > wrote: > > From: Kees Cook <keescook@chromium.org> > > > > Currently the on-disk decompression image includes the "dynamic" > > heap > > region that is used for malloc() during kernel extraction, > > relocation, > > and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss > > section). It makes no sense to balloon the bzImage with "boot_heap" > > as it is zeroed at boot, and acts much more like a "brk" region. > > > > This seems to be a trivial change because head_{64,32}.S already > > only > > copies up to the start of the .bss section, so any growth in the > > .bss > > area was already not meaningful when placing the image in memory. > > The > > .bss size is, however, reflected in the boot params "init_size", so > > the > > memory range calculations included the "boot_heap" region. Instead > > of > > wasting the on-disk image size bytes, just account for this heap > > area > > when identifying the mem_avoid ranges, and leave it out of the .bss > > section entirely. For good measure, also zero initialize it, as > > this > > was already happening for when zeroing the entire .bss section. > > > > While the bzImage size is dominated by the compressed vmlinux, the > > difference removes 64KB for all compressors except bzip2, which > > removes > > 4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ: > > > > -rw-r--r-- 1 kees kees 7813168 Feb 2 23:39 > > arch/x86/boot/bzImage.stock-xz > > -rw-r--r-- 1 kees kees 7747632 Feb 2 23:42 > > arch/x86/boot/bzImage.brk-xz > > > > but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%): > > > > -rw-r--r-- 1 kees kees 15231024 Feb 2 23:44 > > arch/x86/boot/bzImage.stock-bzip2 > > -rw-r--r-- 1 kees kees 11036720 Feb 2 23:47 > > arch/x86/boot/bzImage.brk-bzip2 > > > > For the future fine-grain KASLR work, this will avoid significant > > pain, > > as the ELF section parser will use much more memory during boot and > > filling the bzImage with megabytes of zeros seemed like a poor > > idea. :) > > > > I'm not sure I follow this: the reason the bzImage currently contains > .bss and a fix for it is in a patch I have out for review at > https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu > > This alone shouldn't make much of a difference across compressors. > The > entire .bss is just stored uncompressed as 0's in bzImage currently. > The only thing that gets compressed is the original kernel ELF file. > Is > the difference above just from this patch, or is it including the > overhead of function-sections? > > It is not necessary for it to contain .bss to get the correct > init_size. > The latter is calculated (in x86/boot/header.S) based on the offset > of > the _end symbol in the compressed vmlinux, so storing the .bss is > just a > bug. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559 > > From the cover letter: > > Image Size > > ---------- > > Adding additional section headers as a result of compiling with > > -ffunction-sections will increase the size of the vmlinux ELF file. > > In > > addition, the vmlinux.bin file generated in > > arch/x86/boot/compressed by > > objcopy grows significantly with the current POC implementation. > > This is > > because the boot heap size must be dramatically increased to > > support shuffling > > the sections and re-sorting kallsyms. With a sample kernel > > compilation using a > > stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR > > was enabled. > > This is because the boot heap area is included in the image itself. > > > > It is possible to mitigate this issue by moving the boot heap out > > of .bss. > > Kees Cook has a prototype of this working, and it is included in > > this > > patchset. > > I am also confused by this -- the boot heap is not part of the > vmlinux.bin in arch/x86/boot/compressed: that's a stripped copy of > the > decompressed kernel, just before we apply the selected compression to > it > and vmlinux.relocs. > > Do you mean arch/x86/boot/vmlinux.bin? That is an objcopy of > compressed/vmlinux, and it grows in size with increasing .bss for the > same reason as above (rather it's the cause of bzImage growing). Right, sorry for the confusion - I see now that I could have worded that better. the cover letter should say "In addition, the vmlinux.bin file generated by the objcopy in arch/x86/boot/compressed/Makefile grows significantly with the current POC implementation."
On Wed, Feb 05, 2020 at 07:11:05PM -0500, Arvind Sankar wrote: > From: Kees Cook <keescook@chromium.org> > > This seems to be a trivial change because head_{64,32}.S already only > > copies up to the start of the .bss section, so any growth in the .bss > > area was already not meaningful when placing the image in memory. The > > .bss size is, however, reflected in the boot params "init_size", so the > > memory range calculations included the "boot_heap" region. Instead of > > wasting the on-disk image size bytes, just account for this heap area > > when identifying the mem_avoid ranges, and leave it out of the .bss > > section entirely. For good measure, also zero initialize it, as this > > was already happening for when zeroing the entire .bss section. > > I'm not sure I follow this: the reason the bzImage currently contains > .bss and a fix for it is in a patch I have out for review at > https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu Ah! Thank you. Yes, that's _much_ cleaner. I could not figure out why the linker was actually keeping the .bss section allocated in the on-disk image. :) We've only had this bug for 10 years. ;) > This alone shouldn't make much of a difference across compressors. The > entire .bss is just stored uncompressed as 0's in bzImage currently. > The only thing that gets compressed is the original kernel ELF file. Is > the difference above just from this patch, or is it including the > overhead of function-sections? With bzip2, it's a 4MB heap in .bss. Other compressors are 64KB. With fg-kaslr, the heap is 64MB in .bss. It made the bzImage huge. ;) Another thought I had to deal with the memory utilization in the fg-kaslr shuffle was to actually choose _two_ kernel locations in memory (via a refactoring of choose_random_location()). One to decompress into and the other to write out during the shuffle. Though the symbol table still needs to be reconstructed, etc, so probably just best to leave it all in the regular heap (or improve the ZO heap allocator which doesn't really implement free()). > It is not necessary for it to contain .bss to get the correct init_size. > The latter is calculated (in x86/boot/header.S) based on the offset of > the _end symbol in the compressed vmlinux, so storing the .bss is just a > bug. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559 Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I was staring at this, since I only looked around the compressed/ directory. :) I should add this: diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index d7408af55738..346e36ae163e 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -346,7 +346,7 @@ static void handle_mem_options(void) * in header.S, and the memory diagram is based on the one found in * misc.c. * * The following conditions are already enforced by the image layouts * and - * associated code: + * associated code (see ../boot/header.S): * - input + input_size >= output + output_size * - kernel_total_size <= init_size * - kernel_total_size <= output_size (see Note below)
On Thu, Feb 06, 2020 at 03:13:12AM -0800, Kees Cook wrote: > On Wed, Feb 05, 2020 at 07:11:05PM -0500, Arvind Sankar wrote: > > From: Kees Cook <keescook@chromium.org> > > > This seems to be a trivial change because head_{64,32}.S already only > > > copies up to the start of the .bss section, so any growth in the .bss > > > area was already not meaningful when placing the image in memory. The > > > .bss size is, however, reflected in the boot params "init_size", so the > > > memory range calculations included the "boot_heap" region. Instead of > > > wasting the on-disk image size bytes, just account for this heap area > > > when identifying the mem_avoid ranges, and leave it out of the .bss > > > section entirely. For good measure, also zero initialize it, as this > > > was already happening for when zeroing the entire .bss section. > > > > I'm not sure I follow this: the reason the bzImage currently contains > > .bss and a fix for it is in a patch I have out for review at > > https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu > > Ah! Thank you. Yes, that's _much_ cleaner. I could not figure out why > the linker was actually keeping the .bss section allocated in the > on-disk image. :) We've only had this bug for 10 years. ;) > > > This alone shouldn't make much of a difference across compressors. The > > entire .bss is just stored uncompressed as 0's in bzImage currently. > > The only thing that gets compressed is the original kernel ELF file. Is > > the difference above just from this patch, or is it including the > > overhead of function-sections? > > With bzip2, it's a 4MB heap in .bss. Other compressors are 64KB. With > fg-kaslr, the heap is 64MB in .bss. It made the bzImage huge. ;) Another Ah, I just saw that. Makes more sense now -- so my patch actually saves ~4MiB even now for bz2-compressed bzImages. > thought I had to deal with the memory utilization in the fg-kaslr shuffle > was to actually choose _two_ kernel locations in memory (via a refactoring > of choose_random_location()). One to decompress into and the other to > write out during the shuffle. Though the symbol table still needs to be > reconstructed, etc, so probably just best to leave it all in the regular > heap (or improve the ZO heap allocator which doesn't really implement > free()). > > > It is not necessary for it to contain .bss to get the correct init_size. > > The latter is calculated (in x86/boot/header.S) based on the offset of > > the _end symbol in the compressed vmlinux, so storing the .bss is just a > > bug. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559 > > Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I > was staring at this, since I only looked around the compressed/ directory. > :) > There's another thing I noticed -- you would need to ensure that the init_size in the header covers your boot heap even if you did split it out. The reason is that the bootloader will only know to reserve enough memory for init_size: it's possible it might put the initrd or something else following the kernel, or theoretically there might be reserved memory regions or the end of physical RAM immediately following, so you can't assume that area will be available when you get to extract_kernel.
On Thu, Feb 06, 2020 at 09:25:59AM -0500, Arvind Sankar wrote: > On Thu, Feb 06, 2020 at 03:13:12AM -0800, Kees Cook wrote: > > Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I > > was staring at this, since I only looked around the compressed/ directory. > > :) > > > > There's another thing I noticed -- you would need to ensure that the > init_size in the header covers your boot heap even if you did split it > out. The reason is that the bootloader will only know to reserve enough > memory for init_size: it's possible it might put the initrd or something > else following the kernel, or theoretically there might be reserved > memory regions or the end of physical RAM immediately following, so you > can't assume that area will be available when you get to extract_kernel. Yeah, that's what I was worrying about after I wrote that patch. Yours is the correct solution. :) (I Acked both of those now).
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index f2dfd6d083ef..1f3de8efd40e 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -59,6 +59,7 @@ .hidden _ebss .hidden _got .hidden _egot + .hidden _brk __HEAD SYM_FUNC_START(startup_32) @@ -249,7 +250,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated) pushl $z_input_len /* input_len */ leal input_data(%ebx), %eax pushl %eax /* input_data */ - leal boot_heap(%ebx), %eax + leal _brk(%ebx), %eax pushl %eax /* heap area */ pushl %esi /* real mode pointer */ call extract_kernel /* returns kernel location in %eax */ @@ -276,8 +277,6 @@ efi32_config: */ .bss .balign 4 -boot_heap: - .fill BOOT_HEAP_SIZE, 1, 0 boot_stack: .fill BOOT_STACK_SIZE, 1, 0 boot_stack_end: diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index ee60b81944a7..850bc5220a8d 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -42,6 +42,7 @@ .hidden _ebss .hidden _got .hidden _egot + .hidden _brk __HEAD .code32 @@ -534,7 +535,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated) */ pushq %rsi /* Save the real mode argument */ movq %rsi, %rdi /* real mode address */ - leaq boot_heap(%rip), %rsi /* malloc area for uncompression */ + leaq _brk(%rip), %rsi /* malloc area for uncompression */ leaq input_data(%rip), %rdx /* input_data */ movl $z_input_len, %ecx /* input_len */ movq %rbp, %r8 /* output target address */ @@ -701,12 +702,10 @@ SYM_DATA_END(efi64_config) #endif /* CONFIG_EFI_STUB */ /* - * Stack and heap for uncompression + * Stack for placement and uncompression */ .bss .balign 4 -SYM_DATA_LOCAL(boot_heap, .fill BOOT_HEAP_SIZE, 1, 0) - SYM_DATA_START_LOCAL(boot_stack) .fill BOOT_STACK_SIZE, 1, 0 SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index ae4dce76a9bd..da64d2cdbb42 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -397,7 +397,7 @@ static void handle_mem_options(void) static void mem_avoid_init(unsigned long input, unsigned long input_size, unsigned long output) { - unsigned long init_size = boot_params->hdr.init_size; + unsigned long init_size = boot_params->hdr.init_size + BOOT_HEAP_SIZE; u64 initrd_start, initrd_size; u64 cmd_line, cmd_line_size; char *ptr; diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 977da0911ce7..cb12da264b59 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -463,6 +463,9 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, debug_putstr("early console in extract_kernel\n"); + /* Zero what is effectively our .brk section. */ + memset((void *)heap, 0, BOOT_HEAP_SIZE); + debug_putaddr(heap); free_mem_ptr = heap; /* Heap */ free_mem_end_ptr = heap + BOOT_HEAP_SIZE; diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S index 508cfa6828c5..3ce690474940 100644 --- a/arch/x86/boot/compressed/vmlinux.lds.S +++ b/arch/x86/boot/compressed/vmlinux.lds.S @@ -73,4 +73,5 @@ SECTIONS #endif . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */ _end = .; + _brk = .; }