Message ID | 20131126203727.GA352@www.outflux.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Kees Cook <keescook@chromium.org> wrote: > On a defconfig x86_64 build (with CONFIG_CC_STACKPROTECTOR enabled), the > delta in size is just under 9% larger: > > -rwxrwxr-x 1 kees kees 22134340 Nov 26 10:28 vmlinux.gcc-4.8 > -rwxrwxr-x 1 kees kees 22123870 Nov 26 10:40 vmlinux.gcc-4.9 > -rwxrwxr-x 1 kees kees 24225118 Nov 26 10:42 vmlinux.gcc-4.9+strong Please run it through 'size' so that we know the real text size increases. If the cost of -fstack-protector-strong is really +9% in kernel text size then that's rather significant! If this option blows up our performance critical codepaths as well then this will likely cause a runtime slowdown as well, in addition to the increase in I$ footprint. That needs to be measured. CONFIG_CC_STACKPROTECTOR=y is relatively cheap today. For example on x86-64 defconfig: text data bss dec filename 11378972 1455056 1191936 14025964 vmlinux # CONFIG_CC_STACKPROTECTOR is not set 11420243 1455056 1191936 14067235 vmlinux CONFIG_CC_STACKPROTECTOR=y that's a +0.3% cost currently. Thanks, Ingo
On Wed, Nov 27, 2013 at 3:27 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Kees Cook <keescook@chromium.org> wrote: > >> On a defconfig x86_64 build (with CONFIG_CC_STACKPROTECTOR enabled), the >> delta in size is just under 9% larger: >> >> -rwxrwxr-x 1 kees kees 22134340 Nov 26 10:28 vmlinux.gcc-4.8 >> -rwxrwxr-x 1 kees kees 22123870 Nov 26 10:40 vmlinux.gcc-4.9 >> -rwxrwxr-x 1 kees kees 24225118 Nov 26 10:42 vmlinux.gcc-4.9+strong > > Please run it through 'size' so that we know the real text size > increases. text data bss dec hex filename 11407474 1453792 1191936 14053202 d66f52 vmlinux.gcc-4.8 11458837 1457504 1191936 14108277 d74675 vmlinux.gcc-4.9 11682929 1457504 1191936 14332369 dab1d1 vmlinux.gcc-4.9+strong Looks to be 2% for defconfig. That's way better. Shall I send a v3? > If the cost of -fstack-protector-strong is really +9% in kernel text > size then that's rather significant! > > If this option blows up our performance critical codepaths as well > then this will likely cause a runtime slowdown as well, in addition to > the increase in I$ footprint. That needs to be measured. > > CONFIG_CC_STACKPROTECTOR=y is relatively cheap today. For example on > x86-64 defconfig: > > text data bss dec filename > 11378972 1455056 1191936 14025964 vmlinux # CONFIG_CC_STACKPROTECTOR is not set > 11420243 1455056 1191936 14067235 vmlinux CONFIG_CC_STACKPROTECTOR=y > > that's a +0.3% cost currently. Yeah -- not a lot of functions have char arrays. :) > > Thanks, > > Ingo -Kees
* Kees Cook <keescook@chromium.org> wrote: > On Wed, Nov 27, 2013 at 3:27 AM, Ingo Molnar <mingo@kernel.org> wrote: > > > > * Kees Cook <keescook@chromium.org> wrote: > > > >> On a defconfig x86_64 build (with CONFIG_CC_STACKPROTECTOR enabled), the > >> delta in size is just under 9% larger: > >> > >> -rwxrwxr-x 1 kees kees 22134340 Nov 26 10:28 vmlinux.gcc-4.8 > >> -rwxrwxr-x 1 kees kees 22123870 Nov 26 10:40 vmlinux.gcc-4.9 > >> -rwxrwxr-x 1 kees kees 24225118 Nov 26 10:42 vmlinux.gcc-4.9+strong > > > > Please run it through 'size' so that we know the real text size > > increases. > > text data bss dec hex filename > 11407474 1453792 1191936 14053202 d66f52 vmlinux.gcc-4.8 > 11458837 1457504 1191936 14108277 d74675 vmlinux.gcc-4.9 > 11682929 1457504 1191936 14332369 dab1d1 vmlinux.gcc-4.9+strong > > Looks to be 2% for defconfig. That's way better. Shall I send a v3? Well, it's better than 9%, but still almost an order of magnitude higher than the cost is today, and a lot of distros have CONFIG_CC_STACKPROTECTOR=y. So it would be nice to measure how much the instruction count goes up in some realistic system-bound test. How much does something like kernel/built-in.o increase, as per 'size' output? Thanks, Ingo
On 11/27/2013 09:54 AM, Ingo Molnar wrote: >> >> Looks to be 2% for defconfig. That's way better. Shall I send a v3? > > Well, it's better than 9%, but still almost an order of magnitude > higher than the cost is today, and a lot of distros have > CONFIG_CC_STACKPROTECTOR=y. > > So it would be nice to measure how much the instruction count goes up > in some realistic system-bound test. How much does something like > kernel/built-in.o increase, as per 'size' output? > Do we need CONFIG_CC_STACKPROTECTOR_STRONG? -hpa
On Wed, Nov 27, 2013 at 9:55 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 11/27/2013 09:54 AM, Ingo Molnar wrote: >>> >>> Looks to be 2% for defconfig. That's way better. Shall I send a v3? >> >> Well, it's better than 9%, but still almost an order of magnitude >> higher than the cost is today, and a lot of distros have >> CONFIG_CC_STACKPROTECTOR=y. >> >> So it would be nice to measure how much the instruction count goes up >> in some realistic system-bound test. How much does something like >> kernel/built-in.o increase, as per 'size' output? text data bss dec hex filename 929611 90851 594496 1614958 18a46e built-in.o-gcc-4.9 954648 90851 594496 1639995 19063b built-in.o-gcc-4.9+strong Looks like 3% for defconfg + CONFIG_CC_STACKPROTECTOR > > Do we need CONFIG_CC_STACKPROTECTOR_STRONG? I'm hoping to avoid this since nearly anyone using CC_STACKPROTECTOR would want strong added, but as a fallback, I'm happy to implement it as a separate config item. -Kees
On Wed, Nov 27, 2013 at 10:11 AM, Kees Cook <keescook@chromium.org> wrote: > On Wed, Nov 27, 2013 at 9:55 AM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 11/27/2013 09:54 AM, Ingo Molnar wrote: >>>> >>>> Looks to be 2% for defconfig. That's way better. Shall I send a v3? >>> >>> Well, it's better than 9%, but still almost an order of magnitude >>> higher than the cost is today, and a lot of distros have >>> CONFIG_CC_STACKPROTECTOR=y. >>> >>> So it would be nice to measure how much the instruction count goes up >>> in some realistic system-bound test. How much does something like >>> kernel/built-in.o increase, as per 'size' output? > > text data bss dec hex filename > 929611 90851 594496 1614958 18a46e built-in.o-gcc-4.9 > 954648 90851 594496 1639995 19063b built-in.o-gcc-4.9+strong > > Looks like 3% for defconfg + CONFIG_CC_STACKPROTECTOR > >> >> Do we need CONFIG_CC_STACKPROTECTOR_STRONG? > > I'm hoping to avoid this since nearly anyone using CC_STACKPROTECTOR > would want strong added, but as a fallback, I'm happy to implement it > as a separate config item. Any verdict on this? Should I go with adding ..._STRONG like we used to have for ..._ALL, or is defaulting to -strong best? -Kees
* Kees Cook <keescook@chromium.org> wrote: > On Wed, Nov 27, 2013 at 10:11 AM, Kees Cook <keescook@chromium.org> wrote: > > On Wed, Nov 27, 2013 at 9:55 AM, H. Peter Anvin <hpa@zytor.com> wrote: > >> On 11/27/2013 09:54 AM, Ingo Molnar wrote: > >>>> > >>>> Looks to be 2% for defconfig. That's way better. Shall I send a v3? > >>> > >>> Well, it's better than 9%, but still almost an order of magnitude > >>> higher than the cost is today, and a lot of distros have > >>> CONFIG_CC_STACKPROTECTOR=y. > >>> > >>> So it would be nice to measure how much the instruction count goes up > >>> in some realistic system-bound test. How much does something like > >>> kernel/built-in.o increase, as per 'size' output? > > > > text data bss dec hex filename > > 929611 90851 594496 1614958 18a46e built-in.o-gcc-4.9 > > 954648 90851 594496 1639995 19063b built-in.o-gcc-4.9+strong > > > > Looks like 3% for defconfg + CONFIG_CC_STACKPROTECTOR > > > >> > >> Do we need CONFIG_CC_STACKPROTECTOR_STRONG? > > > > I'm hoping to avoid this since nearly anyone using > > CC_STACKPROTECTOR would want strong added, but as a fallback, I'm > > happy to implement it as a separate config item. > > Any verdict on this? Should I go with adding ..._STRONG like we used > to have for ..._ALL, or is defaulting to -strong best? I'm not opposed to the feature itself, just to the specific structure you presented - as outlined in my review feedback. The cost of the feature itself appears to be significant (this cost should be outlined in the help text btw), while I think the cost of adding this as a new _STRONG option is minimal. So I'd go forward with addressing two issues: 1) I'd add the new STACKPROTECTOR_STRONG option and maybe rename the old one to STACKPROTECTOR_WEAK. If in a year or two most distros have switched over to the _STRONG variant, despite its costs, then we can drop the weak variant. 2) It would also be nice to see a head to head comparison of the 3 variants: !STACKPROTECTOR STACKPROTECTOR_LIGHT STACKPROTECTOR_STRONG of defconfig vmlinux size and estimated number of checks inserted in each case - so people/distros can make an informed decision about the relative quality differences between these variants and whether they want to carry the costs of that. Thanks, Ingo
diff --git a/arch/arm/Makefile b/arch/arm/Makefile index c99b1086d83d..c6d3ea1c063e 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -41,7 +41,8 @@ KBUILD_CFLAGS +=-fno-omit-frame-pointer -mapcs -mno-sched-prolog endif ifeq ($(CONFIG_CC_STACKPROTECTOR),y) -KBUILD_CFLAGS +=-fstack-protector +KBUILD_CFLAGS += $(call cc-option,-fstack-protector-strong,-fstack-protector) + endif ifeq ($(CONFIG_CPU_BIG_ENDIAN),y) diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c index 31bd43b82095..d4f891f56996 100644 --- a/arch/arm/boot/compressed/misc.c +++ b/arch/arm/boot/compressed/misc.c @@ -127,6 +127,18 @@ asmlinkage void __div0(void) error("Attempting division by 0!"); } +unsigned long __stack_chk_guard; + +void __stack_chk_guard_setup(void) +{ + __stack_chk_guard = 0x000a0dff; +} + +void __stack_chk_fail(void) +{ + error("stack-protector: Kernel stack is corrupted\n"); +} + extern int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x)); @@ -137,6 +149,8 @@ decompress_kernel(unsigned long output_start, unsigned long free_mem_ptr_p, { int ret; + __stack_chk_guard_setup(); + output_data = (unsigned char *)output_start; free_mem_ptr = free_mem_ptr_p; free_mem_end_ptr = free_mem_ptr_end_p; diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 41250fb33985..4ebb054cc323 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -86,7 +86,7 @@ endif ifdef CONFIG_CC_STACKPROTECTOR cc_has_sp := $(srctree)/scripts/gcc-x86_$(BITS)-has-stack-protector.sh ifeq ($(shell $(CONFIG_SHELL) $(cc_has_sp) $(CC) $(KBUILD_CPPFLAGS) $(biarch)),y) - stackp-y := -fstack-protector + stackp-y := $(call cc-option,-fstack-protector-strong,-fstack-protector) KBUILD_CFLAGS += $(stackp-y) else $(warning stack protector enabled but no compiler support)
Build the kernel with -fstack-protector-strong when it is available (gcc 4.9 and later). This increases the coverage of the stack protector without the heavy performance hit of -fstack-protector-all. The stack protector options available in gcc are: -fstack-protector-all: Adds the stack-canary saving prefix and stack-canary checking suffix to _all_ function entry and exit. Results in substantial use of stack space for saving the canary for deep stack users (e.g. historically xfs), and measurable (though shockingly still low) performance hit due to all the saving/checking. Really not suitable for sane systems, and was entirely removed as an option from the kernel many years ago. -fstack-protector: Adds the canary save/check to functions that define an 8 (--param=ssp-buffer-size=N, N=8 by default) or more byte local char array. Traditionally, stack overflows happened with string-based manipulations, so this was a way to find those functions. Very few total functions actually get the canary; no measurable performance or size overhead. -fstack-protector-strong Adds the canary for a wider set of functions, since history has shown that it's not just those with strings that have ultimately been vulnerable to stack-busting attacks. With this superset, more functions end up with a canary, but it still remains small compared to all functions with no measurable change in performance. Based on the original design document, a function gets the canary when it contains any of: - local variable's address used as part of the RHS of an assignment or function argument - local variable is an array (or union containing an array), regardless of array type or length - uses register local variables https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU Chrome OS x86_64 build is less than 0.16% larger: -rwxr-xr-x 1 kees kees 118219343 Apr 17 12:26 vmlinux.orig -rwxr-xr-x 1 kees kees 118407919 Apr 19 15:00 vmlinux Ubuntu x86_64 build, using 14.04's config is less than 0.14% larger: -rwxrwxr-x 1 kees kees 174384144 Nov 26 11:00 vmlinux.ubuntu-gcc-4.9 -rwxrwxr-x 1 kees kees 174627120 Nov 26 11:09 vmlinux.ubuntu-gcc-4.9+strong On a defconfig x86_64 build (with CONFIG_CC_STACKPROTECTOR enabled), the delta in size is just under 9% larger: -rwxrwxr-x 1 kees kees 22134340 Nov 26 10:28 vmlinux.gcc-4.8 -rwxrwxr-x 1 kees kees 22123870 Nov 26 10:40 vmlinux.gcc-4.9 -rwxrwxr-x 1 kees kees 24225118 Nov 26 10:42 vmlinux.gcc-4.9+strong ARM's compressed boot code now triggers stack protection, so a static guard was added. Since this is only used during decompression and was never protected before, the exposure here is very small. Once it switches to the full kernel, the stack guard is back to normal. Chrome OS has been using -fstack-protector-strong for its kernel builds for the last 8 months with no problems. Signed-off-by: Kees Cook <keescook@chromium.org> --- v2: - added description of all stack protector options - added size comparisons for Ubuntu and defconfig --- arch/arm/Makefile | 3 ++- arch/arm/boot/compressed/misc.c | 14 ++++++++++++++ arch/x86/Makefile | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-)