Message ID | 85dfdd23d98412a183546e2e7659a6a2bed1fca8.1430230786.git.daniel@iogearbox.net (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
Am Dienstag, 28. April 2015, 17:22:20 schrieb Daniel Borkmann: Hi Daniel, >In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead >of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in >case LTO would decide to inline memzero_explicit() and eventually >find out it could be elimiated as dead store. > >While using barrier() works well for the case of gcc, recent efforts >from LLVMLinux people suggest to use llvm as an alternative to gcc, >and there, Stephan found in a simple stand-alone user space example >that llvm could nevertheless optimize and thus elimitate the memset(). >A similar issue has been observed in the referenced llvm bug report, >which is regarded as not-a-bug. > >The fix in this patch now works for both compilers (also tested with >more aggressive optimization levels). Arguably, in the current kernel >tree it's more of a theoretical issue, but imho, it's better to be >pedantic about it. > >It's clearly visible though, with the below code: if we would have >used barrier()-only here, llvm would have omitted clearing, not so >with barrier_data() variant: > > static inline void memzero_explicit(void *s, size_t count) > { > memset(s, 0, count); > barrier_data(s); > } > > int main(void) > { > char buff[20]; > memzero_explicit(buff, sizeof(buff)); > return 0; > } > > $ gcc -O2 test.c > $ gdb a.out > (gdb) disassemble main > Dump of assembler code for function main: > 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax > 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) > 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) > 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) > 0x000000000040041f <+31>: xor %eax,%eax > 0x0000000000400421 <+33>: retq > End of assembler dump. > > $ clang -O2 test.c > $ gdb a.out > (gdb) disassemble main > Dump of assembler code for function main: > 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 > 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) > 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) > 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax > 0x0000000000400505 <+21>: xor %eax,%eax > 0x0000000000400507 <+23>: retq > End of assembler dump. > >As clang (but also icc) defines __GNUC__, it's sufficient to define this >in compiler-gcc.h only. > >Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 >Reported-by: Stephan Mueller <smueller@chronox.de> >Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> >Cc: Theodore Ts'o <tytso@mit.edu> >Cc: Stephan Mueller <smueller@chronox.de> >Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >Cc: mancha security <mancha1@zoho.com> >Cc: Mark Charlebois <charlebm@gmail.com> >Cc: Behan Webster <behanw@converseincode.com> Using a user space test app: tested clang -O3, clang -O2, gcc -O3, gcc -O2. Tested-by: Stephan Mueller <smueller@chronox.de> Ciao Stephan -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Daniel et al. Looks good from here. By the way, has anyone been able to verify that __memory_barrier provides DSE protection under various optimizations? Unfortunately, I don't have ready access to ICC at the moment or I'd test it myself. --mancha PS It would be nice if memset_s were universally adopted/implemented so we could stop worrying about these things. On Tue, Apr 28, 2015 at 05:22:20PM +0200, Daniel Borkmann wrote: > In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead > of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in > case LTO would decide to inline memzero_explicit() and eventually > find out it could be elimiated as dead store. > > While using barrier() works well for the case of gcc, recent efforts > from LLVMLinux people suggest to use llvm as an alternative to gcc, > and there, Stephan found in a simple stand-alone user space example > that llvm could nevertheless optimize and thus elimitate the memset(). > A similar issue has been observed in the referenced llvm bug report, > which is regarded as not-a-bug. > > The fix in this patch now works for both compilers (also tested with > more aggressive optimization levels). Arguably, in the current kernel > tree it's more of a theoretical issue, but imho, it's better to be > pedantic about it. > > It's clearly visible though, with the below code: if we would have > used barrier()-only here, llvm would have omitted clearing, not so > with barrier_data() variant: > > static inline void memzero_explicit(void *s, size_t count) > { > memset(s, 0, count); > barrier_data(s); > } > > int main(void) > { > char buff[20]; > memzero_explicit(buff, sizeof(buff)); > return 0; > } > > $ gcc -O2 test.c > $ gdb a.out > (gdb) disassemble main > Dump of assembler code for function main: > 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax > 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) > 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) > 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) > 0x000000000040041f <+31>: xor %eax,%eax > 0x0000000000400421 <+33>: retq > End of assembler dump. > > $ clang -O2 test.c > $ gdb a.out > (gdb) disassemble main > Dump of assembler code for function main: > 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 > 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) > 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) > 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax > 0x0000000000400505 <+21>: xor %eax,%eax > 0x0000000000400507 <+23>: retq > End of assembler dump. > > As clang (but also icc) defines __GNUC__, it's sufficient to define this > in compiler-gcc.h only. > > Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 > Reported-by: Stephan Mueller <smueller@chronox.de> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Cc: Theodore Ts'o <tytso@mit.edu> > Cc: Stephan Mueller <smueller@chronox.de> > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > Cc: mancha security <mancha1@zoho.com> > Cc: Mark Charlebois <charlebm@gmail.com> > Cc: Behan Webster <behanw@converseincode.com> > --- > include/linux/compiler-gcc.h | 16 +++++++++++++++- > include/linux/compiler.h | 4 ++++ > lib/string.c | 2 +- > 3 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > index cdf13ca..371e560 100644 > --- a/include/linux/compiler-gcc.h > +++ b/include/linux/compiler-gcc.h > @@ -9,10 +9,24 @@ > + __GNUC_MINOR__ * 100 \ > + __GNUC_PATCHLEVEL__) > > - > /* Optimization barrier */ > + > /* The "volatile" is due to gcc bugs */ > #define barrier() __asm__ __volatile__("": : :"memory") > +/* > + * This version is i.e. to prevent dead stores elimination on @ptr > + * where gcc and llvm may behave differently when otherwise using > + * normal barrier(): while gcc behavior gets along with a normal > + * barrier(), llvm needs an explicit input variable to be assumed > + * clobbered. The issue is as follows: while the inline asm might > + * access any memory it wants, the compiler could have fit all of > + * @ptr into memory registers instead, and since @ptr never escaped > + * from that, it proofed that the inline asm wasn't touching any of > + * it. This version works well with both compilers, i.e. we're telling > + * the compiler that the inline asm absolutely may see the contents > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 > + */ > +#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") > > /* > * This macro obfuscates arithmetic on a variable address so that gcc > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 0e41ca0..8677225 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); > # define barrier() __memory_barrier() > #endif > > +#ifndef barrier_data > +# define barrier_data(ptr) barrier() > +#endif > + > /* Unreachable code */ > #ifndef unreachable > # define unreachable() do { } while (1) > diff --git a/lib/string.c b/lib/string.c > index a579201..bb3d4b6 100644 > --- a/lib/string.c > +++ b/lib/string.c > @@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset); > void memzero_explicit(void *s, size_t count) > { > memset(s, 0, count); > - barrier(); > + barrier_data(s); > } > EXPORT_SYMBOL(memzero_explicit); > > -- > 1.9.3 >
On 04/29/2015 03:08 PM, mancha security wrote: ... > By the way, has anyone been able to verify that __memory_barrier > provides DSE protection under various optimizations? Unfortunately, I > don't have ready access to ICC at the moment or I'd test it myself. Never used icc, but it looks like it's free for open source projects; I can give it a try, but in case you're faster than I am, feel free to post results here. From what I see based on the code, i.e. after that buggy cleanup commit ... commit 73679e50820123ebdedc67ebcda4562d1d6e4aba Author: Pranith Kumar <bobby.prani@gmail.com> Date: Tue Apr 15 12:05:22 2014 -0400 compiler-intel.h: Remove duplicate definition barrier is already defined as __memory_barrier in compiler.h Remove this unnecessary redefinition. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://lkml.kernel.org/r/CAJhHMCAnYPy0%2BqD-1KBnJPLt3XgAjdR12j%2BySSnPgmZcpbE7HQ@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> ... it looks like it's currently using the _same_ gcc inline asm for the barrier on icc instead of what that commit intended to do. So funny enough, we don't actually use __memory_barrier() at the moment. ;) Nonetheless, having a look might be good. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote: > On 04/29/2015 03:08 PM, mancha security wrote: > ... > >By the way, has anyone been able to verify that __memory_barrier > >provides DSE protection under various optimizations? Unfortunately, I > >don't have ready access to ICC at the moment or I'd test it myself. > > Never used icc, but it looks like it's free for open source projects; > I can give it a try, but in case you're faster than I am, feel free > to post results here. Time permitting, I'll try setting this up and post my results. > > From what I see based on the code, i.e. after that buggy cleanup > commit ... > > commit 73679e50820123ebdedc67ebcda4562d1d6e4aba > Author: Pranith Kumar <bobby.prani@gmail.com> > Date: Tue Apr 15 12:05:22 2014 -0400 > > compiler-intel.h: Remove duplicate definition > > barrier is already defined as __memory_barrier in compiler.h > Remove this unnecessary redefinition. > > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > Link: http://lkml.kernel.org/r/CAJhHMCAnYPy0%2BqD-1KBnJPLt3XgAjdR12j%2BySSnPgmZcpbE7HQ@mail.gmail.com > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> > > ... it looks like it's currently using the _same_ gcc inline asm > for the barrier on icc instead of what that commit intended to do. > > So funny enough, we don't actually use __memory_barrier() at the > moment. ;) > > Nonetheless, having a look might be good. Nice catch, 73679e50820 is indeed buggy because ICC defines __GNUC__ (unless -no-gcc is used). That should be reverted. Bug aside, according to [1], ICC does support GNU-style inline asm so for the purposes of barrier_data(), it would be interesting to see if it affords better/worse DSE protection compared to __memory_barrier(). --mancha [1] https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5100C4FC-BC2F-4E36-943A-120CFFFB4285.htm
On 04/29/2015 04:01 PM, Daniel Borkmann wrote: ... > So funny enough, we don't actually use __memory_barrier() at the > moment. ;) Anyway, I will send a v2 of the patch since we need to undef the gcc definition in case someone really uses an ecc compiler that doesn't support inline asm. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/29/2015 04:54 PM, mancha security wrote: > On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote: >> On 04/29/2015 03:08 PM, mancha security wrote: >> ... >>> By the way, has anyone been able to verify that __memory_barrier >>> provides DSE protection under various optimizations? Unfortunately, I >>> don't have ready access to ICC at the moment or I'd test it myself. >> >> Never used icc, but it looks like it's free for open source projects; >> I can give it a try, but in case you're faster than I am, feel free >> to post results here. > > Time permitting, I'll try setting this up and post my results. So I finally got the download link and an eval license for icc, and after needing to download gigbytes of bloat for the suite, I could finally start to experiment a bit. So __GNUC__ and __INTEL_COMPILER is definitely defined by icc, __ECC not in my case, so that part is as expected for the kernel header includes. With barrier_data(), I could observe insns for an inlined memset() being emitted in the disassembly, same with barrier(), same with __memory_barrier(). In fact, even if I only use ... static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } ... icc will emit memset instrinsic insns (did you notice that as well?) when using various optimization levels. Using f.e. -Ofast -ffreestanding resp. -fno-builtin-memset will emit a function call, presumably, icc is then not allowed to make any assumptions, so given the previous result, then would then be expected. So, crafting a stupid example: static inline void dumb_memset(char *s, unsigned char c, size_t n) { int i; for (i = 0; i < n; i++) s[i] = c; } static inline void memzero_explicit(void *s, size_t count) { dumb_memset(s, 0, count); <barrier-variant> } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } With no barrier at all, icc optimizes all that away (using -Ofast), with barrier_data() it inlines and emits additional mov* insns! Just using barrier() or __memory_barrier(), we end up with the same case as with clang, that is, it gets optimized away. So, barrier_data() seems to be better here as well. Cheers, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 30, 2015 at 01:43:07AM +0200, Daniel Borkmann wrote: > On 04/29/2015 04:54 PM, mancha security wrote: > >On Wed, Apr 29, 2015 at 04:01:19PM +0200, Daniel Borkmann wrote: > >>On 04/29/2015 03:08 PM, mancha security wrote: > >>... > >>>By the way, has anyone been able to verify that __memory_barrier > >>>provides DSE protection under various optimizations? Unfortunately, > >>>I don't have ready access to ICC at the moment or I'd test it > >>>myself. > >> > >>Never used icc, but it looks like it's free for open source > >>projects; I can give it a try, but in case you're faster than I am, > >>feel free to post results here. > > > >Time permitting, I'll try setting this up and post my results. > > So I finally got the download link and an eval license for icc, and > after needing to download gigbytes of bloat for the suite, I could > finally start to experiment a bit. Ugh. > So __GNUC__ and __INTEL_COMPILER is definitely defined by icc, __ECC > not in my case, so that part is as expected for the kernel header > includes. > > With barrier_data(), I could observe insns for an inlined memset() > being emitted in the disassembly, same with barrier(), same with > __memory_barrier(). In fact, even if I only use ... > > static inline void memzero_explicit(void *s, size_t count) > { > memset(s, 0, count); > } > > int main(void) > { > char buff[20]; > memzero_explicit(buff, sizeof(buff)); > return 0; > } > > ... icc will emit memset instrinsic insns (did you notice that as > well?) when using various optimization levels. Using f.e. -Ofast > -ffreestanding resp. -fno-builtin-memset will emit a function call, > presumably, icc is then not allowed to make any assumptions, so given > the previous result, then would then be expected. I didn't get around to installing ICC so thanks for sharing the very interesting results. > So, crafting a stupid example: > > static inline void > dumb_memset(char *s, unsigned char c, size_t n) > { > int i; > > for (i = 0; i < n; i++) > s[i] = c; > } > > static inline void memzero_explicit(void *s, size_t count) > { > dumb_memset(s, 0, count); > <barrier-variant> > } > > int main(void) > { > char buff[20]; > memzero_explicit(buff, sizeof(buff)); > return 0; > } > > With no barrier at all, icc optimizes all that away (using -Ofast), > with barrier_data() it inlines and emits additional mov* insns! Just > using barrier() or __memory_barrier(), we end up with the same case as > with clang, that is, it gets optimized away. So, barrier_data() seems > to be better here as well. For now, seems we're good with barrier_data should things like the LTO initiative pick up steam, etc. Cheers.
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index cdf13ca..371e560 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -9,10 +9,24 @@ + __GNUC_MINOR__ * 100 \ + __GNUC_PATCHLEVEL__) - /* Optimization barrier */ + /* The "volatile" is due to gcc bugs */ #define barrier() __asm__ __volatile__("": : :"memory") +/* + * This version is i.e. to prevent dead stores elimination on @ptr + * where gcc and llvm may behave differently when otherwise using + * normal barrier(): while gcc behavior gets along with a normal + * barrier(), llvm needs an explicit input variable to be assumed + * clobbered. The issue is as follows: while the inline asm might + * access any memory it wants, the compiler could have fit all of + * @ptr into memory registers instead, and since @ptr never escaped + * from that, it proofed that the inline asm wasn't touching any of + * it. This version works well with both compilers, i.e. we're telling + * the compiler that the inline asm absolutely may see the contents + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 + */ +#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") /* * This macro obfuscates arithmetic on a variable address so that gcc diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 0e41ca0..8677225 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -169,6 +169,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); # define barrier() __memory_barrier() #endif +#ifndef barrier_data +# define barrier_data(ptr) barrier() +#endif + /* Unreachable code */ #ifndef unreachable # define unreachable() do { } while (1) diff --git a/lib/string.c b/lib/string.c index a579201..bb3d4b6 100644 --- a/lib/string.c +++ b/lib/string.c @@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset); void memzero_explicit(void *s, size_t count) { memset(s, 0, count); - barrier(); + barrier_data(s); } EXPORT_SYMBOL(memzero_explicit);
In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. The fix in this patch now works for both compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible though, with the below code: if we would have used barrier()-only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) 0x000000000040041f <+31>: xor %eax,%eax 0x0000000000400421 <+33>: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax 0x0000000000400505 <+21>: xor %eax,%eax 0x0000000000400507 <+23>: retq End of assembler dump. As clang (but also icc) defines __GNUC__, it's sufficient to define this in compiler-gcc.h only. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Stephan Mueller <smueller@chronox.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: mancha security <mancha1@zoho.com> Cc: Mark Charlebois <charlebm@gmail.com> Cc: Behan Webster <behanw@converseincode.com> --- include/linux/compiler-gcc.h | 16 +++++++++++++++- include/linux/compiler.h | 4 ++++ lib/string.c | 2 +- 3 files changed, 20 insertions(+), 2 deletions(-)