diff mbox

[BUG/PATCH] kernel RNG and its secrets

Message ID 20150318171402.GB24195@zoho.com (mailing list archive)
State Superseded
Headers show

Commit Message

mancha March 18, 2015, 5:14 p.m. UTC
On Wed, Mar 18, 2015 at 05:02:01PM +0100, Stephan Mueller wrote:
> Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa:
> 
> Hi Hannes,
> 
> >On Wed, Mar 18, 2015, at 13:42, Daniel Borkmann wrote:
> >> On 03/18/2015 01:20 PM, Stephan Mueller wrote:
> >> > Am Mittwoch, 18. März 2015, 13:19:07 schrieb Hannes Frederic Sowa:
> >> >>>> My proposal would be to add a
> >> >>>> 
> >> >>>> #define OPTIMIZER_HIDE_MEM(ptr, len) __asm__ __volatile__ ("" :
> >> >>>> :
> >> >>>> "m"(
> >> >>>> ({ struct { u8 b[len]; } *p = (void *)ptr ; *p; }) )
> >> >>>> 
> >> >>>> and use this in the code function.
> >> >>>> 
> >> >>>> This is documented in gcc manual 6.43.2.5.
> >> >>> 
> >> >>> That one adds the zeroization instructuctions. But now there are
> >> >>> much
> >> >>> more than with the barrier.
> >> >>> 
> >> >>>    400469:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
> >> >>>    400470:       00
> >> >>>    400471:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
> >> >>>    400478:       00 00
> >> >>>    40047a:       c7 44 24 10 00 00 00    movl   $0x0,0x10(%rsp)
> >> >>>    400481:       00
> >> >>>    400482:       48 c7 44 24 20 00 00    movq   $0x0,0x20(%rsp)
> >> >>>    400489:       00 00
> >> >>>    40048b:       48 c7 44 24 28 00 00    movq   $0x0,0x28(%rsp)
> >> >>>    400492:       00 00
> >> >>>    400494:       c7 44 24 30 00 00 00    movl   $0x0,0x30(%rsp)
> >> >>>    40049b:       00
> >> >>> 
> >> >>> Any ideas?
> >> >> 
> >> >> Hmm, correct definition of u8?
> >> > 
> >> > I use unsigned char
> >> > 
> >> >> Which version of gcc do you use? I can't see any difference if I
> >> >> compile your example at -O2.
> >> > 
> >> > gcc-Version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)
> >
> >Well, was an error on my side, I see the same behavior.
> >
> >> I can see the same with the gcc version I previously posted. So
> >> it clears the 20 bytes from your example (movq, movq, movl) at
> >> two locations, presumably buf[] and b[].
> >
> >Yes, it looks like that. The reservation on the stack changes, too.
> >
> >Seems like just using barrier() is the best and easiest option.
> 
> Would you prepare a patch for that?
> >
> >Thanks,
> >Hannes
> 
> 
> Ciao
> Stephan

Hi.

Patch 0001 fixes the dead store issue in memzero_explicit().

However, if the idea is to use barrier() instead of OPTIMIZER_HIDE_VAR()
in crypto_memneq() as well, then patch 0002 is the one to use. Please
review and keep in mind my analysis was limited to memzero_explicit().

Cesar, were there reasons you didn't use the gcc version of barrier()
for crypto_memneq()?

Please let me know if I can be of any further assistance.

--mancha
From cd9e882cd1d684f50c52471d83f9ecf55427c360 Mon Sep 17 00:00:00 2001
From: mancha security <mancha1@zoho.com>
Date: Wed, 18 Mar 2015 16:14:34 +0000
Subject: [PATCH] lib/string.c: use barrier() to protect memzero_explicit()

OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient
to ensure protection from dead store optimization.
---
 lib/string.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniel Borkmann March 18, 2015, 5:49 p.m. UTC | #1
On 03/18/2015 06:14 PM, mancha wrote:
...
> Patch 0001 fixes the dead store issue in memzero_explicit().

Thanks! I have issued the fix for the memzero bug to Herbert in
your authorship as discussed, also giving some more context.

For the 2nd issue, lets wait for Cesar.

Thanks again!
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
mancha March 18, 2015, 7:09 p.m. UTC | #2
On Wed, Mar 18, 2015 at 06:49:55PM +0100, Daniel Borkmann wrote:
> On 03/18/2015 06:14 PM, mancha wrote:
> ...
> >Patch 0001 fixes the dead store issue in memzero_explicit().
> 
> Thanks! I have issued the fix for the memzero bug to Herbert in
> your authorship as discussed, also giving some more context.
> 
> For the 2nd issue, lets wait for Cesar.
> 
> Thanks again!

Excellent! 

Cheers.

--mancha
Cesar Eduardo Barros March 18, 2015, 11:53 p.m. UTC | #3
On 18-03-2015 14:14, mancha wrote:
> On Wed, Mar 18, 2015 at 05:02:01PM +0100, Stephan Mueller wrote:
>> Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa:
>>> Seems like just using barrier() is the best and easiest option.
>
> However, if the idea is to use barrier() instead of OPTIMIZER_HIDE_VAR()
> in crypto_memneq() as well, then patch 0002 is the one to use. Please
> review and keep in mind my analysis was limited to memzero_explicit().
>
> Cesar, were there reasons you didn't use the gcc version of barrier()
> for crypto_memneq()?

Yes. Two reasons.

Take a look at how barrier() is defined:

#define barrier() __asm__ __volatile__("": : :"memory")

It tells gcc that the dummy assembly "instruction" touches memory (so 
the compiler can't assume anything about the memory), and that nothing 
should be moved from before to after the barrier and vice versa.

It mentions nothing about registers. Therefore, as far as I know gcc can 
assume that the dummy "instruction" touches no integer registers (or 
restores their values). I can imagine a sufficiently perverse compiler 
using that fact to introduce timing-dependent computations. For 
instance, it could load the values using more than one register and 
combine them at the end, after the barriers; there, it could exit early 
in case one of the registers is all-ones. My definition of 
OPTIMIZER_HIDE_VAR introduces a data dependency to prevent that:

#define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))

The second reason is that barrier() is too strong. For crypto_memneq, 
only the or-chain is critical; the order or width of the loads makes no 
difference. The compiler could, if it wishes, do all the loads and xors 
first and do the or-chain at the end, or whenever it can see a pipeline 
bubble; it doesn't matter as long as it does *all* the "or" operations, 
in sequence.

I would be comfortable with a stronger OPTIMIZER_HIDE_VAR (adding 
"memory" or volatile), even though it could limit optimization 
opportunities, but I wouldn't be comfortable with a weaker 
OPTIMIZER_HIDE_VAR (removing the data dependency), unless the gcc and 
clang guys promise that our definition of barrier() will always prevent 
undesired optimization of register-only operations.

There was a third reason for the exact definition of OPTIMIZER_HIDE_VAR: 
it was copied from RELOC_HIDE, which is a longstanding "hide this 
variable from gcc" operation, and thus known to work as expected.
diff mbox

Patch

diff --git a/lib/string.c b/lib/string.c
index ce81aae..a579201 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -607,7 +607,7 @@  EXPORT_SYMBOL(memset);
 void memzero_explicit(void *s, size_t count)
 {
 	memset(s, 0, count);
-	OPTIMIZER_HIDE_VAR(s);
+	barrier();
 }
 EXPORT_SYMBOL(memzero_explicit);