Message ID | 20150318171402.GB24195@zoho.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On 03/18/2015 06:14 PM, mancha wrote:
...
> Patch 0001 fixes the dead store issue in memzero_explicit().
Thanks! I have issued the fix for the memzero bug to Herbert in
your authorship as discussed, also giving some more context.
For the 2nd issue, lets wait for Cesar.
Thanks again!
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 18, 2015 at 06:49:55PM +0100, Daniel Borkmann wrote: > On 03/18/2015 06:14 PM, mancha wrote: > ... > >Patch 0001 fixes the dead store issue in memzero_explicit(). > > Thanks! I have issued the fix for the memzero bug to Herbert in > your authorship as discussed, also giving some more context. > > For the 2nd issue, lets wait for Cesar. > > Thanks again! Excellent! Cheers. --mancha
On 18-03-2015 14:14, mancha wrote: > On Wed, Mar 18, 2015 at 05:02:01PM +0100, Stephan Mueller wrote: >> Am Mittwoch, 18. März 2015, 16:09:34 schrieb Hannes Frederic Sowa: >>> Seems like just using barrier() is the best and easiest option. > > However, if the idea is to use barrier() instead of OPTIMIZER_HIDE_VAR() > in crypto_memneq() as well, then patch 0002 is the one to use. Please > review and keep in mind my analysis was limited to memzero_explicit(). > > Cesar, were there reasons you didn't use the gcc version of barrier() > for crypto_memneq()? Yes. Two reasons. Take a look at how barrier() is defined: #define barrier() __asm__ __volatile__("": : :"memory") It tells gcc that the dummy assembly "instruction" touches memory (so the compiler can't assume anything about the memory), and that nothing should be moved from before to after the barrier and vice versa. It mentions nothing about registers. Therefore, as far as I know gcc can assume that the dummy "instruction" touches no integer registers (or restores their values). I can imagine a sufficiently perverse compiler using that fact to introduce timing-dependent computations. For instance, it could load the values using more than one register and combine them at the end, after the barriers; there, it could exit early in case one of the registers is all-ones. My definition of OPTIMIZER_HIDE_VAR introduces a data dependency to prevent that: #define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var)) The second reason is that barrier() is too strong. For crypto_memneq, only the or-chain is critical; the order or width of the loads makes no difference. The compiler could, if it wishes, do all the loads and xors first and do the or-chain at the end, or whenever it can see a pipeline bubble; it doesn't matter as long as it does *all* the "or" operations, in sequence. I would be comfortable with a stronger OPTIMIZER_HIDE_VAR (adding "memory" or volatile), even though it could limit optimization opportunities, but I wouldn't be comfortable with a weaker OPTIMIZER_HIDE_VAR (removing the data dependency), unless the gcc and clang guys promise that our definition of barrier() will always prevent undesired optimization of register-only operations. There was a third reason for the exact definition of OPTIMIZER_HIDE_VAR: it was copied from RELOC_HIDE, which is a longstanding "hide this variable from gcc" operation, and thus known to work as expected.
diff --git a/lib/string.c b/lib/string.c index ce81aae..a579201 100644 --- a/lib/string.c +++ b/lib/string.c @@ -607,7 +607,7 @@ EXPORT_SYMBOL(memset); void memzero_explicit(void *s, size_t count) { memset(s, 0, count); - OPTIMIZER_HIDE_VAR(s); + barrier(); } EXPORT_SYMBOL(memzero_explicit);