Message ID | alpine.LRH.2.02.2204241648270.17244@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | hex2bin: make the function hex_to_bin constant-time | expand |
On Sun, 2022-04-24 at 16:54 -0400, Mikulas Patocka wrote: > This patch changes the function hex_to_bin so that it contains no branches > and no memory accesses. [] > +++ linux-2.6/lib/hexdump.c 2022-04-24 18:51:20.000000000 +0200 [] > + * the next line is similar to the previous one, but we need to decode both > + * uppercase and lowercase letters, so we use (ch & 0xdf), which converts > + * lowercase to uppercase > */ > int hex_to_bin(char ch) > { > - if ((ch >= '0') && (ch <= '9')) > - return ch - '0'; > - ch = tolower(ch); > - if ((ch >= 'a') && (ch <= 'f')) > - return ch - 'a' + 10; > - return -1; > + return -1 + > + ((ch - '0' + 1) & (((ch - '9' - 1) & ('0' - 1 - ch)) >> 8)) + > + (((ch & 0xdf) - 'A' + 11) & ((((ch & 0xdf) - 'F' - 1) & ('A' - 1 - (ch & 0xdf))) >> 8)); probably easier to read using a temporary for ch & 0xdf int CH = ch & 0xdf; return -1 + ((ch - '0' + 1) & (((ch - '9' - 1) & ('0' - 1 - ch)) >> 8)) + ((CH - 'A' + 11) & (((CH - 'F' - 1) & ('A' - 1 - CH)) >> 8));
On Sun, Apr 24, 2022 at 1:54 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > + * > + * Explanation of the logic: > + * (ch - '9' - 1) is negative if ch <= '9' > + * ('0' - 1 - ch) is negative if ch >= '0' True, but... Please, just to make me happier, make the sign of 'ch' be something very explicit. Right now that code uses 'char ch', which could be signed or unsigned. It doesn't really matter in this case, since the arithmetic will be done in 'int', and as long as 'int' is larger than 'char' as a type (to be really nit-picky), it all ends up working ok regardless. But just to make me happier, and to make the algorithm actually do the _same_ thing on every architecture, please use an explicit signedness for that 'ch' type. Because then that 'ch >= X' is well-defined. Again - your code _works_. That's not what I worry about. But when playing these kinds of tricks, please make it have the same behavior across architectures, not just "the end result will be the same regardless". Yes, a 'ch' with the high bit set will always be either >= '0' or <= '9', but note how *which* one it is depends on the exact type, and 'char' is simply not well-defined. Finally, for the same reason - please don't use ">> 8". Because I do not believe that bit 8 is well-defined in your arithmetic. The *sign* bit will be, but I'm not convinced bit 8 is. So use ">> 31" or similar. Also, I do worry that this is *exactly* the kind of trick that a compiler could easily turn back into a conditional. Usually compilers tend to go the other way (ie turn conditionals into arithmetic if possible), but.. Linus
On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Finally, for the same reason - please don't use ">> 8". Because I do > not believe that bit 8 is well-defined in your arithmetic. The *sign* > bit will be, but I'm not convinced bit 8 is. Hmm.. I think it's ok. It can indeed overflow in 'char' and change the sign in bit #7, but I suspect bit #8 is always fine. Still, If you want to just extend the sign bit, ">> 31" _is_ the obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or whatever, you get my drift). Linus
From: Linus Torvalds > Sent: 24 April 2022 22:42 > > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Finally, for the same reason - please don't use ">> 8". Because I do > > not believe that bit 8 is well-defined in your arithmetic. The *sign* > > bit will be, but I'm not convinced bit 8 is. > > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the > sign in bit #7, but I suspect bit #8 is always fine. > > Still, If you want to just extend the sign bit, ">> 31" _is_ the > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or > whatever, you get my drift). Except that right shifts of signed values are UB. In particular it has always been valid to do an unsigned shift right on a 2's compliment negative number. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, 25 Apr 2022, David Laight wrote: > From: Linus Torvalds > > Sent: 24 April 2022 22:42 > > > > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > Finally, for the same reason - please don't use ">> 8". Because I do > > > not believe that bit 8 is well-defined in your arithmetic. The *sign* > > > bit will be, but I'm not convinced bit 8 is. > > > > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the > > sign in bit #7, but I suspect bit #8 is always fine. > > > > Still, If you want to just extend the sign bit, ">> 31" _is_ the > > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or > > whatever, you get my drift). > > Except that right shifts of signed values are UB. > In particular it has always been valid to do an unsigned > shift right on a 2's compliment negative number. > > David Yes. All the standard versions (C89, C99, C11, C2X) say that right shift of a negative value is implementation-defined. So, we should cast it to "unsigned" before shifting it. Mikulas
From: Mikulas Patocka > Sent: 25 April 2022 12:04 > > On Mon, 25 Apr 2022, David Laight wrote: > > > From: Linus Torvalds > > > Sent: 24 April 2022 22:42 > > > > > > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds > > > <torvalds@linux-foundation.org> wrote: > > > > > > > > Finally, for the same reason - please don't use ">> 8". Because I do > > > > not believe that bit 8 is well-defined in your arithmetic. The *sign* > > > > bit will be, but I'm not convinced bit 8 is. > > > > > > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the > > > sign in bit #7, but I suspect bit #8 is always fine. > > > > > > Still, If you want to just extend the sign bit, ">> 31" _is_ the > > > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or > > > whatever, you get my drift). > > > > Except that right shifts of signed values are UB. > > In particular it has always been valid to do an unsigned > > shift right on a 2's compliment negative number. > > > > David > > Yes. All the standard versions (C89, C99, C11, C2X) say that right shift > of a negative value is implementation-defined. > > So, we should cast it to "unsigned" before shifting it. Except that the intent appears to be to replicate the sign bit. If it is 'implementation defined' (rather than suddenly being UB) it might be that the linux kernel requires sign propagating right shifts of negative values. This is typically what happens on 2's compliment systems. But not all small cpu have the required shift instruction. OTOH all the ones bit enough to run Linux probably do. (And gcc doesn't support '1's compliment' or 'sign overpunch' cpus.) The problem is that the compiler writers seem to be entering a mindset where they are optimising code based on UB behaviour. So given: void foo(int x) { if (x >> 1 < 0) return; do_something(); } they decide the test is UB, so can always be assumed to be true and thus do_something() is compiled away. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, 25 Apr 2022, David Laight wrote: > From: Mikulas Patocka > > Sent: 25 April 2022 12:04 > > > > On Mon, 25 Apr 2022, David Laight wrote: > > > > > From: Linus Torvalds > > > > Sent: 24 April 2022 22:42 > > > > > > > > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds > > > > <torvalds@linux-foundation.org> wrote: > > > > > > > > > > Finally, for the same reason - please don't use ">> 8". Because I do > > > > > not believe that bit 8 is well-defined in your arithmetic. The *sign* > > > > > bit will be, but I'm not convinced bit 8 is. > > > > > > > > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the > > > > sign in bit #7, but I suspect bit #8 is always fine. > > > > > > > > Still, If you want to just extend the sign bit, ">> 31" _is_ the > > > > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or > > > > whatever, you get my drift). > > > > > > Except that right shifts of signed values are UB. > > > In particular it has always been valid to do an unsigned > > > shift right on a 2's compliment negative number. > > > > > > David > > > > Yes. All the standard versions (C89, C99, C11, C2X) say that right shift > > of a negative value is implementation-defined. > > > > So, we should cast it to "unsigned" before shifting it. > > Except that the intent appears to be to replicate the sign bit. > > If it is 'implementation defined' (rather than suddenly being UB) The standard says "If E1 has a signed type and a negative value, the resulting value is implementation-defined." So, it's not undefined behavior. > it might be that the linux kernel requires sign propagating > right shifts of negative values. It may be that some code in the Linux kernel already assumes that right shifts keep the sign. It's hard to say if such code exists. BTW. ubsan warns about left shift of negative values, but it doesn't warn about right shift of negative values. > This is typically what happens on 2's compliment systems. > But not all small cpu have the required shift instruction. > OTOH all the ones bit enough to run Linux probably do. > (And gcc doesn't support '1's compliment' or 'sign overpunch' cpus.) > > The problem is that the compiler writers seem to be entering > a mindset where they are optimising code based on UB behaviour. > So given: > void foo(int x) > { > if (x >> 1 < 0) > return; > do_something(); > } > they decide the test is UB, so can always be assumed to be true > and thus do_something() is compiled away. > > David If it's implementation-defined (rather than undefined), the compiler shouldn't do such optimization. The linux kernel uses "-fno-strict-overflow" which disables some of these UB optimizations. Mikulas
Index: linux-2.6/lib/hexdump.c =================================================================== --- linux-2.6.orig/lib/hexdump.c 2022-04-24 18:51:20.000000000 +0200 +++ linux-2.6/lib/hexdump.c 2022-04-24 18:51:20.000000000 +0200 @@ -22,15 +22,30 @@ EXPORT_SYMBOL(hex_asc_upper); * * hex_to_bin() converts one hex digit to its actual value or -1 in case of bad * input. + * + * This function is used to load cryptographic keys, so it is coded in such a + * way that there are no conditions or memory accesses that depend on data. + * + * Explanation of the logic: + * (ch - '9' - 1) is negative if ch <= '9' + * ('0' - 1 - ch) is negative if ch >= '0' + * we "and" these two values, so the result is negative if ch is in the range + * '0' ... '9' + * we are only interested in the sign, so we do a shift ">> 8" --- we have -1 if + * ch is in the range '0' ... '9', 0 otherwise + * we "and" this value with (ch - '0' + 1) --- we have a value 1 ... 10 if ch is + * in the range '0' ... '9', 0 otherwise + * we add this value to -1 --- we have a value 0 ... 9 if ch is in the range '0' + * ... '9', -1 otherwise + * the next line is similar to the previous one, but we need to decode both + * uppercase and lowercase letters, so we use (ch & 0xdf), which converts + * lowercase to uppercase */ int hex_to_bin(char ch) { - if ((ch >= '0') && (ch <= '9')) - return ch - '0'; - ch = tolower(ch); - if ((ch >= 'a') && (ch <= 'f')) - return ch - 'a' + 10; - return -1; + return -1 + + ((ch - '0' + 1) & (((ch - '9' - 1) & ('0' - 1 - ch)) >> 8)) + + (((ch & 0xdf) - 'A' + 11) & ((((ch & 0xdf) - 'F' - 1) & ('A' - 1 - (ch & 0xdf))) >> 8)); } EXPORT_SYMBOL(hex_to_bin);
The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- lib/hexdump.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)