hex2bin: make the function hex_to_bin constant-time

Message ID	alpine.LRH.2.02.2204241648270.17244@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State	Not Applicable
Delegated to:	Herbert Xu
Headers	show Return-Path: <linux-crypto-owner@kernel.org> Date: Sun, 24 Apr 2022 16:54:18 -0400 (EDT) From: Mikulas Patocka <mpatocka@redhat.com> To: Linus Torvalds <torvalds@linux-foundation.org>, Andy Shevchenko <andy@kernel.org> cc: dm-devel@redhat.com, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, Herbert Xu <herbert@gondor.apana.org.au>, "David S. Miller" <davem@davemloft.net>, Mike Snitzer <msnitzer@redhat.com>, Mimi Zohar <zohar@linux.ibm.com>, Milan Broz <gmazyland@gmail.com> Subject: [PATCH] hex2bin: make the function hex_to_bin constant-time Message-ID: <alpine.LRH.2.02.2204241648270.17244@file01.intranet.prod.int.rdu2.redhat.com> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk
Series	hex2bin: make the function hex_to_bin constant-time \| expand hex2bin: make the function hex_to_bin constant-time

Mikulas Patocka April 24, 2022, 8:54 p.m. UTC

The function hex2bin is used to load cryptographic keys into device mapper
targets dm-crypt and dm-integrity. It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.

This patch changes the function hex_to_bin so that it contains no branches
and no memory accesses.

Note that this shouldn't cause performance degradation because the size of
the new function is the same as the size of the old function (on x86-64) -
and the new function causes no branch misprediction penalties.

I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no
branches in the generated code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 lib/hexdump.c |   27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

Joe Perches April 24, 2022, 9:30 p.m. UTC | #1

On Sun, 2022-04-24 at 16:54 -0400, Mikulas Patocka wrote:
> This patch changes the function hex_to_bin so that it contains no branches
> and no memory accesses.
[]
> +++ linux-2.6/lib/hexdump.c	2022-04-24 18:51:20.000000000 +0200
[]
> + * the next line is similar to the previous one, but we need to decode both
> + *	uppercase and lowercase letters, so we use (ch & 0xdf), which converts
> + *	lowercase to uppercase
>   */
>  int hex_to_bin(char ch)
>  {
> -	if ((ch >= '0') && (ch <= '9'))
> -		return ch - '0';
> -	ch = tolower(ch);
> -	if ((ch >= 'a') && (ch <= 'f'))
> -		return ch - 'a' + 10;
> -	return -1;
> +	return -1 +
> +		((ch - '0' + 1) & (((ch - '9' - 1) & ('0' - 1 - ch)) >> 8)) +
> +		(((ch & 0xdf) - 'A' + 11) & ((((ch & 0xdf) - 'F' - 1) & ('A' - 1 - (ch & 0xdf))) >> 8));

probably easier to read using a temporary for ch & 0xdf

	int CH = ch & 0xdf;

	return -1 +
	       ((ch - '0' +  1) & (((ch - '9' - 1) & ('0' - 1 - ch)) >> 8)) +
	       ((CH - 'A' + 11) & (((CH - 'F' - 1) & ('A' - 1 - CH)) >> 8));

Linus Torvalds April 24, 2022, 9:37 p.m. UTC | #2

On Sun, Apr 24, 2022 at 1:54 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> + *
> + * Explanation of the logic:
> + * (ch - '9' - 1) is negative if ch <= '9'
> + * ('0' - 1 - ch) is negative if ch >= '0'

True, but...

Please, just to make me happier, make the sign of 'ch' be something
very explicit. Right now that code uses 'char ch', which could be
signed or unsigned.

It doesn't really matter in this case, since the arithmetic will be
done in 'int', and as long as 'int' is larger than 'char' as a type
(to be really nit-picky), it all ends up working ok regardless.

But just to make me happier, and to make the algorithm actually do the
_same_ thing on every architecture, please use an explicit signedness
for that 'ch' type.

Because then that 'ch >= X' is well-defined.

Again - your code _works_. That's not what I worry about. But when
playing these kinds of tricks, please make it have the same behavior
across architectures, not just "the end result will be the same
regardless".

Yes, a 'ch' with the high bit set will always be either >= '0' or <=
'9', but note how *which* one it is depends on the exact type, and
'char' is simply not well-defined.

Finally, for the same reason - please don't use ">> 8".  Because I do
not believe that bit 8 is well-defined in your arithmetic. The *sign*
bit will be, but I'm not convinced bit 8 is.

So use ">> 31" or similar.

Also, I do worry that this is *exactly* the kind of trick that a
compiler could easily turn back into a conditional. Usually compilers
tend to go the other way (ie turn conditionals into arithmetic if
possible), but..

                    Linus

Linus Torvalds April 24, 2022, 9:42 p.m. UTC | #3

On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Finally, for the same reason - please don't use ">> 8".  Because I do
> not believe that bit 8 is well-defined in your arithmetic. The *sign*
> bit will be, but I'm not convinced bit 8 is.

Hmm.. I think it's ok. It can indeed overflow in 'char' and change the
sign in bit #7, but I suspect bit #8 is always fine.

Still, If you want to just extend the sign bit, ">> 31" _is_ the
obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or
whatever, you get my drift).

           Linus

David Laight April 25, 2022, 9:37 a.m. UTC | #4

From: Linus Torvalds
> Sent: 24 April 2022 22:42
> 
> On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Finally, for the same reason - please don't use ">> 8".  Because I do
> > not believe that bit 8 is well-defined in your arithmetic. The *sign*
> > bit will be, but I'm not convinced bit 8 is.
> 
> Hmm.. I think it's ok. It can indeed overflow in 'char' and change the
> sign in bit #7, but I suspect bit #8 is always fine.
> 
> Still, If you want to just extend the sign bit, ">> 31" _is_ the
> obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or
> whatever, you get my drift).

Except that right shifts of signed values are UB.
In particular it has always been valid to do an unsigned
shift right on a 2's compliment negative number.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Mikulas Patocka April 25, 2022, 11:04 a.m. UTC | #5

On Mon, 25 Apr 2022, David Laight wrote:

> From: Linus Torvalds
> > Sent: 24 April 2022 22:42
> > 
> > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Finally, for the same reason - please don't use ">> 8".  Because I do
> > > not believe that bit 8 is well-defined in your arithmetic. The *sign*
> > > bit will be, but I'm not convinced bit 8 is.
> > 
> > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the
> > sign in bit #7, but I suspect bit #8 is always fine.
> > 
> > Still, If you want to just extend the sign bit, ">> 31" _is_ the
> > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or
> > whatever, you get my drift).
> 
> Except that right shifts of signed values are UB.
> In particular it has always been valid to do an unsigned
> shift right on a 2's compliment negative number.
> 
> 	David

Yes. All the standard versions (C89, C99, C11, C2X) say that right shift 
of a negative value is implementation-defined.

So, we should cast it to "unsigned" before shifting it.

Mikulas

David Laight April 25, 2022, 12:59 p.m. UTC | #6

From: Mikulas Patocka
> Sent: 25 April 2022 12:04
> 
> On Mon, 25 Apr 2022, David Laight wrote:
> 
> > From: Linus Torvalds
> > > Sent: 24 April 2022 22:42
> > >
> > > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > Finally, for the same reason - please don't use ">> 8".  Because I do
> > > > not believe that bit 8 is well-defined in your arithmetic. The *sign*
> > > > bit will be, but I'm not convinced bit 8 is.
> > >
> > > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the
> > > sign in bit #7, but I suspect bit #8 is always fine.
> > >
> > > Still, If you want to just extend the sign bit, ">> 31" _is_ the
> > > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or
> > > whatever, you get my drift).
> >
> > Except that right shifts of signed values are UB.
> > In particular it has always been valid to do an unsigned
> > shift right on a 2's compliment negative number.
> >
> > 	David
> 
> Yes. All the standard versions (C89, C99, C11, C2X) say that right shift
> of a negative value is implementation-defined.
> 
> So, we should cast it to "unsigned" before shifting it.

Except that the intent appears to be to replicate the sign bit.

If it is 'implementation defined' (rather than suddenly being UB)
it might be that the linux kernel requires sign propagating
right shifts of negative values.
This is typically what happens on 2's compliment systems.
But not all small cpu have the required shift instruction.
OTOH all the ones bit enough to run Linux probably do.
(And gcc doesn't support '1's compliment' or 'sign overpunch' cpus.)

The problem is that the compiler writers seem to be entering
a mindset where they are optimising code based on UB behaviour.
So given:
void foo(int x)
{
	if (x >> 1 < 0)
		return;
	do_something();
}
they decide the test is UB, so can always be assumed to be true
and thus do_something() is compiled away.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Mikulas Patocka April 25, 2022, 1:33 p.m. UTC | #7

On Mon, 25 Apr 2022, David Laight wrote:

> From: Mikulas Patocka
> > Sent: 25 April 2022 12:04
> > 
> > On Mon, 25 Apr 2022, David Laight wrote:
> > 
> > > From: Linus Torvalds
> > > > Sent: 24 April 2022 22:42
> > > >
> > > > On Sun, Apr 24, 2022 at 2:37 PM Linus Torvalds
> > > > <torvalds@linux-foundation.org> wrote:
> > > > >
> > > > > Finally, for the same reason - please don't use ">> 8".  Because I do
> > > > > not believe that bit 8 is well-defined in your arithmetic. The *sign*
> > > > > bit will be, but I'm not convinced bit 8 is.
> > > >
> > > > Hmm.. I think it's ok. It can indeed overflow in 'char' and change the
> > > > sign in bit #7, but I suspect bit #8 is always fine.
> > > >
> > > > Still, If you want to just extend the sign bit, ">> 31" _is_ the
> > > > obvious thing to use (yeah, yeah, properly "sizeof(int)*8-1" or
> > > > whatever, you get my drift).
> > >
> > > Except that right shifts of signed values are UB.
> > > In particular it has always been valid to do an unsigned
> > > shift right on a 2's compliment negative number.
> > >
> > > 	David
> > 
> > Yes. All the standard versions (C89, C99, C11, C2X) say that right shift
> > of a negative value is implementation-defined.
> > 
> > So, we should cast it to "unsigned" before shifting it.
> 
> Except that the intent appears to be to replicate the sign bit.
> 
> If it is 'implementation defined' (rather than suddenly being UB)

The standard says "If E1 has a signed type and a negative value, the 
resulting value is implementation-defined."

So, it's not undefined behavior.

> it might be that the linux kernel requires sign propagating
> right shifts of negative values.

It may be that some code in the Linux kernel already assumes that right 
shifts keep the sign. It's hard to say if such code exists.

BTW. ubsan warns about left shift of negative values, but it doesn't warn 
about right shift of negative values.

> This is typically what happens on 2's compliment systems.
> But not all small cpu have the required shift instruction.
> OTOH all the ones bit enough to run Linux probably do.
> (And gcc doesn't support '1's compliment' or 'sign overpunch' cpus.)
> 
> The problem is that the compiler writers seem to be entering
> a mindset where they are optimising code based on UB behaviour.
> So given:
> void foo(int x)
> {
> 	if (x >> 1 < 0)
> 		return;
> 	do_something();
> }
> they decide the test is UB, so can always be assumed to be true
> and thus do_something() is compiled away.
> 
> 	David

If it's implementation-defined (rather than undefined), the compiler 
shouldn't do such optimization.

The linux kernel uses "-fno-strict-overflow" which disables some of these 
UB optimizations.

Mikulas

hex2bin: make the function hex_to_bin constant-time

Commit Message

Comments

Patch