Message ID | 20161214035927.30004-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, On 14.12.2016 04:59, Jason A. Donenfeld wrote: > SipHash is a 64-bit keyed hash function that is actually a > cryptographically secure PRF, like HMAC. Except SipHash is super fast, > and is meant to be used as a hashtable keyed lookup function. Can you show or cite benchmarks in comparison with jhash? Last time I looked, especially for short inputs, siphash didn't beat jhash (also on all the 32 bit devices etc.). > SipHash isn't just some new trendy hash function. It's been around for a > while, and there really isn't anything that comes remotely close to > being useful in the way SipHash is. With that said, why do we need this? > > There are a variety of attacks known as "hashtable poisoning" in which an > attacker forms some data such that the hash of that data will be the > same, and then preceeds to fill up all entries of a hashbucket. This is > a realistic and well-known denial-of-service vector. This pretty much depends on the linearity of the hash function? I don't think a crypto secure hash function is needed for a hash table. Albeit I agree that siphash certainly looks good to be used here. > Linux developers already seem to be aware that this is an issue, and > various places that use hash tables in, say, a network context, use a > non-cryptographically secure function (usually jhash) and then try to > twiddle with the key on a time basis (or in many cases just do nothing > and hope that nobody notices). While this is an admirable attempt at > solving the problem, it doesn't actually fix it. SipHash fixes it. I am pretty sure that SipHash still needs a random key per hash table also. So far it was only the choice of hash function you are questioning. > (It fixes it in such a sound way that you could even build a stream > cipher out of SipHash that would resist the modern cryptanalysis.) > > There are a modicum of places in the kernel that are vulnerable to > hashtable poisoning attacks, either via userspace vectors or network > vectors, and there's not a reliable mechanism inside the kernel at the > moment to fix it. The first step toward fixing these issues is actually > getting a secure primitive into the kernel for developers to use. Then > we can, bit by bit, port things over to it as deemed appropriate. Hmm, I tried to follow up with all the HashDoS work and so far didn't see any HashDoS attacks against the Jenkins/SpookyHash family. If this is an issue we might need to also put those changes into stable. > Dozens of languages are already using this internally for their hash > tables. Some of the BSDs already use this in their kernels. SipHash is > a widely known high-speed solution to a widely known problem, and it's > time we catch-up. Bye, Hannes
Hi David, On Wed, Dec 14, 2016 at 10:56 AM, David Laight <David.Laight@aculab.com> wrote: > ... >> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]) > ... >> + u64 k0 = get_unaligned_le64(key); >> + u64 k1 = get_unaligned_le64(key + sizeof(u64)); > ... >> + m = get_unaligned_le64(data); > > All these unaligned accesses are going to get expensive on architectures > like sparc64. Yes, the unaligned accesses aren't pretty. Since in pretty much all use cases thus far, the data can easily be made aligned, perhaps it makes sense to create siphash24() and siphash24_unaligned(). Any thoughts on doing something like that? Jason
Hi Hannes, On Wed, Dec 14, 2016 at 12:21 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote: > Can you show or cite benchmarks in comparison with jhash? Last time I > looked, especially for short inputs, siphash didn't beat jhash (also on > all the 32 bit devices etc.). I assume that jhash is likely faster than siphash, but I wouldn't be surprised if with optimization we can make siphash at least pretty close on 64-bit platforms. (I'll do some tests though; maybe I'm wrong and jhash is already slower.) With that said, siphash is here to replace uses of jhash where hashtable poisoning vulnerabilities make it necessary. Where there's no significant security improvement, if there's no speed improvement either, then of course nothing's going to change. I should have mentioned md5_transform in this first message too, as two other patches in this series actually replace md5_transform usage with siphash. I think in this case, siphash is a clear performance winner (and security winner) over md5_transform. So if the push back against replacing jhash usages is just too high, at the very least it remains useful already for the md5_transform usage. > This pretty much depends on the linearity of the hash function? I don't > think a crypto secure hash function is needed for a hash table. Albeit I > agree that siphash certainly looks good to be used here. In order to prevent the aforementioned poisoning attacks, a PRF with perfect linearity is required, which is what's achieved when it's a cryptographically secure one. Check out section 7 of https://131002.net/siphash/siphash.pdf . > I am pretty sure that SipHash still needs a random key per hash table > also. So far it was only the choice of hash function you are questioning. Siphash needs a random secret key, yes. The point is that the hash function remains secure so long as the secret key is kept secret. Other functions can't make the same guarantee, and so nervous periodic key rotation is necessary, but in most cases nothing is done, and so things just leak over time. > Hmm, I tried to follow up with all the HashDoS work and so far didn't > see any HashDoS attacks against the Jenkins/SpookyHash family. > > If this is an issue we might need to also put those changes into stable. jhash just isn't secure; it's not a cryptographically secure PRF. If there hasn't already been an academic paper put out there about it this year, let's make this thread 1000 messages long to garner attention, and next year perhaps we'll see one. No doubt that motivated government organizations, defense contractors, criminals, and other netizens have already done research in private. Replacing insecure functions with secure functions is usually a good thing. Jason
Hello, On 14.12.2016 14:10, Jason A. Donenfeld wrote: > On Wed, Dec 14, 2016 at 12:21 PM, Hannes Frederic Sowa > <hannes@stressinduktion.org> wrote: >> Can you show or cite benchmarks in comparison with jhash? Last time I >> looked, especially for short inputs, siphash didn't beat jhash (also on >> all the 32 bit devices etc.). > > I assume that jhash is likely faster than siphash, but I wouldn't be > surprised if with optimization we can make siphash at least pretty > close on 64-bit platforms. (I'll do some tests though; maybe I'm wrong > and jhash is already slower.) Yes, numbers would be very usable here. I am mostly concerned about small plastic router cases. E.g. assume you double packet processing time with a change of the hashing function at what point is the actual packet processing more of an attack vector than the hashtable? > With that said, siphash is here to replace uses of jhash where > hashtable poisoning vulnerabilities make it necessary. Where there's > no significant security improvement, if there's no speed improvement > either, then of course nothing's going to change. It still changes currently well working source. ;-) > I should have mentioned md5_transform in this first message too, as > two other patches in this series actually replace md5_transform usage > with siphash. I think in this case, siphash is a clear performance > winner (and security winner) over md5_transform. So if the push back > against replacing jhash usages is just too high, at the very least it > remains useful already for the md5_transform usage. MD5 is considered broken because its collision resistance is broken? SipHash doesn't even claim to have collision resistance (which we don't need here)? But I agree, certainly it could be a nice speed-up! >> This pretty much depends on the linearity of the hash function? I don't >> think a crypto secure hash function is needed for a hash table. Albeit I >> agree that siphash certainly looks good to be used here. > > In order to prevent the aforementioned poisoning attacks, a PRF with > perfect linearity is required, which is what's achieved when it's a > cryptographically secure one. Check out section 7 of > https://131002.net/siphash/siphash.pdf . I think you mean non-linearity. Otherwise I agree that siphash is certainly a better suited hashing algorithm as far as I know. But it would be really interesting to compare some performance numbers. Hard to say anything without them. >> I am pretty sure that SipHash still needs a random key per hash table >> also. So far it was only the choice of hash function you are questioning. > > Siphash needs a random secret key, yes. The point is that the hash > function remains secure so long as the secret key is kept secret. > Other functions can't make the same guarantee, and so nervous periodic > key rotation is necessary, but in most cases nothing is done, and so > things just leak over time. > > >> Hmm, I tried to follow up with all the HashDoS work and so far didn't >> see any HashDoS attacks against the Jenkins/SpookyHash family. >> >> If this is an issue we might need to also put those changes into stable. > > jhash just isn't secure; it's not a cryptographically secure PRF. If > there hasn't already been an academic paper put out there about it > this year, let's make this thread 1000 messages long to garner > attention, and next year perhaps we'll see one. No doubt that > motivated government organizations, defense contractors, criminals, > and other netizens have already done research in private. Replacing > insecure functions with secure functions is usually a good thing. I think this is a weak argument. In general I am in favor to switch to siphash, but it would be nice to see some benchmarks with the specific kernel implementation also on some smaller 32 bit CPUs and especially without using any SIMD instructions (which might have been used in paper comparison). Bye, Hannes
Hi Hannes, On Wed, Dec 14, 2016 at 4:09 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote: > Yes, numbers would be very usable here. I am mostly concerned about > small plastic router cases. E.g. assume you double packet processing > time with a change of the hashing function at what point is the actual > packet processing more of an attack vector than the hashtable? I agree. Looks like Tom did some very quick benchmarks. I'll do some more precise benchmarks myself when we graduate from looking at md5 replacement (the easy case) to looking at jhash replacement (the harder case). >> With that said, siphash is here to replace uses of jhash where >> hashtable poisoning vulnerabilities make it necessary. Where there's >> no significant security improvement, if there's no speed improvement >> either, then of course nothing's going to change. > > It still changes currently well working source. ;-) I mean if siphash doesn't make things better in someway, we'll just continue using jhash, so no source change or anything. In other words: evolutionary conservative approach rather than hasty "replace 'em all!" tomfoolery. > MD5 is considered broken because its collision resistance is broken? > SipHash doesn't even claim to have collision resistance (which we don't > need here)? Not just that, but it's not immediately clear to me that using MD5 as a PRF the way it is now with md5_transform is even a straightforwardly good idea. > But I agree, certainly it could be a nice speed-up! The benchmarks for the secure sequence number generation and the rng are indeed really promising. > I think you mean non-linearity. Yea of course, editing typo, sorry. > In general I am in favor to switch to siphash, but it would be nice to > see some benchmarks with the specific kernel implementation also on some > smaller 32 bit CPUs and especially without using any SIMD instructions > (which might have been used in paper comparison). Sure, agreed. Each proposed jhash replacement will need to be benchmarked on little MIPS machines and x86 monsters alike, with patches indicating PPS before and after. Jason
On 14.12.2016 13:46, Jason A. Donenfeld wrote: > Hi David, > > On Wed, Dec 14, 2016 at 10:56 AM, David Laight <David.Laight@aculab.com> wrote: >> ... >>> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]) >> ... >>> + u64 k0 = get_unaligned_le64(key); >>> + u64 k1 = get_unaligned_le64(key + sizeof(u64)); >> ... >>> + m = get_unaligned_le64(data); >> >> All these unaligned accesses are going to get expensive on architectures >> like sparc64. > > Yes, the unaligned accesses aren't pretty. Since in pretty much all > use cases thus far, the data can easily be made aligned, perhaps it > makes sense to create siphash24() and siphash24_unaligned(). Any > thoughts on doing something like that? I fear that the alignment requirement will be a source of bugs on 32 bit machines, where you cannot even simply take a well aligned struct on a stack and put it into the normal siphash(aligned) function without adding alignment annotations everywhere. Even blocks returned from kmalloc on 32 bit are not aligned to 64 bit. Can we do this a runtime check and just have one function (siphash) dealing with that? Bye, Hannes
Hi Hannes, On Wed, Dec 14, 2016 at 11:03 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote: > I fear that the alignment requirement will be a source of bugs on 32 bit > machines, where you cannot even simply take a well aligned struct on a > stack and put it into the normal siphash(aligned) function without > adding alignment annotations everywhere. Even blocks returned from > kmalloc on 32 bit are not aligned to 64 bit. That's what the "__aligned(SIPHASH24_ALIGNMENT)" attribute is for. The aligned siphash function will be for structs explicitly made for siphash consumption. For everything else there's siphash_unaligned. > Can we do this a runtime check and just have one function (siphash) > dealing with that? Seems like the runtime branching on the aligned function would be bad for performance, when we likely know at compile time if it's going to be aligned or not. I suppose we could add that check just to the unaligned version, and rename it to "maybe_unaligned"? Is this what you have in mind? Jason
Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Siphash needs a random secret key, yes. The point is that the hash > function remains secure so long as the secret key is kept secret. > Other functions can't make the same guarantee, and so nervous periodic > key rotation is necessary, but in most cases nothing is done, and so > things just leak over time. Actually those users that use rhashtable now have a much more sophisticated defence against these attacks, dyanmic rehashing when bucket length exceeds a preset limit. Cheers,
On Thu, 2016-12-15 at 15:57 +0800, Herbert Xu wrote: > Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > Siphash needs a random secret key, yes. The point is that the hash > > function remains secure so long as the secret key is kept secret. > > Other functions can't make the same guarantee, and so nervous > > periodic > > key rotation is necessary, but in most cases nothing is done, and so > > things just leak over time. > > Actually those users that use rhashtable now have a much more > sophisticated defence against these attacks, dyanmic rehashing > when bucket length exceeds a preset limit. > > Cheers, Key independent collisions won't be mitigated by picking a new secret. A simple solution with clear security properties is ideal.
On 15.12.2016 00:29, Jason A. Donenfeld wrote: > Hi Hannes, > > On Wed, Dec 14, 2016 at 11:03 PM, Hannes Frederic Sowa > <hannes@stressinduktion.org> wrote: >> I fear that the alignment requirement will be a source of bugs on 32 bit >> machines, where you cannot even simply take a well aligned struct on a >> stack and put it into the normal siphash(aligned) function without >> adding alignment annotations everywhere. Even blocks returned from >> kmalloc on 32 bit are not aligned to 64 bit. > > That's what the "__aligned(SIPHASH24_ALIGNMENT)" attribute is for. The > aligned siphash function will be for structs explicitly made for > siphash consumption. For everything else there's siphash_unaligned. So in case you have a pointer from somewhere on 32 bit you can essentially only guarantee it has natural alignment or max. native alignment (based on the arch). gcc only fulfills your request for alignment when you allocate on the stack (minus gcc bugs). Let's say you get a pointer from somewhere, maybe embedded in a struct, which came from kmalloc. kmalloc doesn't care about aligned attribute, it will align according to architecture description. That said, if you want to hash that, you would need manually align the memory returned from kmalloc or make sure the the data is more than naturally aligned on that architecture. >> Can we do this a runtime check and just have one function (siphash) >> dealing with that? > > Seems like the runtime branching on the aligned function would be bad > for performance, when we likely know at compile time if it's going to > be aligned or not. I suppose we could add that check just to the > unaligned version, and rename it to "maybe_unaligned"? Is this what > you have in mind? I argue that you mostly don't know at compile time if it is correctly aligned if the alignment requirements are larger than the natural ones. Also, we don't even have that for memcpy, even we use it probably much more than hashing, so I think this is overkill. Bye, Hannes
From: Hannes Frederic Sowa > Sent: 14 December 2016 22:03 > On 14.12.2016 13:46, Jason A. Donenfeld wrote: > > Hi David, > > > > On Wed, Dec 14, 2016 at 10:56 AM, David Laight <David.Laight@aculab.com> wrote: > >> ... > >>> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]) > >> ... > >>> + u64 k0 = get_unaligned_le64(key); > >>> + u64 k1 = get_unaligned_le64(key + sizeof(u64)); > >> ... > >>> + m = get_unaligned_le64(data); > >> > >> All these unaligned accesses are going to get expensive on architectures > >> like sparc64. > > > > Yes, the unaligned accesses aren't pretty. Since in pretty much all > > use cases thus far, the data can easily be made aligned, perhaps it > > makes sense to create siphash24() and siphash24_unaligned(). Any > > thoughts on doing something like that? > > I fear that the alignment requirement will be a source of bugs on 32 bit > machines, where you cannot even simply take a well aligned struct on a > stack and put it into the normal siphash(aligned) function without > adding alignment annotations everywhere. Even blocks returned from > kmalloc on 32 bit are not aligned to 64 bit. Are you doing anything that will require 64bit alignment on 32bit systems? It is unlikely that the kernel can use any simd registers that have wider alignment requirements. You also really don't want to request on-stack items have large alignments. While gcc can generate code to do it, it isn't pretty. David
On 15.12.2016 12:04, David Laight wrote: > From: Hannes Frederic Sowa >> Sent: 14 December 2016 22:03 >> On 14.12.2016 13:46, Jason A. Donenfeld wrote: >>> Hi David, >>> >>> On Wed, Dec 14, 2016 at 10:56 AM, David Laight <David.Laight@aculab.com> wrote: >>>> ... >>>>> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]) >>>> ... >>>>> + u64 k0 = get_unaligned_le64(key); >>>>> + u64 k1 = get_unaligned_le64(key + sizeof(u64)); >>>> ... >>>>> + m = get_unaligned_le64(data); >>>> >>>> All these unaligned accesses are going to get expensive on architectures >>>> like sparc64. >>> >>> Yes, the unaligned accesses aren't pretty. Since in pretty much all >>> use cases thus far, the data can easily be made aligned, perhaps it >>> makes sense to create siphash24() and siphash24_unaligned(). Any >>> thoughts on doing something like that? >> >> I fear that the alignment requirement will be a source of bugs on 32 bit >> machines, where you cannot even simply take a well aligned struct on a >> stack and put it into the normal siphash(aligned) function without >> adding alignment annotations everywhere. Even blocks returned from >> kmalloc on 32 bit are not aligned to 64 bit. > > Are you doing anything that will require 64bit alignment on 32bit systems? > It is unlikely that the kernel can use any simd registers that have wider > alignment requirements. > > You also really don't want to request on-stack items have large alignments. > While gcc can generate code to do it, it isn't pretty. Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8 bytes on 32 bit. Do you question that?
From: Hannes Frederic Sowa > Sent: 15 December 2016 12:23 ... > Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8 > bytes on 32 bit. Do you question that? Yes. The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc). David
On 15.12.2016 13:28, David Laight wrote: > From: Hannes Frederic Sowa >> Sent: 15 December 2016 12:23 > ... >> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8 >> bytes on 32 bit. Do you question that? > > Yes. > > The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc). Hmm, u64 on 32 bit is unsigned long long and not unsigned long. Thus I am actually not sure if the ABI would say anything about that (sorry also for my wrong statement above). Alignment requirement of unsigned long long on gcc with -m32 actually seem to be 8.
From: Hannes Frederic Sowa > Sent: 15 December 2016 12:50 > On 15.12.2016 13:28, David Laight wrote: > > From: Hannes Frederic Sowa > >> Sent: 15 December 2016 12:23 > > ... > >> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8 > >> bytes on 32 bit. Do you question that? > > > > Yes. > > > > The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc). > > Hmm, u64 on 32 bit is unsigned long long and not unsigned long. Thus I > am actually not sure if the ABI would say anything about that (sorry > also for my wrong statement above). > > Alignment requirement of unsigned long long on gcc with -m32 actually > seem to be 8. It depends on the architecture. For x86 it is definitely 4. It might be 8 for sparc, ppc and/or alpha. David
On 15.12.2016 14:56, David Laight wrote: > From: Hannes Frederic Sowa >> Sent: 15 December 2016 12:50 >> On 15.12.2016 13:28, David Laight wrote: >>> From: Hannes Frederic Sowa >>>> Sent: 15 December 2016 12:23 >>> ... >>>> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8 >>>> bytes on 32 bit. Do you question that? >>> >>> Yes. >>> >>> The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc). >> >> Hmm, u64 on 32 bit is unsigned long long and not unsigned long. Thus I >> am actually not sure if the ABI would say anything about that (sorry >> also for my wrong statement above). >> >> Alignment requirement of unsigned long long on gcc with -m32 actually >> seem to be 8. > > It depends on the architecture. > For x86 it is definitely 4. May I ask for a reference? I couldn't see unsigned long long being mentioned in the ia32 abi spec that I found. I agree that those accesses might be synthetically assembled by gcc and for me the alignment of 4 would have seemed natural. But my gcc at least in 32 bit mode disagrees with that. > It might be 8 for sparc, ppc and/or alpha. This is something to find out... Right now ipv6 addresses have an alignment of 4. So we couldn't even naturally pass them to siphash but would need to copy them around, which I feel like a source of bugs. Bye, Hannes
From: Hannes Frederic Sowa > Sent: 15 December 2016 14:57 > On 15.12.2016 14:56, David Laight wrote: > > From: Hannes Frederic Sowa > >> Sent: 15 December 2016 12:50 > >> On 15.12.2016 13:28, David Laight wrote: > >>> From: Hannes Frederic Sowa > >>>> Sent: 15 December 2016 12:23 > >>> ... > >>>> Hmm? Even the Intel ABI expects alignment of unsigned long long to be 8 > >>>> bytes on 32 bit. Do you question that? > >>> > >>> Yes. > >>> > >>> The linux ABI for x86 (32 bit) only requires 32bit alignment for u64 (etc). > >> > >> Hmm, u64 on 32 bit is unsigned long long and not unsigned long. Thus I > >> am actually not sure if the ABI would say anything about that (sorry > >> also for my wrong statement above). > >> > >> Alignment requirement of unsigned long long on gcc with -m32 actually > >> seem to be 8. > > > > It depends on the architecture. > > For x86 it is definitely 4. > > May I ask for a reference? Ask anyone who has had to do compatibility layers to support 32bit binaries on 64bit systems. > I couldn't see unsigned long long being > mentioned in the ia32 abi spec that I found. I agree that those accesses > might be synthetically assembled by gcc and for me the alignment of 4 > would have seemed natural. But my gcc at least in 32 bit mode disagrees > with that. Try (retyped): echo 'struct { long a; long long b; } s; int bar { return sizeof s; }' >foo.c gcc [-m32] -O2 -S foo.c; cat foo.s And look at what is generated. > Right now ipv6 addresses have an alignment of 4. So we couldn't even > naturally pass them to siphash but would need to copy them around, which > I feel like a source of bugs. That is more of a problem on systems that don't support misaligned accesses. Reading the 64bit values with two explicit 32bit reads would work. I think you can get gcc to do that by adding an aligned(4) attribute to the structure member. David
On 15.12.2016 16:41, David Laight wrote: > Try (retyped): > > echo 'struct { long a; long long b; } s; int bar { return sizeof s; }' >foo.c > gcc [-m32] -O2 -S foo.c; cat foo.s > > And look at what is generated. I used __alignof__(unsigned long long) with -m32. >> Right now ipv6 addresses have an alignment of 4. So we couldn't even >> naturally pass them to siphash but would need to copy them around, which >> I feel like a source of bugs. > > That is more of a problem on systems that don't support misaligned accesses. > Reading the 64bit values with two explicit 32bit reads would work. > I think you can get gcc to do that by adding an aligned(4) attribute to the > structure member. Yes, and that is actually my fear, because we support those architectures. I can't comment on that as I don't understand enough of this. If someone finds a way to cause misaligned reads on a small box this seems (maybe depending on sysctls they get fixed up or panic) to be a much bigger issue than having a hash DoS. Thanks, Hannes
Hi David & Hannes, This conversation is veering off course. I think this doesn't really matter at all. Gcc converts u64 into essentially a pair of u32 on 32-bit platforms, so the alignment requirements for 32-bit is at a maximum 32 bits. On 64-bit platforms the alignment requirements are related at a maximum to the biggest register size, so 64-bit alignment. For this reason, no matter the behavior of __aligned(8), we're okay. Likewise, even without __aligned(8), if gcc aligns structs by their biggest member, then we get 4 byte alignment on 32-bit and 8 byte alignment on 64-bit, which is fine. There's no 32-bit platform that will trap on a 64-bit unaligned access because there's no such thing as a 64-bit access there. In short, we're fine. (The reason in6_addr aligns itself to 4 bytes on 64-bit platforms is that it's defined as being u32 blah[4]. If we added a u64 blah[2], we'd get 8 byte alignment, but that's not in the header. Feel free to start a new thread about this issue if you feel this ought to be added for whatever reason.) One optimization that's been suggested on this list is that instead of u8 key[16] and requiring the alignment attribute, I should just use u64 key[2]. This seems reasonable to me, and it will also save the endian conversion call. These keys generally aren't transmitted over a network, so I don't think a byte-wise encoding is particularly important. The other suggestion I've seen is that I make the functions take a const void * instead of a const u8 * for the data, in order to save ugly casts. I'll do this too. Meanwhile Linus has condemned our 4dwords/2qwords naming, and I'll need to think of something different. The best I can think of right now is siphash_4_u32/siphash_2_u64, but I don't find it especially pretty. Open to suggestions. Regards, Jason
Hello, On 15.12.2016 19:50, Jason A. Donenfeld wrote: > Hi David & Hannes, > > This conversation is veering off course. Why? > I think this doesn't really > matter at all. Gcc converts u64 into essentially a pair of u32 on > 32-bit platforms, so the alignment requirements for 32-bit is at a > maximum 32 bits. On 64-bit platforms the alignment requirements are > related at a maximum to the biggest register size, so 64-bit > alignment. For this reason, no matter the behavior of __aligned(8), > we're okay. Likewise, even without __aligned(8), if gcc aligns structs > by their biggest member, then we get 4 byte alignment on 32-bit and 8 > byte alignment on 64-bit, which is fine. There's no 32-bit platform > that will trap on a 64-bit unaligned access because there's no such > thing as a 64-bit access there. In short, we're fine. ARM64 and x86-64 have memory operations that are not vector operations that operate on 128 bit memory. How do you know that the compiler for some architecture will not chose a more optimized instruction to load a 64 bit memory value into two 32 bit registers if you tell the compiler it is 8 byte aligned but it actually isn't? I don't know the answer but telling the compiler some data is 8 byte aligned while it isn't really pretty much seems like a call for trouble. Why can't a compiler not vectorize this code if it can prove that it doesn't conflict with other register users? Bye, Hannes
On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote: > ARM64 and x86-64 have memory operations that are not vector operations > that operate on 128 bit memory. Fair enough. imull I guess. > How do you know that the compiler for some architecture will not chose a > more optimized instruction to load a 64 bit memory value into two 32 bit > registers if you tell the compiler it is 8 byte aligned but it actually > isn't? I don't know the answer but telling the compiler some data is 8 > byte aligned while it isn't really pretty much seems like a call for > trouble. If a compiler is in the business of using special 64-bit instructions on 64-bit aligned data, then it is also the job of the compiler to align structs to 64-bits when passed __aligned(8), which is what we've done in this code. If the compiler were to hypothetically choose to ignore that and internally convert it to a __aligned(4), then it would only be able to do so with the knowledge that it will never use 64-bit aligned data instructions. But so far as I can tell, gcc always respects __aligned(8), which is why I use it in this patchset. I think there might have been confusion here, because perhaps someone was hoping that since in6_addr is 128-bits, that the __aligned attribute would not be required and that the struct would just automatically be aligned to at least 8 bytes. But in fact, as I mentioned, in6_addr is actually composed of u32[4] and not u64[2], so it will only be aligned to 4 bytes, making the __aligned(8) necessary. I think for the purposes of this patchset, this is a solved problem. There's the unaligned version of the function if you don't know about the data, and there's the aligned version if you're using __aligned(SIPHASH_ALIGNMENT) on your data. Plain and simple. Jason
On Thu, Dec 15, 2016 at 09:43:04PM +0100, Jason A. Donenfeld wrote: > On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa > <hannes@stressinduktion.org> wrote: > > ARM64 and x86-64 have memory operations that are not vector operations > > that operate on 128 bit memory. > > Fair enough. imull I guess. imull is into rdx:rax, not memory. I suspect he's talking about cmpxchg16b.
On 15.12.2016 22:04, Peter Zijlstra wrote: > On Thu, Dec 15, 2016 at 09:43:04PM +0100, Jason A. Donenfeld wrote: >> On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa >> <hannes@stressinduktion.org> wrote: >>> ARM64 and x86-64 have memory operations that are not vector operations >>> that operate on 128 bit memory. >> >> Fair enough. imull I guess. > > imull is into rdx:rax, not memory. I suspect he's talking about > cmpxchg16b. Exactly and I think I saw a ll/sc 128 bit on armv8 to atomically manipulate linked lists. Bye, Hannes
On Thu, Dec 15, 2016 at 07:50:36PM +0100, Jason A. Donenfeld wrote: > There's no 32-bit platform > that will trap on a 64-bit unaligned access because there's no such > thing as a 64-bit access there. In short, we're fine. ARMv7 LPAE is a 32bit architecture that has 64bit load/stores IIRC. x86 has cmpxchg8b that can do 64bit things and very much wants the u64 aligned. Also, IIRC we have a few platforms where u64 doesn't carry 8 byte alignment, m68k or something like that, but yes, you likely don't care. Just to make life interesting...
On Thu, Dec 15, 2016 at 10:09 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Dec 15, 2016 at 07:50:36PM +0100, Jason A. Donenfeld wrote: >> There's no 32-bit platform >> that will trap on a 64-bit unaligned access because there's no such >> thing as a 64-bit access there. In short, we're fine. > > ARMv7 LPAE is a 32bit architecture that has 64bit load/stores IIRC. > > x86 has cmpxchg8b that can do 64bit things and very much wants the u64 > aligned. > > Also, IIRC we have a few platforms where u64 doesn't carry 8 byte > alignment, m68k or something like that, but yes, you likely don't care. Indeed, I stand corrected. But in any case, the use of __aligned(8) in the patchset ensures that things are fixed and that we don't have this issue.
On Thu, Dec 15, 2016 at 1:11 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Indeed, I stand corrected. But in any case, the use of __aligned(8) in > the patchset ensures that things are fixed and that we don't have this > issue. I think you can/should just use the natural alignment for "u64". For architectures that need 8-byte alignment, u64 will already be properly aligned. For architectures (like x86-32) that only need 4-byte alignment, you get it. Linus
On 15.12.2016 21:43, Jason A. Donenfeld wrote: > On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa > <hannes@stressinduktion.org> wrote: >> ARM64 and x86-64 have memory operations that are not vector operations >> that operate on 128 bit memory. > > Fair enough. imull I guess. > >> How do you know that the compiler for some architecture will not chose a >> more optimized instruction to load a 64 bit memory value into two 32 bit >> registers if you tell the compiler it is 8 byte aligned but it actually >> isn't? I don't know the answer but telling the compiler some data is 8 >> byte aligned while it isn't really pretty much seems like a call for >> trouble. > > If a compiler is in the business of using special 64-bit instructions > on 64-bit aligned data, then it is also the job of the compiler to > align structs to 64-bits when passed __aligned(8), which is what we've > done in this code. If the compiler were to hypothetically choose to > ignore that and internally convert it to a __aligned(4), then it would > only be able to do so with the knowledge that it will never use 64-bit > aligned data instructions. But so far as I can tell, gcc always > respects __aligned(8), which is why I use it in this patchset. > > I think there might have been confusion here, because perhaps someone > was hoping that since in6_addr is 128-bits, that the __aligned > attribute would not be required and that the struct would just > automatically be aligned to at least 8 bytes. But in fact, as I > mentioned, in6_addr is actually composed of u32[4] and not u64[2], so > it will only be aligned to 4 bytes, making the __aligned(8) necessary. > > I think for the purposes of this patchset, this is a solved problem. > There's the unaligned version of the function if you don't know about > the data, and there's the aligned version if you're using > __aligned(SIPHASH_ALIGNMENT) on your data. Plain and simple. And I was exactly questioning this. static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr, const struct in6_addr *daddr) { net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd)); return jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr), (__force u32)id, ip6_frags.rnd); } This function had a hash DoS (and kind of still has), but it has been mitigated by explicit checks, I hope. So you start looking for all the pointers where ipv6 addresses could come from and find some globally defined struct where I would need to put the aligned(SIPHASH_ALIGNMENT) into to make this work on 32 bit code? Otherwise just the unaligned version is safe on 32 bit code. Who knows this? It isn't even obvious by looking at the header! I would be interested if the compiler can actually constant-fold the address of the stack allocation with an simple if () or some __builtin_constant_p fiddeling, so we don't have this constant review overhead to which function we pass which data. This would also make this whole discussion mood. Bye, Hannes
diff --git a/include/linux/siphash.h b/include/linux/siphash.h new file mode 100644 index 000000000000..6623b3090645 --- /dev/null +++ b/include/linux/siphash.h @@ -0,0 +1,20 @@ +/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com> + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + */ + +#ifndef _LINUX_SIPHASH_H +#define _LINUX_SIPHASH_H + +#include <linux/types.h> + +enum siphash_lengths { + SIPHASH24_KEY_LEN = 16 +}; + +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]); + +#endif /* _LINUX_SIPHASH_H */ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e6327d102184..32bbf689fc46 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1843,9 +1843,9 @@ config TEST_HASH tristate "Perform selftest on hash functions" default n help - Enable this option to test the kernel's integer (<linux/hash,h>) - and string (<linux/stringhash.h>) hash functions on boot - (or module load). + Enable this option to test the kernel's integer (<linux/hash.h>), + string (<linux/stringhash.h>), and siphash (<linux/siphash.h>) + hash functions on boot (or module load). This is intended to help people writing architecture-specific optimized versions. If unsure, say N. diff --git a/lib/Makefile b/lib/Makefile index 50144a3aeebd..71d398b04a74 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ sha1.o chacha20.o md5.o irq_regs.o argv_split.o \ flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ - earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o + earlycpio.o seq_buf.o siphash.o \ + nmi_backtrace.o nodemask.o win_minmax.o lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_SMP) += cpumask.o @@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o obj-y += kstrtox.o obj-$(CONFIG_TEST_BPF) += test_bpf.o obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o -obj-$(CONFIG_TEST_HASH) += test_hash.o +obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o obj-$(CONFIG_TEST_KASAN) += test_kasan.o obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o obj-$(CONFIG_TEST_LKM) += test_module.o diff --git a/lib/siphash.c b/lib/siphash.c new file mode 100644 index 000000000000..7b55ad3a7fe9 --- /dev/null +++ b/lib/siphash.c @@ -0,0 +1,76 @@ +/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com> + * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> + * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to> + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + */ + +#include <linux/siphash.h> +#include <linux/kernel.h> +#include <asm/unaligned.h> + +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 +#include <linux/dcache.h> +#include <asm/word-at-a-time.h> +#endif + +#define SIPROUND \ + do { \ + v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \ + v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \ + v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \ + v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \ + } while(0) + +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]) +{ + u64 v0 = 0x736f6d6570736575ULL; + u64 v1 = 0x646f72616e646f6dULL; + u64 v2 = 0x6c7967656e657261ULL; + u64 v3 = 0x7465646279746573ULL; + u64 b = ((u64)len) << 56; + u64 k0 = get_unaligned_le64(key); + u64 k1 = get_unaligned_le64(key + sizeof(u64)); + u64 m; + const u8 *end = data + len - (len % sizeof(u64)); + const u8 left = len & (sizeof(u64) - 1); + v3 ^= k1; + v2 ^= k0; + v1 ^= k1; + v0 ^= k0; + for (; data != end; data += sizeof(u64)) { + m = get_unaligned_le64(data); + v3 ^= m; + SIPROUND; + SIPROUND; + v0 ^= m; + } +#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64 + if (left) + b |= le64_to_cpu(load_unaligned_zeropad(data) & bytemask_from_count(left)); +#else + switch (left) { + case 7: b |= ((u64)data[6]) << 48; + case 6: b |= ((u64)data[5]) << 40; + case 5: b |= ((u64)data[4]) << 32; + case 4: b |= get_unaligned_le32(data); break; + case 3: b |= ((u64)data[2]) << 16; + case 2: b |= get_unaligned_le16(data); break; + case 1: b |= data[0]; + } +#endif + v3 ^= b; + SIPROUND; + SIPROUND; + v0 ^= b; + v2 ^= 0xff; + SIPROUND; + SIPROUND; + SIPROUND; + SIPROUND; + return (v0 ^ v1) ^ (v2 ^ v3); +} +EXPORT_SYMBOL(siphash24); diff --git a/lib/test_siphash.c b/lib/test_siphash.c new file mode 100644 index 000000000000..336298aaa33b --- /dev/null +++ b/lib/test_siphash.c @@ -0,0 +1,74 @@ +/* Test cases for siphash.c + * + * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com> + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/siphash.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/module.h> + +/* Test vectors taken from official reference source available at: + * https://131002.net/siphash/siphash24.c + */ +static const u64 test_vectors[64] = { + 0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL, + 0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL, + 0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL, + 0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL, + 0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL, + 0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL, + 0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL, + 0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL, + 0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL, + 0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL, + 0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL, + 0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL, + 0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL, + 0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL, + 0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL, + 0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL, + 0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL, + 0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL, + 0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL, + 0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL, + 0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL, + 0x958a324ceb064572ULL +}; + +static int __init siphash_test_init(void) +{ + u8 in[64], k[16], i; + int ret = 0; + + for (i = 0; i < 16; ++i) + k[i] = i; + for (i = 0; i < 64; ++i) { + in[i] = i; + if (siphash24(in, i, k) != test_vectors[i]) { + pr_info("self-test %u: FAIL\n", i + 1); + ret = -EINVAL; + } + } + if (!ret) + pr_info("self-tests: pass\n"); + return ret; +} + +static void __exit siphash_test_exit(void) +{ +} + +module_init(siphash_test_init); +module_exit(siphash_test_exit); + +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>"); +MODULE_LICENSE("Dual BSD/GPL");
SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function. SipHash isn't just some new trendy hash function. It's been around for a while, and there really isn't anything that comes remotely close to being useful in the way SipHash is. With that said, why do we need this? There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Linux developers already seem to be aware that this is an issue, and various places that use hash tables in, say, a network context, use a non-cryptographically secure function (usually jhash) and then try to twiddle with the key on a time basis (or in many cases just do nothing and hope that nobody notices). While this is an admirable attempt at solving the problem, it doesn't actually fix it. SipHash fixes it. (It fixes it in such a sound way that you could even build a stream cipher out of SipHash that would resist the modern cryptanalysis.) There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. Dozens of languages are already using this internally for their hash tables. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known problem, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Daniel J. Bernstein <djb@cr.yp.to> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Biggers <ebiggers3@gmail.com> --- Changes from v1->v2: - None in this patch, but see elsewhere in series. include/linux/siphash.h | 20 +++++++++++++ lib/Kconfig.debug | 6 ++-- lib/Makefile | 5 ++-- lib/siphash.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++ lib/test_siphash.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 176 insertions(+), 5 deletions(-) create mode 100644 include/linux/siphash.h create mode 100644 lib/siphash.c create mode 100644 lib/test_siphash.c