Message ID | 20161212034817.1773-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
On Sun, Dec 11, 2016 at 7:48 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > + switch (left) { > + case 7: b |= ((u64)data[6]) << 48; > + case 6: b |= ((u64)data[5]) << 40; > + case 5: b |= ((u64)data[4]) << 32; > + case 4: b |= ((u64)data[3]) << 24; > + case 3: b |= ((u64)data[2]) << 16; > + case 2: b |= ((u64)data[1]) << 8; > + case 1: b |= ((u64)data[0]); break; > + case 0: break; > + } The above is extremely inefficient. Considering that most kernel data would be expected to be smallish, that matters (ie the usual benchmark would not be about hashing megabytes of data, but instead millions of hashes of small data). I think this could be rewritten (at least for 64-bit architectures) as #ifdef CONFIG_DCACHE_WORD_ACCESS if (left) b |= le64_to_cpu(load_unaligned_zeropad(data) & bytemask_from_count(left)); #else .. do the duff's device thing with the switch() .. #endif which should give you basically perfect code generation (ie a single 64-bit load and a byte mask). Totally untested, just looking at the code and trying to make sense of it. ... and obviously, it requires an actual high-performance use-case to make any difference. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 12, 2016 at 04:48:17AM +0100, Jason A. Donenfeld wrote: > > diff --git a/lib/Makefile b/lib/Makefile > index 50144a3aeebd..71d398b04a74 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ > sha1.o chacha20.o md5.o irq_regs.o argv_split.o \ > flex_proportions.o ratelimit.o show_mem.o \ > is_single_threaded.o plist.o decompress.o kobject_uevent.o \ > - earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o > + earlycpio.o seq_buf.o siphash.o \ > + nmi_backtrace.o nodemask.o win_minmax.o > > lib-$(CONFIG_MMU) += ioremap.o > lib-$(CONFIG_SMP) += cpumask.o > @@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o > obj-y += kstrtox.o > obj-$(CONFIG_TEST_BPF) += test_bpf.o > obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o > -obj-$(CONFIG_TEST_HASH) += test_hash.o > +obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o Maybe add to the help text for CONFIG_TEST_HASH that it now tests siphash too? > +static inline u64 le64_to_cpuvp(const void *p) > +{ > + return le64_to_cpup(p); > +} This assumes the key and message buffers are aligned to __alignof__(u64). Unless that's going to be a clearly documented requirement for callers, you should use get_unaligned_le64() instead. And you can pass a 'u8 *' directly to get_unaligned_le64(), no need for a helper function. > + b = (v0 ^ v1) ^ (v2 ^ v3); > + return (__force u64)cpu_to_le64(b); > +} It makes sense for this to return a u64, but that means the cpu_to_le64() is wrong, since u64 indicates CPU endianness. It should just return 'b'. > +++ b/lib/test_siphash.c > @@ -0,0 +1,116 @@ > +/* Test cases for siphash.c > + * > + * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com> > + * > + * This file is provided under a dual BSD/GPLv2 license. > + * > + * SipHash: a fast short-input PRF > + * https://131002.net/siphash/ > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/siphash.h> > +#include <linux/kernel.h> > +#include <linux/string.h> > +#include <linux/errno.h> > +#include <linux/module.h> > + > +static const u8 test_vectors[64][8] = { > + { 0x31, 0x0e, 0x0e, 0xdd, 0x47, 0xdb, 0x6f, 0x72 }, Can you mention in a comment where the test vectors came from? > + if (memcmp(&out, test_vectors[i], 8)) { > + pr_info("self-test %u: FAIL\n", i + 1); > + ret = -EINVAL; > + } If you make the output really be CPU-endian like I'm suggesting then this will need to be something like: if (out != get_unaligned_le64(test_vectors[i])) { Or else make the test vectors be an array of u64. - Eric -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Linus, On Mon, Dec 12, 2016 at 5:01 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > The above is extremely inefficient. Considering that most kernel data > would be expected to be smallish, that matters (ie the usual benchmark > would not be about hashing megabytes of data, but instead millions of > hashes of small data). > > I think this could be rewritten (at least for 64-bit architectures) as > > #ifdef CONFIG_DCACHE_WORD_ACCESS > > if (left) > b |= le64_to_cpu(load_unaligned_zeropad(data) & > bytemask_from_count(left)); > > #else > > .. do the duff's device thing with the switch() .. > > #endif > > which should give you basically perfect code generation (ie a single > 64-bit load and a byte mask). I modified the test to hash data of size 0 through 7 repeatedly 100000000 times, and benchmarked that a few times on a Skylake laptop. The `load_unaligned_zeropad & bytemask_from_count` version was consistently 7% slower. I then modified it again to simply hash a 4 byte constant repeatedly 1000000000 times. The `load_unaligned_zeropad & bytemask_from_count` version was around 6% faster. I tried again with a 7 byte constant and got more or less a similar result. Then I tried with a 1 byte constant, and found that the `load_unaligned_zeropad & bytemask_from_count` version was slower. So, it would seem that between the `if (left)` and the `switch (left)`, there's the same number of branches. But for small values of `left`, the duff's device just has simpler arithmetic, whereas for large values of `left`, the `load_unaligned_zeropad` prevails. If micro-optimization is really appealing, one could imagine a hybrid of the two: switch (left) { case 7: case 6: case 5: case 4: b |= le64_to_cpu(load_unaligned_zeropad(data) & bytemask_from_count(left)); break; case 3: b |= ((u64)data[2]) << 16; case 2: b |= ((u64)data[1]) << 8; case 1: b |= ((u64)data[0]); break; case 0: break; } But I'm not sure this complication is worth it, and it might be more likely that the left-over size is 4 bytes most of the time, so we should just use your trick on platforms that support it. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Eric, Lots of good points; thanks for the review. Responses are inline below. On Mon, Dec 12, 2016 at 6:42 AM, Eric Biggers <ebiggers3@gmail.com> wrote: > Maybe add to the help text for CONFIG_TEST_HASH that it now tests siphash too? Good call. Will do. > This assumes the key and message buffers are aligned to __alignof__(u64). > Unless that's going to be a clearly documented requirement for callers, you > should use get_unaligned_le64() instead. And you can pass a 'u8 *' directly to > get_unaligned_le64(), no need for a helper function. I had thought about that briefly, but just sort of figured most people were passing in aligned variables... but that's a pretty bad assumption to make especially for 64-bit alignment. I'll switch to using the get_unaligned functions. [As a side note, I wonder if crypto/chacha20_generic.c should be using the unaligned functions instead too, at least for the iv reading...] > It makes sense for this to return a u64, but that means the cpu_to_le64() is > wrong, since u64 indicates CPU endianness. It should just return 'b'. At first I was very opposed to making this change, since by returning a value with an explicit byte order, you can cast to u8 and have uniform indexed byte access across platforms. But of course this doesn't make any sense, since it's returning a u64, and it makes all other bitwise operations non-uniform anyway. I checked with JP (co-creator of siphash, CC'd) and he confirmed your suspicion that it was just to make the test vector comparison easier and for some byte-wise uniformity, but that it's not strictly necessary. So, I've removed that last cpu_to_le64, and I've also refactored those test vectors to be written as ULL literals, so that a simple == integer comparison will work across platforms. > Can you mention in a comment where the test vectors came from? Sure, will do. > If you make the output really be CPU-endian like I'm suggesting then this will > need to be something like: > > if (out != get_unaligned_le64(test_vectors[i])) { > > Or else make the test vectors be an array of u64. Yep, I wound up doing that. Thanks Eric! Will submit a v3 soon if nobody else has comments. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Dec 11, 2016 at 9:48 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > I modified the test to hash data of size 0 through 7 repeatedly > 100000000 times, and benchmarked that a few times on a Skylake laptop. > The `load_unaligned_zeropad & bytemask_from_count` version was > consistently 7% slower. > > I then modified it again to simply hash a 4 byte constant repeatedly > 1000000000 times. The `load_unaligned_zeropad & bytemask_from_count` > version was around 6% faster. I tried again with a 7 byte constant and > got more or less a similar result. > > Then I tried with a 1 byte constant, and found that the > `load_unaligned_zeropad & bytemask_from_count` version was slower. > > So, it would seem that between the `if (left)` and the `switch > (left)`, there's the same number of branches. Interesting. For the dcache code (which is where that trick comes from), we used to have a loop (rather than the duff's device thing), and it performed badly due to the consistently badly predicted branch of the loop. But I never compared it against the duff's device version. I guess you could try to just remove the "if (left)" test entirely, if it is at least partly the mispredict. It should do the right thing even with a zero count, and it might schedule the code better. Code size _should_ be better with the byte mask model (which won't matter in the hot loop example, since it will all be cached, possibly even in the uop cache for really tight benchmark loops). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/siphash.h b/include/linux/siphash.h new file mode 100644 index 000000000000..6623b3090645 --- /dev/null +++ b/include/linux/siphash.h @@ -0,0 +1,20 @@ +/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com> + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + */ + +#ifndef _LINUX_SIPHASH_H +#define _LINUX_SIPHASH_H + +#include <linux/types.h> + +enum siphash_lengths { + SIPHASH24_KEY_LEN = 16 +}; + +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]); + +#endif /* _LINUX_SIPHASH_H */ diff --git a/lib/Makefile b/lib/Makefile index 50144a3aeebd..71d398b04a74 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ sha1.o chacha20.o md5.o irq_regs.o argv_split.o \ flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ - earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o + earlycpio.o seq_buf.o siphash.o \ + nmi_backtrace.o nodemask.o win_minmax.o lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_SMP) += cpumask.o @@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o obj-y += kstrtox.o obj-$(CONFIG_TEST_BPF) += test_bpf.o obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o -obj-$(CONFIG_TEST_HASH) += test_hash.o +obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o obj-$(CONFIG_TEST_KASAN) += test_kasan.o obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o obj-$(CONFIG_TEST_LKM) += test_module.o diff --git a/lib/siphash.c b/lib/siphash.c new file mode 100644 index 000000000000..e78dc36d19b9 --- /dev/null +++ b/lib/siphash.c @@ -0,0 +1,72 @@ +/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com> + * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> + * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to> + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + */ + +#include <linux/siphash.h> +#include <linux/kernel.h> + +static inline u64 le64_to_cpuvp(const void *p) +{ + return le64_to_cpup(p); +} + +#define SIPROUND \ + do { \ + v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \ + v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \ + v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \ + v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \ + } while(0) + +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]) +{ + u64 v0 = 0x736f6d6570736575ULL; + u64 v1 = 0x646f72616e646f6dULL; + u64 v2 = 0x6c7967656e657261ULL; + u64 v3 = 0x7465646279746573ULL; + u64 b = ((u64)len) << 56; + u64 k0 = le64_to_cpuvp(key); + u64 k1 = le64_to_cpuvp(key + sizeof(u64)); + u64 m; + const u8 *end = data + len - (len % sizeof(u64)); + const u8 left = len & (sizeof(u64) - 1); + v3 ^= k1; + v2 ^= k0; + v1 ^= k1; + v0 ^= k0; + for (; data != end; data += sizeof(u64)) { + m = le64_to_cpuvp(data); + v3 ^= m; + SIPROUND; + SIPROUND; + v0 ^= m; + } + switch (left) { + case 7: b |= ((u64)data[6]) << 48; + case 6: b |= ((u64)data[5]) << 40; + case 5: b |= ((u64)data[4]) << 32; + case 4: b |= ((u64)data[3]) << 24; + case 3: b |= ((u64)data[2]) << 16; + case 2: b |= ((u64)data[1]) << 8; + case 1: b |= ((u64)data[0]); break; + case 0: break; + } + v3 ^= b; + SIPROUND; + SIPROUND; + v0 ^= b; + v2 ^= 0xff; + SIPROUND; + SIPROUND; + SIPROUND; + SIPROUND; + b = (v0 ^ v1) ^ (v2 ^ v3); + return (__force u64)cpu_to_le64(b); +} +EXPORT_SYMBOL(siphash24); diff --git a/lib/test_siphash.c b/lib/test_siphash.c new file mode 100644 index 000000000000..45b5435540e9 --- /dev/null +++ b/lib/test_siphash.c @@ -0,0 +1,116 @@ +/* Test cases for siphash.c + * + * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com> + * + * This file is provided under a dual BSD/GPLv2 license. + * + * SipHash: a fast short-input PRF + * https://131002.net/siphash/ + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/siphash.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/errno.h> +#include <linux/module.h> + +static const u8 test_vectors[64][8] = { + { 0x31, 0x0e, 0x0e, 0xdd, 0x47, 0xdb, 0x6f, 0x72 }, + { 0xfd, 0x67, 0xdc, 0x93, 0xc5, 0x39, 0xf8, 0x74 }, + { 0x5a, 0x4f, 0xa9, 0xd9, 0x09, 0x80, 0x6c, 0x0d }, + { 0x2d, 0x7e, 0xfb, 0xd7, 0x96, 0x66, 0x67, 0x85 }, + { 0xb7, 0x87, 0x71, 0x27, 0xe0, 0x94, 0x27, 0xcf }, + { 0x8d, 0xa6, 0x99, 0xcd, 0x64, 0x55, 0x76, 0x18 }, + { 0xce, 0xe3, 0xfe, 0x58, 0x6e, 0x46, 0xc9, 0xcb }, + { 0x37, 0xd1, 0x01, 0x8b, 0xf5, 0x00, 0x02, 0xab }, + { 0x62, 0x24, 0x93, 0x9a, 0x79, 0xf5, 0xf5, 0x93 }, + { 0xb0, 0xe4, 0xa9, 0x0b, 0xdf, 0x82, 0x00, 0x9e }, + { 0xf3, 0xb9, 0xdd, 0x94, 0xc5, 0xbb, 0x5d, 0x7a }, + { 0xa7, 0xad, 0x6b, 0x22, 0x46, 0x2f, 0xb3, 0xf4 }, + { 0xfb, 0xe5, 0x0e, 0x86, 0xbc, 0x8f, 0x1e, 0x75 }, + { 0x90, 0x3d, 0x84, 0xc0, 0x27, 0x56, 0xea, 0x14 }, + { 0xee, 0xf2, 0x7a, 0x8e, 0x90, 0xca, 0x23, 0xf7 }, + { 0xe5, 0x45, 0xbe, 0x49, 0x61, 0xca, 0x29, 0xa1 }, + { 0xdb, 0x9b, 0xc2, 0x57, 0x7f, 0xcc, 0x2a, 0x3f }, + { 0x94, 0x47, 0xbe, 0x2c, 0xf5, 0xe9, 0x9a, 0x69 }, + { 0x9c, 0xd3, 0x8d, 0x96, 0xf0, 0xb3, 0xc1, 0x4b }, + { 0xbd, 0x61, 0x79, 0xa7, 0x1d, 0xc9, 0x6d, 0xbb }, + { 0x98, 0xee, 0xa2, 0x1a, 0xf2, 0x5c, 0xd6, 0xbe }, + { 0xc7, 0x67, 0x3b, 0x2e, 0xb0, 0xcb, 0xf2, 0xd0 }, + { 0x88, 0x3e, 0xa3, 0xe3, 0x95, 0x67, 0x53, 0x93 }, + { 0xc8, 0xce, 0x5c, 0xcd, 0x8c, 0x03, 0x0c, 0xa8 }, + { 0x94, 0xaf, 0x49, 0xf6, 0xc6, 0x50, 0xad, 0xb8 }, + { 0xea, 0xb8, 0x85, 0x8a, 0xde, 0x92, 0xe1, 0xbc }, + { 0xf3, 0x15, 0xbb, 0x5b, 0xb8, 0x35, 0xd8, 0x17 }, + { 0xad, 0xcf, 0x6b, 0x07, 0x63, 0x61, 0x2e, 0x2f }, + { 0xa5, 0xc9, 0x1d, 0xa7, 0xac, 0xaa, 0x4d, 0xde }, + { 0x71, 0x65, 0x95, 0x87, 0x66, 0x50, 0xa2, 0xa6 }, + { 0x28, 0xef, 0x49, 0x5c, 0x53, 0xa3, 0x87, 0xad }, + { 0x42, 0xc3, 0x41, 0xd8, 0xfa, 0x92, 0xd8, 0x32 }, + { 0xce, 0x7c, 0xf2, 0x72, 0x2f, 0x51, 0x27, 0x71 }, + { 0xe3, 0x78, 0x59, 0xf9, 0x46, 0x23, 0xf3, 0xa7 }, + { 0x38, 0x12, 0x05, 0xbb, 0x1a, 0xb0, 0xe0, 0x12 }, + { 0xae, 0x97, 0xa1, 0x0f, 0xd4, 0x34, 0xe0, 0x15 }, + { 0xb4, 0xa3, 0x15, 0x08, 0xbe, 0xff, 0x4d, 0x31 }, + { 0x81, 0x39, 0x62, 0x29, 0xf0, 0x90, 0x79, 0x02 }, + { 0x4d, 0x0c, 0xf4, 0x9e, 0xe5, 0xd4, 0xdc, 0xca }, + { 0x5c, 0x73, 0x33, 0x6a, 0x76, 0xd8, 0xbf, 0x9a }, + { 0xd0, 0xa7, 0x04, 0x53, 0x6b, 0xa9, 0x3e, 0x0e }, + { 0x92, 0x59, 0x58, 0xfc, 0xd6, 0x42, 0x0c, 0xad }, + { 0xa9, 0x15, 0xc2, 0x9b, 0xc8, 0x06, 0x73, 0x18 }, + { 0x95, 0x2b, 0x79, 0xf3, 0xbc, 0x0a, 0xa6, 0xd4 }, + { 0xf2, 0x1d, 0xf2, 0xe4, 0x1d, 0x45, 0x35, 0xf9 }, + { 0x87, 0x57, 0x75, 0x19, 0x04, 0x8f, 0x53, 0xa9 }, + { 0x10, 0xa5, 0x6c, 0xf5, 0xdf, 0xcd, 0x9a, 0xdb }, + { 0xeb, 0x75, 0x09, 0x5c, 0xcd, 0x98, 0x6c, 0xd0 }, + { 0x51, 0xa9, 0xcb, 0x9e, 0xcb, 0xa3, 0x12, 0xe6 }, + { 0x96, 0xaf, 0xad, 0xfc, 0x2c, 0xe6, 0x66, 0xc7 }, + { 0x72, 0xfe, 0x52, 0x97, 0x5a, 0x43, 0x64, 0xee }, + { 0x5a, 0x16, 0x45, 0xb2, 0x76, 0xd5, 0x92, 0xa1 }, + { 0xb2, 0x74, 0xcb, 0x8e, 0xbf, 0x87, 0x87, 0x0a }, + { 0x6f, 0x9b, 0xb4, 0x20, 0x3d, 0xe7, 0xb3, 0x81 }, + { 0xea, 0xec, 0xb2, 0xa3, 0x0b, 0x22, 0xa8, 0x7f }, + { 0x99, 0x24, 0xa4, 0x3c, 0xc1, 0x31, 0x57, 0x24 }, + { 0xbd, 0x83, 0x8d, 0x3a, 0xaf, 0xbf, 0x8d, 0xb7 }, + { 0x0b, 0x1a, 0x2a, 0x32, 0x65, 0xd5, 0x1a, 0xea }, + { 0x13, 0x50, 0x79, 0xa3, 0x23, 0x1c, 0xe6, 0x60 }, + { 0x93, 0x2b, 0x28, 0x46, 0xe4, 0xd7, 0x06, 0x66 }, + { 0xe1, 0x91, 0x5f, 0x5c, 0xb1, 0xec, 0xa4, 0x6c }, + { 0xf3, 0x25, 0x96, 0x5c, 0xa1, 0x6d, 0x62, 0x9f }, + { 0x57, 0x5f, 0xf2, 0x8e, 0x60, 0x38, 0x1b, 0xe5 }, + { 0x72, 0x45, 0x06, 0xeb, 0x4c, 0x32, 0x8a, 0x95 } +}; + +static int __init siphash_test_init(void) +{ + u8 in[64], k[16], i; + u64 out; + int ret = 0; + + for (i = 0; i < 16; ++i) + k[i] = i; + + for (i = 0; i < 64; ++i) { + in[i] = i; + out = siphash24(in, i, k); + if (memcmp(&out, test_vectors[i], 8)) { + pr_info("self-test %u: FAIL\n", i + 1); + ret = -EINVAL; + } + } + if (!ret) + pr_info("self-tests: pass\n"); + return ret; +} + +static void __exit siphash_test_exit(void) +{ +} + +module_init(siphash_test_init); +module_exit(siphash_test_exit); + +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>"); +MODULE_LICENSE("Dual BSD/GPL");
SipHash is a 64-bit keyed hash function that is actually a cryptographically secure PRF, like HMAC. Except SipHash is super fast, and is meant to be used as a hashtable keyed lookup function. SipHash isn't just some new trendy hash function. It's been around for a while, and there really isn't anything that comes remotely close to being useful in the way SipHash is. With that said, why do we need this? There are a variety of attacks known as "hashtable poisoning" in which an attacker forms some data such that the hash of that data will be the same, and then preceeds to fill up all entries of a hashbucket. This is a realistic and well-known denial-of-service vector. Linux developers already seem to be aware that this is an issue, and various places that use hash tables in, say, a network context, use a non-cryptographically secure function (usually jhash) and then try to twiddle with the key on a time basis (or in many cases just do nothing and hope that nobody notices). While this is an admirable attempt at solving the problem, it doesn't actually fix it. SipHash fixes it. (It fixes it in such a sound way that you could even build a stream cipher out of SipHash that would resist the modern cryptanalysis.) There are a modicum of places in the kernel that are vulnerable to hashtable poisoning attacks, either via userspace vectors or network vectors, and there's not a reliable mechanism inside the kernel at the moment to fix it. The first step toward fixing these issues is actually getting a secure primitive into the kernel for developers to use. Then we can, bit by bit, port things over to it as deemed appropriate. Dozens of languages are already using this internally for their hash tables. Some of the BSDs already use this in their kernels. SipHash is a widely known high-speed solution to a widely known problem, and it's time we catch-up. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Daniel J. Bernstein <djb@cr.yp.to> --- include/linux/siphash.h | 20 +++++++++ lib/Makefile | 5 ++- lib/siphash.c | 72 ++++++++++++++++++++++++++++++ lib/test_siphash.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 211 insertions(+), 2 deletions(-) create mode 100644 include/linux/siphash.h create mode 100644 lib/siphash.c create mode 100644 lib/test_siphash.c