diff mbox

[v2] siphash: add cryptographically secure hashtable function

Message ID 20161212034817.1773-1-Jason@zx2c4.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason A. Donenfeld Dec. 12, 2016, 3:48 a.m. UTC
SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function.

SipHash isn't just some new trendy hash function. It's been around for a
while, and there really isn't anything that comes remotely close to
being useful in the way SipHash is. With that said, why do we need this?

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector.

Linux developers already seem to be aware that this is an issue, and
various places that use hash tables in, say, a network context, use a
non-cryptographically secure function (usually jhash) and then try to
twiddle with the key on a time basis (or in many cases just do nothing
and hope that nobody notices). While this is an admirable attempt at
solving the problem, it doesn't actually fix it. SipHash fixes it.

(It fixes it in such a sound way that you could even build a stream
cipher out of SipHash that would resist the modern cryptanalysis.)

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

Dozens of languages are already using this internally for their hash
tables. Some of the BSDs already use this in their kernels. SipHash is
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Daniel J. Bernstein <djb@cr.yp.to>
---
 include/linux/siphash.h |  20 +++++++++
 lib/Makefile            |   5 ++-
 lib/siphash.c           |  72 ++++++++++++++++++++++++++++++
 lib/test_siphash.c      | 116 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 211 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

Comments

Linus Torvalds Dec. 12, 2016, 4:01 a.m. UTC | #1
On Sun, Dec 11, 2016 at 7:48 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> +       switch (left) {
> +               case 7: b |= ((u64)data[6]) << 48;
> +               case 6: b |= ((u64)data[5]) << 40;
> +               case 5: b |= ((u64)data[4]) << 32;
> +               case 4: b |= ((u64)data[3]) << 24;
> +               case 3: b |= ((u64)data[2]) << 16;
> +               case 2: b |= ((u64)data[1]) <<  8;
> +               case 1: b |= ((u64)data[0]); break;
> +               case 0: break;
> +       }

The above is extremely inefficient. Considering that most kernel data
would be expected to be smallish, that matters (ie the usual benchmark
would not be about hashing megabytes of data, but instead millions of
hashes of small data).

I think this could be rewritten (at least for 64-bit architectures) as

    #ifdef CONFIG_DCACHE_WORD_ACCESS

        if (left)
                b |= le64_to_cpu(load_unaligned_zeropad(data) &
bytemask_from_count(left));

    #else

        .. do the duff's device thing with the switch() ..

    #endif

which should give you basically perfect code generation (ie a single
64-bit load and a byte mask).

Totally untested, just looking at the code and trying to make sense of it.

... and obviously, it requires an actual high-performance use-case to
make any difference.

                  Linus
Eric Biggers Dec. 12, 2016, 5:42 a.m. UTC | #2
On Mon, Dec 12, 2016 at 04:48:17AM +0100, Jason A. Donenfeld wrote:
>
> diff --git a/lib/Makefile b/lib/Makefile
> index 50144a3aeebd..71d398b04a74 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
>  	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
>  	 flex_proportions.o ratelimit.o show_mem.o \
>  	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> -	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
> +	 earlycpio.o seq_buf.o siphash.o \
> +	 nmi_backtrace.o nodemask.o win_minmax.o
>  
>  lib-$(CONFIG_MMU) += ioremap.o
>  lib-$(CONFIG_SMP) += cpumask.o
> @@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
>  obj-y += kstrtox.o
>  obj-$(CONFIG_TEST_BPF) += test_bpf.o
>  obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
> -obj-$(CONFIG_TEST_HASH) += test_hash.o
> +obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o

Maybe add to the help text for CONFIG_TEST_HASH that it now tests siphash too?

> +static inline u64 le64_to_cpuvp(const void *p)
> +{
> +	return le64_to_cpup(p);
> +}

This assumes the key and message buffers are aligned to __alignof__(u64).
Unless that's going to be a clearly documented requirement for callers, you
should use get_unaligned_le64() instead.  And you can pass a 'u8 *' directly to
get_unaligned_le64(), no need for a helper function.

> +	b = (v0 ^ v1) ^ (v2 ^ v3);
> +	return (__force u64)cpu_to_le64(b);
> +}

It makes sense for this to return a u64, but that means the cpu_to_le64() is
wrong, since u64 indicates CPU endianness.  It should just return 'b'.

> +++ b/lib/test_siphash.c
> @@ -0,0 +1,116 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/siphash.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +
> +static const u8 test_vectors[64][8] = {
> +	{ 0x31, 0x0e, 0x0e, 0xdd, 0x47, 0xdb, 0x6f, 0x72 },

Can you mention in a comment where the test vectors came from?

> +		if (memcmp(&out, test_vectors[i], 8)) {
> +			pr_info("self-test %u: FAIL\n", i + 1);
> +			ret = -EINVAL;
> +		}

If you make the output really be CPU-endian like I'm suggesting then this will
need to be something like:

	if (out != get_unaligned_le64(test_vectors[i])) {

Or else make the test vectors be an array of u64.

- Eric
Jason A. Donenfeld Dec. 12, 2016, 5:48 a.m. UTC | #3
Hey Linus,

On Mon, Dec 12, 2016 at 5:01 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> The above is extremely inefficient. Considering that most kernel data
> would be expected to be smallish, that matters (ie the usual benchmark
> would not be about hashing megabytes of data, but instead millions of
> hashes of small data).
>
> I think this could be rewritten (at least for 64-bit architectures) as
>
>     #ifdef CONFIG_DCACHE_WORD_ACCESS
>
>         if (left)
>                 b |= le64_to_cpu(load_unaligned_zeropad(data) &
> bytemask_from_count(left));
>
>     #else
>
>         .. do the duff's device thing with the switch() ..
>
>     #endif
>
> which should give you basically perfect code generation (ie a single
> 64-bit load and a byte mask).

I modified the test to hash data of size 0 through 7 repeatedly
100000000 times, and benchmarked that a few times on a Skylake laptop.
The `load_unaligned_zeropad & bytemask_from_count` version was
consistently 7% slower.

I then modified it again to simply hash a 4 byte constant repeatedly
1000000000 times. The `load_unaligned_zeropad & bytemask_from_count`
version was around 6% faster. I tried again with a 7 byte constant and
got more or less a similar result.

Then I tried with a 1 byte constant, and found that the
`load_unaligned_zeropad & bytemask_from_count` version was slower.

So, it would seem that between the `if (left)` and the `switch
(left)`, there's the same number of branches. But for small values of
`left`, the duff's device just has simpler arithmetic, whereas for
large values of `left`, the `load_unaligned_zeropad` prevails. If
micro-optimization is really appealing, one could imagine a hybrid of
the two:

    switch (left) {
    case 7:
    case 6:
    case 5:
    case 4:
        b |= le64_to_cpu(load_unaligned_zeropad(data) &
bytemask_from_count(left));
        break;
    case 3: b |= ((u64)data[2]) << 16;
    case 2: b |= ((u64)data[1]) <<  8;
    case 1: b |= ((u64)data[0]); break;
    case 0: break;
    }

But I'm not sure this complication is worth it, and it might be more
likely that the left-over size is 4 bytes most of the time, so we
should just use your trick on platforms that support it.

Jason
Jason A. Donenfeld Dec. 12, 2016, 9:17 p.m. UTC | #4
Hey Eric,

Lots of good points; thanks for the review. Responses are inline below.

On Mon, Dec 12, 2016 at 6:42 AM, Eric Biggers <ebiggers3@gmail.com> wrote:
> Maybe add to the help text for CONFIG_TEST_HASH that it now tests siphash too?

Good call. Will do.

> This assumes the key and message buffers are aligned to __alignof__(u64).
> Unless that's going to be a clearly documented requirement for callers, you
> should use get_unaligned_le64() instead.  And you can pass a 'u8 *' directly to
> get_unaligned_le64(), no need for a helper function.

I had thought about that briefly, but just sort of figured most people
were passing in aligned variables... but that's a pretty bad
assumption to make especially for 64-bit alignment. I'll switch to
using the get_unaligned functions.

[As a side note, I wonder if crypto/chacha20_generic.c should be using
the unaligned functions instead too, at least for the iv reading...]

> It makes sense for this to return a u64, but that means the cpu_to_le64() is
> wrong, since u64 indicates CPU endianness.  It should just return 'b'.

At first I was very opposed to making this change, since by returning
a value with an explicit byte order, you can cast to u8 and have
uniform indexed byte access across platforms. But of course this
doesn't make any sense, since it's returning a u64, and it makes all
other bitwise operations non-uniform anyway.  I checked with JP
(co-creator of siphash, CC'd) and he confirmed your suspicion that it
was just to make the test vector comparison easier and for some
byte-wise uniformity, but that it's not strictly necessary. So, I've
removed that last cpu_to_le64, and I've also refactored those test
vectors to be written as ULL literals, so that a simple == integer
comparison will work across platforms.

> Can you mention in a comment where the test vectors came from?

Sure, will do.


> If you make the output really be CPU-endian like I'm suggesting then this will
> need to be something like:
>
>         if (out != get_unaligned_le64(test_vectors[i])) {
>
> Or else make the test vectors be an array of u64.

Yep, I wound up doing that.

Thanks Eric! Will submit a v3 soon if nobody else has comments.

Jason
Linus Torvalds Dec. 12, 2016, 9:37 p.m. UTC | #5
On Sun, Dec 11, 2016 at 9:48 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> I modified the test to hash data of size 0 through 7 repeatedly
> 100000000 times, and benchmarked that a few times on a Skylake laptop.
> The `load_unaligned_zeropad & bytemask_from_count` version was
> consistently 7% slower.
>
> I then modified it again to simply hash a 4 byte constant repeatedly
> 1000000000 times. The `load_unaligned_zeropad & bytemask_from_count`
> version was around 6% faster. I tried again with a 7 byte constant and
> got more or less a similar result.
>
> Then I tried with a 1 byte constant, and found that the
> `load_unaligned_zeropad & bytemask_from_count` version was slower.
>
> So, it would seem that between the `if (left)` and the `switch
> (left)`, there's the same number of branches.

Interesting.

For the dcache code (which is where that trick comes from), we used to
have a loop (rather than the duff's device thing), and it performed
badly due to the consistently badly predicted branch of the loop. But
I never compared it against the duff's device version.

I guess you could try to just remove the "if (left)" test entirely, if
it is at least partly the mispredict. It should do the right thing
even with a zero count, and it might schedule the code better. Code
size _should_ be better with the byte mask model (which won't matter
in the hot loop example, since it will all be cached, possibly even in
the uop cache for really tight benchmark loops).

             Linus
diff mbox

Patch

diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..6623b3090645
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,20 @@ 
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+
+enum siphash_lengths {
+	SIPHASH24_KEY_LEN = 16
+};
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..71d398b04a74 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@  lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+	 earlycpio.o seq_buf.o siphash.o \
+	 nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
@@ -44,7 +45,7 @@  obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
 obj-y += kstrtox.o
 obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_KASAN) += test_kasan.o
 obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
 obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..e78dc36d19b9
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,72 @@ 
+/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
+ * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+
+static inline u64 le64_to_cpuvp(const void *p)
+{
+	return le64_to_cpup(p);
+}
+
+#define SIPROUND \
+	do { \
+	v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
+	v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
+	v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
+	v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
+	} while(0)
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+	u64 v0 = 0x736f6d6570736575ULL;
+	u64 v1 = 0x646f72616e646f6dULL;
+	u64 v2 = 0x6c7967656e657261ULL;
+	u64 v3 = 0x7465646279746573ULL;
+	u64 b = ((u64)len) << 56;
+	u64 k0 = le64_to_cpuvp(key);
+	u64 k1 = le64_to_cpuvp(key + sizeof(u64));
+	u64 m;
+	const u8 *end = data + len - (len % sizeof(u64));
+	const u8 left = len & (sizeof(u64) - 1);
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(u64)) {
+		m = le64_to_cpuvp(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+	switch (left) {
+		case 7: b |= ((u64)data[6]) << 48;
+		case 6: b |= ((u64)data[5]) << 40;
+		case 5: b |= ((u64)data[4]) << 32;
+		case 4: b |= ((u64)data[3]) << 24;
+		case 3: b |= ((u64)data[2]) << 16;
+		case 2: b |= ((u64)data[1]) <<  8;
+		case 1: b |= ((u64)data[0]); break;
+		case 0: break;
+	}
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	b = (v0 ^ v1) ^ (v2 ^ v3);
+	return (__force u64)cpu_to_le64(b);
+}
+EXPORT_SYMBOL(siphash24);
diff --git a/lib/test_siphash.c b/lib/test_siphash.c
new file mode 100644
index 000000000000..45b5435540e9
--- /dev/null
+++ b/lib/test_siphash.c
@@ -0,0 +1,116 @@ 
+/* Test cases for siphash.c
+ *
+ * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+
+static const u8 test_vectors[64][8] = {
+	{ 0x31, 0x0e, 0x0e, 0xdd, 0x47, 0xdb, 0x6f, 0x72 },
+	{ 0xfd, 0x67, 0xdc, 0x93, 0xc5, 0x39, 0xf8, 0x74 },
+	{ 0x5a, 0x4f, 0xa9, 0xd9, 0x09, 0x80, 0x6c, 0x0d },
+	{ 0x2d, 0x7e, 0xfb, 0xd7, 0x96, 0x66, 0x67, 0x85 },
+	{ 0xb7, 0x87, 0x71, 0x27, 0xe0, 0x94, 0x27, 0xcf },
+	{ 0x8d, 0xa6, 0x99, 0xcd, 0x64, 0x55, 0x76, 0x18 },
+	{ 0xce, 0xe3, 0xfe, 0x58, 0x6e, 0x46, 0xc9, 0xcb },
+	{ 0x37, 0xd1, 0x01, 0x8b, 0xf5, 0x00, 0x02, 0xab },
+	{ 0x62, 0x24, 0x93, 0x9a, 0x79, 0xf5, 0xf5, 0x93 },
+	{ 0xb0, 0xe4, 0xa9, 0x0b, 0xdf, 0x82, 0x00, 0x9e },
+	{ 0xf3, 0xb9, 0xdd, 0x94, 0xc5, 0xbb, 0x5d, 0x7a },
+	{ 0xa7, 0xad, 0x6b, 0x22, 0x46, 0x2f, 0xb3, 0xf4 },
+	{ 0xfb, 0xe5, 0x0e, 0x86, 0xbc, 0x8f, 0x1e, 0x75 },
+	{ 0x90, 0x3d, 0x84, 0xc0, 0x27, 0x56, 0xea, 0x14 },
+	{ 0xee, 0xf2, 0x7a, 0x8e, 0x90, 0xca, 0x23, 0xf7 },
+	{ 0xe5, 0x45, 0xbe, 0x49, 0x61, 0xca, 0x29, 0xa1 },
+	{ 0xdb, 0x9b, 0xc2, 0x57, 0x7f, 0xcc, 0x2a, 0x3f },
+	{ 0x94, 0x47, 0xbe, 0x2c, 0xf5, 0xe9, 0x9a, 0x69 },
+	{ 0x9c, 0xd3, 0x8d, 0x96, 0xf0, 0xb3, 0xc1, 0x4b },
+	{ 0xbd, 0x61, 0x79, 0xa7, 0x1d, 0xc9, 0x6d, 0xbb },
+	{ 0x98, 0xee, 0xa2, 0x1a, 0xf2, 0x5c, 0xd6, 0xbe },
+	{ 0xc7, 0x67, 0x3b, 0x2e, 0xb0, 0xcb, 0xf2, 0xd0 },
+	{ 0x88, 0x3e, 0xa3, 0xe3, 0x95, 0x67, 0x53, 0x93 },
+	{ 0xc8, 0xce, 0x5c, 0xcd, 0x8c, 0x03, 0x0c, 0xa8 },
+	{ 0x94, 0xaf, 0x49, 0xf6, 0xc6, 0x50, 0xad, 0xb8 },
+	{ 0xea, 0xb8, 0x85, 0x8a, 0xde, 0x92, 0xe1, 0xbc },
+	{ 0xf3, 0x15, 0xbb, 0x5b, 0xb8, 0x35, 0xd8, 0x17 },
+	{ 0xad, 0xcf, 0x6b, 0x07, 0x63, 0x61, 0x2e, 0x2f },
+	{ 0xa5, 0xc9, 0x1d, 0xa7, 0xac, 0xaa, 0x4d, 0xde },
+	{ 0x71, 0x65, 0x95, 0x87, 0x66, 0x50, 0xa2, 0xa6 },
+	{ 0x28, 0xef, 0x49, 0x5c, 0x53, 0xa3, 0x87, 0xad },
+	{ 0x42, 0xc3, 0x41, 0xd8, 0xfa, 0x92, 0xd8, 0x32 },
+	{ 0xce, 0x7c, 0xf2, 0x72, 0x2f, 0x51, 0x27, 0x71 },
+	{ 0xe3, 0x78, 0x59, 0xf9, 0x46, 0x23, 0xf3, 0xa7 },
+	{ 0x38, 0x12, 0x05, 0xbb, 0x1a, 0xb0, 0xe0, 0x12 },
+	{ 0xae, 0x97, 0xa1, 0x0f, 0xd4, 0x34, 0xe0, 0x15 },
+	{ 0xb4, 0xa3, 0x15, 0x08, 0xbe, 0xff, 0x4d, 0x31 },
+	{ 0x81, 0x39, 0x62, 0x29, 0xf0, 0x90, 0x79, 0x02 },
+	{ 0x4d, 0x0c, 0xf4, 0x9e, 0xe5, 0xd4, 0xdc, 0xca },
+	{ 0x5c, 0x73, 0x33, 0x6a, 0x76, 0xd8, 0xbf, 0x9a },
+	{ 0xd0, 0xa7, 0x04, 0x53, 0x6b, 0xa9, 0x3e, 0x0e },
+	{ 0x92, 0x59, 0x58, 0xfc, 0xd6, 0x42, 0x0c, 0xad },
+	{ 0xa9, 0x15, 0xc2, 0x9b, 0xc8, 0x06, 0x73, 0x18 },
+	{ 0x95, 0x2b, 0x79, 0xf3, 0xbc, 0x0a, 0xa6, 0xd4 },
+	{ 0xf2, 0x1d, 0xf2, 0xe4, 0x1d, 0x45, 0x35, 0xf9 },
+	{ 0x87, 0x57, 0x75, 0x19, 0x04, 0x8f, 0x53, 0xa9 },
+	{ 0x10, 0xa5, 0x6c, 0xf5, 0xdf, 0xcd, 0x9a, 0xdb },
+	{ 0xeb, 0x75, 0x09, 0x5c, 0xcd, 0x98, 0x6c, 0xd0 },
+	{ 0x51, 0xa9, 0xcb, 0x9e, 0xcb, 0xa3, 0x12, 0xe6 },
+	{ 0x96, 0xaf, 0xad, 0xfc, 0x2c, 0xe6, 0x66, 0xc7 },
+	{ 0x72, 0xfe, 0x52, 0x97, 0x5a, 0x43, 0x64, 0xee },
+	{ 0x5a, 0x16, 0x45, 0xb2, 0x76, 0xd5, 0x92, 0xa1 },
+	{ 0xb2, 0x74, 0xcb, 0x8e, 0xbf, 0x87, 0x87, 0x0a },
+	{ 0x6f, 0x9b, 0xb4, 0x20, 0x3d, 0xe7, 0xb3, 0x81 },
+	{ 0xea, 0xec, 0xb2, 0xa3, 0x0b, 0x22, 0xa8, 0x7f },
+	{ 0x99, 0x24, 0xa4, 0x3c, 0xc1, 0x31, 0x57, 0x24 },
+	{ 0xbd, 0x83, 0x8d, 0x3a, 0xaf, 0xbf, 0x8d, 0xb7 },
+	{ 0x0b, 0x1a, 0x2a, 0x32, 0x65, 0xd5, 0x1a, 0xea },
+	{ 0x13, 0x50, 0x79, 0xa3, 0x23, 0x1c, 0xe6, 0x60 },
+	{ 0x93, 0x2b, 0x28, 0x46, 0xe4, 0xd7, 0x06, 0x66 },
+	{ 0xe1, 0x91, 0x5f, 0x5c, 0xb1, 0xec, 0xa4, 0x6c },
+	{ 0xf3, 0x25, 0x96, 0x5c, 0xa1, 0x6d, 0x62, 0x9f },
+	{ 0x57, 0x5f, 0xf2, 0x8e, 0x60, 0x38, 0x1b, 0xe5 },
+	{ 0x72, 0x45, 0x06, 0xeb, 0x4c, 0x32, 0x8a, 0x95 }
+};
+
+static int __init siphash_test_init(void)
+{
+	u8 in[64], k[16], i;
+	u64 out;
+	int ret = 0;
+
+	for (i = 0; i < 16; ++i)
+		k[i] = i;
+
+	for (i = 0; i < 64; ++i) {
+		in[i] = i;
+		out = siphash24(in, i, k);
+		if (memcmp(&out, test_vectors[i], 8)) {
+			pr_info("self-test %u: FAIL\n", i + 1);
+			ret = -EINVAL;
+		}
+	}
+	if (!ret)
+		pr_info("self-tests: pass\n");
+	return ret;
+}
+
+static void __exit siphash_test_exit(void)
+{
+}
+
+module_init(siphash_test_init);
+module_exit(siphash_test_exit);
+
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_LICENSE("Dual BSD/GPL");