diff mbox series

[v4,02/39] dm vdo: add the MurmurHash3 fast hashing algorithm

Message ID 20231026214136.1067410-3-msakai@redhat.com (mailing list archive)
State New, archived
Headers show
Series dm vdo: add the dm-vdo deduplication and compression DM target | expand

Commit Message

Matthew Sakai Oct. 26, 2023, 9:40 p.m. UTC
MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
written by Austin Appleby and placed in the public domain. This version has
been modified to produce the same result on both big endian and little
endian processors, making it suitable for use in portable persistent data.

Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
Co-developed-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Co-developed-by: John Wiele <jwiele@redhat.com>
Signed-off-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
---
 drivers/md/dm-vdo/murmurhash3.c | 175 ++++++++++++++++++++++++++++++++
 drivers/md/dm-vdo/murmurhash3.h |  15 +++
 2 files changed, 190 insertions(+)
 create mode 100644 drivers/md/dm-vdo/murmurhash3.c
 create mode 100644 drivers/md/dm-vdo/murmurhash3.h

Comments

Guenter Roeck March 14, 2024, 11:35 p.m. UTC | #1
On Thu, Oct 26, 2023 at 05:40:59PM -0400, Matthew Sakai wrote:
> MurmurHash3 is a fast, non-cryptographic, 128-bit hash. It was originally
> written by Austin Appleby and placed in the public domain. This version has
> been modified to produce the same result on both big endian and little
> endian processors, making it suitable for use in portable persistent data.
> 
> Co-developed-by: J. corwin Coburn <corwin@hurlbutnet.net>
> Signed-off-by: J. corwin Coburn <corwin@hurlbutnet.net>
> Co-developed-by: Ken Raeburn <raeburn@redhat.com>
> Signed-off-by: Ken Raeburn <raeburn@redhat.com>
> Co-developed-by: John Wiele <jwiele@redhat.com>
> Signed-off-by: John Wiele <jwiele@redhat.com>
> Signed-off-by: Matthew Sakai <msakai@redhat.com>
> ---
[ ... ]
> +
> +#define ROTL64(x, y) rotl64(x, y)
> +static __always_inline u64 getblock64(const u64 *p, int i)
> +{
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +	return p[i];
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +	return __builtin_bswap64(p[i]);
> +#else
> +#error "can't figure out byte order"
> +#endif
> +}
> +
> +static __always_inline void putblock64(u64 *p, int i, u64 value)
> +{
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +	p[i] = value;
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +	p[i] = __builtin_bswap64(value);
> +#else
> +#error "can't figure out byte order"
> +#endif
> +}

On sparc64, with gcc 11.4, the above code results in:

ERROR: modpost: "__bswapdi2" [drivers/md/dm-vdo/dm-vdo.ko] undefined!

Guenter
Kenneth Raeburn March 18, 2024, 8:37 p.m. UTC | #2
(resend because of accidental HTML lossage)

On Thu, Mar 14, 2024 at 7:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
> On sparc64, with gcc 11.4, the above code results in:
>
> ERROR: modpost: "__bswapdi2" [drivers/md/dm-vdo/dm-vdo.ko] undefined!
>
> Guenter

Thanks for catching that. I don't think our team has any sparc
machines readily available for testing.
This is an artifact of our having imported user-mode code to use in
the kernel. We should probably be using le64_to_cpup and friends, as
we do elsewhere, so it doesn't try to pull in libgcc support routines.

Ken
Guenter Roeck March 18, 2024, 8:54 p.m. UTC | #3
On 3/18/24 13:37, Kenneth Raeburn wrote:
> (resend because of accidental HTML lossage)
> 
> On Thu, Mar 14, 2024 at 7:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>> On sparc64, with gcc 11.4, the above code results in:
>>
>> ERROR: modpost: "__bswapdi2" [drivers/md/dm-vdo/dm-vdo.ko] undefined!
>>
>> Guenter
> 
> Thanks for catching that. I don't think our team has any sparc
> machines readily available for testing.
> This is an artifact of our having imported user-mode code to use in
> the kernel. We should probably be using le64_to_cpup and friends, as
> we do elsewhere, so it doesn't try to pull in libgcc support routines.
> 

I am kind of getting wary about reporting such issues. Should I drop
building dm-vdo images for sparc ? Would it be possible to add
"depends on BROKEN if SPARC" configuration option to indicate that
the code isn't expected to be buildable on sparc, much less work ?

Thanks,
Guenter
Matthew Sakai March 19, 2024, 1:14 a.m. UTC | #4
On 3/18/24 16:54, Guenter Roeck wrote:
> On 3/18/24 13:37, Kenneth Raeburn wrote:
>> (resend because of accidental HTML lossage)
>>
>> On Thu, Mar 14, 2024 at 7:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>> On sparc64, with gcc 11.4, the above code results in:
>>>
>>> ERROR: modpost: "__bswapdi2" [drivers/md/dm-vdo/dm-vdo.ko] undefined!
>>>
>>> Guenter
>>
>> Thanks for catching that. I don't think our team has any sparc
>> machines readily available for testing.
>> This is an artifact of our having imported user-mode code to use in
>> the kernel. We should probably be using le64_to_cpup and friends, as
>> we do elsewhere, so it doesn't try to pull in libgcc support routines.
>>
> 
> I am kind of getting wary about reporting such issues. Should I drop
> building dm-vdo images for sparc ? Would it be possible to add
> "depends on BROKEN if SPARC" configuration option to indicate that
> the code isn't expected to be buildable on sparc, much less work ?

First off, sorry for the slow response. I was away for the last couple 
of days.

I would encourage you to report issues when you find them. 
Theoretically, VDO should work on sparc. However since we don't have 
that architecture readily available to us, we won't know what to fix 
unless it gets reported.

In this case, we really shouldn't be relying on these libgcc builtins 
anyway. I hope to have a patch in a couple of days that you can try out. 
If it turns out there are more extensive issues with VDO on sparc, I am 
open to disabling the config, but I hope it won't come to that.

Matt
Matthew Sakai March 20, 2024, 9:44 p.m. UTC | #5
On 3/18/24 16:54, Guenter Roeck wrote:
> On 3/18/24 13:37, Kenneth Raeburn wrote:
>> (resend because of accidental HTML lossage)
>>
>> On Thu, Mar 14, 2024 at 7:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>> On sparc64, with gcc 11.4, the above code results in:
>>>
>>> ERROR: modpost: "__bswapdi2" [drivers/md/dm-vdo/dm-vdo.ko] undefined!
>>>
>>> Guenter
>>
>> Thanks for catching that. I don't think our team has any sparc
>> machines readily available for testing.
>> This is an artifact of our having imported user-mode code to use in
>> the kernel. We should probably be using le64_to_cpup and friends, as
>> we do elsewhere, so it doesn't try to pull in libgcc support routines.
>>
> 
> I am kind of getting wary about reporting such issues. Should I drop
> building dm-vdo images for sparc ? Would it be possible to add
> "depends on BROKEN if SPARC" configuration option to indicate that
> the code isn't expected to be buildable on sparc, much less work ?
> 
> Thanks,
> Guenter
> 

Could you try out the patch I just sent to see if it fixes your build 
problem?

Matt
Guenter Roeck March 20, 2024, 10:59 p.m. UTC | #6
On 3/20/24 14:44, Matthew Sakai wrote:
> 
> On 3/18/24 16:54, Guenter Roeck wrote:
>> On 3/18/24 13:37, Kenneth Raeburn wrote:
>>> (resend because of accidental HTML lossage)
>>>
>>> On Thu, Mar 14, 2024 at 7:38 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>> On sparc64, with gcc 11.4, the above code results in:
>>>>
>>>> ERROR: modpost: "__bswapdi2" [drivers/md/dm-vdo/dm-vdo.ko] undefined!
>>>>
>>>> Guenter
>>>
>>> Thanks for catching that. I don't think our team has any sparc
>>> machines readily available for testing.
>>> This is an artifact of our having imported user-mode code to use in
>>> the kernel. We should probably be using le64_to_cpup and friends, as
>>> we do elsewhere, so it doesn't try to pull in libgcc support routines.
>>>
>>
>> I am kind of getting wary about reporting such issues. Should I drop
>> building dm-vdo images for sparc ? Would it be possible to add
>> "depends on BROKEN if SPARC" configuration option to indicate that
>> the code isn't expected to be buildable on sparc, much less work ?
>>
>> Thanks,
>> Guenter
>>
> 
> Could you try out the patch I just sent to see if it fixes your build problem?
> 
Sure, will do.

Guenter
diff mbox series

Patch

diff --git a/drivers/md/dm-vdo/murmurhash3.c b/drivers/md/dm-vdo/murmurhash3.c
new file mode 100644
index 000000000000..c68d0d153904
--- /dev/null
+++ b/drivers/md/dm-vdo/murmurhash3.c
@@ -0,0 +1,175 @@ 
+// SPDX-License-Identifier: LGPL-2.1+
+/*
+ * MurmurHash3 was written by Austin Appleby, and is placed in the public
+ * domain. The author hereby disclaims copyright to this source code.
+ *
+ * Adapted by John Wiele (jwiele@redhat.com).
+ */
+
+#include "murmurhash3.h"
+
+static inline u64 rotl64(u64 x, s8 r)
+{
+	return (x << r) | (x >> (64 - r));
+}
+
+#define ROTL64(x, y) rotl64(x, y)
+static __always_inline u64 getblock64(const u64 *p, int i)
+{
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+	return p[i];
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	return __builtin_bswap64(p[i]);
+#else
+#error "can't figure out byte order"
+#endif
+}
+
+static __always_inline void putblock64(u64 *p, int i, u64 value)
+{
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+	p[i] = value;
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	p[i] = __builtin_bswap64(value);
+#else
+#error "can't figure out byte order"
+#endif
+}
+
+/* Finalization mix - force all bits of a hash block to avalanche */
+
+static __always_inline u64 fmix64(u64 k)
+{
+	k ^= k >> 33;
+	k *= 0xff51afd7ed558ccdLLU;
+	k ^= k >> 33;
+	k *= 0xc4ceb9fe1a85ec53LLU;
+	k ^= k >> 33;
+
+	return k;
+}
+
+void murmurhash3_128(const void *key, const int len, const u32 seed, void *out)
+{
+	const u8 *data = (const u8 *)key;
+	const int nblocks = len / 16;
+
+	u64 h1 = seed;
+	u64 h2 = seed;
+
+	const u64 c1 = 0x87c37b91114253d5LLU;
+	const u64 c2 = 0x4cf5ad432745937fLLU;
+
+	/* body */
+
+	const u64 *blocks = (const u64 *)(data);
+
+	int i;
+
+	for (i = 0; i < nblocks; i++) {
+		u64 k1 = getblock64(blocks, i * 2 + 0);
+		u64 k2 = getblock64(blocks, i * 2 + 1);
+
+		k1 *= c1;
+		k1 = ROTL64(k1, 31);
+		k1 *= c2;
+		h1 ^= k1;
+
+		h1 = ROTL64(h1, 27);
+		h1 += h2;
+		h1 = h1 * 5 + 0x52dce729;
+
+		k2 *= c2;
+		k2 = ROTL64(k2, 33);
+		k2 *= c1;
+		h2 ^= k2;
+
+		h2 = ROTL64(h2, 31);
+		h2 += h1;
+		h2 = h2 * 5 + 0x38495ab5;
+	}
+
+	/* tail */
+
+	{
+		const u8 *tail = (const u8 *)(data + nblocks * 16);
+
+		u64 k1 = 0;
+		u64 k2 = 0;
+
+		switch (len & 15) {
+		case 15:
+			k2 ^= ((u64)tail[14]) << 48;
+			fallthrough;
+		case 14:
+			k2 ^= ((u64)tail[13]) << 40;
+			fallthrough;
+		case 13:
+			k2 ^= ((u64)tail[12]) << 32;
+			fallthrough;
+		case 12:
+			k2 ^= ((u64)tail[11]) << 24;
+			fallthrough;
+		case 11:
+			k2 ^= ((u64)tail[10]) << 16;
+			fallthrough;
+		case 10:
+			k2 ^= ((u64)tail[9]) << 8;
+			fallthrough;
+		case 9:
+			k2 ^= ((u64)tail[8]) << 0;
+			k2 *= c2;
+			k2 = ROTL64(k2, 33);
+			k2 *= c1;
+			h2 ^= k2;
+			fallthrough;
+
+		case 8:
+			k1 ^= ((u64)tail[7]) << 56;
+			fallthrough;
+		case 7:
+			k1 ^= ((u64)tail[6]) << 48;
+			fallthrough;
+		case 6:
+			k1 ^= ((u64)tail[5]) << 40;
+			fallthrough;
+		case 5:
+			k1 ^= ((u64)tail[4]) << 32;
+			fallthrough;
+		case 4:
+			k1 ^= ((u64)tail[3]) << 24;
+			fallthrough;
+		case 3:
+			k1 ^= ((u64)tail[2]) << 16;
+			fallthrough;
+		case 2:
+			k1 ^= ((u64)tail[1]) << 8;
+			fallthrough;
+		case 1:
+			k1 ^= ((u64)tail[0]) << 0;
+			k1 *= c1;
+			k1 = ROTL64(k1, 31);
+			k1 *= c2;
+			h1 ^= k1;
+			break;
+		default:
+			break;
+		};
+	}
+	/* finalization */
+
+	h1 ^= len;
+	h2 ^= len;
+
+	h1 += h2;
+	h2 += h1;
+
+	h1 = fmix64(h1);
+	h2 = fmix64(h2);
+
+	h1 += h2;
+	h2 += h1;
+
+	putblock64((u64 *)out, 0, h1);
+	putblock64((u64 *)out, 1, h2);
+}
diff --git a/drivers/md/dm-vdo/murmurhash3.h b/drivers/md/dm-vdo/murmurhash3.h
new file mode 100644
index 000000000000..d84711ddb659
--- /dev/null
+++ b/drivers/md/dm-vdo/murmurhash3.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: LGPL-2.1+ */
+/*
+ * MurmurHash3 was written by Austin Appleby, and is placed in the public
+ * domain. The author hereby disclaims copyright to this source code.
+ */
+
+#ifndef _MURMURHASH3_H_
+#define _MURMURHASH3_H_
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+void murmurhash3_128(const void *key, int len, u32 seed, void *out);
+
+#endif /* _MURMURHASH3_H_ */