diff mbox series

dwarf_loader: use a better hashing function

Message ID 20210210232327.1965876-1-morbo@google.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series dwarf_loader: use a better hashing function | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Bill Wendling Feb. 10, 2021, 11:23 p.m. UTC
This hashing function[1] produces better hash table bucket
distributions. The original hashing function always produced zeros in
the three least significant bits.

The new hashing funciton gives a modest performance boost.

      Original      New
       0:11.41       0:11.38
       0:11.36       0:11.34
       0:11.35       0:11.26
      -----------------------
  Avg: 0:11.373      0:11.327

for a performance improvement of 0.4%.

[1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes

Signed-off-by: Bill Wendling <morbo@google.com>
---
 hash.h | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

Comments

Andrii Nakryiko Feb. 10, 2021, 11:59 p.m. UTC | #1
On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <morbo@google.com> wrote:
>
> This hashing function[1] produces better hash table bucket
> distributions. The original hashing function always produced zeros in
> the three least significant bits.
>
> The new hashing funciton gives a modest performance boost.
>
>       Original      New
>        0:11.41       0:11.38
>        0:11.36       0:11.34
>        0:11.35       0:11.26
>       -----------------------
>   Avg: 0:11.373      0:11.327
>
> for a performance improvement of 0.4%.
>
> [1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes
>

Can you please also test with the one libbpf uses internally:

return (val * 11400714819323198485llu) >> (64 - bits);

?

Thanks!

> Signed-off-by: Bill Wendling <morbo@google.com>
> ---
>  hash.h | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/hash.h b/hash.h
> index d3aa416..ea201ab 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -33,22 +33,17 @@
>
>  static inline uint64_t hash_64(const uint64_t val, const unsigned int bits)
>  {
> -       uint64_t hash = val;
> +       uint64_t hash = val * 0x369DEA0F31A53F85UL + 0x255992D382208B61UL;
>
> -       /*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
> -       uint64_t n = hash;
> -       n <<= 18;
> -       hash -= n;
> -       n <<= 33;
> -       hash -= n;
> -       n <<= 3;
> -       hash += n;
> -       n <<= 3;
> -       hash -= n;
> -       n <<= 4;
> -       hash += n;
> -       n <<= 2;
> -       hash += n;
> +       hash ^= hash >> 21;
> +       hash ^= hash << 37;
> +       hash ^= hash >>  4;
> +
> +       hash *= 0x422E19E1D95D2F0DUL;
> +
> +       hash ^= hash << 20;
> +       hash ^= hash >> 41;
> +       hash ^= hash <<  5;
>
>         /* High bits are more random, so use them. */
>         return hash >> (64 - bits);
> --
> 2.30.0.478.g8a0d178c01-goog
>
Bill Wendling Feb. 11, 2021, 1:24 a.m. UTC | #2
On Wed, Feb 10, 2021 at 4:00 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <morbo@google.com> wrote:
> >
> > This hashing function[1] produces better hash table bucket
> > distributions. The original hashing function always produced zeros in
> > the three least significant bits.
> >
> > The new hashing funciton gives a modest performance boost.
> >
> >       Original      New
> >        0:11.41       0:11.38
> >        0:11.36       0:11.34
> >        0:11.35       0:11.26
> >       -----------------------
> >   Avg: 0:11.373      0:11.327
> >
> > for a performance improvement of 0.4%.
> >
> > [1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes
> >
>
> Can you please also test with the one libbpf uses internally:
>
> return (val * 11400714819323198485llu) >> (64 - bits);
>
> ?
>
> Thanks!
>
It's giving me a running time of ~11.11s, which is even better. Would
you like me to submit a patch?

-bw
Andrii Nakryiko Feb. 11, 2021, 1:31 a.m. UTC | #3
On Wed, Feb 10, 2021 at 5:24 PM Bill Wendling <morbo@google.com> wrote:
>
> On Wed, Feb 10, 2021 at 4:00 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <morbo@google.com> wrote:
> > >
> > > This hashing function[1] produces better hash table bucket
> > > distributions. The original hashing function always produced zeros in
> > > the three least significant bits.
> > >
> > > The new hashing funciton gives a modest performance boost.
> > >
> > >       Original      New
> > >        0:11.41       0:11.38
> > >        0:11.36       0:11.34
> > >        0:11.35       0:11.26
> > >       -----------------------
> > >   Avg: 0:11.373      0:11.327
> > >
> > > for a performance improvement of 0.4%.
> > >
> > > [1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes
> > >
> >
> > Can you please also test with the one libbpf uses internally:
> >
> > return (val * 11400714819323198485llu) >> (64 - bits);
> >
> > ?
> >
> > Thanks!
> >
> It's giving me a running time of ~11.11s, which is even better. Would
> you like me to submit a patch?

faster is better, so yeah, why not? :)

>
> -bw
Arnaldo Carvalho de Melo Feb. 11, 2021, 1:01 p.m. UTC | #4
Em Wed, Feb 10, 2021 at 05:31:48PM -0800, Andrii Nakryiko escreveu:
> On Wed, Feb 10, 2021 at 5:24 PM Bill Wendling <morbo@google.com> wrote:
> > On Wed, Feb 10, 2021 at 4:00 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <morbo@google.com> wrote:
> > > > This hashing function[1] produces better hash table bucket
> > > > distributions. The original hashing function always produced zeros in
> > > > the three least significant bits.

> > > > The new hashing funciton gives a modest performance boost.

> > > >       Original      New
> > > >        0:11.41       0:11.38
> > > >        0:11.36       0:11.34
> > > >        0:11.35       0:11.26
> > > >       -----------------------
> > > >   Avg: 0:11.373      0:11.327

> > > > for a performance improvement of 0.4%.

> > > > [1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes

> > > Can you please also test with the one libbpf uses internally:

> > > return (val * 11400714819323198485llu) >> (64 - bits);

> > > ?

> > It's giving me a running time of ~11.11s, which is even better. Would
> > you like me to submit a patch?

> faster is better, so yeah, why not? :)

Yeah, I agree, faster is better, please make it so :-)

- Arnaldo
Bill Wendling Feb. 12, 2021, 6:55 a.m. UTC | #5
On Thu, Feb 11, 2021 at 5:01 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Wed, Feb 10, 2021 at 05:31:48PM -0800, Andrii Nakryiko escreveu:
> > On Wed, Feb 10, 2021 at 5:24 PM Bill Wendling <morbo@google.com> wrote:
> > > On Wed, Feb 10, 2021 at 4:00 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > > On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <morbo@google.com> wrote:
> > > > > This hashing function[1] produces better hash table bucket
> > > > > distributions. The original hashing function always produced zeros in
> > > > > the three least significant bits.
>
> > > > > The new hashing funciton gives a modest performance boost.
>
> > > > >       Original      New
> > > > >        0:11.41       0:11.38
> > > > >        0:11.36       0:11.34
> > > > >        0:11.35       0:11.26
> > > > >       -----------------------
> > > > >   Avg: 0:11.373      0:11.327
>
> > > > > for a performance improvement of 0.4%.
>
> > > > > [1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes
>
> > > > Can you please also test with the one libbpf uses internally:
>
> > > > return (val * 11400714819323198485llu) >> (64 - bits);
>
> > > > ?
>
> > > It's giving me a running time of ~11.11s, which is even better. Would
> > > you like me to submit a patch?
>
> > faster is better, so yeah, why not? :)
>
> Yeah, I agree, faster is better, please make it so :-)
>
Your wish is my command! :-) Done.

-bw
Arnaldo Carvalho de Melo Feb. 12, 2021, 12:35 p.m. UTC | #6
Em Thu, Feb 11, 2021 at 10:55:32PM -0800, Bill Wendling escreveu:
> On Thu, Feb 11, 2021 at 5:01 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > Em Wed, Feb 10, 2021 at 05:31:48PM -0800, Andrii Nakryiko escreveu:
> > > On Wed, Feb 10, 2021 at 5:24 PM Bill Wendling <morbo@google.com> wrote:
> > > > On Wed, Feb 10, 2021 at 4:00 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > > > On Wed, Feb 10, 2021 at 3:25 PM Bill Wendling <morbo@google.com> wrote:
> > > > > > This hashing function[1] produces better hash table bucket
> > > > > > distributions. The original hashing function always produced zeros in
> > > > > > the three least significant bits.
> >
> > > > > > The new hashing funciton gives a modest performance boost.
> >
> > > > > >       Original      New
> > > > > >        0:11.41       0:11.38
> > > > > >        0:11.36       0:11.34
> > > > > >        0:11.35       0:11.26
> > > > > >       -----------------------
> > > > > >   Avg: 0:11.373      0:11.327
> >
> > > > > > for a performance improvement of 0.4%.
> >
> > > > > > [1] From Numerical Recipes, 3rd Ed. 7.1.4 Random Hashes and Random Bytes
> >
> > > > > Can you please also test with the one libbpf uses internally:
> >
> > > > > return (val * 11400714819323198485llu) >> (64 - bits);
> >
> > > > > ?
> >
> > > > It's giving me a running time of ~11.11s, which is even better. Would
> > > > you like me to submit a patch?
> >
> > > faster is better, so yeah, why not? :)
> >
> > Yeah, I agree, faster is better, please make it so :-)
> >
> Your wish is my command! :-) Done.

Thanks, looking for the patch and applying!

No go think about something else to make it faster 8-)

- Arnaldo
diff mbox series

Patch

diff --git a/hash.h b/hash.h
index d3aa416..ea201ab 100644
--- a/hash.h
+++ b/hash.h
@@ -33,22 +33,17 @@ 
 
 static inline uint64_t hash_64(const uint64_t val, const unsigned int bits)
 {
-	uint64_t hash = val;
+	uint64_t hash = val * 0x369DEA0F31A53F85UL + 0x255992D382208B61UL;
 
-	/*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
-	uint64_t n = hash;
-	n <<= 18;
-	hash -= n;
-	n <<= 33;
-	hash -= n;
-	n <<= 3;
-	hash += n;
-	n <<= 3;
-	hash -= n;
-	n <<= 4;
-	hash += n;
-	n <<= 2;
-	hash += n;
+	hash ^= hash >> 21;
+	hash ^= hash << 37;
+	hash ^= hash >>  4;
+
+	hash *= 0x422E19E1D95D2F0DUL;
+
+	hash ^= hash << 20;
+	hash ^= hash >> 41;
+	hash ^= hash <<  5;
 
 	/* High bits are more random, so use them. */
 	return hash >> (64 - bits);