[10/10] tb-hash: improve tb_jmp_cache hash function in user mode

Optimizations to cross-page chaining and indirect jumps make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.

However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.

Measurements:

-    specINT 2006 (test set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz
                              Y axis: Speedup over 95b31d70

     1.3x+-+-------------------------------------------------------------+-+
         |        jr             $$                                        |
    1.25x+-+....  jr+xxhash      %%  ....................................+-+
         |        jr+hash+inline @@                 +++                    |
     1.2x+-+.............................................................+-+
         |                                          @@@                    |
         |                    +++@@               ++@:@       +++  @@+     |
    1.15x+-+..................$$$@@...............$$@.@.......@@...@@....+-+
         |                    $ $@@               $$@ @      %%@   @@      |
     1.1x+-+..................$.$@@...............$$@.@......%%@.$$@@....+-+
         |          +++@@+    $ $@@               $$@ @    ++%%@+$$@@   +++|
    1.05x+-+.........$$@@.....$.$@@...@@..........$$@.@..@@@.%%@.$$@@...@@-+
         |           $$@@     $ $@@$$$@@          $$% @$$@+@$$%@ $$@@+$$@@ |
         |+$$++++++++$$@@+++@@$ $@@$+$@@+++@@$$+@@$$% @$$@+@$$%@ $$%@ $$@@ |
       1x+-$$@@A$$%@R$$@@R$$@@$_$%@$_$%@$$s@@$$%%@$$%.@$$%.@$$%@.$$%@.$$%@-+
         | $$@@+$$%@ $$%@ $$@@$+$%@$ $%@$$%%@$$+%@$$% @$$% @$$%@ $$%@ $$%@ |
    0.95x+-$$%@.$$%@.$$%@.$$%@$.$%@$.$%@$$.%@$$.%@$$%.@$$%.@$$%@.$$%@.$$%@-+
         | $$%@ $$%@ $$%@ $$%@$ $%@$ $%@$$ %@$$ %@$$% @$$% @$$%@ $$%@ $$%@ |
     0.9x+-$$%@-$$%@-$$%@-$$%@$$$%@$$$%@$$%%@$$%%@$$%@@$$%@@$$%@-$$%@-$$%@-+
           astabzip2 gcc gobmh264rehmlibquantumcfomneperlbensjexalanchmean
  png: http://imgur.com/RiaBuIi

That is, a 6.45% hmean improvement for this commit. Note that this is the
test set, so some benchmarks take almost no time (and therefore aren't that
sensitive to changes here). See "train" results below.

Note also that hashing quality is not the only requirement: xxhash
gives on average the highest hit rates. However, the time spent computing
the hash negates the performance gains coming from the increase in hit rate.
Given these results, I dropped xxhash from subsequent experiments.

-   specINT 2006 (train set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz
                              Y axis: Speedup over 95b31d70

    1.4x+-+--------------------------------------------------------------+-+
        |    jr      $$                                           +++      |
        |    jr+hash %%                                            :       |
    1.3x+-+.......................................................%%%....+-+
        |                                               +++  +++  %:%      |
        |                      +++                      %%%   :   %+%      |
    1.2x+-+.....................%%......................%.%..%%%.$$.%....+-+
        |                     ++%%                 %%% $$+%  %:% $$+%      |
        |            +++      $$$%                $$+% $$ %  %:% $$ %      |
    1.1x+-+...........%%......$.$%................$$.%.$$.%.$$.%.$$.%..%%%-+
        |  +++        %%      $ $%            +++ $$ % $$ % $$ % $$ % +%+% |
        | ++%%  +++ ++%% ++%% $ $% $$$+ +++   %%% $$ % $$ % $$ % $$ % $$+% |
      1x+-$$$%RGR%%R$$$%H$$$%P$j$%h$s$%.$$%%..%.%.$$.%.$$.%.$$.%.$$.%.$$.%-+
        | $+$% $$$% $ $% $+$% $ $% $ $% $$+%  % % $$ % $$ % $$ % $$ % $$ % |
        | $ $% $ $% $ $% $ $% $ $% $ $% $$ %  % % $$ % $$ % $$ % $$ % $$ % |
    0.9x+-$.$%.$.$%.$.$%.$.$%.$.$%.$.$%.$$.%..%.%.$$.%.$$.%.$$.%.$$.%.$$.%-+
        | $ $% $ $% $ $% $ $% $ $% $ $% $$ % $$+% $$ % $$ % $$ % $$ % $$ % |
        | $ $% $ $% $ $% $ $% $ $% $ $% $$ % $$ % $$ % $$ % $$ % $$ % $$ % |
    0.8x+-$$$%-$$$%-$$$%-$$$%-$$$%-$$$%-$$%%-$$%%-$$%%-$$%%-$$%%-$$%%-$$%%-+
          astarbzip2 gcc gobmh264rehlibquantumcfomneperlbensjexalancbhmean
  png: http://imgur.com/55iJJgD

That is, a 10.19% hmean improvement for jr+hash (this commit).

-               NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz
                              Y axis: Speedup over 95b31d70

    1.35x+-+-------------------------------------------------------------+-+
         |               @@@   jr              $$                          |
     1.3x+-+.............@.@.  jr+inline       %%  ...@@@................+-+
         |               @ @   jr+inline+hash  @@     @ @                  |
         |               @ @                          @ @                  |
    1.25x+-+.............@.@..........................@.@................+-+
         |               @ @                    @@@   @ @                  |
     1.2x+-+.............@.@..................$$%.@...@.@................+-+
         |               @ @                  $$% @   @ @                  |
         |               @ @        %%@       $$% @  %% @                  |
    1.15x+-+.............@.@........%%@.......$$%.@$$$%.@................+-+
         |               @ @        %%@       $$% @$ $% @                  |
     1.1x+-+.............@.@......$$$%@.......$$%.@$.$%.@...............@@-+
         |               @ @      $ $%@       $$% @$ $% @               @@ |
         |               @ @      $ $%@ $$%%@ $$% @$ $% @            $$%%@ |
    1.05x+-+...........$$%.@$$$%@@$.$%@.$$.%@.$$%.@$.$%.@.........@@.$$.%@-+
         | $$%%@       $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @       %%%@ $$ %@ |
       1x+-$$.%@AR%%%@R$$%B@$G$%P@$T$%@_$$+%@l$$%+@$s$%.@$$$%@.$$.%@.$$.%@-+
         +-$$%%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%%@-+
        ASSIGNMBITFIELFOFP_EMULATHUFFMANLU_DECOMPNEURNUMERICSTRING_SOhmean
  png: http://imgur.com/i5e1gdY

That is, a 11% hmean perf gain--it almost doubles the perf gain
from implementing the jr optimization.

-              NBench, x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz

     1.1x+-+-------------------------------------------------------------+-+
         |         jr             $$                                       |
    1.08x+-+.....  jr+inline      %%  ...................................+-+
         |         jr+inline+hash @@                                       |
         | $$ @@                                                           |
    1.06x+-$$.@@.........................%%%.............................+-+
         | $$%%@                         % %                               |
    1.04x+-$$.%@.........................%.%.............................+-+
         | $$ %@         @@@            $$ %                   $$          |
         | $$ %@         @ @  %%        $$ %                   $$%%@       |
    1.02x+-$$.%@........%%.@$$$%@@......$$.%@..%%@@..%%........$$.%@.$$%%@-+
         | $$ %@    @@  %% @$ $% @$$$   $$ %@ $$% @  %%@@$$$%  $$ %@ $$ %@ |
       1x+-$$.%@A$$R@@RG%%B@$G$%P@$T$%P_$$T%@h$$%+@$$$%e@$.$%@.$$.%@.$$.%@-+
         | $$ %@ $$%%@ $$% @$ $% @$ $%  $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ |
    0.98x+-$$.%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$.%@-+
         | $$ %@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ |
         | $$ %@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ |
    0.96x+-$$.%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$.%@-+
         +-$$%%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%%@-+
        ASSIGNMBITFIELFOFP_EMULATHUFFMANLU_DECOMPNEURNUMERICSTRING_SOhmean
  png: http://imgur.com/Xu0Owgu

The fact that NBench is not very sensitive to changes here was mentioned
in the previous commit's log. We get a very slight overall decrease in hmean
performance, although some workloads improve as well. Note that there are
no error bars: NBench re-runs itself until confidence on the stability of
the average is >= 95%, and it doesn't report the resulting stddev.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/tb-hash.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

[10/10] tb-hash: improve tb_jmp_cache hash function in user mode

Commit Message

Comments

Patch