From patchwork Wed Apr 12 01:17:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Emilio Cota X-Patchwork-Id: 9676393 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6615960382 for ; Wed, 12 Apr 2017 01:26:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 551F62852E for ; Wed, 12 Apr 2017 01:26:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4957A2857B; Wed, 12 Apr 2017 01:26:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 73B062852E for ; Wed, 12 Apr 2017 01:26:50 +0000 (UTC) Received: from localhost ([::1]:41802 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cy73Z-0005fs-8s for patchwork-qemu-devel@patchwork.kernel.org; Tue, 11 Apr 2017 21:26:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41275) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cy6uz-00085G-LY for qemu-devel@nongnu.org; Tue, 11 Apr 2017 21:17:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cy6ux-0006RZ-Jc for qemu-devel@nongnu.org; Tue, 11 Apr 2017 21:17:57 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:48652) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cy6us-0006ML-2p; Tue, 11 Apr 2017 21:17:50 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id A1E3520B77; Tue, 11 Apr 2017 21:17:48 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute4.internal (MEProxy); Tue, 11 Apr 2017 21:17:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=Mh0 ya9PSTnUiWSVOAuPQqMNYf1lHJtq8eqF1Akv/HHI=; b=nomPzFv25frsswkv6Sy 1mvanbBfJ4w8t6Ab/BQigQLy5Yay0SmWxrmt2GaXiLlRCBqrfYHPIjpoGb55eFxB muvQbhM3oiRSLreZu9SrsikOAjj2k0mvw16qk4ZY/5gqDtr8NS/RPeadXE6uMx/g rjpwttw6Rws0siX4hRFP2wEg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=Mh0ya9PSTnUiWSVOAuPQqMNYf1lHJtq8eqF1Akv/H HI=; b=d+YFxccPSfZF0v6G8Ek13e64lS5s22jhCSFKmFHP6LfmEEX7dV9u0hc4H H0U5ahYB8hx3ZDrPAZAaI/rn3+5OHtHiz/UMxAMvxvOokfAUfzvmKhwUGeKgy+Sp L74mVSCrDQRIlUr0SbAXL8JDvyMmfsEln+RQzZOm108LvHPOR9HlzWTUph/JcNaB OiU2FkDLe23LdIAPlAoCPjE6l2VMI43zuPtFkiwDhRWJVlZ4fggNHcXHE0plido/ BWSWaH5ECn4szpJmxu9oJ5pagxRCvEpZyckFMhDinDbuMVKjEPVjvkjRXPmo4B6R rWXNfg8d4Kx8EvCzDLZxHUv4tYBlg== X-ME-Sender: X-Sasl-enc: JnWikoH95gmDlF90pyAA825Jx5d8cGmGMLLvPC+YIEX1 1491959868 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 5FAD6241ED; Tue, 11 Apr 2017 21:17:48 -0400 (EDT) From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Tue, 11 Apr 2017 21:17:30 -0400 Message-Id: <1491959850-30756-11-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1491959850-30756-1-git-send-email-cota@braap.org> References: <1491959850-30756-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.25 Subject: [Qemu-devel] [PATCH 10/10] tb-hash: improve tb_jmp_cache hash function in user mode X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Eduardo Habkost , Peter Crosthwaite , Stefan Weil , Claudio Fontana , Alexander Graf , alex.bennee@linaro.org, qemu-arm@nongnu.org, Pranith Kumar , Paolo Bonzini , Aurelien Jarno , Richard Henderson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Optimizations to cross-page chaining and indirect jumps make performance more sensitive to the hit rate of tb_jmp_cache. The constraint of reserving some bits for the page number lowers the achievable quality of the hashing function. However, user-mode does not have this requirement. Thus, with this change we use for user-mode a hashing function that is both faster and of better quality than the previous one. Measurements: - specINT 2006 (test set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz Y axis: Speedup over 95b31d70 1.3x+-+-------------------------------------------------------------+-+ | jr $$ | 1.25x+-+.... jr+xxhash %% ....................................+-+ | jr+hash+inline @@ +++ | 1.2x+-+.............................................................+-+ | @@@ | | +++@@ ++@:@ +++ @@+ | 1.15x+-+..................$$$@@...............$$@.@.......@@...@@....+-+ | $ $@@ $$@ @ %%@ @@ | 1.1x+-+..................$.$@@...............$$@.@......%%@.$$@@....+-+ | +++@@+ $ $@@ $$@ @ ++%%@+$$@@ +++| 1.05x+-+.........$$@@.....$.$@@...@@..........$$@.@..@@@.%%@.$$@@...@@-+ | $$@@ $ $@@$$$@@ $$% @$$@+@$$%@ $$@@+$$@@ | |+$$++++++++$$@@+++@@$ $@@$+$@@+++@@$$+@@$$% @$$@+@$$%@ $$%@ $$@@ | 1x+-$$@@A$$%@R$$@@R$$@@$_$%@$_$%@$$s@@$$%%@$$%.@$$%.@$$%@.$$%@.$$%@-+ | $$@@+$$%@ $$%@ $$@@$+$%@$ $%@$$%%@$$+%@$$% @$$% @$$%@ $$%@ $$%@ | 0.95x+-$$%@.$$%@.$$%@.$$%@$.$%@$.$%@$$.%@$$.%@$$%.@$$%.@$$%@.$$%@.$$%@-+ | $$%@ $$%@ $$%@ $$%@$ $%@$ $%@$$ %@$$ %@$$% @$$% @$$%@ $$%@ $$%@ | 0.9x+-$$%@-$$%@-$$%@-$$%@$$$%@$$$%@$$%%@$$%%@$$%@@$$%@@$$%@-$$%@-$$%@-+ astabzip2 gcc gobmh264rehmlibquantumcfomneperlbensjexalanchmean png: http://imgur.com/RiaBuIi That is, a 6.45% hmean improvement for this commit. Note that this is the test set, so some benchmarks take almost no time (and therefore aren't that sensitive to changes here). See "train" results below. Note also that hashing quality is not the only requirement: xxhash gives on average the highest hit rates. However, the time spent computing the hash negates the performance gains coming from the increase in hit rate. Given these results, I dropped xxhash from subsequent experiments. - specINT 2006 (train set), x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz Y axis: Speedup over 95b31d70 1.4x+-+--------------------------------------------------------------+-+ | jr $$ +++ | | jr+hash %% : | 1.3x+-+.......................................................%%%....+-+ | +++ +++ %:% | | +++ %%% : %+% | 1.2x+-+.....................%%......................%.%..%%%.$$.%....+-+ | ++%% %%% $$+% %:% $$+% | | +++ $$$% $$+% $$ % %:% $$ % | 1.1x+-+...........%%......$.$%................$$.%.$$.%.$$.%.$$.%..%%%-+ | +++ %% $ $% +++ $$ % $$ % $$ % $$ % +%+% | | ++%% +++ ++%% ++%% $ $% $$$+ +++ %%% $$ % $$ % $$ % $$ % $$+% | 1x+-$$$%RGR%%R$$$%H$$$%P$j$%h$s$%.$$%%..%.%.$$.%.$$.%.$$.%.$$.%.$$.%-+ | $+$% $$$% $ $% $+$% $ $% $ $% $$+% % % $$ % $$ % $$ % $$ % $$ % | | $ $% $ $% $ $% $ $% $ $% $ $% $$ % % % $$ % $$ % $$ % $$ % $$ % | 0.9x+-$.$%.$.$%.$.$%.$.$%.$.$%.$.$%.$$.%..%.%.$$.%.$$.%.$$.%.$$.%.$$.%-+ | $ $% $ $% $ $% $ $% $ $% $ $% $$ % $$+% $$ % $$ % $$ % $$ % $$ % | | $ $% $ $% $ $% $ $% $ $% $ $% $$ % $$ % $$ % $$ % $$ % $$ % $$ % | 0.8x+-$$$%-$$$%-$$$%-$$$%-$$$%-$$$%-$$%%-$$%%-$$%%-$$%%-$$%%-$$%%-$$%%-+ astarbzip2 gcc gobmh264rehlibquantumcfomneperlbensjexalancbhmean png: http://imgur.com/55iJJgD That is, a 10.19% hmean improvement for jr+hash (this commit). - NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz Y axis: Speedup over 95b31d70 1.35x+-+-------------------------------------------------------------+-+ | @@@ jr $$ | 1.3x+-+.............@.@. jr+inline %% ...@@@................+-+ | @ @ jr+inline+hash @@ @ @ | | @ @ @ @ | 1.25x+-+.............@.@..........................@.@................+-+ | @ @ @@@ @ @ | 1.2x+-+.............@.@..................$$%.@...@.@................+-+ | @ @ $$% @ @ @ | | @ @ %%@ $$% @ %% @ | 1.15x+-+.............@.@........%%@.......$$%.@$$$%.@................+-+ | @ @ %%@ $$% @$ $% @ | 1.1x+-+.............@.@......$$$%@.......$$%.@$.$%.@...............@@-+ | @ @ $ $%@ $$% @$ $% @ @@ | | @ @ $ $%@ $$%%@ $$% @$ $% @ $$%%@ | 1.05x+-+...........$$%.@$$$%@@$.$%@.$$.%@.$$%.@$.$%.@.........@@.$$.%@-+ | $$%%@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @ %%%@ $$ %@ | 1x+-$$.%@AR%%%@R$$%B@$G$%P@$T$%@_$$+%@l$$%+@$s$%.@$$$%@.$$.%@.$$.%@-+ +-$$%%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%%@-+ ASSIGNMBITFIELFOFP_EMULATHUFFMANLU_DECOMPNEURNUMERICSTRING_SOhmean png: http://imgur.com/i5e1gdY That is, a 11% hmean perf gain--it almost doubles the perf gain from implementing the jr optimization. - NBench, x86_64-linux-user. Host: Intel i7-4790K @ 4.00GHz 1.1x+-+-------------------------------------------------------------+-+ | jr $$ | 1.08x+-+..... jr+inline %% ...................................+-+ | jr+inline+hash @@ | | $$ @@ | 1.06x+-$$.@@.........................%%%.............................+-+ | $$%%@ % % | 1.04x+-$$.%@.........................%.%.............................+-+ | $$ %@ @@@ $$ % $$ | | $$ %@ @ @ %% $$ % $$%%@ | 1.02x+-$$.%@........%%.@$$$%@@......$$.%@..%%@@..%%........$$.%@.$$%%@-+ | $$ %@ @@ %% @$ $% @$$$ $$ %@ $$% @ %%@@$$$% $$ %@ $$ %@ | 1x+-$$.%@A$$R@@RG%%B@$G$%P@$T$%P_$$T%@h$$%+@$$$%e@$.$%@.$$.%@.$$.%@-+ | $$ %@ $$%%@ $$% @$ $% @$ $% $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ | 0.98x+-$$.%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$.%@-+ | $$ %@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ | | $$ %@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$% @$ $% @$ $%@ $$ %@ $$ %@ | 0.96x+-$$.%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$%.@$.$%.@$.$%@.$$.%@.$$.%@-+ +-$$%%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%@@$$$%@@$$$%@-$$%%@-$$%%@-+ ASSIGNMBITFIELFOFP_EMULATHUFFMANLU_DECOMPNEURNUMERICSTRING_SOhmean png: http://imgur.com/Xu0Owgu The fact that NBench is not very sensitive to changes here was mentioned in the previous commit's log. We get a very slight overall decrease in hmean performance, although some workloads improve as well. Note that there are no error bars: NBench re-runs itself until confidence on the stability of the average is >= 95%, and it doesn't report the resulting stddev. Signed-off-by: Emilio G. Cota --- include/exec/tb-hash.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 2c27490..b1fe2d0 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -22,6 +22,8 @@ #include "exec/tb-hash-xx.h" +#ifdef CONFIG_SOFTMMU + /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for addresses on the same page. The top bits are the same. This allows TLB invalidation to quickly clear a subset of the hash table. */ @@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) | (tmp & TB_JMP_ADDR_MASK)); } +#else + +/* In user-mode we can get better hashing because we do not have a TLB */ +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) +{ + return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1); +} + +#endif /* CONFIG_SOFTMMU */ + static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags) {