[RFC,v3,5/5] (XXX) cputlb: dynamically resize TLBs based on use rate

XXX: convert CPU_TLB_BITS/CPU_TLB_SIZE users in non-i386
     TCG backends

Perform the resizing only on flushes, otherwise we'd
have to take a perf hit by either rehashing the array
or unnecessarily flushing it.

We grow the array aggressively, and reduce the size more
slowly. This accommodates mixed workloads, where some
processes might be memory-heavy while others are not.

As the following experiments show, this a net perf gain,
particularly for memory-heavy workloads. Experiments
are run on an Intel i7-6700K CPU @ 4.00GHz.

1. System boot + shudown, debian aarch64:

- Before (tb-lock-v3):
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7469.363393      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
    31,507,707,190      cycles                    #    4.218 GHz                      ( +-  0.07% )
    57,101,577,452      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,265,531,804      branches                  # 1374.352 M/sec                    ( +-  0.07% )
       173,020,681      branch-misses             #    1.69% of all branches          ( +-  0.10% )

       7.483359063 seconds time elapsed                                          ( +-  0.08% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7185.036730      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.11% )
    30,303,501,143      cycles                    #    4.218 GHz                      ( +-  0.11% )
    54,198,386,487      instructions              #    1.79  insns per cycle          ( +-  0.08% )
     9,726,518,945      branches                  # 1353.719 M/sec                    ( +-  0.08% )
       167,082,307      branch-misses             #    1.72% of all branches          ( +-  0.08% )

       7.195597842 seconds time elapsed                                          ( +-  0.11% )

That is, a 3.8% improvement.

2. System boot + shutdown, ubuntu 18.04 x86_64:

- Before (tb-lock-v3):
Performance counter stats for 'taskset -c 0 ../img/x86_64/ubuntu-die.sh -nographic' (2 runs):

      49971.036482      task-clock (msec)         #    0.999 CPUs utilized            ( +-  1.62% )
   210,766,077,140      cycles                    #    4.218 GHz                      ( +-  1.63% )
   428,829,830,790      instructions              #    2.03  insns per cycle          ( +-  0.75% )
    77,313,384,038      branches                  # 1547.164 M/sec                    ( +-  0.54% )
       835,610,706      branch-misses             #    1.08% of all branches          ( +-  2.97% )

      50.003855102 seconds time elapsed                                          ( +-  1.61% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/x86_64/ubuntu-die.sh -nographic' (2 runs):

      50118.124477      task-clock (msec)         #    0.999 CPUs utilized            ( +-  4.30% )
           132,396      context-switches          #    0.003 M/sec                    ( +-  1.20% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
           167,754      page-faults               #    0.003 M/sec                    ( +-  0.06% )
   211,414,701,601      cycles                    #    4.218 GHz                      ( +-  4.30% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
   431,618,818,597      instructions              #    2.04  insns per cycle          ( +-  6.40% )
    80,197,256,524      branches                  # 1600.165 M/sec                    ( +-  8.59% )
       794,830,352      branch-misses             #    0.99% of all branches          ( +-  2.05% )

      50.177077175 seconds time elapsed                                          ( +-  4.23% )

No improvement (within noise range).

3. x86_64 SPEC06int:
                              SPEC06int (test set)
                         [ Y axis: speedup over master ]
  8 +-+--+----+----+-----+----+----+----+----+----+----+-----+----+----+--+-+
    |                                                                       |
    |                                                   tlb-lock-v3         |
  7 +-+..................$$$...........................+indirection       +-+
    |                    $ $                              +resizing         |
    |                    $ $                                                |
  6 +-+..................$.$..............................................+-+
    |                    $ $                                                |
    |                    $ $                                                |
  5 +-+..................$.$..............................................+-+
    |                    $ $                                                |
    |                    $ $                                                |
  4 +-+..................$.$..............................................+-+
    |                    $ $                                                |
    |          +++       $ $                                                |
  3 +-+........$$+.......$.$..............................................+-+
    |          $$        $ $                                                |
    |          $$        $ $                                 $$$            |
  2 +-+........$$........$.$.................................$.$..........+-+
    |          $$        $ $                                 $ $       +$$  |
    |          $$   $$+  $ $  $$$       +$$                  $ $  $$$   $$  |
  1 +-+***#$***#$+**#$+**#+$**#+$**##$**##$***#$***#$+**#$+**#+$**#+$**##$+-+
    |  * *#$* *#$ **#$ **# $**# $** #$** #$* *#$* *#$ **#$ **# $**# $** #$  |
    |  * *#$* *#$ **#$ **# $**# $** #$** #$* *#$* *#$ **#$ **# $**# $** #$  |
  0 +-+***#$***#$-**#$-**#$$**#$$**##$**##$***#$***#$-**#$-**#$$**#$$**##$+-+
     401.bzi403.gc429445.g456.h462.libq464.h471.omne4483.xalancbgeomean
png: https://imgur.com/a/b1wn3wc

That is, a 1.53x average speedup over master, with a max speedup of 7.13x.

Note that "indirection" (i.e. the "cputlb: introduce indirection for TLB size"
patch in this series) incurs no overhead, on average.

To conclude, here is a different look at the SPEC06int results, using
linux-user as the baseline and comparing master and this series ("tlb-dyn"):

            Softmmu slowdown vs. linux-user for SPEC06int (test set)
                    [ Y axis: slowdown over linux-user ]
  14 +-+--+----+----+----+----+----+-----+----+----+----+----+----+----+--+-+
     |                                                                      |
     |                                                       master         |
  12 +-+...............+**..................................tlb-dyn.......+-+
     |                  **                                                  |
     |                  **                                                  |
     |                  **                                                  |
  10 +-+................**................................................+-+
     |                  **                                                  |
     |                  **                                                  |
   8 +-+................**................................................+-+
     |                  **                                                  |
     |                  **                                                  |
     |                  **                                                  |
   6 +-+................**................................................+-+
     |       ***        **                                                  |
     |       * *        **                                                  |
   4 +-+.....*.*........**.................................***............+-+
     |       * *        **                                 * *              |
     |       * *  +++   **             ***            ***  * *  ***  ***    |
     |       * *  +**++ **   **##      *+*#      ***  * *#+* *  * *##* *    |
   2 +-+.....*.*##.**##.**##.**.#.**##.*+*#.***#.*+*#.*.*#.*.*#+*.*.#*.*##+-+
     |++***##*+*+#+**+#+**+#+**+#+**+#+*+*#+*+*#+*+*#+*+*#+*+*#+*+*+#*+*+#++|
     |  * * #* * # ** # ** # ** # ** # * *# * *# * *# * *# * *# * * #* * #  |
   0 +-+***##***##-**##-**##-**##-**##-***#-***#-***#-***#-***#-***##***##+-+
      401.bzi403.g429445.g456.hm462.libq464.h471.omn4483.xalancbgeomean

png: https://imgur.com/a/eXkjMCE

After this series, we bring down the average softmmu overhead
from 2.77x to 1.80x, with a maximum slowdown of 2.48x (omnetpp).

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-defs.h   | 39 +++++++++----------------------------
 accel/tcg/cputlb.c        | 41 +++++++++++++++++++++++++++++++++++++--
 tcg/i386/tcg-target.inc.c |  2 +-
 3 files changed, 49 insertions(+), 33 deletions(-)

Message ID	20181009175129.17888-6-cota@braap.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E10013AA for <patchwork-qemu-devel@patchwork.kernel.org>; Tue, 9 Oct 2018 17:56:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02052294F1 for <patchwork-qemu-devel@patchwork.kernel.org>; Tue, 9 Oct 2018 17:56:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E7FF429513; Tue, 9 Oct 2018 17:56:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EB06C29508 for <patchwork-qemu-devel@patchwork.kernel.org>; Tue, 9 Oct 2018 17:56:23 +0000 (UTC) Received: from localhost ([::1]:53187 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>) id 1g9wF5-0002O4-3L for patchwork-qemu-devel@patchwork.kernel.org; Tue, 09 Oct 2018 13:56:23 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54269) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <cota@braap.org>) id 1g9wAU-0006lM-SH for qemu-devel@nongnu.org; Tue, 09 Oct 2018 13:51:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <cota@braap.org>) id 1g9wAS-0002GG-Ml for qemu-devel@nongnu.org; Tue, 09 Oct 2018 13:51:38 -0400 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:38227) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <cota@braap.org>) id 1g9wAS-0002F9-CA for qemu-devel@nongnu.org; Tue, 09 Oct 2018 13:51:36 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 8FB26926; Tue, 9 Oct 2018 13:51:34 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Tue, 09 Oct 2018 13:51:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; s=mesmtp; bh=X28E+fGSdGBEu191+vNPKPNtkcpWzqJUM5luQ+4enxA=; b=2fivhf94LKMH b+uELmYztIHqM5ccHLbrwY4a0FDD7w6X89BPOSyLOEaMXO0om6+C/KyvmRztqRvK i8U6CGbsJet8QP+LWlY7UJ28JNBZkwcz+4XmbiLU/xyTdkSBy3Wgifw2Lacz9zwW 5nNmu9pP3pxMsAdBeay0a1chn7QT5Wg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=X28E+fGSdGBEu191+vNPKPNtkcpWzqJUM5luQ+4en xA=; b=XRCs0Cng3AV1Jo9c75bosthYZxvWf4eWbQAU2LMsPwT6jiXaO37n8LF3W 05yEki51WnTZD7tzx+ZhqwW8vgUJ1O5SmsRr3zviUam7sTFbC0zCuLZinhVFQkeP MWgV14jbBH1QT7IKCzXWcZfNjob+ZHIfo/QNxFrCL8SfwNWCycfgvdm7fCXOePoD RCPSS40C8zK8VMt0htqcNcDxlBQlbky6W3u/UzCS4Auy5IjPtGrVLTLg1k9uWsCs m+k/dVD3x7eFDX0PKmJ3u4+QfNKK+e1YkhikP9U90NYL1FTuLswKjIbTS7vwcP+q u5hArJQrnWa5AHLFjzFO7exwEZ1cQ== X-ME-Sender: <xms:puq8W7pWu9GOMRiFbHuEmlslJ6V2Am8uy4fs2oq6xfubLHiBg07mfg> X-ME-Proxy: <xmx:puq8W0UtFNPXyHg8t7qqs_Gv4aXItp4U52L95QZ3LXO6N4EaTHVdpA> <xmx:puq8W-hlzw4BGzS2h-0g_cdrOjcrcE2nUIMkcwJ001AkPOvS3meLOQ> <xmx:puq8W1hzKis6zzZiXVIUWk6j5sEScNUXzewX08_enB8eakeGBMc6fg> <xmx:puq8W0E3vsLU-f-aDO5JMuyWXJ8lualQCrvnvDbNa_nBRD0bCYEClg> <xmx:puq8W6A4OCzb8QCLrJsvtqeJFcm7EUYvGDWDKFpMn72QXve-r-dFSQ> <xmx:puq8WxhPI8WrSPvzF34v9gzyaqQI99EMP1VZjemtfRcKX9ntsSs2Eg> Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id D8714E47CB; Tue, 9 Oct 2018 13:51:33 -0400 (EDT) From: "Emilio G. Cota" <cota@braap.org> To: qemu-devel@nongnu.org Date: Tue, 9 Oct 2018 13:51:29 -0400 Message-Id: <20181009175129.17888-6-cota@braap.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181009175129.17888-1-cota@braap.org> References: <20181009175129.17888-1-cota@braap.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 64.147.123.24 Subject: [Qemu-devel] [RFC v3 5/5] (XXX) cputlb: dynamically resize TLBs based on use rate X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: <qemu-devel.nongnu.org> List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>, <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe> List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/> List-Post: <mailto:qemu-devel@nongnu.org> List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help> List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>, <mailto:qemu-devel-request@nongnu.org?subject=subscribe> Cc: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, Richard Henderson <richard.henderson@linaro.org> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> X-Virus-Scanned: ClamAV using ClamSMTP
Series	Dynamic TLB sizing \| expand [RFC,v3,0/5] Dynamic TLB sizing [RFC,v3,1/5] tcg: Add tlb_index and tlb_entry helpers [RFC,v3,2/5] (XXX) cputlb: introduce indirection for TLB size [RFC,v3,3/5] cputlb: do not evict empty entries to the vtlb [RFC,v3,4/5] cputlb: track TLB use rate [RFC,v3,5/5] (XXX) cputlb: dynamically resize TLBs based on use rate

[RFC,v3,5/5] (XXX) cputlb: dynamically resize TLBs based on use rate

Commit Message

Patch