From patchwork Fri Jul 1 17:04:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 9210257 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 78487607D6 for ; Fri, 1 Jul 2016 17:14:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6022E28531 for ; Fri, 1 Jul 2016 17:14:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 52267286A4; Fri, 1 Jul 2016 17:14:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8A2AE28531 for ; Fri, 1 Jul 2016 17:14:44 +0000 (UTC) Received: from localhost ([::1]:34665 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bJ21b-0005ZG-Nv for patchwork-qemu-devel@patchwork.kernel.org; Fri, 01 Jul 2016 13:14:43 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53744) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bJ1sR-0002RL-Gl for qemu-devel@nongnu.org; Fri, 01 Jul 2016 13:05:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bJ1sO-0005Um-OT for qemu-devel@nongnu.org; Fri, 01 Jul 2016 13:05:14 -0400 Received: from mail-pf0-x241.google.com ([2607:f8b0:400e:c00::241]:34035) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bJ1sO-0005UK-DF for qemu-devel@nongnu.org; Fri, 01 Jul 2016 13:05:12 -0400 Received: by mail-pf0-x241.google.com with SMTP id 66so10457884pfy.1 for ; Fri, 01 Jul 2016 10:05:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=btLhre2yXyaAlMDyrGwLdAS3ddPw3Pn5bw+rD3DqOVw=; b=q4vHtpVUgsuKzyP2RlXlqUiWIywt+taaeD1I7GeEj2t6qAd7Sit3IGoda4dz55tpw4 rrrkde1DOOgJve3PI3kaP0+WT0/g6PWMCcPhnuTGRa9m+SiRtQrm7MDV07LPHN/OSvwV Ci0v7tzoSbnexV4ylHrVh/BlFjufBWWBkpHCVH2tCqa1hPz2h6JFG5LHeNqmQhKThPmy 9mKZ0cS3nMeT02kNZJH39anmhF+Jz6HT+cU2cIElUJ6vn1O+Me3C0VFmIEt4ITsqCTg4 k3QWyrcAPk/BlJ9Jgl6nRzU1Yv14hq6Nhv6lr0vu1XQztik2pApqJ5fp4ZLU3Rq/ox6f 6l6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=btLhre2yXyaAlMDyrGwLdAS3ddPw3Pn5bw+rD3DqOVw=; b=CT5wCCSHoL3N1uhWvvd7HDFNoemfTd2TejM2h9gIXunjWphOfU/Zp8ef+8gnejHNQv 02PRpHJ0tmKDi6c1atO8W7aljFTFojYATFwZze/6GINg0y7enIJMKqtXrvuZ85KuvsT0 gFZ4SjwGII7QLWwxauCmg88htIV+Fjb4ST7eVsQoQpzg1wqRFk65hAlURA9d8cRwHUrv 7LVEmD3qKctb173QLoxBJIhbezt87xOiCbkrjtc2fO60hK72ibHjT0n+Jr34YbF//umd Ua9idlElf3aKuiYUKswKyonv7WDy4Owt5AbpBHgwTymgoFlPdPjoLwyIqjhghGaWUCz3 TbLA== X-Gm-Message-State: ALyK8tK14tdA7xsjG2im9Q7Atc7cvrNMmeOWPvl8d6yV36C9lIZA1hVMli3CFO6go97UHw== X-Received: by 10.98.33.138 with SMTP id o10mr33081348pfj.151.1467392711425; Fri, 01 Jul 2016 10:05:11 -0700 (PDT) Received: from bigtime.twiddle.net (71-37-54-227.tukw.qwest.net. [71.37.54.227]) by smtp.gmail.com with ESMTPSA id ff9sm2652229pac.5.2016.07.01.10.05.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 01 Jul 2016 10:05:10 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 1 Jul 2016 10:04:45 -0700 Message-Id: <1467392693-22715-20-git-send-email-rth@twiddle.net> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1467392693-22715-1-git-send-email-rth@twiddle.net> References: <1467392693-22715-1-git-send-email-rth@twiddle.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400e:c00::241 Subject: [Qemu-devel] [PATCH v2 19/27] target-i386: remove helper_lock() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: pbonzini@redhat.com, serge.fdrv@gmail.com, cota@braap.org, alex.bennee@linaro.org, peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: "Emilio G. Cota" It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 +Emaster +-N--+ ++ || | |++ | || | 15 +++ ++ |N| | |+| | 10 ++| ++ |+|+ | | | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E| |+E+E+- +++ +E+------+E+-- | 5 ++|+ ++ |+N+H+--- +++ | ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- | 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 ++master +-N--+ ++ |E| | |++ | ||E | 15 ++| ++ |N|| | |+|| ---+E+------+E+-----+E+------+E| 10 ++| | ---+E+------+E+-----+E+--- +++ +++ ||H+E+--+E+-- | |+++++ | | || | 5 ++|+H+-- +++ ++ |+N+ - ---+H+------+H+------ | + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | 35 +cmpxchg +-H--+ ++ | master +-N--+ ---+E+------+E+------+E+-----+E+------+E| 30 ++| ---+E+-- +++ ++ | | -+E+--- | 25 ++E ---- +++ ++ |+++++ -+E+ | 20 +E+ E-- +++ ++ |H|+++ | |+| +H+------- | 15 ++H+ ---+++ +H+------ ++ |N++H+-- +++--- +H+------++| 10 ++ +++ - +++ ---+H+ +++ +H+ | | +H+-----+H+------+H+-- | 5 ++| +++ ++ ++N+N+--+N++ + + + + + | 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 140 +cmpxchg +-H--+ +++ +++ ++ | master +-N--+ E--------E------+E+------++| 120 ++ --| | +++ E+ | -- +++ +++ ++| 100 ++ - ++ | +++- +++ ++| 80 ++ -+E+ -+H+------+H+------H--------++ | ---- ---- +++ H| | ---+E+-----+E+- ---+H+ ++| 60 ++ +E+--- +++ ---+H+--- ++ | --+++ ---+H+-- | 40 ++ +E+-+H+--- ++ | +H+ | 20 +EE+ ++ +N+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 300 +cmpxchg +-H--+ +++ | master +-N--+ +++ || | +++ | ----E| 250 ++ | ----E---- ++ | ----E--- | ---+H| 200 ++ -+E+--- +++ ---+H+--- ++ | ---- -+H+-- | | +E+ +++ ---- +++ | 150 ++ ---+++ ---+H+- ++ | --- -+H+-- | 100 ++ ---+E+ ---- +++ ++ | +++ ---+E+-----+H+- | | -+E+------+H+-- | 50 ++ +E+ ++ +EE+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Note that using atomic helpers instead of cmpxchg is not only more performant (when the host implements natively the same atomics, as is in this case); it also simplifies the necessary TCG/helper code of the implementation. Compare the difference between the atomic and cmpxchg implementations: case OP_ADDL: if (s1->prefix & PREFIX_LOCK) { - gen_atomic_add_fetch(cpu_T0, cpu_A0, cpu_T1, ot); + TCGv t0, t1, t2, a0; + TCGLabel *retry; + + t0 = tcg_temp_local_new(); + t1 = tcg_temp_local_new(); + t2 = tcg_temp_local_new(); + a0 = tcg_temp_local_new(); + retry = gen_new_label(); + + tcg_gen_mov_tl(a0, cpu_A0); + tcg_gen_mov_tl(t1, cpu_T1); + gen_set_label(retry); + gen_op_ld_v(s1, ot, cpu_T0, a0); + tcg_gen_add_tl(t2, cpu_T0, t1); + gen_cmpxchg(t0, a0, cpu_T0, t2, ot); + tcg_gen_brcond_tl(TCG_COND_NE, t0, cpu_T0, retry); + + tcg_gen_mov_tl(cpu_T0, t2); + tcg_gen_mov_tl(cpu_T1, t1); + + tcg_temp_free(t0); + tcg_temp_free(t1); + tcg_temp_free(t2); + tcg_temp_free(a0); } else { Signed-off-by: Emilio G. Cota Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org> --- target-i386/helper.h | 2 -- target-i386/mem_helper.c | 33 --------------------------------- target-i386/translate.c | 15 --------------- 3 files changed, 50 deletions(-) diff --git a/target-i386/helper.h b/target-i386/helper.h index 1320edc..d02357c 100644 --- a/target-i386/helper.h +++ b/target-i386/helper.h @@ -1,8 +1,6 @@ DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int) DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int) -DEF_HELPER_0(lock, void) -DEF_HELPER_0(unlock, void) DEF_HELPER_3(write_eflags, void, env, tl, i32) DEF_HELPER_1(read_eflags, tl, env) DEF_HELPER_2(divb_AL, void, env, tl) diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c index 5c0558f..8649115 100644 --- a/target-i386/mem_helper.c +++ b/target-i386/mem_helper.c @@ -25,39 +25,6 @@ #include "qemu/int128.h" #include "tcg.h" -/* broken thread support */ - -#if defined(CONFIG_USER_ONLY) -QemuMutex global_cpu_lock; - -void helper_lock(void) -{ - qemu_mutex_lock(&global_cpu_lock); -} - -void helper_unlock(void) -{ - qemu_mutex_unlock(&global_cpu_lock); -} - -void helper_lock_init(void) -{ - qemu_mutex_init(&global_cpu_lock); -} -#else -void helper_lock(void) -{ -} - -void helper_unlock(void) -{ -} - -void helper_lock_init(void) -{ -} -#endif - void helper_cmpxchg8b(CPUX86State *env, target_ulong a0) { uintptr_t ra = GETPC(); diff --git a/target-i386/translate.c b/target-i386/translate.c index 525c445..9468cd5 100644 --- a/target-i386/translate.c +++ b/target-i386/translate.c @@ -4537,10 +4537,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, s->aflag = aflag; s->dflag = dflag; - /* lock generation */ - if (prefixes & PREFIX_LOCK) - gen_helper_lock(); - /* now check op code */ reswitch: switch(b) { @@ -8195,20 +8191,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, default: goto unknown_op; } - /* lock generation */ - if (s->prefix & PREFIX_LOCK) - gen_helper_unlock(); return s->pc; illegal_op: - if (s->prefix & PREFIX_LOCK) - gen_helper_unlock(); - /* XXX: ensure that no lock was generated */ gen_illegal_opcode(s); return s->pc; unknown_op: - if (s->prefix & PREFIX_LOCK) - gen_helper_unlock(); - /* XXX: ensure that no lock was generated */ gen_unknown_opcode(env, s); return s->pc; } @@ -8300,8 +8287,6 @@ void tcg_x86_init(void) offsetof(CPUX86State, bnd_regs[i].ub), bnd_regu_names[i]); } - - helper_lock_init(); } /* generate intermediate code for basic block 'tb'. */