From patchwork Sun Jul 16 20:04:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Emilio Cota X-Patchwork-Id: 9843671 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BDF1A6037F for ; Sun, 16 Jul 2017 20:39:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9E54326D05 for ; Sun, 16 Jul 2017 20:39:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 90B5C27F98; Sun, 16 Jul 2017 20:39:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B581826D05 for ; Sun, 16 Jul 2017 20:39:39 +0000 (UTC) Received: from localhost ([::1]:46861 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dWqKI-0004Bb-SU for patchwork-qemu-devel@patchwork.kernel.org; Sun, 16 Jul 2017 16:39:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dWpmj-0001HX-TW for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:05:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dWpmb-0008TW-LB for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:04:57 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:47883) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dWpmb-0008S5-4b for qemu-devel@nongnu.org; Sun, 16 Jul 2017 16:04:49 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 8F12020A7E; Sun, 16 Jul 2017 16:04:48 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute4.internal (MEProxy); Sun, 16 Jul 2017 16:04:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=braap.org; h=cc :date:from:in-reply-to:message-id:references:subject:to :x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=hTv 8r7uFbTgN9alHPDjf5MOo8n5Ihao+PBXSzWnK9YA=; b=vxV/U+c/Gmebvrczlnd BzCITyKxWj1W5SxTur2d2OdwjY3epLGIGpgHouSYm4E1L7UFJaYuZl7C/v69WV/z Vw37fUV7KEFIJP/Jo8wx7YR/pTisz3ROHhdO+MIrQMX4srJlCB0CstX9yG2oxuAa WwgOYm/lPOnbC4zFoB9CICIk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:date:from:in-reply-to:message-id :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc :x-sasl-enc; s=fm1; bh=hTv8r7uFbTgN9alHPDjf5MOo8n5Ihao+PBXSzWnK9 YA=; b=JFcfMDem5GVz8O31hRKqhVq4S5/7YVHqCHnWOqPK1w/cHpya5cB+Luqa3 2diZRzTqR2tST4AeU4UkQoxmcQSAbuxiOhxieUjU5smcRrgiWCPN1WvjcLCkCbqV Wa8FPB7RrhAf5Er8gzMnghWbFa71V3D1pEaD9JzXzhhNvs16+KJuz4fQY4TVf72y ByaUJu43FlR6Uy+1HMKKaVrFR50Ej8HnHpO/DY0Jve8b48JyrVcfTCZ11gUd3Svp HhM0dXVKa/jnGl3h/8nKYbl+HDJXHpbeWxcCK10PyoCxM8GzX0Xp6370cOIFl2RV Nj4osthjCeOvZ6C9d1Zdq2uQ8AUug== X-ME-Sender: X-Sasl-enc: kW5hSF2BFBAzv6K/5JtQRzsv1KRnsBng808FTQRmE1Q6 1500235488 Received: from localhost (flamenco.cs.columbia.edu [128.59.20.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 4CCF77E17E; Sun, 16 Jul 2017 16:04:48 -0400 (EDT) From: "Emilio G. Cota" To: qemu-devel@nongnu.org Date: Sun, 16 Jul 2017 16:04:28 -0400 Message-Id: <1500235468-15341-46-git-send-email-cota@braap.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1500235468-15341-1-git-send-email-cota@braap.org> References: <1500235468-15341-1-git-send-email-cota@braap.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.25 Subject: [Qemu-devel] [PATCH v2 45/45] tcg: enable multiple TCG contexts in softmmu X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] atomically using cmpxchg. They also increment n_tcg_ctxs atomically. Do not be too annoyed by the subsequent atomic_read's of that variable; they are there just to play nice with analysis tools such as thread sanitizer. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Signed-off-by: Emilio G. Cota --- tcg/tcg.h | 7 ++-- accel/tcg/translate-all.c | 2 +- cpus.c | 2 + linux-user/syscall.c | 1 + tcg/tcg.c | 103 ++++++++++++++++++++++++++++++++++++++++++---- 5 files changed, 104 insertions(+), 11 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index 6f6720b..2f7661b 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -745,7 +745,7 @@ struct TCGContext { }; extern TCGContext tcg_init_ctx; -extern TCGContext *tcg_ctx; +extern __thread TCGContext *tcg_ctx; static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v) { @@ -767,7 +767,7 @@ static inline bool tcg_op_buf_full(void) /* pool based memory allocation */ -/* tb_lock must be held for tcg_malloc_internal. */ +/* user-mode: tb_lock must be held for tcg_malloc_internal. */ void *tcg_malloc_internal(TCGContext *s, int size); void tcg_pool_reset(TCGContext *s); TranslationBlock *tcg_tb_alloc(TCGContext *s); @@ -778,7 +778,7 @@ void tcg_region_reset_all(void); size_t tcg_code_size(void); size_t tcg_code_capacity(void); -/* Called with tb_lock held. */ +/* user-mode: Called with tb_lock held. */ static inline void *tcg_malloc(int size) { TCGContext *s = tcg_ctx; @@ -795,6 +795,7 @@ static inline void *tcg_malloc(int size) } void tcg_context_init(TCGContext *s); +void tcg_register_thread(void); void tcg_prologue_init(TCGContext *s); void tcg_func_start(TCGContext *s); diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 98aa63e..78457a4 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -155,7 +155,7 @@ static void *l1_map[V_L1_MAX_SIZE]; /* code generation context */ TCGContext tcg_init_ctx; -TCGContext *tcg_ctx; +__thread TCGContext *tcg_ctx; TBContext tb_ctx; bool parallel_cpus; diff --git a/cpus.c b/cpus.c index 5455819..170071c 100644 --- a/cpus.c +++ b/cpus.c @@ -1307,6 +1307,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg) CPUState *cpu = arg; rcu_register_thread(); + tcg_register_thread(); qemu_mutex_lock_iothread(); qemu_thread_get_self(cpu->thread); @@ -1454,6 +1455,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) g_assert(!use_icount); rcu_register_thread(); + tcg_register_thread(); qemu_mutex_lock_iothread(); qemu_thread_get_self(cpu->thread); diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 925ae11..1beb11c 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -6214,6 +6214,7 @@ static void *clone_func(void *arg) TaskState *ts; rcu_register_thread(); + tcg_register_thread(); env = info->env; cpu = ENV_GET_CPU(env); thread_cpu = cpu; diff --git a/tcg/tcg.c b/tcg/tcg.c index daec7d1..f56ab44 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -59,6 +59,7 @@ #include "elf.h" #include "exec/log.h" +#include "sysemu/sysemu.h" /* Forward declarations for functions declared in tcg-target.inc.c and used here. */ @@ -325,13 +326,14 @@ static inline bool tcg_region_initial_alloc__locked(TCGContext *s) /* Call from a safe-work context */ void tcg_region_reset_all(void) { + unsigned int n_ctxs = atomic_read(&n_tcg_ctxs); unsigned int i; qemu_mutex_lock(®ion.lock); region.current = 0; region.n_full = 0; - for (i = 0; i < n_tcg_ctxs; i++) { + for (i = 0; i < n_ctxs; i++) { if (unlikely(tcg_region_initial_alloc__locked(tcg_ctxs[i]))) { tcg_abort(); } @@ -365,6 +367,23 @@ static void tcg_region_set_guard_pages(void) * Region partitioning works by splitting code_gen_buffer into separate regions, * and then assigning regions to TCG threads so that the threads can translate * code in parallel without synchronization. + * + * In softmmu the number of TCG threads is bounded by smp_cpus, so + * tcg_region_init callers must ensure that @n_regions is set so that there will + * be at least as many regions as TCG threads. + * + * User-mode callers must set @n_regions to 0/1, thereby using a single region. + * Having multiple regions in user-mode is not supported, since the number of + * vCPU threads (recall that each thread spawned by the guest corresponds to + * a vCPU thread) is only bounded by the OS, and usually this number is huge + * (tens of thousands is not uncommon). Thus, given this large bound on the + * number of vCPU threads and the fact that code_gen_buffer is allocated at + * compile-time, we cannot guarantee that the availability of at least one + * region per vCPU thread. + * + * However, this user-mode limitation is unlikely to be a significant problem + * in practice. Multi-threaded guests share most if not all of their translated + * code, which makes parallel code generation less appealing than in softmmu. */ void tcg_region_init(size_t n_regions) { @@ -398,13 +417,71 @@ void tcg_region_init(size_t n_regions) region.buf = buf; tcg_region_set_guard_pages(); qemu_mutex_init(®ion.lock); - /* - * We do not yet support multiple TCG contexts, so do the initial - * allocation now. - */ +#ifdef CONFIG_USER_ONLY + /* In user-mode we support only one ctx, so do the initial allocation now */ if (unlikely(tcg_region_initial_alloc__locked(tcg_ctx))) { tcg_abort(); } +#endif +} + +/* + * All TCG threads except the parent (i.e. the one that called tcg_context_init + * and registered the target's TCG globals) must register with this function + * before initiating translation. + * + * In user-mode we just point tcg_ctx to tcg_init_ctx. See the documentation + * of tcg_region_init() for the reasoning behind this. + * + * In softmmu each caller registers its context in tcg_ctxs[]. Note that in + * softmmu tcg_ctxs[] does not track tcg_ctx_init, since the initial context + * is not used anymore for translation once this function is called. + * + * Not tracking tcg_init_ctx in tcg_ctxs[] in softmmu keeps code that iterates + * over the array (e.g. tcg_code_size() the same for both softmmu and user-mode. + */ +void tcg_register_thread(void) +{ +#ifdef CONFIG_USER_ONLY + tcg_ctx = &tcg_init_ctx; +#else + TCGContext *s = g_malloc(sizeof(*s)); + int i; + + memcpy(s, &tcg_init_ctx, sizeof(*s)); + /* tcg_optimize will allocate a new opt_temps array for this ctx */ + s->opt_temps = NULL; + + /* claim the first free pointer in tcg_ctxs and increment n_tcg_ctxs */ + for (i = 0; i < smp_cpus; i++) { + if (atomic_cmpxchg(&tcg_ctxs[i], NULL, s) == NULL) { + unsigned int n; + + n = atomic_fetch_inc(&n_tcg_ctxs); + /* + * Zero out s->prof in all contexts but the first. + * This ensures that we correctly account for the profiling info + * generated during initialization, since tcg_init_ctx is not + * tracked by the array. + */ + if (n != 0) { +#ifdef CONFIG_PROFILER + memset(&s->prof, 0, sizeof(s->prof)); +#endif + } + break; + } + } + /* Only vCPU threads can call this function */ + g_assert(i < smp_cpus); + + tcg_ctx = s; + qemu_mutex_lock(®ion.lock); + if (unlikely(tcg_region_initial_alloc__locked(tcg_ctx))) { + tcg_abort(); + } + qemu_mutex_unlock(®ion.lock); +#endif } /* @@ -416,12 +493,13 @@ void tcg_region_init(size_t n_regions) */ size_t tcg_code_size(void) { + unsigned int n_ctxs = atomic_read(&n_tcg_ctxs); unsigned int i; size_t total; qemu_mutex_lock(®ion.lock); total = region.n_full * (region.size - TCG_HIGHWATER); - for (i = 0; i < n_tcg_ctxs; i++) { + for (i = 0; i < n_ctxs; i++) { const TCGContext *s = tcg_ctxs[i]; size_t size; @@ -516,11 +594,21 @@ static GHashTable *helper_table; static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)]; static void process_op_defs(TCGContext *s); +/* + * In user-mode we simply share the init context among threads, since we + * use a single region. See the documentation tcg_region_init() for the + * reasoning behind this. + * In softmmu we will have at most smp_cpus TCG threads. + */ static void tcg_ctxs_init(TCGContext *s) { +#ifdef CONFIG_USER_ONLY tcg_ctxs = g_new(TCGContext *, 1); tcg_ctxs[0] = s; n_tcg_ctxs = 1; +#else + tcg_ctxs = g_new0(TCGContext *, smp_cpus); +#endif } void tcg_context_init(TCGContext *s) @@ -2734,9 +2822,10 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs, static inline void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table) { + unsigned int n_ctxs = atomic_read(&n_tcg_ctxs); unsigned int i; - for (i = 0; i < n_tcg_ctxs; i++) { + for (i = 0; i < n_ctxs; i++) { const TCGProfile *orig = &tcg_ctxs[i]->prof; if (counters) {