From patchwork Fri Sep 9 21:47:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 9324477 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 90B1A60869 for ; Fri, 9 Sep 2016 21:52:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F7ED2A023 for ; Fri, 9 Sep 2016 21:52:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 742DA2A024; Fri, 9 Sep 2016 21:52:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C60F22A023 for ; Fri, 9 Sep 2016 21:52:38 +0000 (UTC) Received: from localhost ([::1]:60429 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biTiw-0008D4-1x for patchwork-qemu-devel@patchwork.kernel.org; Fri, 09 Sep 2016 17:52:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40282) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biTdy-00048K-Aq for qemu-devel@nongnu.org; Fri, 09 Sep 2016 17:47:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1biTdw-0004Nj-Nb for qemu-devel@nongnu.org; Fri, 09 Sep 2016 17:47:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52056) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1biTdw-0004NW-Fv for qemu-devel@nongnu.org; Fri, 09 Sep 2016 17:47:28 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0BCD74E4C3; Fri, 9 Sep 2016 21:47:28 +0000 (UTC) Received: from redhat.com (vpn-62-83.rdu2.redhat.com [10.10.62.83]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id u89LlQ2p025973; Fri, 9 Sep 2016 17:47:27 -0400 Date: Sat, 10 Sep 2016 00:47:26 +0300 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Message-ID: <1473456998-14863-5-git-send-email-mst@redhat.com> References: <1473456998-14863-1-git-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1473456998-14863-1-git-send-email-mst@redhat.com> X-Mutt-Fcc: =sent X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 09 Sep 2016 21:47:28 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL v2 04/14] target-i386: present virtual L3 cache info for vcpus X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , "Longpeng\(Mike\)" , Richard Henderson , Eduardo Habkost , Paolo Bonzini Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: "Longpeng(Mike)" Some software algorithms are based on the hardware's cache info, for example, for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger a resched IPI and told cpu2 to do the wakeup if they don't share low level cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc. The relevant linux-kernel code as bellow: static void ttwu_queue(struct task_struct *p, int cpu) { struct rq *rq = cpu_rq(cpu); ...... if (... && !cpus_share_cache(smp_processor_id(), cpu)) { ...... ttwu_queue_remote(p, cpu); /* will trigger RES IPI */ return; } ...... ttwu_do_activate(rq, p, 0); /* access target's rq directly */ ...... } In real hardware, the cpus on the same socket share L3 cache, so one won't trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs under some workloads even if the virtual cpus belongs to the same virtual socket. For KVM, there will be lots of vmexit due to guest send IPIs. The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates) and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during the period: No-L3 With-L3(applied this patch) cpu0: 363890 44582 cpu1: 373405 43109 cpu2: 340783 43797 cpu3: 333854 43409 cpu4: 327170 40038 cpu5: 325491 39922 cpu6: 319129 42391 cpu7: 306480 41035 cpu8: 161139 32188 cpu9: 164649 31024 cpu10: 149823 30398 cpu11: 149823 32455 cpu12: 164830 35143 cpu13: 172269 35805 cpu14: 179979 33898 cpu15: 194505 32754 avg: 268963.6 40129.8 The VM's topology is "1*socket 8*cores 2*threads". After present virtual L3 cache info for VM, the amounts of RES IPIs in guest reduce 85%. For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause severe performance degradation. We had tested the overall system performance if vcpus actually run on sparate physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%). Signed-off-by: Longpeng(Mike) Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/i386/pc.h | 9 +++++++++ target-i386/cpu.h | 6 ++++++ target-i386/cpu.c | 49 ++++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 59 insertions(+), 5 deletions(-) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 8ad6f15..ebba151 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -368,7 +368,16 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); int e820_get_num_entries(void); bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); +#define PC_COMPAT_2_8 \ + {\ + .driver = TYPE_X86_CPU,\ + .property = "l3-cache",\ + .value = "off",\ + }, + + #define PC_COMPAT_2_7 \ + PC_COMPAT_2_8 \ HW_COMPAT_2_7 #define PC_COMPAT_2_6 \ diff --git a/target-i386/cpu.h b/target-i386/cpu.h index cf14bcb..bb3ffda 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -1207,6 +1207,12 @@ struct X86CPU { */ bool enable_lmce; + /* Compatibility bits for old machine types. + * If true present virtual l3 cache for VM, the vcpus in the same virtual + * socket share an virtual l3 cache. + */ + bool enable_l3_cache; + /* Compatibility bits for old machine types: */ bool enable_cpuid_0xb; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index ec674dc..5a5299a 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -57,6 +57,7 @@ #define CPUID_2_L1D_32KB_8WAY_64B 0x2c #define CPUID_2_L1I_32KB_8WAY_64B 0x30 #define CPUID_2_L2_2MB_8WAY_64B 0x7d +#define CPUID_2_L3_16MB_16WAY_64B 0x4d /* CPUID Leaf 4 constants: */ @@ -131,11 +132,18 @@ #define L2_LINES_PER_TAG 1 #define L2_SIZE_KB_AMD 512 -/* No L3 cache: */ +/* Level 3 unified cache: */ #define L3_SIZE_KB 0 /* disabled */ #define L3_ASSOCIATIVITY 0 /* disabled */ #define L3_LINES_PER_TAG 0 /* disabled */ #define L3_LINE_SIZE 0 /* disabled */ +#define L3_N_LINE_SIZE 64 +#define L3_N_ASSOCIATIVITY 16 +#define L3_N_SETS 16384 +#define L3_N_PARTITIONS 1 +#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B +#define L3_N_LINES_PER_TAG 1 +#define L3_N_SIZE_KB_AMD 16384 /* TLB definitions: */ @@ -2279,6 +2287,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, { X86CPU *cpu = x86_env_get_cpu(env); CPUState *cs = CPU(cpu); + uint32_t pkg_offset; /* test if maximum index reached */ if (index & 0x80000000) { @@ -2332,7 +2341,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } *eax = 1; /* Number of CPUID[EAX=2] calls required */ *ebx = 0; - *ecx = 0; + if (!cpu->enable_l3_cache) { + *ecx = 0; + } else { + *ecx = L3_N_DESCRIPTOR; + } *edx = (L1D_DESCRIPTOR << 16) | \ (L1I_DESCRIPTOR << 8) | \ (L2_DESCRIPTOR); @@ -2378,6 +2391,25 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx = L2_SETS - 1; *edx = CPUID_4_NO_INVD_SHARING; break; + case 3: /* L3 cache info */ + if (!cpu->enable_l3_cache) { + *eax = 0; + *ebx = 0; + *ecx = 0; + *edx = 0; + break; + } + *eax |= CPUID_4_TYPE_UNIFIED | \ + CPUID_4_LEVEL(3) | \ + CPUID_4_SELF_INIT_LEVEL; + pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads); + *eax |= ((1 << pkg_offset) - 1) << 14; + *ebx = (L3_N_LINE_SIZE - 1) | \ + ((L3_N_PARTITIONS - 1) << 12) | \ + ((L3_N_ASSOCIATIVITY - 1) << 22); + *ecx = L3_N_SETS - 1; + *edx = CPUID_4_INCLUSIVE | CPUID_4_COMPLEX_IDX; + break; default: /* end of info */ *eax = 0; *ebx = 0; @@ -2589,9 +2621,15 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx = (L2_SIZE_KB_AMD << 16) | \ (AMD_ENC_ASSOC(L2_ASSOCIATIVITY) << 12) | \ (L2_LINES_PER_TAG << 8) | (L2_LINE_SIZE); - *edx = ((L3_SIZE_KB/512) << 18) | \ - (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ - (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); + if (!cpu->enable_l3_cache) { + *edx = ((L3_SIZE_KB / 512) << 18) | \ + (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ + (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); + } else { + *edx = ((L3_N_SIZE_KB_AMD / 512) << 18) | \ + (AMD_ENC_ASSOC(L3_N_ASSOCIATIVITY) << 12) | \ + (L3_N_LINES_PER_TAG << 8) | (L3_N_LINE_SIZE); + } break; case 0x80000007: *eax = 0; @@ -3368,6 +3406,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id), DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), + DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true), DEFINE_PROP_END_OF_LIST() };