Message ID | 1472782975-20056-1-git-send-email-longpeng2@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: longpeng > Sent: Friday, September 02, 2016 10:23 AM > To: ehabkost@redhat.com; rth@twiddle.net; pbonzini@redhat.com; > mst@redhat.com > Cc: Zhaoshenglong; Gonglei (Arei); Huangpeng (Peter); Herongguang (Stephen); > qemu-devel@nongnu.org; Longpeng(Mike) > Subject: [PATCH v3] target-i386: present virtual L3 cache info for vcpus > > From: "Longpeng(Mike)" <longpeng@huawei2.com> > A typo in email address, pls resend the v3. > Some software algorithms are based on the hardware's cache info, for > example, > for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will > trigger > a resched IPI and told cpu2 to do the wakeup if they don't share low level > cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc. > The relevant linux-kernel code as bellow: > > static void ttwu_queue(struct task_struct *p, int cpu) > { > struct rq *rq = cpu_rq(cpu); > ...... > if (... && !cpus_share_cache(smp_processor_id(), cpu)) { > ...... > ttwu_queue_remote(p, cpu); /* will trigger RES IPI */ > return; > } > ...... > ttwu_do_activate(rq, p, 0); /* access target's rq directly */ > ...... > } > > In real hardware, the cpus on the same socket share L3 cache, so one won't > trigger a resched IPIs when wakeup a task on others. But QEMU doesn't > present a > virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs > under some workloads even if the virtual cpus belongs to the same virtual > socket. > > For KVM, this degrades performance, because there will be lots of vmexit due > to > guest send IPIs. > > The workload is a SAP HANA's testsuite, we run it one round(about 40 > minuates) > and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering > during > the period: > > No-L3 With-L3(applied this patch) > cpu0: 363890 44582 > cpu1: 373405 43109 > cpu2: 340783 43797 > cpu3: 333854 43409 > cpu4: 327170 40038 > cpu5: 325491 39922 > cpu6: 319129 42391 > cpu7: 306480 41035 > cpu8: 161139 32188 > cpu9: 164649 31024 > cpu10: 149823 30398 > cpu11: 149823 32455 > cpu12: 164830 35143 > cpu13: 172269 35805 > cpu14: 179979 33898 > cpu15: 194505 32754 > avg: 268963.6 40129.8 > > The VM's topology is "1*socket 8*cores 2*threads". > After present virtual L3 cache info for VM, the amounts of RES IPIs in guest > reduce 85%. > > What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive. > We had tested the overall system performance if vcpus actually run on sparate > physical socket. With L3 cache, the performance improves > 7.2%~33.1%(avg:15.7%). > > Signed-off-by: Longpeng(Mike) <longpeng@huawei2.com> > Here as well. Regards, -Gonglei
On Fri, Sep 02, 2016 at 10:22:55AM +0800, Longpeng(Mike) wrote: > From: "Longpeng(Mike)" <longpeng@huawei2.com> > > Some software algorithms are based on the hardware's cache info, for example, > for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger > a resched IPI and told cpu2 to do the wakeup if they don't share low level > cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc. > The relevant linux-kernel code as bellow: > > static void ttwu_queue(struct task_struct *p, int cpu) > { > struct rq *rq = cpu_rq(cpu); > ...... > if (... && !cpus_share_cache(smp_processor_id(), cpu)) { > ...... > ttwu_queue_remote(p, cpu); /* will trigger RES IPI */ > return; > } > ...... > ttwu_do_activate(rq, p, 0); /* access target's rq directly */ > ...... > } > > In real hardware, the cpus on the same socket share L3 cache, so one won't > trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a > virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs > under some workloads even if the virtual cpus belongs to the same virtual socket. > > For KVM, this degrades performance, because there will be lots of vmexit due to > guest send IPIs. > > The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates) > and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during > the period: > > No-L3 With-L3(applied this patch) > cpu0: 363890 44582 > cpu1: 373405 43109 > cpu2: 340783 43797 > cpu3: 333854 43409 > cpu4: 327170 40038 > cpu5: 325491 39922 > cpu6: 319129 42391 > cpu7: 306480 41035 > cpu8: 161139 32188 > cpu9: 164649 31024 > cpu10: 149823 30398 > cpu11: 149823 32455 > cpu12: 164830 35143 > cpu13: 172269 35805 > cpu14: 179979 33898 > cpu15: 194505 32754 > avg: 268963.6 40129.8 > > The VM's topology is "1*socket 8*cores 2*threads". > After present virtual L3 cache info for VM, the amounts of RES IPIs in guest > reduce 85%. > > What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive. > We had tested the overall system performance if vcpus actually run on sparate > physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%). > > Signed-off-by: Longpeng(Mike) <longpeng@huawei2.com> For PC bits: Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > Changes since v2: > - add more useful commit mesage. > - rename "compat-cache" to "l3-cache-shared". > > Changes since v1: > - fix the compat problem: set compat_props on PC_COMPAT_2_7. > - fix a "intentionally introducde bug": make intel's and amd's consistently. > - fix the CPUID.(EAX=4, ECX=3):EAX[25:14]. > - test the performance if vcpus running on sparate sockets: with L3 cache, > the performance improves 7.2%~33.1%(avg: 15.7%). > --- > include/hw/i386/pc.h | 8 ++++++++ > target-i386/cpu.c | 49 ++++++++++++++++++++++++++++++++++++++++++++----- > target-i386/cpu.h | 5 +++++ > 3 files changed, 57 insertions(+), 5 deletions(-) > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h > index 74c175c..c92c54e 100644 > --- a/include/hw/i386/pc.h > +++ b/include/hw/i386/pc.h > @@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); > int e820_get_num_entries(void); > bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); > > +#define PC_COMPAT_2_7 \ > + {\ > + .driver = TYPE_X86_CPU,\ > + .property = "l3-cache-shared",\ > + .value = "off",\ > + }, > + > #define PC_COMPAT_2_6 \ > + PC_COMPAT_2_7 \ > HW_COMPAT_2_6 \ > {\ > .driver = "fw_cfg_io",\ > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > index 6a1afab..4f93922 100644 > --- a/target-i386/cpu.c > +++ b/target-i386/cpu.c > @@ -57,6 +57,7 @@ > #define CPUID_2_L1D_32KB_8WAY_64B 0x2c > #define CPUID_2_L1I_32KB_8WAY_64B 0x30 > #define CPUID_2_L2_2MB_8WAY_64B 0x7d > +#define CPUID_2_L3_16MB_16WAY_64B 0x4d > > > /* CPUID Leaf 4 constants: */ > @@ -131,11 +132,18 @@ > #define L2_LINES_PER_TAG 1 > #define L2_SIZE_KB_AMD 512 > > -/* No L3 cache: */ > +/* Level 3 unified cache: */ > #define L3_SIZE_KB 0 /* disabled */ > #define L3_ASSOCIATIVITY 0 /* disabled */ > #define L3_LINES_PER_TAG 0 /* disabled */ > #define L3_LINE_SIZE 0 /* disabled */ > +#define L3_N_LINE_SIZE 64 > +#define L3_N_ASSOCIATIVITY 16 > +#define L3_N_SETS 16384 > +#define L3_N_PARTITIONS 1 > +#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B > +#define L3_N_LINES_PER_TAG 1 > +#define L3_N_SIZE_KB_AMD 16384 > > /* TLB definitions: */ > > @@ -2275,6 +2283,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > { > X86CPU *cpu = x86_env_get_cpu(env); > CPUState *cs = CPU(cpu); > + uint32_t pkg_offset; > > /* test if maximum index reached */ > if (index & 0x80000000) { > @@ -2328,7 +2337,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > } > *eax = 1; /* Number of CPUID[EAX=2] calls required */ > *ebx = 0; > - *ecx = 0; > + if (!cpu->enable_l3_cache_shared) { > + *ecx = 0; > + } else { > + *ecx = L3_N_DESCRIPTOR; > + } > *edx = (L1D_DESCRIPTOR << 16) | \ > (L1I_DESCRIPTOR << 8) | \ > (L2_DESCRIPTOR); > @@ -2374,6 +2387,25 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > *ecx = L2_SETS - 1; > *edx = CPUID_4_NO_INVD_SHARING; > break; > + case 3: /* L3 cache info */ > + if (!cpu->enable_l3_cache_shared) { > + *eax = 0; > + *ebx = 0; > + *ecx = 0; > + *edx = 0; > + break; > + } > + *eax |= CPUID_4_TYPE_UNIFIED | \ > + CPUID_4_LEVEL(3) | \ > + CPUID_4_SELF_INIT_LEVEL; > + pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads); > + *eax |= ((1 << pkg_offset) - 1) << 14; > + *ebx = (L3_N_LINE_SIZE - 1) | \ > + ((L3_N_PARTITIONS - 1) << 12) | \ > + ((L3_N_ASSOCIATIVITY - 1) << 22); > + *ecx = L3_N_SETS - 1; > + *edx = CPUID_4_INCLUSIVE | CPUID_4_COMPLEX_IDX; > + break; > default: /* end of info */ > *eax = 0; > *ebx = 0; > @@ -2585,9 +2617,15 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > *ecx = (L2_SIZE_KB_AMD << 16) | \ > (AMD_ENC_ASSOC(L2_ASSOCIATIVITY) << 12) | \ > (L2_LINES_PER_TAG << 8) | (L2_LINE_SIZE); > - *edx = ((L3_SIZE_KB/512) << 18) | \ > - (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ > - (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); > + if (!cpu->enable_l3_cache_shared) { > + *edx = ((L3_SIZE_KB / 512) << 18) | \ > + (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ > + (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); > + } else { > + *edx = ((L3_N_SIZE_KB_AMD / 512) << 18) | \ > + (AMD_ENC_ASSOC(L3_N_ASSOCIATIVITY) << 12) | \ > + (L3_N_LINES_PER_TAG << 8) | (L3_N_LINE_SIZE); > + } > break; > case 0x80000007: > *eax = 0; > @@ -3364,6 +3402,7 @@ static Property x86_cpu_properties[] = { > DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id), > DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), > DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), > + DEFINE_PROP_BOOL("l3-cache-shared", X86CPU, enable_l3_cache_shared, true), > DEFINE_PROP_END_OF_LIST() > }; > > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index 65615c0..355bf47 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -1202,6 +1202,11 @@ struct X86CPU { > */ > bool enable_lmce; > > + /* Compatibility bits for old machine types. > + * If true present virtual l3 cache for VM. "pretend that all CPUs share an l3 cache"? > + */ > + bool enable_l3_cache_shared; > + > /* Compatibility bits for old machine types: */ > bool enable_cpuid_0xb; > > -- > 1.8.3.1 >
Hi Michael, On 2016/9/3 6:52, Michael S. Tsirkin wrote: > On Fri, Sep 02, 2016 at 10:22:55AM +0800, Longpeng(Mike) wrote: >> From: "Longpeng(Mike)" <longpeng@huawei2.com> >> >> Some software algorithms are based on the hardware's cache info, for example, >> for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger >> a resched IPI and told cpu2 to do the wakeup if they don't share low level >> cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc. >> The relevant linux-kernel code as bellow: >> >> static void ttwu_queue(struct task_struct *p, int cpu) >> { >> struct rq *rq = cpu_rq(cpu); >> ...... >> if (... && !cpus_share_cache(smp_processor_id(), cpu)) { >> ...... >> ttwu_queue_remote(p, cpu); /* will trigger RES IPI */ >> return; >> } >> ...... >> ttwu_do_activate(rq, p, 0); /* access target's rq directly */ >> ...... >> } >> >> In real hardware, the cpus on the same socket share L3 cache, so one won't >> trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a >> virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs >> under some workloads even if the virtual cpus belongs to the same virtual socket. >> >> For KVM, this degrades performance, because there will be lots of vmexit due to >> guest send IPIs. >> >> The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates) >> and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during >> the period: >> >> No-L3 With-L3(applied this patch) >> cpu0: 363890 44582 >> cpu1: 373405 43109 >> cpu2: 340783 43797 >> cpu3: 333854 43409 >> cpu4: 327170 40038 >> cpu5: 325491 39922 >> cpu6: 319129 42391 >> cpu7: 306480 41035 >> cpu8: 161139 32188 >> cpu9: 164649 31024 >> cpu10: 149823 30398 >> cpu11: 149823 32455 >> cpu12: 164830 35143 >> cpu13: 172269 35805 >> cpu14: 179979 33898 >> cpu15: 194505 32754 >> avg: 268963.6 40129.8 >> >> The VM's topology is "1*socket 8*cores 2*threads". >> After present virtual L3 cache info for VM, the amounts of RES IPIs in guest >> reduce 85%. >> >> What's more, for KVM, vcpus send IPIs will cause vmexit which is expensive. >> We had tested the overall system performance if vcpus actually run on sparate >> physical socket. With L3 cache, the performance improves 7.2%~33.1%(avg:15.7%). >> >> Signed-off-by: Longpeng(Mike) <longpeng@huawei2.com> > > For PC bits: > Acked-by: Michael S. Tsirkin <mst@redhat.com> Thanks! > > >> --- >> Changes since v2: >> - add more useful commit mesage. >> - rename "compat-cache" to "l3-cache-shared". >> >> Changes since v1: >> - fix the compat problem: set compat_props on PC_COMPAT_2_7. >> - fix a "intentionally introducde bug": make intel's and amd's consistently. >> - fix the CPUID.(EAX=4, ECX=3):EAX[25:14]. >> - test the performance if vcpus running on sparate sockets: with L3 cache, >> the performance improves 7.2%~33.1%(avg: 15.7%). >> --- >> include/hw/i386/pc.h | 8 ++++++++ >> target-i386/cpu.c | 49 ++++++++++++++++++++++++++++++++++++++++++++----- >> target-i386/cpu.h | 5 +++++ >> 3 files changed, 57 insertions(+), 5 deletions(-) >> >> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h >> index 74c175c..c92c54e 100644 >> --- a/include/hw/i386/pc.h >> +++ b/include/hw/i386/pc.h >> @@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); >> int e820_get_num_entries(void); >> bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); >> >> +#define PC_COMPAT_2_7 \ >> + {\ >> + .driver = TYPE_X86_CPU,\ >> + .property = "l3-cache-shared",\ >> + .value = "off",\ >> + }, >> + >> #define PC_COMPAT_2_6 \ >> + PC_COMPAT_2_7 \ >> HW_COMPAT_2_6 \ >> {\ >> .driver = "fw_cfg_io",\ >> diff --git a/target-i386/cpu.c b/target-i386/cpu.c >> index 6a1afab..4f93922 100644 >> --- a/target-i386/cpu.c >> +++ b/target-i386/cpu.c >> @@ -57,6 +57,7 @@ >> #define CPUID_2_L1D_32KB_8WAY_64B 0x2c >> #define CPUID_2_L1I_32KB_8WAY_64B 0x30 >> #define CPUID_2_L2_2MB_8WAY_64B 0x7d >> +#define CPUID_2_L3_16MB_16WAY_64B 0x4d >> >> >> /* CPUID Leaf 4 constants: */ >> @@ -131,11 +132,18 @@ >> #define L2_LINES_PER_TAG 1 >> #define L2_SIZE_KB_AMD 512 >> >> -/* No L3 cache: */ >> +/* Level 3 unified cache: */ >> #define L3_SIZE_KB 0 /* disabled */ >> #define L3_ASSOCIATIVITY 0 /* disabled */ >> #define L3_LINES_PER_TAG 0 /* disabled */ >> #define L3_LINE_SIZE 0 /* disabled */ >> +#define L3_N_LINE_SIZE 64 >> +#define L3_N_ASSOCIATIVITY 16 >> +#define L3_N_SETS 16384 >> +#define L3_N_PARTITIONS 1 >> +#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B >> +#define L3_N_LINES_PER_TAG 1 >> +#define L3_N_SIZE_KB_AMD 16384 >> >> /* TLB definitions: */ >> >> @@ -2275,6 +2283,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, >> { >> X86CPU *cpu = x86_env_get_cpu(env); >> CPUState *cs = CPU(cpu); >> + uint32_t pkg_offset; >> >> /* test if maximum index reached */ >> if (index & 0x80000000) { >> @@ -2328,7 +2337,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, >> } >> *eax = 1; /* Number of CPUID[EAX=2] calls required */ >> *ebx = 0; >> - *ecx = 0; >> + if (!cpu->enable_l3_cache_shared) { >> + *ecx = 0; >> + } else { >> + *ecx = L3_N_DESCRIPTOR; >> + } >> *edx = (L1D_DESCRIPTOR << 16) | \ >> (L1I_DESCRIPTOR << 8) | \ >> (L2_DESCRIPTOR); >> @@ -2374,6 +2387,25 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, >> *ecx = L2_SETS - 1; >> *edx = CPUID_4_NO_INVD_SHARING; >> break; >> + case 3: /* L3 cache info */ >> + if (!cpu->enable_l3_cache_shared) { >> + *eax = 0; >> + *ebx = 0; >> + *ecx = 0; >> + *edx = 0; >> + break; >> + } >> + *eax |= CPUID_4_TYPE_UNIFIED | \ >> + CPUID_4_LEVEL(3) | \ >> + CPUID_4_SELF_INIT_LEVEL; >> + pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads); >> + *eax |= ((1 << pkg_offset) - 1) << 14; >> + *ebx = (L3_N_LINE_SIZE - 1) | \ >> + ((L3_N_PARTITIONS - 1) << 12) | \ >> + ((L3_N_ASSOCIATIVITY - 1) << 22); >> + *ecx = L3_N_SETS - 1; >> + *edx = CPUID_4_INCLUSIVE | CPUID_4_COMPLEX_IDX; >> + break; >> default: /* end of info */ >> *eax = 0; >> *ebx = 0; >> @@ -2585,9 +2617,15 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, >> *ecx = (L2_SIZE_KB_AMD << 16) | \ >> (AMD_ENC_ASSOC(L2_ASSOCIATIVITY) << 12) | \ >> (L2_LINES_PER_TAG << 8) | (L2_LINE_SIZE); >> - *edx = ((L3_SIZE_KB/512) << 18) | \ >> - (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ >> - (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); >> + if (!cpu->enable_l3_cache_shared) { >> + *edx = ((L3_SIZE_KB / 512) << 18) | \ >> + (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ >> + (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); >> + } else { >> + *edx = ((L3_N_SIZE_KB_AMD / 512) << 18) | \ >> + (AMD_ENC_ASSOC(L3_N_ASSOCIATIVITY) << 12) | \ >> + (L3_N_LINES_PER_TAG << 8) | (L3_N_LINE_SIZE); >> + } >> break; >> case 0x80000007: >> *eax = 0; >> @@ -3364,6 +3402,7 @@ static Property x86_cpu_properties[] = { >> DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id), >> DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), >> DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), >> + DEFINE_PROP_BOOL("l3-cache-shared", X86CPU, enable_l3_cache_shared, true), >> DEFINE_PROP_END_OF_LIST() >> }; >> >> diff --git a/target-i386/cpu.h b/target-i386/cpu.h >> index 65615c0..355bf47 100644 >> --- a/target-i386/cpu.h >> +++ b/target-i386/cpu.h >> @@ -1202,6 +1202,11 @@ struct X86CPU { >> */ >> bool enable_lmce; >> >> + /* Compatibility bits for old machine types. >> + * If true present virtual l3 cache for VM. > > "pretend that all CPUs share an l3 cache"? > The vcpus in the same virtual-socket share an virtual l3 cache. I will make it more clearly later. The 2.7 was released, so I will modify this patch for 2.8 later. > >> + */ >> + bool enable_l3_cache_shared; >> + >> /* Compatibility bits for old machine types: */ >> bool enable_cpuid_0xb; >> >> -- >> 1.8.3.1 >> > > . >
On Fri, Sep 02, 2016 at 10:22:55AM +0800, Longpeng(Mike) wrote: [...] > --- > Changes since v2: > - add more useful commit mesage. > - rename "compat-cache" to "l3-cache-shared". What exactly "shared" means here? All the property does is to enable/disable the L3 cache, as its own description says: > > + /* Compatibility bits for old machine types. > + * If true present virtual l3 cache for VM. > + */ > + bool enable_l3_cache_shared; > + Why not just "l3-cache" or "l3-cache-enabled"?
Hi Eduardo, On 2016/9/6 2:53, Eduardo Habkost wrote: > On Fri, Sep 02, 2016 at 10:22:55AM +0800, Longpeng(Mike) wrote: > [...] >> --- >> Changes since v2: >> - add more useful commit mesage. >> - rename "compat-cache" to "l3-cache-shared". > > What exactly "shared" means here? All the property does is to > enable/disable the L3 cache, as its own description says: > >> >> + /* Compatibility bits for old machine types. >> + * If true present virtual l3 cache for VM. >> + */ >> + bool enable_l3_cache_shared; >> + > > Why not just "l3-cache" or "l3-cache-enabled"? > I wanted to fix l1/l2's inconsistent bugs together originally, so I named it "compat-cache". But later I thought it's too ugly to adding much l1/l2's compatible macros, so I given up this idea. Instead, rename it to "l3-cache-shared". Thanks for your good suggestion. I will choose "l3-cache-enabled" in the v5.
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 74c175c..c92c54e 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -367,7 +367,15 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t); int e820_get_num_entries(void); bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *); +#define PC_COMPAT_2_7 \ + {\ + .driver = TYPE_X86_CPU,\ + .property = "l3-cache-shared",\ + .value = "off",\ + }, + #define PC_COMPAT_2_6 \ + PC_COMPAT_2_7 \ HW_COMPAT_2_6 \ {\ .driver = "fw_cfg_io",\ diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 6a1afab..4f93922 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -57,6 +57,7 @@ #define CPUID_2_L1D_32KB_8WAY_64B 0x2c #define CPUID_2_L1I_32KB_8WAY_64B 0x30 #define CPUID_2_L2_2MB_8WAY_64B 0x7d +#define CPUID_2_L3_16MB_16WAY_64B 0x4d /* CPUID Leaf 4 constants: */ @@ -131,11 +132,18 @@ #define L2_LINES_PER_TAG 1 #define L2_SIZE_KB_AMD 512 -/* No L3 cache: */ +/* Level 3 unified cache: */ #define L3_SIZE_KB 0 /* disabled */ #define L3_ASSOCIATIVITY 0 /* disabled */ #define L3_LINES_PER_TAG 0 /* disabled */ #define L3_LINE_SIZE 0 /* disabled */ +#define L3_N_LINE_SIZE 64 +#define L3_N_ASSOCIATIVITY 16 +#define L3_N_SETS 16384 +#define L3_N_PARTITIONS 1 +#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B +#define L3_N_LINES_PER_TAG 1 +#define L3_N_SIZE_KB_AMD 16384 /* TLB definitions: */ @@ -2275,6 +2283,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, { X86CPU *cpu = x86_env_get_cpu(env); CPUState *cs = CPU(cpu); + uint32_t pkg_offset; /* test if maximum index reached */ if (index & 0x80000000) { @@ -2328,7 +2337,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } *eax = 1; /* Number of CPUID[EAX=2] calls required */ *ebx = 0; - *ecx = 0; + if (!cpu->enable_l3_cache_shared) { + *ecx = 0; + } else { + *ecx = L3_N_DESCRIPTOR; + } *edx = (L1D_DESCRIPTOR << 16) | \ (L1I_DESCRIPTOR << 8) | \ (L2_DESCRIPTOR); @@ -2374,6 +2387,25 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx = L2_SETS - 1; *edx = CPUID_4_NO_INVD_SHARING; break; + case 3: /* L3 cache info */ + if (!cpu->enable_l3_cache_shared) { + *eax = 0; + *ebx = 0; + *ecx = 0; + *edx = 0; + break; + } + *eax |= CPUID_4_TYPE_UNIFIED | \ + CPUID_4_LEVEL(3) | \ + CPUID_4_SELF_INIT_LEVEL; + pkg_offset = apicid_pkg_offset(cs->nr_cores, cs->nr_threads); + *eax |= ((1 << pkg_offset) - 1) << 14; + *ebx = (L3_N_LINE_SIZE - 1) | \ + ((L3_N_PARTITIONS - 1) << 12) | \ + ((L3_N_ASSOCIATIVITY - 1) << 22); + *ecx = L3_N_SETS - 1; + *edx = CPUID_4_INCLUSIVE | CPUID_4_COMPLEX_IDX; + break; default: /* end of info */ *eax = 0; *ebx = 0; @@ -2585,9 +2617,15 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ecx = (L2_SIZE_KB_AMD << 16) | \ (AMD_ENC_ASSOC(L2_ASSOCIATIVITY) << 12) | \ (L2_LINES_PER_TAG << 8) | (L2_LINE_SIZE); - *edx = ((L3_SIZE_KB/512) << 18) | \ - (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ - (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); + if (!cpu->enable_l3_cache_shared) { + *edx = ((L3_SIZE_KB / 512) << 18) | \ + (AMD_ENC_ASSOC(L3_ASSOCIATIVITY) << 12) | \ + (L3_LINES_PER_TAG << 8) | (L3_LINE_SIZE); + } else { + *edx = ((L3_N_SIZE_KB_AMD / 512) << 18) | \ + (AMD_ENC_ASSOC(L3_N_ASSOCIATIVITY) << 12) | \ + (L3_N_LINES_PER_TAG << 8) | (L3_N_LINE_SIZE); + } break; case 0x80000007: *eax = 0; @@ -3364,6 +3402,7 @@ static Property x86_cpu_properties[] = { DEFINE_PROP_STRING("hv-vendor-id", X86CPU, hyperv_vendor_id), DEFINE_PROP_BOOL("cpuid-0xb", X86CPU, enable_cpuid_0xb, true), DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false), + DEFINE_PROP_BOOL("l3-cache-shared", X86CPU, enable_l3_cache_shared, true), DEFINE_PROP_END_OF_LIST() }; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 65615c0..355bf47 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -1202,6 +1202,11 @@ struct X86CPU { */ bool enable_lmce; + /* Compatibility bits for old machine types. + * If true present virtual l3 cache for VM. + */ + bool enable_l3_cache_shared; + /* Compatibility bits for old machine types: */ bool enable_cpuid_0xb;