diff mbox series

[v3,04/17] i386/cpu: Fix i/d-cache topology to core level for Intel CPU

Message ID 20230801103527.397756-5-zhao1.liu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Support smp.clusters for x86 | expand

Commit Message

Zhao Liu Aug. 1, 2023, 10:35 a.m. UTC
From: Zhao Liu <zhao1.liu@intel.com>

For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
both 0, and this means i-cache and d-cache are shared in the SMT level.
This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.

For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
Information for cpuid 0x8000001D") has already introduced i/d cache
topology as core level by default.

Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.

This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.

Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
Changes since v1:
 * Split this fix from the patch named "i386/cpu: Fix number of
   addressable IDs in CPUID.04H".
 * Add the explanation of the impact on performance. (Xiaoyao)
---
 target/i386/cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Xiaoyao Li Aug. 4, 2023, 9:56 a.m. UTC | #1
On 8/1/2023 6:35 PM, Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
> CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
> both 0, 

This sounds like you are describing some architectural rules, which 
misleads me. I suggest changing the description to

For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs 
sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 
25:14]) to 0. ...

> and this means i-cache and d-cache are shared in the SMT level.
> This is correct if there's single thread per core, but is wrong for the
> hyper threading case (one core contains multiple threads) since the
> i-cache and d-cache are shared in the core level other than SMT level.
> 
> For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
> Information for cpuid 0x8000001D") has already introduced i/d cache
> topology as core level by default.
> 
> Therefore, in order to be compatible with both multi-threaded and
> single-threaded situations, we should set i-cache and d-cache be shared
> at the core level by default.
> 
> This fix changes the default i/d cache topology from per-thread to
> per-core. Potentially, this change in L1 cache topology may affect the
> performance of the VM if the user does not specifically specify the
> topology or bind the vCPU. However, the way to achieve optimal
> performance should be to create a reasonable topology and set the
> appropriate vCPU affinity without relying on QEMU's default topology
> structure.
> 
> Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
> Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
> Changes since v1:
>   * Split this fix from the patch named "i386/cpu: Fix number of
>     addressable IDs in CPUID.04H".
>   * Add the explanation of the impact on performance. (Xiaoyao)
> ---
>   target/i386/cpu.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 50613cd04612..b439a05244ee 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6104,12 +6104,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>               switch (count) {
>               case 0: /* L1 dcache info */
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
> -                                    1, cs->nr_cores,
> +                                    cs->nr_threads, cs->nr_cores,
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 1: /* L1 icache info */
>                   encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
> -                                    1, cs->nr_cores,
> +                                    cs->nr_threads, cs->nr_cores,
>                                       eax, ebx, ecx, edx);
>                   break;
>               case 2: /* L2 cache info */
Zhao Liu Aug. 4, 2023, 12:43 p.m. UTC | #2
Hi Xiaoyao,

On Fri, Aug 04, 2023 at 05:56:47PM +0800, Xiaoyao Li wrote:
> Date: Fri, 4 Aug 2023 17:56:47 +0800
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> Subject: Re: [PATCH v3 04/17] i386/cpu: Fix i/d-cache topology to core
>  level for Intel CPU
> 
> On 8/1/2023 6:35 PM, Zhao Liu wrote:
> > From: Zhao Liu <zhao1.liu@intel.com>
> > 
> > For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
> > CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
> > both 0,
> 
> This sounds like you are describing some architectural rules, which misleads
> me. I suggest changing the description to
> 
> For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs
> sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits
> 25:14]) to 0. ...

Yeah, it's clearer. Will use your description. Thanks!

> 
> > and this means i-cache and d-cache are shared in the SMT level.
> > This is correct if there's single thread per core, but is wrong for the
> > hyper threading case (one core contains multiple threads) since the
> > i-cache and d-cache are shared in the core level other than SMT level.
> > 
> > For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache
> > Information for cpuid 0x8000001D") has already introduced i/d cache
> > topology as core level by default.
> > 
> > Therefore, in order to be compatible with both multi-threaded and
> > single-threaded situations, we should set i-cache and d-cache be shared
> > at the core level by default.
> > 
> > This fix changes the default i/d cache topology from per-thread to
> > per-core. Potentially, this change in L1 cache topology may affect the
> > performance of the VM if the user does not specifically specify the
> > topology or bind the vCPU. However, the way to achieve optimal
> > performance should be to create a reasonable topology and set the
> > appropriate vCPU affinity without relying on QEMU's default topology
> > structure.
> > 
> > Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently")
> > Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
> > Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> 
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

Thanks!

-Zhao

> 
> > ---
> > Changes since v1:
> >   * Split this fix from the patch named "i386/cpu: Fix number of
> >     addressable IDs in CPUID.04H".
> >   * Add the explanation of the impact on performance. (Xiaoyao)
> > ---
> >   target/i386/cpu.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 50613cd04612..b439a05244ee 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -6104,12 +6104,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >               switch (count) {
> >               case 0: /* L1 dcache info */
> >                   encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
> > -                                    1, cs->nr_cores,
> > +                                    cs->nr_threads, cs->nr_cores,
> >                                       eax, ebx, ecx, edx);
> >                   break;
> >               case 1: /* L1 icache info */
> >                   encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
> > -                                    1, cs->nr_cores,
> > +                                    cs->nr_threads, cs->nr_cores,
> >                                       eax, ebx, ecx, edx);
> >                   break;
> >               case 2: /* L2 cache info */
>
diff mbox series

Patch

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 50613cd04612..b439a05244ee 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6104,12 +6104,12 @@  void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             switch (count) {
             case 0: /* L1 dcache info */
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
-                                    1, cs->nr_cores,
+                                    cs->nr_threads, cs->nr_cores,
                                     eax, ebx, ecx, edx);
                 break;
             case 1: /* L1 icache info */
                 encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
-                                    1, cs->nr_cores,
+                                    cs->nr_threads, cs->nr_cores,
                                     eax, ebx, ecx, edx);
                 break;
             case 2: /* L2 cache info */