diff mbox series

[v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package

Message ID 20240918131815.8543-1-xuchuangxclwt@bytedance.com (mailing list archive)
State New, archived
Headers show
Series [v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package | expand

Commit Message

Chuang Xu Sept. 18, 2024, 1:18 p.m. UTC
When QEMU is started with:
-cpu host,migratable=on,host-cache-info=on,l3-cache=off
-smp 180,sockets=2,dies=1,cores=45,threads=2

Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
CPUID.04H.EAX[31:26] as expected.

As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
integer too. Otherwise we may encounter unexpected results in guest.

For example, when QEMU is started with CLI above and xtopology is disabled,
guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
as the result, even though theads-per-core should actually be 2.

So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
to solve the unexpected result.

Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Igor Mammedov Sept. 19, 2024, 7:06 a.m. UTC | #1
On Wed, 18 Sep 2024 21:18:15 +0800
Chuang Xu <xuchuangxclwt@bytedance.com> wrote:

> When QEMU is started with:
> -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> -smp 180,sockets=2,dies=1,cores=45,threads=2
> 
> Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> CPUID.04H.EAX[31:26] as expected.
> 
> As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> integer too. Otherwise we may encounter unexpected results in guest.
> 
> For example, when QEMU is started with CLI above and xtopology is disabled,
> guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
> calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
> as the result, even though theads-per-core should actually be 2.
> 
> So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> to solve the unexpected result.
> 
> Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
> Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
> Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  target/i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 4c2e6f3a71..3710ae5283 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>          }
>          *edx = env->features[FEAT_1_EDX];
>          if (threads_per_pkg > 1) {
> -            *ebx |= threads_per_pkg << 16;
> +            *ebx |= pow2ceil(threads_per_pkg) << 16;
>              *edx |= CPUID_HT;
>          }
>          if (!cpu->enable_pmu) {
Zhao Liu Sept. 19, 2024, 6:29 p.m. UTC | #2
Hi Chuang and Igor,

Sorry for late reply,

On Wed, Sep 18, 2024 at 09:18:15PM +0800, Chuang Xu wrote:
> Date: Wed, 18 Sep 2024 21:18:15 +0800
> From: Chuang Xu <xuchuangxclwt@bytedance.com>
> Subject: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical
>  processors in the physical package
> X-Mailer: git-send-email 2.24.3 (Apple Git-128)
> 
> When QEMU is started with:
> -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> -smp 180,sockets=2,dies=1,cores=45,threads=2
> 
> Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> CPUID.04H.EAX[31:26] as expected.
> 
> As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> integer too. Otherwise we may encounter unexpected results in guest.
> 
> For example, when QEMU is started with CLI above and xtopology is disabled,
> guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
> calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
> as the result, even though theads-per-core should actually be 2.
> 
> So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> to solve the unexpected result.
> 
> Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
> Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
> Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
> ---
>  target/i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 4c2e6f3a71..3710ae5283 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>          }
>          *edx = env->features[FEAT_1_EDX];
>          if (threads_per_pkg > 1) {
> -            *ebx |= threads_per_pkg << 16;
> +            *ebx |= pow2ceil(threads_per_pkg) << 16;

Yes, the fix is right.

About the "Maximum number of addressable IDs", the commit 88dd4ca06c83
("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]")
introduced the new way to calculate.

The pow2ceil() works for current SMP topology, but may be wrong on
hybrid topology, as the reason I listed in the commit message:

> The nearest power-of-2 integer can be calculated by pow2ceil() or by
> using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).

> But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> are associated with APIC ID. For example, in linux kernel, the field
> "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
> another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> matched with actual core numbers and it's calculated by:
> "(1 << (pkg_offset - core_offset)) - 1".

Using APIC ID offset to calculate is the hardware's approach, so I tried
to use APIC ID instead of pow2ceil() and replaced all pow2ceil() case.

Hi Igor, do you agree? :-)

Best Regards,
Zhao
Igor Mammedov Sept. 20, 2024, 11:08 a.m. UTC | #3
On Fri, 20 Sep 2024 02:29:46 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> Hi Chuang and Igor,
> 
> Sorry for late reply,
> 
> On Wed, Sep 18, 2024 at 09:18:15PM +0800, Chuang Xu wrote:
> > Date: Wed, 18 Sep 2024 21:18:15 +0800
> > From: Chuang Xu <xuchuangxclwt@bytedance.com>
> > Subject: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical
> >  processors in the physical package
> > X-Mailer: git-send-email 2.24.3 (Apple Git-128)
> > 
> > When QEMU is started with:
> > -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> > -smp 180,sockets=2,dies=1,cores=45,threads=2
> > 
> > Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> > CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> > execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> > CPUID.04H.EAX[31:26] as expected.
> > 
> > As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> > we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> > integer too. Otherwise we may encounter unexpected results in guest.
> > 
> > For example, when QEMU is started with CLI above and xtopology is disabled,
> > guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
> > calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
> > as the result, even though theads-per-core should actually be 2.
> > 
> > So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> > to solve the unexpected result.
> > 
> > Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
> > Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
> > Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
> > ---
> >  target/i386/cpu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 4c2e6f3a71..3710ae5283 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> >          }
> >          *edx = env->features[FEAT_1_EDX];
> >          if (threads_per_pkg > 1) {
> > -            *ebx |= threads_per_pkg << 16;
> > +            *ebx |= pow2ceil(threads_per_pkg) << 16;  
> 
> Yes, the fix is right.
> 
> About the "Maximum number of addressable IDs", the commit 88dd4ca06c83
> ("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]")
> introduced the new way to calculate.
> 
> The pow2ceil() works for current SMP topology, but may be wrong on
> hybrid topology, as the reason I listed in the commit message:
> 
> > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).  
> 
> > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > are associated with APIC ID. For example, in linux kernel, the field
> > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
> > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > matched with actual core numbers and it's calculated by:
> > "(1 << (pkg_offset - core_offset)) - 1".  
> 
> Using APIC ID offset to calculate is the hardware's approach, so I tried
> to use APIC ID instead of pow2ceil() and replaced all pow2ceil() case.

Well, hybrid case needs some more explanation then.

'pow2ceil(threads_per_pkg) << 16' - does exactly what SDM says for CPUID.01H.EBX[23:16]

Can you point to a spec that confirms that above is wrong and
explain in more details how hybrid case is supposed to work
and where it's documented?
 


> 
> Hi Igor, do you agree? :-)
> 
> Best Regards,
> Zhao
>
Zhao Liu Sept. 25, 2024, 8:49 a.m. UTC | #4
On Fri, Sep 20, 2024 at 01:08:27PM +0200, Igor Mammedov wrote:
> Date: Fri, 20 Sep 2024 13:08:27 +0200
> From: Igor Mammedov <imammedo@redhat.com>
> Subject: Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for
>  logical processors in the physical package
> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu)
> 
> On Fri, 20 Sep 2024 02:29:46 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
> 
> > Hi Chuang and Igor,
> > 
> > Sorry for late reply,
> > 
> > On Wed, Sep 18, 2024 at 09:18:15PM +0800, Chuang Xu wrote:
> > > Date: Wed, 18 Sep 2024 21:18:15 +0800
> > > From: Chuang Xu <xuchuangxclwt@bytedance.com>
> > > Subject: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical
> > >  processors in the physical package
> > > X-Mailer: git-send-email 2.24.3 (Apple Git-128)
> > > 
> > > When QEMU is started with:
> > > -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> > > -smp 180,sockets=2,dies=1,cores=45,threads=2
> > > 
> > > Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> > > CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> > > execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> > > CPUID.04H.EAX[31:26] as expected.
> > > 
> > > As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> > > we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> > > integer too. Otherwise we may encounter unexpected results in guest.
> > > 
> > > For example, when QEMU is started with CLI above and xtopology is disabled,
> > > guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
> > > calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
> > > as the result, even though theads-per-core should actually be 2.
> > > 
> > > So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> > > to solve the unexpected result.
> > > 
> > > Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
> > > Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
> > > Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
> > > ---
> > >  target/i386/cpu.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 4c2e6f3a71..3710ae5283 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
> > >          }
> > >          *edx = env->features[FEAT_1_EDX];
> > >          if (threads_per_pkg > 1) {
> > > -            *ebx |= threads_per_pkg << 16;
> > > +            *ebx |= pow2ceil(threads_per_pkg) << 16;  
> > 
> > Yes, the fix is right.
> > 
> > About the "Maximum number of addressable IDs", the commit 88dd4ca06c83
> > ("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]")
> > introduced the new way to calculate.
> > 
> > The pow2ceil() works for current SMP topology, but may be wrong on
> > hybrid topology, as the reason I listed in the commit message:
> > 
> > > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > > using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).  
> > 
> > > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > > are associated with APIC ID. For example, in linux kernel, the field
> > > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
> > > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > > matched with actual core numbers and it's calculated by:
> > > "(1 << (pkg_offset - core_offset)) - 1".  
> > 
> > Using APIC ID offset to calculate is the hardware's approach, so I tried
> > to use APIC ID instead of pow2ceil() and replaced all pow2ceil() case.
> 
> Well, hybrid case needs some more explanation then.
> 
> 'pow2ceil(threads_per_pkg) << 16' - does exactly what SDM says for CPUID.01H.EBX[23:16]
> 
> Can you point to a spec that confirms that above is wrong and
> explain in more details how hybrid case is supposed to work
> and where it's documented?
>  

This is mainly about the meaning of "addressable ID". There's a spec
"Intel 64 Architecture Processor Topology Enumeration" [1].

In the section 1.5.3, it mentions the "addressable ID" of CPUID.01H.
EBX[23:16] indicates the "initial APIC ID". And for CPUID.04H:EAX[bits
31:26], that field is "core_id", which is the part of initial APIC ID.

Therefore, "maximum number of addressable IDs" indicates the maximum
number of IDs that each subfield of the APIC ID (we create based on the
smp topology) can hold, which is related to width.

For example, if the width of the APIC sub ID for the thread level is 2,
then it will hold a maximum of 4 sub IDs. For the smp system, the
sub-topology fields of the APIC IDs are filled, so pow2ceil() is always
valid.

For hybrid, there will be holes in the sub fields, for example, 4 E
cores form a module and a P core is exclusive to a module, for the core
field of the E core's APIC ID, the width of the core field is at least 2
to accommodate the IDs of the 4 cores, while the P core, though only 1
bit width is needed, still uses the same width as the E core in order to
align with the E core. In this case, pow2ceil() fails.

And the example I mentioned before ("on Alder Lake P, the CPUID.04H:EAX[
bits 31:26] is not matched with actual core numbers and it's calculated
by: '(1 << (pkg_offset - core_offset)) - 1'."), has the same reason,
that's because the APIC ID layout has the holes, and the possible APIC
IDs are more than actual cores.

[1]: https://cdrdv2-public.intel.com/759067/intel-64-architecture-processor-topology-enumeration.pdfi

Regards,
Zhao
diff mbox series

Patch

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 4c2e6f3a71..3710ae5283 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6417,7 +6417,7 @@  void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         }
         *edx = env->features[FEAT_1_EDX];
         if (threads_per_pkg > 1) {
-            *ebx |= threads_per_pkg << 16;
+            *ebx |= pow2ceil(threads_per_pkg) << 16;
             *edx |= CPUID_HT;
         }
         if (!cpu->enable_pmu) {