diff mbox

[RFC,1/8] x86/hvm: set initial apicid to vcpu_id

Message ID 56D735B0.8040402@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joao Martins March 2, 2016, 6:49 p.m. UTC
On 02/25/2016 05:03 PM, Jan Beulich wrote:
>>>> On 22.02.16 at 22:02, <joao.m.martins@oracle.com> wrote:
>> Currently the initial_apicid is set vcpu_id * 2 which makes it difficult
>> for the toolstack to manage how is the topology seen by the guest.
>> Instead of forcing procpkg and proccount to be VCPUID * 2, instead we
>> set it to max vcpuid on proccount to max_vcpu_id + 1 (logical number of
>> logical cores) and procpkg to max_vcpu_id (max cores minus 1)
> 
> I'm afraid it takes more than this to explain why the change is
> needed or at least desirable.
Apologies for my clumsiness in the commit message. I should have explained
properly why we need this for this series in the first place.

Currently use initial_apicid as vcpu_id * 2,
and doubled the leafs 1 and 4 values (proccount and procpkg) which means we will
address 8 LAPICIDs (tohugh only 4 will be used). Example topology and algorithm
below to facilitate discussion:

# Maximum logical addressable IDs (logical processors in a package)
proccount = CPUID.1:EBX[23:16]

# Maximum core addressable IDs - 1 (maximum cores in a package - 1)
procpkg = CPUID.(4,0):EAX[31:26]

# MSB (Calculate most significant bit)
SMT_Mask_width = MSB(proccount / (procpkg + 1))
Core_Mask_width = MSB(procpkg + 1)
CoreSMT_Mask_width = SMT_Mask_width + Core_Mask_width
Pkg_Mask_width = 1 << CoreSMT_Mask_width

SMT_ID = APICID & ((1 << SMT_Mask_width) - 1)
Core_ID = (APICID >> SMT_Mask_width) & ((1 << Core_Mask_width) - 1)
Pkg_ID = (APICID & Pkg_Mask_width) >> CoreSMT_Mask_width

So as it is right now, the topology on a 4 vcpu HVM guest looks like:

=> topology(proccount = 16, procpkg = 7) # current
APICID=0 SMT_ID=0 CORE_ID=0 PKG_ID=0 # VCPU 0
APICID=1 SMT_ID=1 CORE_ID=0 PKG_ID=0
APICID=2 SMT_ID=0 CORE_ID=1 PKG_ID=0 # VCPU 1
APICID=3 SMT_ID=1 CORE_ID=1 PKG_ID=0
APICID=4 SMT_ID=0 CORE_ID=2 PKG_ID=0 # VCPU 2
APICID=5 SMT_ID=1 CORE_ID=2 PKG_ID=0
APICID=6 SMT_ID=0 CORE_ID=3 PKG_ID=0 # VCPU 3
APICID=7 SMT_ID=1 CORE_ID=3 PKG_ID=0
[...]
APICID=14 SMT_ID=0 CORE_ID=7 PKG_ID=0
APICID=15 SMT_ID=1 CORE_ID=7 PKG_ID=0

As you know, APICID describes the SMT, Core and PKG. Problem with having APICID
in even numbers (0, 2, 4, ... N) is that we can't describe the SMT/siblings in
the topology. Thus turning the APICID ID space into (0, 1, 2 .. N) like this
patch proposes means we can know calculate all possibilities on both
topology kinds. Note that is a prerequisite patch so that a later
patch in this series sets the proccount and procpkg to enable seeing some cores
as SMT siblings.

=> topology(proccount = 4, procpkg = 3) # with this patch
APICID=0 SMT_ID=0 CORE_ID=0 PKG_ID=0
APICID=1 SMT_ID=0 CORE_ID=1 PKG_ID=0
APICID=2 SMT_ID=0 CORE_ID=2 PKG_ID=0
APICID=3 SMT_ID=0 CORE_ID=3 PKG_ID=0

x2APIC isn't addressed here for this RFC but it has the same issue (and
consequently exposure of FEATURE_XTOPOLOGY, CPUID.(EAX=0xB, ECX=N)). One
difference is that the SMT,Core,Pkg mask widths are fetched from each subleaf
directly as opposed to a calculation between procpkg and proccount.

> In particular I'd like to suggest that
> you do some archeology to understand why things are the way
> they are.
Digging in the history and threads, this behavior seems to be introduced by
commit c21d85b ("[HVM] Change VCPU->LAPIC_ID mapping so that VCPU0 has ID0")
where the main issue looked like a conflict between VCPU 0 LAPICID and IOAPIC
ID. Previous commits (a41ba62, facdf41) made IOAPIC on 0 and vLAPIC on 1..N but
it broke on old kernels (for the lack of LAPIC 0), so it ended up having a
vLAPIC ID space with 0, 2, 4, 6 and assign vIOAPICID = 1. This way all of
{L,IO}APICs have unique IDs - this thread
(http://lists.xen.org/archives/html/xen-devel/2008-09/msg00986.html) seems to
mention something along these lines too.

But the manuals aren't exactly clear on this ID uniqueness between LAPICs and
I/O APICs on more recent processors. Any lights on this would be great.

Intel 82093AA (IOAPIC) datasheet [0] says the following:

"This register contains the 4-bit APIC ID. The ID serves as a physical name of
the IOAPIC. All APIC devices using the APIC bus should have a unique APIC ID."

Though looking at the Intel SDM Volume 3, Chapter 10.1, Figure 10-2 and 10-3,
the APIC bus seems to be used only up to P6 family processors (Figure 10-2)
and it's indeed shared between I/OAPIC and LAPIC . For its successor (Pentium 4
and later) it's no longer the case (Figure 10-3).

My Broadwell machine in fact have conflicting APIC IDs between IOAPIC and LAPIC
in my MADT table. And it does seem that it's the case too for SeaBIOS (commit
e39b938 ("report real I/O APIC ID (0) on MADT and MP-table (v3)") ) and QEMU.
Though it wouldn't justify as reason for doing this on Xen.

[0] http://www.intel.com/design/chipsets/datashts/29056601.pdf

>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -4633,7 +4633,7 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
>> unsigned int *ebx,
>>      case 0x1:
>>          /* Fix up VLAPIC details. */
>>          *ebx &= 0x00FFFFFFu;
>> -        *ebx |= (v->vcpu_id * 2) << 24;
>> +        *ebx |= (v->vcpu_id) << 24;
>>          if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
>>              __clear_bit(X86_FEATURE_APIC & 31, edx);
> 
> In no case is this sufficient as adjustment to the hypervisor side,
> as it gets things out of sync with e.g. hvm/vlapic.c.
Yes, and also the firmware like this chunk below (tested too):
diff mbox

Patch

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index b838cf9..9c1f2f7 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -47,7 +47,7 @@  extern struct bios_config ovmf_config;
 #define IOAPIC_VERSION      0x11

 #define LAPIC_BASE_ADDRESS  0xfee00000
-#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
+#define LAPIC_ID(vcpu_id)   (vcpu_id)

 #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 01a8430..80fc6a1 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1206,7 +1206,7 @@  void vlapic_reset(struct vlapic *vlapic)
     if ( !has_vlapic(v->domain) )
         return;

-    vlapic_set_reg(vlapic, APIC_ID,  (v->vcpu_id * 2) << 24);
+    vlapic_set_reg(vlapic, APIC_ID,  v->vcpu_id << 24);
     vlapic_set_reg(vlapic, APIC_LVR, VLAPIC_VERSION);

     for ( i = 0; i < 8; i++ )