diff mbox

[for-4.8] libxc/x86: Report consistent initial APIC value for PV guests

Message ID ac4cf44e-8edf-ba18-193a-79dfde53dd5c@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Nov. 10, 2016, 2:55 p.m. UTC
On 10/11/16 14:50, Boris Ostrovsky wrote:
> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
> APIC ID) with contents of that field on the processor that launched
> the guest. This results in the guest reporting different initial
> APIC IDs across runs.
>
> We should be consistent in how this value is reported, let's set
> it to 0 (which is also what Linux guests expect).
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

This surely wants to go along with:

andrewcoop@andrewcoop:/local/xen.git/xen$ git diff

Which brings the PV CPUID handling in line with HVM handling.  Otherwise
a guest will see an APIC ID of 0 in all vcpus, which will surely confuse it.

~Andrew

> ---
>
> I think this should go to stable branches as well. This has been causing
> problems lately in Linux with introduction of topology maps.
>
>
>  tools/libxc/xc_cpuid_x86.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index d761805..1f26294 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -618,6 +618,12 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
>          /* Host topology exposed to PV guest.  Provide host value. */
>          bool host_htt = regs[3] & bitmaskof(X86_FEATURE_HTT);
>  
> +        /*
> +         * Don't pick host's Initial APIC ID which can change from run
> +         * to run. 
> +         */
> +        regs[1] &= 0x00ffffffu;
> +
>          regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
>          regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] &
>                     ~bitmaskof(X86_FEATURE_HTT));

Comments

Boris Ostrovsky Nov. 10, 2016, 3:05 p.m. UTC | #1
On 11/10/2016 09:55 AM, Andrew Cooper wrote:
> On 10/11/16 14:50, Boris Ostrovsky wrote:
>> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
>> APIC ID) with contents of that field on the processor that launched
>> the guest. This results in the guest reporting different initial
>> APIC IDs across runs.
>>
>> We should be consistent in how this value is reported, let's set
>> it to 0 (which is also what Linux guests expect).
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> This surely wants to go along with:

Probably, although Linux PV always reports APIC ID as zero (whole PV
APIC is a mess there as it is tied to topology discovery and we don't do
this well, to put it charitably).

> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index b51b51b..bdf9339 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -985,6 +985,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>          uint32_t tmp, _ecx, _ebx;
>  
>      case 0x00000001:
> +        /* Fix up VLAPIC details. */
> +        b &= 0x00FFFFFFu;
> +        b |= (curr->vcpu_id * 2) << 24;

Do we also need to multiply by two for PV guests? Or is it just to be
consistent with HVM?


-boris

> +
>          c &= pv_featureset[FEATURESET_1c];
>          d &= pv_featureset[FEATURESET_1d];
>  
>
> Which brings the PV CPUID handling in line with HVM handling.  Otherwise
> a guest will see an APIC ID of 0 in all vcpus, which will surely confuse it.
>
> ~Andrew
>
>> ---
>>
>> I think this should go to stable branches as well. This has been causing
>> problems lately in Linux with introduction of topology maps.
>>
>>
>>  tools/libxc/xc_cpuid_x86.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
>> index d761805..1f26294 100644
>> --- a/tools/libxc/xc_cpuid_x86.c
>> +++ b/tools/libxc/xc_cpuid_x86.c
>> @@ -618,6 +618,12 @@ static void xc_cpuid_pv_policy(xc_interface *xch,
>>          /* Host topology exposed to PV guest.  Provide host value. */
>>          bool host_htt = regs[3] & bitmaskof(X86_FEATURE_HTT);
>>  
>> +        /*
>> +         * Don't pick host's Initial APIC ID which can change from run
>> +         * to run. 
>> +         */
>> +        regs[1] &= 0x00ffffffu;
>> +
>>          regs[2] = info->featureset[featureword_of(X86_FEATURE_SSE3)];
>>          regs[3] = (info->featureset[featureword_of(X86_FEATURE_FPU)] &
>>                     ~bitmaskof(X86_FEATURE_HTT));
Andrew Cooper Nov. 10, 2016, 3:08 p.m. UTC | #2
On 10/11/16 15:05, Boris Ostrovsky wrote:
> On 11/10/2016 09:55 AM, Andrew Cooper wrote:
>> On 10/11/16 14:50, Boris Ostrovsky wrote:
>>> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
>>> APIC ID) with contents of that field on the processor that launched
>>> the guest. This results in the guest reporting different initial
>>> APIC IDs across runs.
>>>
>>> We should be consistent in how this value is reported, let's set
>>> it to 0 (which is also what Linux guests expect).
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> This surely wants to go along with:
> Probably, although Linux PV always reports APIC ID as zero (whole PV
> APIC is a mess there as it is tied to topology discovery and we don't do
> this well, to put it charitably).

If PV linux always overrides this to 0, why do you need the toolstack
fix in the first place?

>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>> index b51b51b..bdf9339 100644
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -985,6 +985,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>          uint32_t tmp, _ecx, _ebx;
>>  
>>      case 0x00000001:
>> +        /* Fix up VLAPIC details. */
>> +        b &= 0x00FFFFFFu;
>> +        b |= (curr->vcpu_id * 2) << 24;
> Do we also need to multiply by two for PV guests? Or is it just to be
> consistent with HVM?

Frankly, until I get CPUID phase 2 sorted, this is all held together
with good wishes, rather than duck tape.  I am astounded it has held
together this long.

HVM chooses an even APIC ID to prevent the VM thinking it has hyperthreads.

~Andrew
Boris Ostrovsky Nov. 10, 2016, 3:24 p.m. UTC | #3
On 11/10/2016 10:08 AM, Andrew Cooper wrote:
> On 10/11/16 15:05, Boris Ostrovsky wrote:
>> On 11/10/2016 09:55 AM, Andrew Cooper wrote:
>>> On 10/11/16 14:50, Boris Ostrovsky wrote:
>>>> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
>>>> APIC ID) with contents of that field on the processor that launched
>>>> the guest. This results in the guest reporting different initial
>>>> APIC IDs across runs.
>>>>
>>>> We should be consistent in how this value is reported, let's set
>>>> it to 0 (which is also what Linux guests expect).
>>>>
>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> This surely wants to go along with:
>> Probably, although Linux PV always reports APIC ID as zero (whole PV
>> APIC is a mess there as it is tied to topology discovery and we don't do
>> this well, to put it charitably).
> If PV linux always overrides this to 0, why do you need the toolstack
> fix in the first place?

I meant that Linux overrides APICID read from APIC (which, of course, we
don't have) to zero, not CPUID.

>
>>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>>> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>>> index b51b51b..bdf9339 100644
>>> --- a/xen/arch/x86/traps.c
>>> +++ b/xen/arch/x86/traps.c
>>> @@ -985,6 +985,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>>          uint32_t tmp, _ecx, _ebx;
>>>  
>>>      case 0x00000001:
>>> +        /* Fix up VLAPIC details. */
>>> +        b &= 0x00FFFFFFu;
>>> +        b |= (curr->vcpu_id * 2) << 24;
>> Do we also need to multiply by two for PV guests? Or is it just to be
>> consistent with HVM?
> Frankly, until I get CPUID phase 2 sorted, this is all held together
> with good wishes, rather than duck tape.  I am astounded it has held
> together this long.

It does not anymore. We are being yelled at by Linux x86 maintainers to
get our stuff in order.

>
> HVM chooses an even APIC ID to prevent the VM thinking it has hyperthreads.


Right, but it's not needed (I think) by PV. But we might do it that way
to keep the two in sync. Let me test it and see what breaks.

-boris
Jan Beulich Nov. 10, 2016, 4:12 p.m. UTC | #4
>>> On 10.11.16 at 15:55, <andrew.cooper3@citrix.com> wrote:
> On 10/11/16 14:50, Boris Ostrovsky wrote:
>> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
>> APIC ID) with contents of that field on the processor that launched
>> the guest. This results in the guest reporting different initial
>> APIC IDs across runs.
>>
>> We should be consistent in how this value is reported, let's set
>> it to 0 (which is also what Linux guests expect).
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> This surely wants to go along with:
> 
> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index b51b51b..bdf9339 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -985,6 +985,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>          uint32_t tmp, _ecx, _ebx;
>  
>      case 0x00000001:
> +        /* Fix up VLAPIC details. */
> +        b &= 0x00FFFFFFu;
> +        b |= (curr->vcpu_id * 2) << 24;
> +
>          c &= pv_featureset[FEATURESET_1c];
>          d &= pv_featureset[FEATURESET_1d];
>  
> 
> Which brings the PV CPUID handling in line with HVM handling.  Otherwise
> a guest will see an APIC ID of 0 in all vcpus, which will surely confuse it.

Which will still result in multiple identical APIC IDs once there are
128 or more vCPU-s in a PV guest.

Jan
Boris Ostrovsky Nov. 10, 2016, 4:24 p.m. UTC | #5
On 11/10/2016 11:12 AM, Jan Beulich wrote:
>>>> On 10.11.16 at 15:55, <andrew.cooper3@citrix.com> wrote:
>> On 10/11/16 14:50, Boris Ostrovsky wrote:
>>> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
>>> APIC ID) with contents of that field on the processor that launched
>>> the guest. This results in the guest reporting different initial
>>> APIC IDs across runs.
>>>
>>> We should be consistent in how this value is reported, let's set
>>> it to 0 (which is also what Linux guests expect).
>>>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> This surely wants to go along with:
>>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>> index b51b51b..bdf9339 100644
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -985,6 +985,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>          uint32_t tmp, _ecx, _ebx;
>>  
>>      case 0x00000001:
>> +        /* Fix up VLAPIC details. */
>> +        b &= 0x00FFFFFFu;
>> +        b |= (curr->vcpu_id * 2) << 24;
>> +
>>          c &= pv_featureset[FEATURESET_1c];
>>          d &= pv_featureset[FEATURESET_1d];
>>  
>>
>> Which brings the PV CPUID handling in line with HVM handling.  Otherwise
>> a guest will see an APIC ID of 0 in all vcpus, which will surely confuse it.
> Which will still result in multiple identical APIC IDs once there are
> 128 or more vCPU-s in a PV guest.


And this change (either with or without *2) makes Linux PV problem of
mismatched APIC[0x20] vs CPUID(1).EBX[31:24] to be back (yes, because
Linux is broken in that it currently makes APIC[0x20] return zero).

You'd obviously be within your rights to say that it's up to Linux to
deal with this but I will ask you to consider taking only the toolstack
part of this for now.


-boris
Andrew Cooper Nov. 10, 2016, 4:24 p.m. UTC | #6
On 10/11/16 16:24, Boris Ostrovsky wrote:
> On 11/10/2016 11:12 AM, Jan Beulich wrote:
>>>>> On 10.11.16 at 15:55, <andrew.cooper3@citrix.com> wrote:
>>> On 10/11/16 14:50, Boris Ostrovsky wrote:
>>>> Currently hypervisor provides PV guest's CPUID(1).EBX[31:24] (initial
>>>> APIC ID) with contents of that field on the processor that launched
>>>> the guest. This results in the guest reporting different initial
>>>> APIC IDs across runs.
>>>>
>>>> We should be consistent in how this value is reported, let's set
>>>> it to 0 (which is also what Linux guests expect).
>>>>
>>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>> This surely wants to go along with:
>>>
>>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>>> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>>> index b51b51b..bdf9339 100644
>>> --- a/xen/arch/x86/traps.c
>>> +++ b/xen/arch/x86/traps.c
>>> @@ -985,6 +985,10 @@ void pv_cpuid(struct cpu_user_regs *regs)
>>>          uint32_t tmp, _ecx, _ebx;
>>>  
>>>      case 0x00000001:
>>> +        /* Fix up VLAPIC details. */
>>> +        b &= 0x00FFFFFFu;
>>> +        b |= (curr->vcpu_id * 2) << 24;
>>> +
>>>          c &= pv_featureset[FEATURESET_1c];
>>>          d &= pv_featureset[FEATURESET_1d];
>>>  
>>>
>>> Which brings the PV CPUID handling in line with HVM handling.  Otherwise
>>> a guest will see an APIC ID of 0 in all vcpus, which will surely confuse it.
>> Which will still result in multiple identical APIC IDs once there are
>> 128 or more vCPU-s in a PV guest.

You can already crash Xen with fewer PV vcpus than that, due to long
running for_each_vcpu() loops.

The current ABI limit of 8192 is implausible for production, as is the
current 128 limit for HVM guests.

>
> And this change (either with or without *2) makes Linux PV problem of
> mismatched APIC[0x20] vs CPUID(1).EBX[31:24] to be back (yes, because
> Linux is broken in that it currently makes APIC[0x20] return zero).
>
> You'd obviously be within your rights to say that it's up to Linux to
> deal with this but I will ask you to consider taking only the toolstack
> part of this for now.

If the toolstack along is sufficient duct tape, perhaps best to go with
just that for now.

I will have to change all of this for CPUID phase-2.  We can see about
fixing things more correctly then.

~Andrew
diff mbox

Patch

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b51b51b..bdf9339 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -985,6 +985,10 @@  void pv_cpuid(struct cpu_user_regs *regs)
         uint32_t tmp, _ecx, _ebx;
 
     case 0x00000001:
+        /* Fix up VLAPIC details. */
+        b &= 0x00FFFFFFu;
+        b |= (curr->vcpu_id * 2) << 24;
+
         c &= pv_featureset[FEATURESET_1c];
         d &= pv_featureset[FEATURESET_1d];