diff mbox series

[v4] xen/x86: On x2APIC mode, derive LDR from APIC ID

Message ID 20231123173057.1325-1-alejandro.vallejo@cloud.com (mailing list archive)
State Superseded
Headers show
Series [v4] xen/x86: On x2APIC mode, derive LDR from APIC ID | expand

Commit Message

Alejandro Vallejo Nov. 23, 2023, 5:30 p.m. UTC
Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
registers are derivable from each other through a fixed formula.

Xen uses that formula, but applies it to vCPU IDs (which are sequential)
rather than x2APIC IDs (which are not, at the moment). As I understand it,
this is an attempt to tightly pack vCPUs into clusters so each cluster has
16 vCPUs rather than 8, but this is problematic for OSs that might read the
x2APIC ID and internally derive LDR (or the other way around)

This patch fixes the implementation so we follow the rules in the x2APIC
spec(s) and covers migrations from broken hypervisors, so LDRs are
preserved even for hotppluggable CPUs and across APIC resets.

While touching that area, I removed the existing printk statement in
vlapic_load_fixup() (as the checks it performed didn't make sense in x2APIC
mode and wouldn't affect the outcome) and put another printk as an else
branch so we get warnings trying to load nonsensical LDR values we don't
know about.

Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v4:
  * Removed "See <function>()" part of comment in set_x2apic_id()
  * Removed _with_ in field name
  * Trimmed down comments further
  * Rephrased the Xen versions in the comments so it's implied not every
    Xen 4.X is affected (as they won't be after this patch is backported)
  * Replace Xen 4.18 reference with date+4.19 dev window
---
 xen/arch/x86/hvm/vlapic.c             | 66 ++++++++++++++++++---------
 xen/arch/x86/include/asm/hvm/domain.h |  3 ++
 2 files changed, 47 insertions(+), 22 deletions(-)

Comments

Andrew Cooper Nov. 24, 2023, 10:05 p.m. UTC | #1
A few minor grammar notes.

"x86/vlapic: In x2APIC ..."


On 23/11/2023 5:30 pm, Alejandro Vallejo wrote:
> Both Intel and AMD manuals agree that on x2APIC mode, the APIC LDR and ID
> registers are derivable from each other through a fixed formula.
>
> Xen uses that formula, but applies it to vCPU IDs (which are sequential)
> rather than x2APIC IDs (which are not, at the moment). As I understand it,
> this is an attempt to tightly pack vCPUs into clusters so each cluster has
> 16 vCPUs rather than 8, but this is problematic for OSs that might read the
> x2APIC ID and internally derive LDR (or the other way around)

Missing full stop.  But I'd also phrase this explicitly as "violating
the spec", rather than "problematic".

Intel is crystal clear on the matter:

  Logical x2APIC ID = [(x2APIC ID[19:4] « 16) | (1 « x2APIC ID[3:0])]

and Xen isn't implementing this.

> This patch fixes the implementation so we follow the rules in the x2APIC
> spec(s) and covers migrations from broken hypervisors, so LDRs are
> preserved even for hotppluggable CPUs and across APIC resets.

"This patch fixes the implementation so we follow the x2APIC spec for
new VMs, while preserving the behaviour (buggy or fixed) for migrated-in
VMs."

Hotpluggable isn't relevant.  We have the state, and it's either as it
was before, or it's fixed.


>
> While touching that area, I removed the existing printk statement in
> vlapic_load_fixup() (as the checks it performed didn't make sense in x2APIC
> mode and wouldn't affect the outcome) and put another printk as an else
> branch so we get warnings trying to load nonsensical LDR values we don't
> know about.
>
> Fixes: f9e0cccf7b35 ("x86/HVM: fix ID handling of x2APIC emulation")
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> v4:
>   * Removed "See <function>()" part of comment in set_x2apic_id()
>   * Removed _with_ in field name
>   * Trimmed down comments further
>   * Rephrased the Xen versions in the comments so it's implied not every
>     Xen 4.X is affected (as they won't be after this patch is backported)
>   * Replace Xen 4.18 reference with date+4.19 dev window
> ---
>  xen/arch/x86/hvm/vlapic.c             | 66 ++++++++++++++++++---------
>  xen/arch/x86/include/asm/hvm/domain.h |  3 ++
>  2 files changed, 47 insertions(+), 22 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 5cb87f8649..cd4701c5a2 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1061,13 +1061,26 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = {
>      .write = vlapic_mmio_write,
>  };
>  
> +static uint32_t x2apic_ldr_from_id(uint32_t id)
> +{
> +    return ((id & ~0xf) << 12) | (1 << (id & 0xf));
> +}
> +
>  static void set_x2apic_id(struct vlapic *vlapic)
>  {

const struct vcpu *v = vlapic_vcpu(vlapic);

seeing as you've got the expression used twice already.

With that, the following logic is shorter too; you can get away with
dropping the vcpu_id variable as v->vcpu_id is the more common form to
use in Xen.


> -    u32 id = vlapic_vcpu(vlapic)->vcpu_id;
> -    u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf));
> +    uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id;
> +    uint32_t apic_id = vcpu_id * 2;

/* TODO: Fix topology */ as a suffix here.

Just to make it clear that we're aware that this *2 logic is faulty, but
it needs to remain like this in the short term.

> +    uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
> +
> +    /*
> +     * Workaround for migrated domains to derive LDRs as the source host
> +     * would've.
> +     */
> +    if ( vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id )
> +        apic_ldr = x2apic_ldr_from_id(vcpu_id);
>  
> -    vlapic_set_reg(vlapic, APIC_ID, id * 2);
> -    vlapic_set_reg(vlapic, APIC_LDR, ldr);
> +    vlapic_set_reg(vlapic, APIC_ID, apic_id);
> +    vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>  }
>  
>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> @@ -1498,27 +1511,36 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
>   */
>  static void lapic_load_fixup(struct vlapic *vlapic)
>  {

Again, const struct vcpu *v = vlapic_vcpu(vlapic); here helps legibility.

> -    uint32_t id = vlapic->loaded.id;
> +    uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>  
> -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
> -    {
> +    /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
> +    if ( !vlapic_x2apic_mode(vlapic) ||
> +         (vlapic->loaded.ldr == good_ldr) )
> +        return;
> +
> +    if ( vlapic->loaded.ldr == 1 )
> +       /*
> +        * Xen <= 4.4 may have a bug by which all the APICs configured in
> +        * x2APIC mode got LDR = 1. We can't leave it as-is because it
> +        * assigned the same LDR to every CPU.  We'll fix fix the bug now
> +        * and assign an LDR value consistent with the APIC ID.
> +        */
> +        set_x2apic_id(vlapic);
> +    else if ( vlapic->loaded.ldr ==

I know these are single statement if's, and the coding style permits
them without braces, but the comment makes it visually awkward to follow.

This is an example where braces help readability generally, but also (as
it happens) help make a more readable diff.


> +              x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id) )
>          /*
> -         * This is optional: ID != 0 contradicts LDR == 1. It's being added
> -         * to aid in eventual debugging of issues arising from the fixup done
> -         * here, but can be dropped as soon as it is found to conflict with
> -         * other (future) changes.
> +         * Migrations from Xen 4.4 to date (4.19 dev window, Nov 2023) may
> +         * show this bug.

"may have had LDR derived from the vCPU_ID, not the APIC_ID."

Better to clearly state what the bug is.

>  We must preserve LDRs so new vCPUs use consistent
> +         * derivations and existing guests, which may have already read the
> +         * LDR at the source host, aren't surprised when interrupts stop
> +         * working the way they did at the other end.
>           */
> -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
> -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
> -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
> -                   vlapic_vcpu(vlapic), id);
> -        set_x2apic_id(vlapic);
> -    }
> -    else /* Undo an eventual earlier fixup. */
> -    {
> -        vlapic_set_reg(vlapic, APIC_ID, id);
> -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
> -    }
> +        vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id = true;
> +    else
> +        printk(XENLOG_G_WARNING
> +               "%pv: bogus x2APIC loaded id=%#x ldr=%#x (expected ldr=%#x)\n",

%pv: bogus loaded x2APIC ID %#x, LDR %#x, expected LDR %#x\n

If you properly capitalise x2APIC, you should capitalise ID and LDR. 
The other changes are matter of taste, but make for a less cluttered log
message IMO.


This is a long list of minor niggles, but they're all style/comment
issues, and nothing to do with the logic itself.  I'm happy to fix them
all up on commit, and here is the result with them merged:

https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=commitdiff;h=953dcb0317d20959ffee14e404595cfbb66c607a

for you to glance over.

~Andrew
Jan Beulich Nov. 27, 2023, 8:40 a.m. UTC | #2
On 23.11.2023 18:30, Alejandro Vallejo wrote:
> @@ -1498,27 +1511,36 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
>   */
>  static void lapic_load_fixup(struct vlapic *vlapic)
>  {
> -    uint32_t id = vlapic->loaded.id;
> +    uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>  
> -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
> -    {
> +    /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
> +    if ( !vlapic_x2apic_mode(vlapic) ||
> +         (vlapic->loaded.ldr == good_ldr) )
> +        return;
> +
> +    if ( vlapic->loaded.ldr == 1 )
> +       /*
> +        * Xen <= 4.4 may have a bug by which all the APICs configured in
> +        * x2APIC mode got LDR = 1. We can't leave it as-is because it
> +        * assigned the same LDR to every CPU.  We'll fix fix the bug now
> +        * and assign an LDR value consistent with the APIC ID.
> +        */

Just one comment on top of Andrew's: Is the double "fix" really intended
here? (I could see it might be, but then "fix the bug fix" would read
slightly more smoothly to me as a non-native speaker.)

Another aspect here is what exactly the comment states (and does not
state). Original comments made explicit that LDR == 1 contradicts ID == 0.
In the new comment you only emphasize that all CPUs cannot have that same
LDR. But the value of 1 being bogus in the first place doesn't become clear
anymore.

Jan
Alejandro Vallejo Nov. 27, 2023, 12:08 p.m. UTC | #3
On 24/11/2023 22:05, Andrew Cooper wrote:
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 5cb87f8649..cd4701c5a2 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -1061,13 +1061,26 @@ static const struct hvm_mmio_ops vlapic_mmio_ops = {
>>       .write = vlapic_mmio_write,
>>   };
>>   
>> +static uint32_t x2apic_ldr_from_id(uint32_t id)
>> +{
>> +    return ((id & ~0xf) << 12) | (1 << (id & 0xf));
>> +}
>> +
>>   static void set_x2apic_id(struct vlapic *vlapic)
>>   {
> 
> const struct vcpu *v = vlapic_vcpu(vlapic);
> 
> seeing as you've got the expression used twice already.
> 
> With that, the following logic is shorter too; you can get away with
> dropping the vcpu_id variable as v->vcpu_id is the more common form to
> use in Xen.

Twice? I can see a vague raison-d'etre in lapic_load_fixup(), but
there's a single occurence here.

> 
>>   We must preserve LDRs so new vCPUs use consistent
>> +         * derivations and existing guests, which may have already read the
>> +         * LDR at the source host, aren't surprised when interrupts stop
>> +         * working the way they did at the other end.
>>            */
>> -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
>> -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
>> -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
>> -                   vlapic_vcpu(vlapic), id);
>> -        set_x2apic_id(vlapic);
>> -    }
>> -    else /* Undo an eventual earlier fixup. */
>> -    {
>> -        vlapic_set_reg(vlapic, APIC_ID, id);
>> -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
>> -    }
>> +        vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id = true;
>> +    else
>> +        printk(XENLOG_G_WARNING
>> +               "%pv: bogus x2APIC loaded id=%#x ldr=%#x (expected ldr=%#x)\n",
> 
> %pv: bogus loaded x2APIC ID %#x, LDR %#x, expected LDR %#x\n
> 
> If you properly capitalise x2APIC, you should capitalise ID and LDR.
> The other changes are matter of taste, but make for a less cluttered log
> message IMO.
> 

"bogus x2APIC loaded" is meant to be a sentence followed by key-value 
pairs. Uppercasing the keys is fine (albeit unnecessary, IMO), but you 
choice of word order feels backwards.

> 
> This is a long list of minor niggles, but they're all style/comment
> issues, and nothing to do with the logic itself.  I'm happy to fix them
> all up on commit, and here is the result with them merged:
> 
> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=commitdiff;h=953dcb0317d20959ffee14e404595cfbb66c607a
> 
> for you to glance over.
> 
> ~Andrew

Except for the 2 bits pointed out, the others are cosmetic changes
I don't particularly mind. Jan spotted a typo in the second comment 
block in lapic_load_fixup() that should be corrected too.
(s/fix fix/fix/)

Cheers,
Alejandro
Alejandro Vallejo Nov. 27, 2023, 12:17 p.m. UTC | #4
On 27/11/2023 08:40, Jan Beulich wrote:
> On 23.11.2023 18:30, Alejandro Vallejo wrote:
>> @@ -1498,27 +1511,36 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
>>    */
>>   static void lapic_load_fixup(struct vlapic *vlapic)
>>   {
>> -    uint32_t id = vlapic->loaded.id;
>> +    uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>   
>> -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
>> -    {
>> +    /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>> +    if ( !vlapic_x2apic_mode(vlapic) ||
>> +         (vlapic->loaded.ldr == good_ldr) )
>> +        return;
>> +
>> +    if ( vlapic->loaded.ldr == 1 )
>> +       /*
>> +        * Xen <= 4.4 may have a bug by which all the APICs configured in
>> +        * x2APIC mode got LDR = 1. We can't leave it as-is because it
>> +        * assigned the same LDR to every CPU.  We'll fix fix the bug now
>> +        * and assign an LDR value consistent with the APIC ID.
>> +        */
> 
> Just one comment on top of Andrew's: Is the double "fix" really intended
> here? (I could see it might be, but then "fix the bug fix" would read
> slightly more smoothly to me as a non-native speaker.)

It's not intended indeed. s/fix fix/fix/

> 
> Another aspect here is what exactly the comment states (and does not
> state). Original comments made explicit that LDR == 1 contradicts ID == 0.
> In the new comment you only emphasize that all CPUs cannot have that same
> LDR. But the value of 1 being bogus in the first place doesn't become clear
> anymore.
> 
> Jan

1 is bogus for id!=0, but so would be 3, 7 or 42. In particular we have
ID==2 contradicting LDR=2, and we're allowing it. The reason why we must
fix this other case is because all LDRs are equal, otherwise it would
get the same treatment as the other bug.

Cheers,
Alejandro
Andrew Cooper Nov. 27, 2023, 12:20 p.m. UTC | #5
On 27/11/2023 12:08 pm, Alejandro Vallejo wrote:
> On 24/11/2023 22:05, Andrew Cooper wrote:
>>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>>> index 5cb87f8649..cd4701c5a2 100644
>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -1061,13 +1061,26 @@ static const struct hvm_mmio_ops
>>> vlapic_mmio_ops = {
>>>       .write = vlapic_mmio_write,
>>>   };
>>>   +static uint32_t x2apic_ldr_from_id(uint32_t id)
>>> +{
>>> +    return ((id & ~0xf) << 12) | (1 << (id & 0xf));
>>> +}
>>> +
>>>   static void set_x2apic_id(struct vlapic *vlapic)
>>>   {
>>
>> const struct vcpu *v = vlapic_vcpu(vlapic);
>>
>> seeing as you've got the expression used twice already.
>>
>> With that, the following logic is shorter too; you can get away with
>> dropping the vcpu_id variable as v->vcpu_id is the more common form to
>> use in Xen.
>
> Twice? I can see a vague raison-d'etre in lapic_load_fixup(), but
> there's a single occurence here.

It's hidden in the vlapic_domain(), which is vlacpi_vcpu()->domain.

>
>>
>>>   We must preserve LDRs so new vCPUs use consistent
>>> +         * derivations and existing guests, which may have already
>>> read the
>>> +         * LDR at the source host, aren't surprised when interrupts
>>> stop
>>> +         * working the way they did at the other end.
>>>            */
>>> -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
>>> -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
>>> -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
>>> -                   vlapic_vcpu(vlapic), id);
>>> -        set_x2apic_id(vlapic);
>>> -    }
>>> -    else /* Undo an eventual earlier fixup. */
>>> -    {
>>> -        vlapic_set_reg(vlapic, APIC_ID, id);
>>> -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
>>> -    }
>>> +        vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id = true;
>>> +    else
>>> +        printk(XENLOG_G_WARNING
>>> +               "%pv: bogus x2APIC loaded id=%#x ldr=%#x (expected
>>> ldr=%#x)\n",
>>
>> %pv: bogus loaded x2APIC ID %#x, LDR %#x, expected LDR %#x\n
>>
>> If you properly capitalise x2APIC, you should capitalise ID and LDR.
>> The other changes are matter of taste, but make for a less cluttered log
>> message IMO.
>>
>
> "bogus x2APIC loaded" is meant to be a sentence followed by key-value
> pairs. Uppercasing the keys is fine (albeit unnecessary, IMO), but you
> choice of word order feels backwards.

The grammar is awkward either way.

How about "bogus x2APIC record: "

?

That is much clearer I think.

~Andrew
Jan Beulich Nov. 27, 2023, 12:24 p.m. UTC | #6
On 27.11.2023 13:17, Alejandro Vallejo wrote:
> On 27/11/2023 08:40, Jan Beulich wrote:
>> On 23.11.2023 18:30, Alejandro Vallejo wrote:
>>> @@ -1498,27 +1511,36 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
>>>    */
>>>   static void lapic_load_fixup(struct vlapic *vlapic)
>>>   {
>>> -    uint32_t id = vlapic->loaded.id;
>>> +    uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>>   
>>> -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
>>> -    {
>>> +    /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>>> +    if ( !vlapic_x2apic_mode(vlapic) ||
>>> +         (vlapic->loaded.ldr == good_ldr) )
>>> +        return;
>>> +
>>> +    if ( vlapic->loaded.ldr == 1 )
>>> +       /*
>>> +        * Xen <= 4.4 may have a bug by which all the APICs configured in
>>> +        * x2APIC mode got LDR = 1. We can't leave it as-is because it
>>> +        * assigned the same LDR to every CPU.  We'll fix fix the bug now
>>> +        * and assign an LDR value consistent with the APIC ID.
>>> +        */
>>
>> Just one comment on top of Andrew's: Is the double "fix" really intended
>> here? (I could see it might be, but then "fix the bug fix" would read
>> slightly more smoothly to me as a non-native speaker.)
> 
> It's not intended indeed. s/fix fix/fix/
> 
>>
>> Another aspect here is what exactly the comment states (and does not
>> state). Original comments made explicit that LDR == 1 contradicts ID == 0.
>> In the new comment you only emphasize that all CPUs cannot have that same
>> LDR. But the value of 1 being bogus in the first place doesn't become clear
>> anymore.
> 
> 1 is bogus for id!=0, but so would be 3, 7 or 42.

Yet 3, 7, and 42 aren't interesting in the context of that older bug.

> In particular we have
> ID==2 contradicting LDR=2, and we're allowing it. The reason why we must
> fix this other case is because all LDRs are equal, otherwise it would
> get the same treatment as the other bug.

I understand all that; still there's loss of information in the comments,
from my perspective.

Jan
Alejandro Vallejo Nov. 27, 2023, 1:25 p.m. UTC | #7
On 27/11/2023 12:20, Andrew Cooper wrote:
> On 27/11/2023 12:08 pm, Alejandro Vallejo wrote:
>> On 24/11/2023 22:05, Andrew Cooper wrote:
>>>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>>>> index 5cb87f8649..cd4701c5a2 100644
>>>> --- a/xen/arch/x86/hvm/vlapic.c
>>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>>> @@ -1061,13 +1061,26 @@ static const struct hvm_mmio_ops
>>>> vlapic_mmio_ops = {
>>>>        .write = vlapic_mmio_write,
>>>>    };
>>>>    +static uint32_t x2apic_ldr_from_id(uint32_t id)
>>>> +{
>>>> +    return ((id & ~0xf) << 12) | (1 << (id & 0xf));
>>>> +}
>>>> +
>>>>    static void set_x2apic_id(struct vlapic *vlapic)
>>>>    {
>>>
>>> const struct vcpu *v = vlapic_vcpu(vlapic);
>>>
>>> seeing as you've got the expression used twice already.
>>>
>>> With that, the following logic is shorter too; you can get away with
>>> dropping the vcpu_id variable as v->vcpu_id is the more common form to
>>> use in Xen.
>>
>> Twice? I can see a vague raison-d'etre in lapic_load_fixup(), but
>> there's a single occurence here.
> 
> It's hidden in the vlapic_domain(), which is vlacpi_vcpu()->domain.
> 
>>
>>>
>>>>    We must preserve LDRs so new vCPUs use consistent
>>>> +         * derivations and existing guests, which may have already
>>>> read the
>>>> +         * LDR at the source host, aren't surprised when interrupts
>>>> stop
>>>> +         * working the way they did at the other end.
>>>>             */
>>>> -        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
>>>> -             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
>>>> -            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
>>>> -                   vlapic_vcpu(vlapic), id);
>>>> -        set_x2apic_id(vlapic);
>>>> -    }
>>>> -    else /* Undo an eventual earlier fixup. */
>>>> -    {
>>>> -        vlapic_set_reg(vlapic, APIC_ID, id);
>>>> -        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
>>>> -    }
>>>> +        vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id = true;
>>>> +    else
>>>> +        printk(XENLOG_G_WARNING
>>>> +               "%pv: bogus x2APIC loaded id=%#x ldr=%#x (expected
>>>> ldr=%#x)\n",
>>>
>>> %pv: bogus loaded x2APIC ID %#x, LDR %#x, expected LDR %#x\n
>>>
>>> If you properly capitalise x2APIC, you should capitalise ID and LDR.
>>> The other changes are matter of taste, but make for a less cluttered log
>>> message IMO.
>>>
>>
>> "bogus x2APIC loaded" is meant to be a sentence followed by key-value
>> pairs. Uppercasing the keys is fine (albeit unnecessary, IMO), but you
>> choice of word order feels backwards.
> 
> The grammar is awkward either way.
> 
> How about "bogus x2APIC record: "
> 
> ?
> 
> That is much clearer I think.
> 
> ~Andrew

LGTM.

Do you want me to send a v5 with it all?

Cheers,
Alejandro

Cheers,
Alejandro
Alejandro Vallejo Nov. 27, 2023, 1:29 p.m. UTC | #8
On 27/11/2023 12:24, Jan Beulich wrote:
> On 27.11.2023 13:17, Alejandro Vallejo wrote:
>> On 27/11/2023 08:40, Jan Beulich wrote:
>>> On 23.11.2023 18:30, Alejandro Vallejo wrote:
>>>> @@ -1498,27 +1511,36 @@ static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
>>>>    */
>>>>   static void lapic_load_fixup(struct vlapic *vlapic)
>>>>   {
>>>> -    uint32_t id = vlapic->loaded.id;
>>>> +    uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
>>>>   
>>>> -    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
>>>> -    {
>>>> +    /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
>>>> +    if ( !vlapic_x2apic_mode(vlapic) ||
>>>> +         (vlapic->loaded.ldr == good_ldr) )
>>>> +        return;
>>>> +
>>>> +    if ( vlapic->loaded.ldr == 1 )
>>>> +       /*
>>>> +        * Xen <= 4.4 may have a bug by which all the APICs configured in
>>>> +        * x2APIC mode got LDR = 1. We can't leave it as-is because it
>>>> +        * assigned the same LDR to every CPU.  We'll fix fix the bug now
>>>> +        * and assign an LDR value consistent with the APIC ID.
>>>> +        */
>>>
>>> Just one comment on top of Andrew's: Is the double "fix" really intended
>>> here? (I could see it might be, but then "fix the bug fix" would read
>>> slightly more smoothly to me as a non-native speaker.)
>>
>> It's not intended indeed. s/fix fix/fix/
>>
>>>
>>> Another aspect here is what exactly the comment states (and does not
>>> state). Original comments made explicit that LDR == 1 contradicts ID == 0.
>>> In the new comment you only emphasize that all CPUs cannot have that same
>>> LDR. But the value of 1 being bogus in the first place doesn't become clear
>>> anymore.
>>
>> 1 is bogus for id!=0, but so would be 3, 7 or 42.
> 
> Yet 3, 7, and 42 aren't interesting in the context of that older bug.
> 
>> In particular we have
>> ID==2 contradicting LDR=2, and we're allowing it. The reason why we must
>> fix this other case is because all LDRs are equal, otherwise it would
>> get the same treatment as the other bug.
> 
> I understand all that; still there's loss of information in the comments,
> from my perspective.
> 
> Jan

v2 did have a "Note that "x2apic_id == 0" has always been correct and
can't be used to discriminate these cases." and another in front of the
early exit "No need to perform fixups in non-x2apic mode, and x2apic_id
== 0 has always been correct.". They were trimmed as versions went on.

As mentioned before this is all cosmetic, so I'm happy either way. I'll
reinstate something to this effect in a v5.

Cheers,
Alejandro
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 5cb87f8649..cd4701c5a2 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1061,13 +1061,26 @@  static const struct hvm_mmio_ops vlapic_mmio_ops = {
     .write = vlapic_mmio_write,
 };
 
+static uint32_t x2apic_ldr_from_id(uint32_t id)
+{
+    return ((id & ~0xf) << 12) | (1 << (id & 0xf));
+}
+
 static void set_x2apic_id(struct vlapic *vlapic)
 {
-    u32 id = vlapic_vcpu(vlapic)->vcpu_id;
-    u32 ldr = ((id & ~0xf) << 12) | (1 << (id & 0xf));
+    uint32_t vcpu_id = vlapic_vcpu(vlapic)->vcpu_id;
+    uint32_t apic_id = vcpu_id * 2;
+    uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
+
+    /*
+     * Workaround for migrated domains to derive LDRs as the source host
+     * would've.
+     */
+    if ( vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id )
+        apic_ldr = x2apic_ldr_from_id(vcpu_id);
 
-    vlapic_set_reg(vlapic, APIC_ID, id * 2);
-    vlapic_set_reg(vlapic, APIC_LDR, ldr);
+    vlapic_set_reg(vlapic, APIC_ID, apic_id);
+    vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
 }
 
 int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
@@ -1498,27 +1511,36 @@  static int cf_check lapic_save_regs(struct vcpu *v, hvm_domain_context_t *h)
  */
 static void lapic_load_fixup(struct vlapic *vlapic)
 {
-    uint32_t id = vlapic->loaded.id;
+    uint32_t good_ldr = x2apic_ldr_from_id(vlapic->loaded.id);
 
-    if ( vlapic_x2apic_mode(vlapic) && id && vlapic->loaded.ldr == 1 )
-    {
+    /* Skip fixups on xAPIC mode, or if the x2APIC LDR is already correct */
+    if ( !vlapic_x2apic_mode(vlapic) ||
+         (vlapic->loaded.ldr == good_ldr) )
+        return;
+
+    if ( vlapic->loaded.ldr == 1 )
+       /*
+        * Xen <= 4.4 may have a bug by which all the APICs configured in
+        * x2APIC mode got LDR = 1. We can't leave it as-is because it
+        * assigned the same LDR to every CPU.  We'll fix fix the bug now
+        * and assign an LDR value consistent with the APIC ID.
+        */
+        set_x2apic_id(vlapic);
+    else if ( vlapic->loaded.ldr ==
+              x2apic_ldr_from_id(vlapic_vcpu(vlapic)->vcpu_id) )
         /*
-         * This is optional: ID != 0 contradicts LDR == 1. It's being added
-         * to aid in eventual debugging of issues arising from the fixup done
-         * here, but can be dropped as soon as it is found to conflict with
-         * other (future) changes.
+         * Migrations from Xen 4.4 to date (4.19 dev window, Nov 2023) may
+         * show this bug. We must preserve LDRs so new vCPUs use consistent
+         * derivations and existing guests, which may have already read the
+         * LDR at the source host, aren't surprised when interrupts stop
+         * working the way they did at the other end.
          */
-        if ( GET_xAPIC_ID(id) != vlapic_vcpu(vlapic)->vcpu_id * 2 ||
-             id != SET_xAPIC_ID(GET_xAPIC_ID(id)) )
-            printk(XENLOG_G_WARNING "%pv: bogus APIC ID %#x loaded\n",
-                   vlapic_vcpu(vlapic), id);
-        set_x2apic_id(vlapic);
-    }
-    else /* Undo an eventual earlier fixup. */
-    {
-        vlapic_set_reg(vlapic, APIC_ID, id);
-        vlapic_set_reg(vlapic, APIC_LDR, vlapic->loaded.ldr);
-    }
+        vlapic_domain(vlapic)->arch.hvm.bug_x2apic_ldr_vcpu_id = true;
+    else
+        printk(XENLOG_G_WARNING
+               "%pv: bogus x2APIC loaded id=%#x ldr=%#x (expected ldr=%#x)\n",
+               vlapic_vcpu(vlapic), vlapic->loaded.id, vlapic->loaded.ldr,
+               good_ldr);
 }
 
 static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t *h)
diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
index 6e53ce4449..dd9d837e84 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -106,6 +106,9 @@  struct hvm_domain {
 
     bool                   is_s3_suspended;
 
+    /* Compatibility setting for a bug in x2APIC LDR */
+    bool bug_x2apic_ldr_vcpu_id;
+
     /* hypervisor intercepted msix table */
     struct list_head       msixtbl_list;