diff mbox series

[v2,3/4] apic: Use 32bit APIC ID for migration instance ID

Message ID 20191016022933.7276-4-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series apic: Fix migration breakage of >255 vcpus | expand

Commit Message

Peter Xu Oct. 16, 2019, 2:29 a.m. UTC
Migration is silently broken now with x2apic config like this:

     -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
     -device intel-iommu,intremap=on,eim=on

After migration, the guest kernel could hang at anything, due to
x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
any operations related to x2apic could be broken then (e.g., RDMSR on
x2apic MSRs could fail because KVM would think that the vcpu hasn't
enabled x2apic at all).

The issue is that the x2apic bit was never applied correctly for vcpus
whose ID > 255 when migrate completes, and that's because when we
migrate APIC we use the APICCommonState.id as instance ID of the
migration stream, while that's too short for x2apic.

Let's use the newly introduced initial_apic_id for that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/intc/apic_common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Eduardo Habkost Oct. 16, 2019, 2:42 a.m. UTC | #1
On Wed, Oct 16, 2019 at 10:29:32AM +0800, Peter Xu wrote:
> Migration is silently broken now with x2apic config like this:
> 
>      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
>      -device intel-iommu,intremap=on,eim=on
> 
> After migration, the guest kernel could hang at anything, due to
> x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> any operations related to x2apic could be broken then (e.g., RDMSR on
> x2apic MSRs could fail because KVM would think that the vcpu hasn't
> enabled x2apic at all).
> 
> The issue is that the x2apic bit was never applied correctly for vcpus
> whose ID > 255 when migrate completes, and that's because when we
> migrate APIC we use the APICCommonState.id as instance ID of the
> migration stream, while that's too short for x2apic.
> 
> Let's use the newly introduced initial_apic_id for that.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Juan Quintela Oct. 16, 2019, 8:44 a.m. UTC | #2
Peter Xu <peterx@redhat.com> wrote:
> Migration is silently broken now with x2apic config like this:
>
>      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
>      -device intel-iommu,intremap=on,eim=on
>
> After migration, the guest kernel could hang at anything, due to
> x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> any operations related to x2apic could be broken then (e.g., RDMSR on
> x2apic MSRs could fail because KVM would think that the vcpu hasn't
> enabled x2apic at all).
>
> The issue is that the x2apic bit was never applied correctly for vcpus
> whose ID > 255 when migrate completes, and that's because when we
> migrate APIC we use the APICCommonState.id as instance ID of the
> migration stream, while that's too short for x2apic.
>
> Let's use the newly introduced initial_apic_id for that.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>
diff mbox series

Patch

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index fabfa7320b..f0d88a1b14 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -315,7 +315,10 @@  static void apic_common_realize(DeviceState *dev, Error **errp)
     APICCommonState *s = APIC_COMMON(dev);
     APICCommonClass *info;
     static DeviceState *vapic;
-    uint32_t instance_id = s->id;
+    uint32_t instance_id = s->initial_apic_id;
+
+    /* Normally initial APIC ID should be no more than hundreds */
+    assert(instance_id != VMSTATE_INSTANCE_ID_ANY);
 
     info = APIC_COMMON_GET_CLASS(s);
     info->realize(dev, errp);