diff mbox series

[2/2] apic: Use 32bit APIC ID for migration instance ID

Message ID 20191015075444.10955-3-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series apic: Fix migration breakage of >255 vcpus | expand

Commit Message

Peter Xu Oct. 15, 2019, 7:54 a.m. UTC
Migration is silently broken now with x2apic config like this:

     -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
     -device intel-iommu,intremap=on,eim=on

After migration, the guest kernel could hang at anything, due to
x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
any operations related to x2apic could be broken then (e.g., RDMSR on
x2apic MSRs could fail because KVM would think that the vcpu hasn't
enabled x2apic at all).

The issue is that the x2apic bit was never applied correctly for vcpus
whose ID > 255 when migrate completes, and that's because when we
migrate APIC we use the APICCommonState.id as instance ID of the
migration stream, while that's too short for x2apic.

Let's use the newly introduced initial_apic_id for that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 hw/intc/apic_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Juan Quintela Oct. 15, 2019, 8:30 a.m. UTC | #1
Peter Xu <peterx@redhat.com> wrote:
> Migration is silently broken now with x2apic config like this:
>
>      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
>      -device intel-iommu,intremap=on,eim=on
>
> After migration, the guest kernel could hang at anything, due to
> x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> any operations related to x2apic could be broken then (e.g., RDMSR on
> x2apic MSRs could fail because KVM would think that the vcpu hasn't
> enabled x2apic at all).
>
> The issue is that the x2apic bit was never applied correctly for vcpus
> whose ID > 255 when migrate completes, and that's because when we
> migrate APIC we use the APICCommonState.id as instance ID of the
> migration stream, while that's too short for x2apic.
>
> Let's use the newly introduced initial_apic_id for that.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/intc/apic_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index aafd8e0e33..6024a3e06a 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error **errp)
>      APICCommonState *s = APIC_COMMON(dev);
>      APICCommonClass *info;
>      static DeviceState *vapic;
> -    int instance_id = s->id;
> +    int64_t instance_id = s->initial_apic_id;

int is ok here.

But damn thing, initial_apic_id is uint32_t.  Sniff.

Later, Juan.
Dr. David Alan Gilbert Oct. 15, 2019, 9:22 a.m. UTC | #2
* Peter Xu (peterx@redhat.com) wrote:
> Migration is silently broken now with x2apic config like this:
> 
>      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
>      -device intel-iommu,intremap=on,eim=on
> 
> After migration, the guest kernel could hang at anything, due to
> x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> any operations related to x2apic could be broken then (e.g., RDMSR on
> x2apic MSRs could fail because KVM would think that the vcpu hasn't
> enabled x2apic at all).
> 
> The issue is that the x2apic bit was never applied correctly for vcpus
> whose ID > 255 when migrate completes, and that's because when we
> migrate APIC we use the APICCommonState.id as instance ID of the
> migration stream, while that's too short for x2apic.
> 
> Let's use the newly introduced initial_apic_id for that.

I'd like to understand a few things:
   a) Does this change the instance ID of existing APICs on the
migration stream? 
     a1) Ever for <256 CPUs?
     a2) For >=256 CPUs?

    [Because changing the ID breaks migration]

  b) Is the instance ID constant - I can see it's a property on the
     APIC, but I cna't see who sets it

  c) In the case where it fails, did we end up registering two
     devices with the same name and instance ID?  If so, is it worth
     adding a check that would error if we tried?

Dave

> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  hw/intc/apic_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index aafd8e0e33..6024a3e06a 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error **errp)
>      APICCommonState *s = APIC_COMMON(dev);
>      APICCommonClass *info;
>      static DeviceState *vapic;
> -    int instance_id = s->id;
> +    int64_t instance_id = s->initial_apic_id;
>  
>      info = APIC_COMMON_GET_CLASS(s);
>      info->realize(dev, errp);
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Oct. 15, 2019, 10:16 a.m. UTC | #3
On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > Migration is silently broken now with x2apic config like this:
> > 
> >      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
> >      -device intel-iommu,intremap=on,eim=on
> > 
> > After migration, the guest kernel could hang at anything, due to
> > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> > any operations related to x2apic could be broken then (e.g., RDMSR on
> > x2apic MSRs could fail because KVM would think that the vcpu hasn't
> > enabled x2apic at all).
> > 
> > The issue is that the x2apic bit was never applied correctly for vcpus
> > whose ID > 255 when migrate completes, and that's because when we
> > migrate APIC we use the APICCommonState.id as instance ID of the
> > migration stream, while that's too short for x2apic.
> > 
> > Let's use the newly introduced initial_apic_id for that.
> 
> I'd like to understand a few things:
>    a) Does this change the instance ID of existing APICs on the
> migration stream? 
>      a1) Ever for <256 CPUs?

No.

>      a2) For >=256 CPUs?

Yes.

> 
>     [Because changing the ID breaks migration]

But if we don't change it, the stream is broken too. :)

Then the destination VM will receive e.g. two apic_id==0 instances (I
think the apic_id==256 instance will wrongly overwrite the apic_id==0
one), while the vcpu with apic_id==256 will use the initial apic
values.

So IMHO we should still fix this, even if it changes the migration
stream.  At least we start to make it right.

> 
>   b) Is the instance ID constant - I can see it's a property on the
>      APIC, but I cna't see who sets it

For each vcpu, I think yes it should be a constant as long as the
topology is the same.  This is how I understand it to be set:

(1) In pc_cpus_init(), we init these:

    possible_cpus = mc->possible_cpu_arch_ids(ms);
    for (i = 0; i < ms->smp.cpus; i++) {
        pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
    }

(2) In x86_cpu_apic_create(), we apply the apic_id to "id" property:

    qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);

> 
>   c) In the case where it fails, did we end up registering two
>      devices with the same name and instance ID?  If so, is it worth
>      adding a check that would error if we tried?

Sounds doable.

Thanks,
Dr. David Alan Gilbert Oct. 15, 2019, 11:02 a.m. UTC | #4
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > Migration is silently broken now with x2apic config like this:
> > > 
> > >      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
> > >      -device intel-iommu,intremap=on,eim=on
> > > 
> > > After migration, the guest kernel could hang at anything, due to
> > > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> > > any operations related to x2apic could be broken then (e.g., RDMSR on
> > > x2apic MSRs could fail because KVM would think that the vcpu hasn't
> > > enabled x2apic at all).
> > > 
> > > The issue is that the x2apic bit was never applied correctly for vcpus
> > > whose ID > 255 when migrate completes, and that's because when we
> > > migrate APIC we use the APICCommonState.id as instance ID of the
> > > migration stream, while that's too short for x2apic.
> > > 
> > > Let's use the newly introduced initial_apic_id for that.
> > 
> > I'd like to understand a few things:
> >    a) Does this change the instance ID of existing APICs on the
> > migration stream? 
> >      a1) Ever for <256 CPUs?
> 
> No.
> 
> >      a2) For >=256 CPUs?
> 
> Yes.
> 
> > 
> >     [Because changing the ID breaks migration]
> 
> But if we don't change it, the stream is broken too. :)
> 
> Then the destination VM will receive e.g. two apic_id==0 instances (I
> think the apic_id==256 instance will wrongly overwrite the apic_id==0
> one), while the vcpu with apic_id==256 will use the initial apic
> values.
> 
> So IMHO we should still fix this, even if it changes the migration
> stream.  At least we start to make it right.

Yes, that makes sense.
It deserves a doc mention somewhere.

> > 
> >   b) Is the instance ID constant - I can see it's a property on the
> >      APIC, but I cna't see who sets it
> 
> For each vcpu, I think yes it should be a constant as long as the
> topology is the same.  This is how I understand it to be set:
> 
> (1) In pc_cpus_init(), we init these:
> 
>     possible_cpus = mc->possible_cpu_arch_ids(ms);
>     for (i = 0; i < ms->smp.cpus; i++) {
>         pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
>     }
> 
> (2) In x86_cpu_apic_create(), we apply the apic_id to "id" property:
> 
>     qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);

OK, that's fine - as long as it's constaatn and not guest influenced.

> > 
> >   c) In the case where it fails, did we end up registering two
> >      devices with the same name and instance ID?  If so, is it worth
> >      adding a check that would error if we tried?
> 
> Sounds doable.
> 

Great,

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Eduardo Habkost Oct. 15, 2019, 7:49 p.m. UTC | #5
On Tue, Oct 15, 2019 at 12:02:53PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > Migration is silently broken now with x2apic config like this:
> > > > 
> > > >      -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \
> > > >      -device intel-iommu,intremap=on,eim=on
> > > > 
> > > > After migration, the guest kernel could hang at anything, due to
> > > > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so
> > > > any operations related to x2apic could be broken then (e.g., RDMSR on
> > > > x2apic MSRs could fail because KVM would think that the vcpu hasn't
> > > > enabled x2apic at all).
> > > > 
> > > > The issue is that the x2apic bit was never applied correctly for vcpus
> > > > whose ID > 255 when migrate completes, and that's because when we
> > > > migrate APIC we use the APICCommonState.id as instance ID of the
> > > > migration stream, while that's too short for x2apic.
> > > > 
> > > > Let's use the newly introduced initial_apic_id for that.
> > > 
> > > I'd like to understand a few things:
> > >    a) Does this change the instance ID of existing APICs on the
> > > migration stream? 
> > >      a1) Ever for <256 CPUs?
> > 
> > No.
> > 
> > >      a2) For >=256 CPUs?
> > 
> > Yes.
> > 
> > > 
> > >     [Because changing the ID breaks migration]
> > 
> > But if we don't change it, the stream is broken too. :)
> > 
> > Then the destination VM will receive e.g. two apic_id==0 instances (I
> > think the apic_id==256 instance will wrongly overwrite the apic_id==0
> > one), while the vcpu with apic_id==256 will use the initial apic
> > values.
> > 
> > So IMHO we should still fix this, even if it changes the migration
> > stream.  At least we start to make it right.
> 
> Yes, that makes sense.
> It deserves a doc mention somewhere.
> 
> > > 
> > >   b) Is the instance ID constant - I can see it's a property on the
> > >      APIC, but I cna't see who sets it
> > 
> > For each vcpu, I think yes it should be a constant as long as the
> > topology is the same.  This is how I understand it to be set:
> > 
> > (1) In pc_cpus_init(), we init these:
> > 
> >     possible_cpus = mc->possible_cpu_arch_ids(ms);
> >     for (i = 0; i < ms->smp.cpus; i++) {
> >         pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
> >     }
> > 
> > (2) In x86_cpu_apic_create(), we apply the apic_id to "id" property:
> > 
> >     qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);
> 
> OK, that's fine - as long as it's constaatn and not guest influenced.

The guest may change the CPU APIC ID (although they rarely do),
but I believe X86CPU::apic_id is always going to be the initial
APIC ID.  I'll double check (and maybe send a patch to rename it
to initial_apic_id).
diff mbox series

Patch

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index aafd8e0e33..6024a3e06a 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -315,7 +315,7 @@  static void apic_common_realize(DeviceState *dev, Error **errp)
     APICCommonState *s = APIC_COMMON(dev);
     APICCommonClass *info;
     static DeviceState *vapic;
-    int instance_id = s->id;
+    int64_t instance_id = s->initial_apic_id;
 
     info = APIC_COMMON_GET_CLASS(s);
     info->realize(dev, errp);