Message ID | 20191015075444.10955-3-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | apic: Fix migration breakage of >255 vcpus | expand |
Peter Xu <peterx@redhat.com> wrote: > Migration is silently broken now with x2apic config like this: > > -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \ > -device intel-iommu,intremap=on,eim=on > > After migration, the guest kernel could hang at anything, due to > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so > any operations related to x2apic could be broken then (e.g., RDMSR on > x2apic MSRs could fail because KVM would think that the vcpu hasn't > enabled x2apic at all). > > The issue is that the x2apic bit was never applied correctly for vcpus > whose ID > 255 when migrate completes, and that's because when we > migrate APIC we use the APICCommonState.id as instance ID of the > migration stream, while that's too short for x2apic. > > Let's use the newly introduced initial_apic_id for that. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > hw/intc/apic_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c > index aafd8e0e33..6024a3e06a 100644 > --- a/hw/intc/apic_common.c > +++ b/hw/intc/apic_common.c > @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error **errp) > APICCommonState *s = APIC_COMMON(dev); > APICCommonClass *info; > static DeviceState *vapic; > - int instance_id = s->id; > + int64_t instance_id = s->initial_apic_id; int is ok here. But damn thing, initial_apic_id is uint32_t. Sniff. Later, Juan.
* Peter Xu (peterx@redhat.com) wrote: > Migration is silently broken now with x2apic config like this: > > -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \ > -device intel-iommu,intremap=on,eim=on > > After migration, the guest kernel could hang at anything, due to > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so > any operations related to x2apic could be broken then (e.g., RDMSR on > x2apic MSRs could fail because KVM would think that the vcpu hasn't > enabled x2apic at all). > > The issue is that the x2apic bit was never applied correctly for vcpus > whose ID > 255 when migrate completes, and that's because when we > migrate APIC we use the APICCommonState.id as instance ID of the > migration stream, while that's too short for x2apic. > > Let's use the newly introduced initial_apic_id for that. I'd like to understand a few things: a) Does this change the instance ID of existing APICs on the migration stream? a1) Ever for <256 CPUs? a2) For >=256 CPUs? [Because changing the ID breaks migration] b) Is the instance ID constant - I can see it's a property on the APIC, but I cna't see who sets it c) In the case where it fails, did we end up registering two devices with the same name and instance ID? If so, is it worth adding a check that would error if we tried? Dave > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > hw/intc/apic_common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c > index aafd8e0e33..6024a3e06a 100644 > --- a/hw/intc/apic_common.c > +++ b/hw/intc/apic_common.c > @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error **errp) > APICCommonState *s = APIC_COMMON(dev); > APICCommonClass *info; > static DeviceState *vapic; > - int instance_id = s->id; > + int64_t instance_id = s->initial_apic_id; > > info = APIC_COMMON_GET_CLASS(s); > info->realize(dev, errp); > -- > 2.21.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > Migration is silently broken now with x2apic config like this: > > > > -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \ > > -device intel-iommu,intremap=on,eim=on > > > > After migration, the guest kernel could hang at anything, due to > > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so > > any operations related to x2apic could be broken then (e.g., RDMSR on > > x2apic MSRs could fail because KVM would think that the vcpu hasn't > > enabled x2apic at all). > > > > The issue is that the x2apic bit was never applied correctly for vcpus > > whose ID > 255 when migrate completes, and that's because when we > > migrate APIC we use the APICCommonState.id as instance ID of the > > migration stream, while that's too short for x2apic. > > > > Let's use the newly introduced initial_apic_id for that. > > I'd like to understand a few things: > a) Does this change the instance ID of existing APICs on the > migration stream? > a1) Ever for <256 CPUs? No. > a2) For >=256 CPUs? Yes. > > [Because changing the ID breaks migration] But if we don't change it, the stream is broken too. :) Then the destination VM will receive e.g. two apic_id==0 instances (I think the apic_id==256 instance will wrongly overwrite the apic_id==0 one), while the vcpu with apic_id==256 will use the initial apic values. So IMHO we should still fix this, even if it changes the migration stream. At least we start to make it right. > > b) Is the instance ID constant - I can see it's a property on the > APIC, but I cna't see who sets it For each vcpu, I think yes it should be a constant as long as the topology is the same. This is how I understand it to be set: (1) In pc_cpus_init(), we init these: possible_cpus = mc->possible_cpu_arch_ids(ms); for (i = 0; i < ms->smp.cpus; i++) { pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal); } (2) In x86_cpu_apic_create(), we apply the apic_id to "id" property: qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id); > > c) In the case where it fails, did we end up registering two > devices with the same name and instance ID? If so, is it worth > adding a check that would error if we tried? Sounds doable. Thanks,
* Peter Xu (peterx@redhat.com) wrote: > On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote: > > * Peter Xu (peterx@redhat.com) wrote: > > > Migration is silently broken now with x2apic config like this: > > > > > > -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \ > > > -device intel-iommu,intremap=on,eim=on > > > > > > After migration, the guest kernel could hang at anything, due to > > > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so > > > any operations related to x2apic could be broken then (e.g., RDMSR on > > > x2apic MSRs could fail because KVM would think that the vcpu hasn't > > > enabled x2apic at all). > > > > > > The issue is that the x2apic bit was never applied correctly for vcpus > > > whose ID > 255 when migrate completes, and that's because when we > > > migrate APIC we use the APICCommonState.id as instance ID of the > > > migration stream, while that's too short for x2apic. > > > > > > Let's use the newly introduced initial_apic_id for that. > > > > I'd like to understand a few things: > > a) Does this change the instance ID of existing APICs on the > > migration stream? > > a1) Ever for <256 CPUs? > > No. > > > a2) For >=256 CPUs? > > Yes. > > > > > [Because changing the ID breaks migration] > > But if we don't change it, the stream is broken too. :) > > Then the destination VM will receive e.g. two apic_id==0 instances (I > think the apic_id==256 instance will wrongly overwrite the apic_id==0 > one), while the vcpu with apic_id==256 will use the initial apic > values. > > So IMHO we should still fix this, even if it changes the migration > stream. At least we start to make it right. Yes, that makes sense. It deserves a doc mention somewhere. > > > > b) Is the instance ID constant - I can see it's a property on the > > APIC, but I cna't see who sets it > > For each vcpu, I think yes it should be a constant as long as the > topology is the same. This is how I understand it to be set: > > (1) In pc_cpus_init(), we init these: > > possible_cpus = mc->possible_cpu_arch_ids(ms); > for (i = 0; i < ms->smp.cpus; i++) { > pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal); > } > > (2) In x86_cpu_apic_create(), we apply the apic_id to "id" property: > > qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id); OK, that's fine - as long as it's constaatn and not guest influenced. > > > > c) In the case where it fails, did we end up registering two > > devices with the same name and instance ID? If so, is it worth > > adding a check that would error if we tried? > > Sounds doable. > Great, Dave > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Oct 15, 2019 at 12:02:53PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > On Tue, Oct 15, 2019 at 10:22:18AM +0100, Dr. David Alan Gilbert wrote: > > > * Peter Xu (peterx@redhat.com) wrote: > > > > Migration is silently broken now with x2apic config like this: > > > > > > > > -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \ > > > > -device intel-iommu,intremap=on,eim=on > > > > > > > > After migration, the guest kernel could hang at anything, due to > > > > x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so > > > > any operations related to x2apic could be broken then (e.g., RDMSR on > > > > x2apic MSRs could fail because KVM would think that the vcpu hasn't > > > > enabled x2apic at all). > > > > > > > > The issue is that the x2apic bit was never applied correctly for vcpus > > > > whose ID > 255 when migrate completes, and that's because when we > > > > migrate APIC we use the APICCommonState.id as instance ID of the > > > > migration stream, while that's too short for x2apic. > > > > > > > > Let's use the newly introduced initial_apic_id for that. > > > > > > I'd like to understand a few things: > > > a) Does this change the instance ID of existing APICs on the > > > migration stream? > > > a1) Ever for <256 CPUs? > > > > No. > > > > > a2) For >=256 CPUs? > > > > Yes. > > > > > > > > [Because changing the ID breaks migration] > > > > But if we don't change it, the stream is broken too. :) > > > > Then the destination VM will receive e.g. two apic_id==0 instances (I > > think the apic_id==256 instance will wrongly overwrite the apic_id==0 > > one), while the vcpu with apic_id==256 will use the initial apic > > values. > > > > So IMHO we should still fix this, even if it changes the migration > > stream. At least we start to make it right. > > Yes, that makes sense. > It deserves a doc mention somewhere. > > > > > > > b) Is the instance ID constant - I can see it's a property on the > > > APIC, but I cna't see who sets it > > > > For each vcpu, I think yes it should be a constant as long as the > > topology is the same. This is how I understand it to be set: > > > > (1) In pc_cpus_init(), we init these: > > > > possible_cpus = mc->possible_cpu_arch_ids(ms); > > for (i = 0; i < ms->smp.cpus; i++) { > > pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal); > > } > > > > (2) In x86_cpu_apic_create(), we apply the apic_id to "id" property: > > > > qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id); > > OK, that's fine - as long as it's constaatn and not guest influenced. The guest may change the CPU APIC ID (although they rarely do), but I believe X86CPU::apic_id is always going to be the initial APIC ID. I'll double check (and maybe send a patch to rename it to initial_apic_id).
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index aafd8e0e33..6024a3e06a 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -315,7 +315,7 @@ static void apic_common_realize(DeviceState *dev, Error **errp) APICCommonState *s = APIC_COMMON(dev); APICCommonClass *info; static DeviceState *vapic; - int instance_id = s->id; + int64_t instance_id = s->initial_apic_id; info = APIC_COMMON_GET_CLASS(s); info->realize(dev, errp);
Migration is silently broken now with x2apic config like this: -smp 200,maxcpus=288,sockets=2,cores=72,threads=2 \ -device intel-iommu,intremap=on,eim=on After migration, the guest kernel could hang at anything, due to x2apic bit not migrated correctly in IA32_APIC_BASE on some vcpus, so any operations related to x2apic could be broken then (e.g., RDMSR on x2apic MSRs could fail because KVM would think that the vcpu hasn't enabled x2apic at all). The issue is that the x2apic bit was never applied correctly for vcpus whose ID > 255 when migrate completes, and that's because when we migrate APIC we use the APICCommonState.id as instance ID of the migration stream, while that's too short for x2apic. Let's use the newly introduced initial_apic_id for that. Signed-off-by: Peter Xu <peterx@redhat.com> --- hw/intc/apic_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)