diff mbox series

[06/13] spapr/xive: fix migration of the XiveTCTX under TCG

Message ID 20190107183946.7230-7-clg@kaod.org (mailing list archive)
State New, archived
Headers show
Series spapr: add KVM support to the XIVE interrupt mode | expand

Commit Message

Cédric Le Goater Jan. 7, 2019, 6:39 p.m. UTC
When the thread interrupt management state is retrieved from the KVM
VCPU, word2 is saved under the QEMU XIVE thread context to print out
the OS CAM line under the QEMU monitor.

This breaks the migration of a TCG guest (and with KVM when
kernel_irqchip=off) because the matching algorithm of the presenter
relies on the OS CAM value. Fix with an extra reset of the thread
contexts to restore the expected value.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

David Gibson Feb. 8, 2019, 5:36 a.m. UTC | #1
On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
> When the thread interrupt management state is retrieved from the KVM
> VCPU, word2 is saved under the QEMU XIVE thread context to print out
> the OS CAM line under the QEMU monitor.
> 
> This breaks the migration of a TCG guest (and with KVM when
> kernel_irqchip=off) because the matching algorithm of the presenter
> relies on the OS CAM value. Fix with an extra reset of the thread
> contexts to restore the expected value.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Why is the CAM value you get from KVM different from the one you
expect in qemu?

> ---
>  hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 233c97c5ecd9..ba27d9d8e972 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -363,7 +363,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>  
>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  {
> -    return spapr_xive_post_load(spapr->xive, version_id);
> +    CPUState *cs;
> +    int ret;
> +
> +    ret = spapr_xive_post_load(spapr->xive, version_id);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    /*
> +     * When the states are collected from the KVM XIVE device, word2
> +     * of the XiveTCTX is set to print out the OS CAM line under the
> +     * QEMU monitor.
> +     *
> +     * This breaks the migration on a TCG guest (or on KVM with
> +     * kernel_irqchip=off) because the matching algorithm of the
> +     * presenter relies on the OS CAM value. Fix with an extra reset
> +     * of the thread contexts to restore the expected value.
> +     */
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        /* (TCG) Set the OS CAM line of the thread interrupt context. */
> +        spapr_xive_set_tctx_os_cam(cpu->tctx);
> +    }
> +    return 0;
>  }
>  
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
Cédric Le Goater Feb. 8, 2019, 7:12 a.m. UTC | #2
On 2/8/19 6:36 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
>> When the thread interrupt management state is retrieved from the KVM
>> VCPU, word2 is saved under the QEMU XIVE thread context to print out
>> the OS CAM line under the QEMU monitor.
>>
>> This breaks the migration of a TCG guest (and with KVM when
>> kernel_irqchip=off) because the matching algorithm of the presenter
>> relies on the OS CAM value. Fix with an extra reset of the thread
>> contexts to restore the expected value.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Why is the CAM value you get from KVM different from the one you
> expect in qemu?

An NVT base identifier is allocated for each VM at the OPAL level
and each vCPU getd an increment of this value. It is pushed in the 
OS CAM line when the vCPU is scheduled to run.
 
KVM XIVE names this identifier a VP id. 

The QEMU emulation of XIVE uses a fixed value for the NVT base 
identifier.

C.

 
>> ---
>>  hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 233c97c5ecd9..ba27d9d8e972 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -363,7 +363,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>>  
>>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>>  {
>> -    return spapr_xive_post_load(spapr->xive, version_id);
>> +    CPUState *cs;
>> +    int ret;
>> +
>> +    ret = spapr_xive_post_load(spapr->xive, version_id);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    /*
>> +     * When the states are collected from the KVM XIVE device, word2
>> +     * of the XiveTCTX is set to print out the OS CAM line under the
>> +     * QEMU monitor.
>> +     *
>> +     * This breaks the migration on a TCG guest (or on KVM with
>> +     * kernel_irqchip=off) because the matching algorithm of the
>> +     * presenter relies on the OS CAM value. Fix with an extra reset
>> +     * of the thread contexts to restore the expected value.
>> +     */
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +
>> +        /* (TCG) Set the OS CAM line of the thread interrupt context. */
>> +        spapr_xive_set_tctx_os_cam(cpu->tctx);
>> +    }
>> +    return 0;
>>  }
>>  
>>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>
David Gibson Feb. 12, 2019, 12:22 a.m. UTC | #3
On Fri, Feb 08, 2019 at 08:12:12AM +0100, Cédric Le Goater wrote:
> On 2/8/19 6:36 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
> >> When the thread interrupt management state is retrieved from the KVM
> >> VCPU, word2 is saved under the QEMU XIVE thread context to print out
> >> the OS CAM line under the QEMU monitor.
> >>
> >> This breaks the migration of a TCG guest (and with KVM when
> >> kernel_irqchip=off) because the matching algorithm of the presenter
> >> relies on the OS CAM value. Fix with an extra reset of the thread
> >> contexts to restore the expected value.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > Why is the CAM value you get from KVM different from the one you
> > expect in qemu?
> 
> An NVT base identifier is allocated for each VM at the OPAL level
> and each vCPU getd an increment of this value. It is pushed in the 
> OS CAM line when the vCPU is scheduled to run.
>  
> KVM XIVE names this identifier a VP id. 
> 
> The QEMU emulation of XIVE uses a fixed value for the NVT base 
> identifier.

Ah, I see.

Hmm.  Really this highlights why I'm uneasy migrating the whole TCTX
as just a blob of registers, even though only some of them are really
runtime state, and others are machine configuration that can be worked
out separately at the two ends.
Cédric Le Goater Feb. 12, 2019, 6:58 a.m. UTC | #4
On 2/12/19 1:22 AM, David Gibson wrote:
> On Fri, Feb 08, 2019 at 08:12:12AM +0100, Cédric Le Goater wrote:
>> On 2/8/19 6:36 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
>>>> When the thread interrupt management state is retrieved from the KVM
>>>> VCPU, word2 is saved under the QEMU XIVE thread context to print out
>>>> the OS CAM line under the QEMU monitor.
>>>>
>>>> This breaks the migration of a TCG guest (and with KVM when
>>>> kernel_irqchip=off) because the matching algorithm of the presenter
>>>> relies on the OS CAM value. Fix with an extra reset of the thread
>>>> contexts to restore the expected value.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Why is the CAM value you get from KVM different from the one you
>>> expect in qemu?
>>
>> An NVT base identifier is allocated for each VM at the OPAL level
>> and each vCPU getd an increment of this value. It is pushed in the 
>> OS CAM line when the vCPU is scheduled to run.
>>  
>> KVM XIVE names this identifier a VP id. 
>>
>> The QEMU emulation of XIVE uses a fixed value for the NVT base 
>> identifier.
> 
> Ah, I see.
> 
> Hmm.  Really this highlights why I'm uneasy migrating the whole TCTX
> as just a blob of registers, even though only some of them are really
> runtime state, and others are machine configuration that can be worked
> out separately at the two ends.

This is really a special case :

1. migration kernel_irqchip=on -> kernel_irqchip=off
2. debug info from KVM is squashing TCG state

We could ignore the VP id configured by KVM but it seems interesting
to retrieve. 

C.
diff mbox series

Patch

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 233c97c5ecd9..ba27d9d8e972 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -363,7 +363,31 @@  static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-    return spapr_xive_post_load(spapr->xive, version_id);
+    CPUState *cs;
+    int ret;
+
+    ret = spapr_xive_post_load(spapr->xive, version_id);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * When the states are collected from the KVM XIVE device, word2
+     * of the XiveTCTX is set to print out the OS CAM line under the
+     * QEMU monitor.
+     *
+     * This breaks the migration on a TCG guest (or on KVM with
+     * kernel_irqchip=off) because the matching algorithm of the
+     * presenter relies on the OS CAM value. Fix with an extra reset
+     * of the thread contexts to restore the expected value.
+     */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        /* (TCG) Set the OS CAM line of the thread interrupt context. */
+        spapr_xive_set_tctx_os_cam(cpu->tctx);
+    }
+    return 0;
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)