Message ID | 87poa0g62t.fsf@localhost.localdomain.i-did-not-set--mail-host-address--so-tickle-me (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/06/2017 08:10 AM, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > >> Hello, >> >> When a CPU is stopped with the 'stop-self' RTAS call, its state >> 'halted' is switched to 1 and, in this case, the MSR is not taken into >> account anymore in the cpu_has_work() routine. Only the pending >> hardware interrupts are checked with their LPCR:PECE* enablement bit. >> >> If the DECR timer fires after 'stop-self' is called and before the CPU >> 'stop' state is reached, the nearly-dead CPU will have some work to do >> and the guest will crash. This case happens very frequently with the >> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is >> occasionally fired but after 'stop' state, so no work is to be done >> and the guest survives. >> >> I suspect there is a race between the QEMU mainloop triggering the >> timers and the TCG CPU thread but I could not quite identify the root >> cause. To be safe, let's disable the decrementer interrupt in the LPCR >> when the CPU is halted and reenable it when the CPU is restarted. > > Moreover, disabling the DECR in the reset path solves the TCG multi cpu > reboot case, as reboot path does not call stop-cpu rtas call. yes. I was going to restart the thread on the topic. Let's how these two little patches are discussed. Then we/you can resend the missing hunk in reset which is needed to perform a TCG reboot. Thanks, C. > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 3e20b1d886..c5150ee590 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) > cs->halted = 1; > > env->spr[SPR_HIOR] = 0; > + /* Disable DECR for secondary cpus */ > + if (cs != first_cpu) { > + if (env->mmu_model == POWERPC_MMU_3_00) { > + env->spr[SPR_LPCR] &= ~LPCR_DEE; > + } else { > + /* P7 and P8 both have same bit for DECR */ > + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; > + } > + } > } > > static void spapr_cpu_destroy(PowerPCCPU *cpu) > > > Regards > Nikunj >
On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > > > Hello, > > > > When a CPU is stopped with the 'stop-self' RTAS call, its state > > 'halted' is switched to 1 and, in this case, the MSR is not taken into > > account anymore in the cpu_has_work() routine. Only the pending > > hardware interrupts are checked with their LPCR:PECE* enablement bit. > > > > If the DECR timer fires after 'stop-self' is called and before the CPU > > 'stop' state is reached, the nearly-dead CPU will have some work to do > > and the guest will crash. This case happens very frequently with the > > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is > > occasionally fired but after 'stop' state, so no work is to be done > > and the guest survives. > > > > I suspect there is a race between the QEMU mainloop triggering the > > timers and the TCG CPU thread but I could not quite identify the root > > cause. To be safe, let's disable the decrementer interrupt in the LPCR > > when the CPU is halted and reenable it when the CPU is restarted. > > Moreover, disabling the DECR in the reset path solves the TCG multi cpu > reboot case, as reboot path does not call stop-cpu rtas call. SHouldn't we do it in set_papr too and only turn it on for the boot CPU and in start-cpu RTAS call ? Same with the other PECEs in fact... > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 3e20b1d886..c5150ee590 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) > cs->halted = 1; > > env->spr[SPR_HIOR] = 0; > + /* Disable DECR for secondary cpus */ > + if (cs != first_cpu) { > + if (env->mmu_model == POWERPC_MMU_3_00) { > + env->spr[SPR_LPCR] &= ~LPCR_DEE; > + } else { > + /* P7 and P8 both have same bit for DECR */ > + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; > + } > + } > } > > static void spapr_cpu_destroy(PowerPCCPU *cpu) > > > Regards > Nikunj
On 10/06/2017 09:46 AM, Benjamin Herrenschmidt wrote: > On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote: >> Cédric Le Goater <clg@kaod.org> writes: >> >>> Hello, >>> >>> When a CPU is stopped with the 'stop-self' RTAS call, its state >>> 'halted' is switched to 1 and, in this case, the MSR is not taken into >>> account anymore in the cpu_has_work() routine. Only the pending >>> hardware interrupts are checked with their LPCR:PECE* enablement bit. >>> >>> If the DECR timer fires after 'stop-self' is called and before the CPU >>> 'stop' state is reached, the nearly-dead CPU will have some work to do >>> and the guest will crash. This case happens very frequently with the >>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is >>> occasionally fired but after 'stop' state, so no work is to be done >>> and the guest survives. >>> >>> I suspect there is a race between the QEMU mainloop triggering the >>> timers and the TCG CPU thread but I could not quite identify the root >>> cause. To be safe, let's disable the decrementer interrupt in the LPCR >>> when the CPU is halted and reenable it when the CPU is restarted. >> >> Moreover, disabling the DECR in the reset path solves the TCG multi cpu >> reboot case, as reboot path does not call stop-cpu rtas call. > > SHouldn't we do it in set_papr too and only turn it on for the boot CPU > and in start-cpu RTAS call ? Same with the other PECEs in fact... yes I agree. In cpu_ppc_set_papr(), we should set the PECE* bits only for the boot CPU and then let the RTAS calls start-cpu and stop-self do the enablement and disablement. I will respin the patchset. C. >> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c >> index 3e20b1d886..c5150ee590 100644 >> --- a/hw/ppc/spapr_cpu_core.c >> +++ b/hw/ppc/spapr_cpu_core.c >> @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) >> cs->halted = 1; >> >> env->spr[SPR_HIOR] = 0; >> + /* Disable DECR for secondary cpus */ >> + if (cs != first_cpu) { >> + if (env->mmu_model == POWERPC_MMU_3_00) { >> + env->spr[SPR_LPCR] &= ~LPCR_DEE; >> + } else { >> + /* P7 and P8 both have same bit for DECR */ >> + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; >> + } >> + } >> } >> >> static void spapr_cpu_destroy(PowerPCCPU *cpu) >> >> >> Regards >> Nikunj
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: > On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote: >> Cédric Le Goater <clg@kaod.org> writes: >> >> > Hello, >> > >> > When a CPU is stopped with the 'stop-self' RTAS call, its state >> > 'halted' is switched to 1 and, in this case, the MSR is not taken into >> > account anymore in the cpu_has_work() routine. Only the pending >> > hardware interrupts are checked with their LPCR:PECE* enablement bit. >> > >> > If the DECR timer fires after 'stop-self' is called and before the CPU >> > 'stop' state is reached, the nearly-dead CPU will have some work to do >> > and the guest will crash. This case happens very frequently with the >> > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is >> > occasionally fired but after 'stop' state, so no work is to be done >> > and the guest survives. >> > >> > I suspect there is a race between the QEMU mainloop triggering the >> > timers and the TCG CPU thread but I could not quite identify the root >> > cause. To be safe, let's disable the decrementer interrupt in the LPCR >> > when the CPU is halted and reenable it when the CPU is restarted. >> >> Moreover, disabling the DECR in the reset path solves the TCG multi cpu >> reboot case, as reboot path does not call stop-cpu rtas call. > > SHouldn't we do it in set_papr too and only turn it on for the boot CPU > and in start-cpu RTAS call ? Same with the other PECEs in fact... Yes, +1 for that Regards Nikunj
On Fri, Oct 06, 2017 at 11:40:02AM +0530, Nikunj A Dadhania wrote: > Cédric Le Goater <clg@kaod.org> writes: > > > Hello, > > > > When a CPU is stopped with the 'stop-self' RTAS call, its state > > 'halted' is switched to 1 and, in this case, the MSR is not taken into > > account anymore in the cpu_has_work() routine. Only the pending > > hardware interrupts are checked with their LPCR:PECE* enablement bit. > > > > If the DECR timer fires after 'stop-self' is called and before the CPU > > 'stop' state is reached, the nearly-dead CPU will have some work to do > > and the guest will crash. This case happens very frequently with the > > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is > > occasionally fired but after 'stop' state, so no work is to be done > > and the guest survives. > > > > I suspect there is a race between the QEMU mainloop triggering the > > timers and the TCG CPU thread but I could not quite identify the root > > cause. To be safe, let's disable the decrementer interrupt in the LPCR > > when the CPU is halted and reenable it when the CPU is restarted. > > Moreover, disabling the DECR in the reset path solves the TCG multi cpu > reboot case, as reboot path does not call stop-cpu rtas call. > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index 3e20b1d886..c5150ee590 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) > cs->halted = 1; > > env->spr[SPR_HIOR] = 0; > + /* Disable DECR for secondary cpus */ > + if (cs != first_cpu) { > + if (env->mmu_model == POWERPC_MMU_3_00) { > + env->spr[SPR_LPCR] &= ~LPCR_DEE; > + } else { > + /* P7 and P8 both have same bit for DECR */ > + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; > + } > + } > } This seems reasonable. > > static void spapr_cpu_destroy(PowerPCCPU *cpu) > > > Regards > Nikunj >
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 3e20b1d886..c5150ee590 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque) cs->halted = 1; env->spr[SPR_HIOR] = 0; + /* Disable DECR for secondary cpus */ + if (cs != first_cpu) { + if (env->mmu_model == POWERPC_MMU_3_00) { + env->spr[SPR_LPCR] &= ~LPCR_DEE; + } else { + /* P7 and P8 both have same bit for DECR */ + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3; + } + } } static void spapr_cpu_destroy(PowerPCCPU *cpu)