diff mbox

[v3] spapr: disable decrementer during reset

Message ID 20170717041639.16137-1-nikunj@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nikunj A. Dadhania July 17, 2017, 4:16 a.m. UTC
Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.

When reset happens, all the CPUs are in halted state. First CPU is brought out
of reset and secondary CPUs would be initialized by the guest kernel using a
rtas call start-cpu.

However, in case of TCG, decrementer interrupts keep on coming and waking the
secondary CPUs up.

These secondary CPUs would see the decrementer interrupt pending, which makes
cpu::has_work() to bring them out of wait loop and start executing
tcg_exec_cpu().

The problem with this is all the CPUs wake up and start booting SLOF image,
causing the following exception(4 CPUs TCG VM):

[   81.440850] reboot: Restarting system

SLOF
S
SLOF
SLOFLOF[0[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m *************************************m**********[?25l **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
***********************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8

During reset, disable decrementer interrupt for secondary CPUs and enable them
when the secondary CPUs are brought online by rtas start-cpu call.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 hw/ppc/spapr_cpu_core.c | 9 +++++++++
 hw/ppc/spapr_rtas.c     | 8 ++++++++
 2 files changed, 17 insertions(+)

Comments

David Gibson July 18, 2017, 4:46 a.m. UTC | #1
On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> 
> When reset happens, all the CPUs are in halted state. First CPU is brought out
> of reset and secondary CPUs would be initialized by the guest kernel using a
> rtas call start-cpu.
> 
> However, in case of TCG, decrementer interrupts keep on coming and waking the
> secondary CPUs up.
> 
> These secondary CPUs would see the decrementer interrupt pending, which makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
> 
> The problem with this is all the CPUs wake up and start booting SLOF image,
> causing the following exception(4 CPUs TCG VM):

Ok, I'm still trying to understand why the behaviour on reboot is
different from the first boot.  AFAICT on initial boot, the LPCR will
have DEE / PECE3 enabled.  So why aren't we getting the same problem
then?


> 
> [   81.440850] reboot: Restarting system
> 
> SLOF
> S
> SLOF
> SLOFLOF[0[0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m *************************************m**********[?25l **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ***********************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> 
> During reset, disable decrementer interrupt for secondary CPUs and enable them
> when the secondary CPUs are brought online by rtas start-cpu call.
> 
> Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_cpu_core.c | 9 +++++++++
>  hw/ppc/spapr_rtas.c     | 8 ++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index ea278ce..bbfe8c2 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -87,6 +87,15 @@ static void spapr_cpu_reset(void *opaque)
>  
>      env->spr[SPR_HIOR] = 0;
>  
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>      /*
>       * This is a hack for the benefit of KVM PR - it abuses the SDR1
>       * slot in kvm_sregs to communicate the userspace address of the
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 94a2799..4623d1d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -174,6 +174,14 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>          kvm_cpu_synchronize_state(cs);
>  
>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> +        /* Enable DECR interrupt */
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> +        }
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;
Nikunj A. Dadhania July 18, 2017, 5:17 a.m. UTC | #2
David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> 
>> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>> 
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> Ok, I'm still trying to understand why the behaviour on reboot is
> different from the first boot.

During first boot, the cpu is in the stopped state, so
cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
until rtas start-cpu. Therefore, we never check the cpu_has_work()

In case of reboot, all CPUs are resumed after reboot. So we check the
next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
DECR interrupt and remove the CPU from halted state as the CPU has work.

> AFAICT on initial boot, the LPCR will
> have DEE / PECE3 enabled.  So why aren't we getting the same problem
> then?

Regards
Nikunj
Nikunj A. Dadhania July 18, 2017, 5:23 a.m. UTC | #3
David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> 
>> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>> 
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> Ok, I'm still trying to understand why the behaviour on reboot is
> different from the first boot.

During first boot, the cpu is in the stopped state, so
cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
until rtas start-cpu. Therefore, we never check the cpu_has_work()

In case of reboot, all CPUs are resumed after reboot. So we check the
next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
DECR interrupt and remove the CPU from halted state as the CPU has work.

> AFAICT on initial boot, the LPCR will
> have DEE / PECE3 enabled.  So why aren't we getting the same problem
> then?

Regards
Nikunj
Nikunj A. Dadhania July 18, 2017, 5:26 a.m. UTC | #4
David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> 
>> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>> 
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> Ok, I'm still trying to understand why the behaviour on reboot is
> different from the first boot.

During first boot, the cpu is in the stopped state, so
cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
until rtas start-cpu. Therefore, we never check the cpu_has_work()

In case of reboot, all CPUs are resumed after reboot. So we check the
next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
DECR interrupt and remove the CPU from halted state as the CPU has work.

> AFAICT on initial boot, the LPCR will
> have DEE / PECE3 enabled.  So why aren't we getting the same problem
> then?

Regards
Nikunj
David Gibson July 18, 2017, 6:50 a.m. UTC | #5
On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> >> 
> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
> >> of reset and secondary CPUs would be initialized by the guest kernel using a
> >> rtas call start-cpu.
> >> 
> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
> >> secondary CPUs up.
> >> 
> >> These secondary CPUs would see the decrementer interrupt pending, which makes
> >> cpu::has_work() to bring them out of wait loop and start executing
> >> tcg_exec_cpu().
> >> 
> >> The problem with this is all the CPUs wake up and start booting SLOF image,
> >> causing the following exception(4 CPUs TCG VM):
> >
> > Ok, I'm still trying to understand why the behaviour on reboot is
> > different from the first boot.
> 
> During first boot, the cpu is in the stopped state, so
> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
> until rtas start-cpu. Therefore, we never check the cpu_has_work()
> 
> In case of reboot, all CPUs are resumed after reboot. So we check the
> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> DECR interrupt and remove the CPU from halted state as the CPU has
> work.

Ok, so it sounds like we should set stopped on all the secondary CPUs
on reset as well.  What's causing them to be resumed after the reset
at the moment?

> > AFAICT on initial boot, the LPCR will
> > have DEE / PECE3 enabled.  So why aren't we getting the same problem
> > then?
> 
> Regards
> Nikunj
>
Benjamin Herrenschmidt July 18, 2017, 11:01 a.m. UTC | #6
On Tue, 2017-07-18 at 10:56 +0530, Nikunj A Dadhania wrote:
> In case of reboot, all CPUs are resumed after reboot. So we check the
> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> DECR interrupt and remove the CPU from halted state as the CPU has work.

Shouldn't we put them back in the stopped state then ?

Cheers,
Ben.
Nikunj A. Dadhania July 19, 2017, 3:50 a.m. UTC | #7
David Gibson <david@gibson.dropbear.id.au> writes:

> On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> >> 
>> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> >> of reset and secondary CPUs would be initialized by the guest kernel using a
>> >> rtas call start-cpu.
>> >> 
>> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> >> secondary CPUs up.
>> >> 
>> >> These secondary CPUs would see the decrementer interrupt pending, which makes
>> >> cpu::has_work() to bring them out of wait loop and start executing
>> >> tcg_exec_cpu().
>> >> 
>> >> The problem with this is all the CPUs wake up and start booting SLOF image,
>> >> causing the following exception(4 CPUs TCG VM):
>> >
>> > Ok, I'm still trying to understand why the behaviour on reboot is
>> > different from the first boot.
>> 
>> During first boot, the cpu is in the stopped state, so
>> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
>> until rtas start-cpu. Therefore, we never check the cpu_has_work()
>> 
>> In case of reboot, all CPUs are resumed after reboot. So we check the
>> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
>> DECR interrupt and remove the CPU from halted state as the CPU has
>> work.
>
> Ok, so it sounds like we should set stopped on all the secondary CPUs
> on reset as well.  What's causing them to be resumed after the reset
> at the moment?

That is part of the main loop in vl.c, when reset is requested. All the
vcpus are paused (stopped == true) then system reset is issued, and all
cpus are resumed (stopped == false). Which is correct.

static bool main_loop_should_exit(void)
{
[...]
    request = qemu_reset_requested();
    if (request) {
        pause_all_vcpus();
        qemu_system_reset(request);
        resume_all_vcpus();
        if (!runstate_check(RUN_STATE_RUNNING) &&
                !runstate_check(RUN_STATE_INMIGRATE)) {
            runstate_set(RUN_STATE_PRELAUNCH);
        }
    }
[...]
}

The CPUs are in halted state, i.e. cpu::halted = 1. We have set
cpu::halted = 0 for the primary CPU, aka first_cpu, in
spapr.c::ppc_spapr_reset()

In case of TCG, we have a decr callback (per CPU) scheduled once the
machine is started which keeps on running and interrupting irrespective
of the state of the machine.

tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);

Which keeps on queueing the interrupt to the CPUs. All the other CPUs
are in a tight loop checking cpu_thread_is_idle(), which returns false
when it sees the decrementer interrupt, the cpu starts executing:

cpu_exec()
  -> cpu_handle_halt()
     -> sees cpu_has_work() and sets cpu->halted = 0

And the execution resumes, when it shouldnt have. Ideally, for secondary
CPUs, cpu::halted flag is cleared in rtas start-cpu call.

Initially, I thought we should not have interrupt during reset state.
That was the reason of ignoring decr interrupt when msr_ee was disabled
in my previous patch. BenH suggested that it is wrong from HW
perspective.

Regards,
Nikunj
David Gibson July 19, 2017, 4:32 a.m. UTC | #8
On Wed, Jul 19, 2017 at 09:20:52AM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> 
> >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> >> >> 
> >> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
> >> >> of reset and secondary CPUs would be initialized by the guest kernel using a
> >> >> rtas call start-cpu.
> >> >> 
> >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
> >> >> secondary CPUs up.
> >> >> 
> >> >> These secondary CPUs would see the decrementer interrupt pending, which makes
> >> >> cpu::has_work() to bring them out of wait loop and start executing
> >> >> tcg_exec_cpu().
> >> >> 
> >> >> The problem with this is all the CPUs wake up and start booting SLOF image,
> >> >> causing the following exception(4 CPUs TCG VM):
> >> >
> >> > Ok, I'm still trying to understand why the behaviour on reboot is
> >> > different from the first boot.
> >> 
> >> During first boot, the cpu is in the stopped state, so
> >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
> >> until rtas start-cpu. Therefore, we never check the cpu_has_work()
> >> 
> >> In case of reboot, all CPUs are resumed after reboot. So we check the
> >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> >> DECR interrupt and remove the CPU from halted state as the CPU has
> >> work.
> >
> > Ok, so it sounds like we should set stopped on all the secondary CPUs
> > on reset as well.  What's causing them to be resumed after the reset
> > at the moment?
> 
> That is part of the main loop in vl.c, when reset is requested. All the
> vcpus are paused (stopped == true) then system reset is issued, and all
> cpus are resumed (stopped == false). Which is correct.

is it?  Seems we have a different value of 'stopped' on the first boot
compared to reoboots, which doesn't seem right.

> static bool main_loop_should_exit(void)
> {
> [...]
>     request = qemu_reset_requested();
>     if (request) {
>         pause_all_vcpus();
>         qemu_system_reset(request);
>         resume_all_vcpus();
>         if (!runstate_check(RUN_STATE_RUNNING) &&
>                 !runstate_check(RUN_STATE_INMIGRATE)) {
>             runstate_set(RUN_STATE_PRELAUNCH);
>         }
>     }
> [...]
> }
> 
> The CPUs are in halted state, i.e. cpu::halted = 1. We have set
> cpu::halted = 0 for the primary CPU, aka first_cpu, in
> spapr.c::ppc_spapr_reset()
> 
> In case of TCG, we have a decr callback (per CPU) scheduled once the
> machine is started which keeps on running and interrupting irrespective
> of the state of the machine.

Right.  The thing is "halted" means waiting-for-interrupt; it's mostly
used for things like x86 hlt instruction, H_CEDE, short-term nap modes
and so forth.  We're abusing it a little for keeping the secondary
CPUs offline until they're explicitly started.

Trouble is, there isn't a clearly better option - stopped is
automatically turned off after reset as above, so it can't be used for
"stopped under firmware / hypervisor authority".

> tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);
> 
> Which keeps on queueing the interrupt to the CPUs. All the other CPUs
> are in a tight loop checking cpu_thread_is_idle(), which returns false
> when it sees the decrementer interrupt, the cpu starts executing:
> 
> cpu_exec()
>   -> cpu_handle_halt()
>      -> sees cpu_has_work() and sets cpu->halted = 0
> 
> And the execution resumes, when it shouldnt have. Ideally, for secondary
> CPUs, cpu::halted flag is cleared in rtas start-cpu call.
> 
> Initially, I thought we should not have interrupt during reset state.
> That was the reason of ignoring decr interrupt when msr_ee was disabled
> in my previous patch. BenH suggested that it is wrong from HW
> perspective.
> 
> Regards,
> Nikunj
>
Cédric Le Goater Sept. 27, 2017, 8:36 p.m. UTC | #9
On 07/17/2017 06:16 AM, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> 
> When reset happens, all the CPUs are in halted state. First CPU is brought out
> of reset and secondary CPUs would be initialized by the guest kernel using a
> rtas call start-cpu.
> 
> However, in case of TCG, decrementer interrupts keep on coming and waking the
> secondary CPUs up.
> 
> These secondary CPUs would see the decrementer interrupt pending, which makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
> 
> The problem with this is all the CPUs wake up and start booting SLOF image,
> causing the following exception(4 CPUs TCG VM):
> 
> [   81.440850] reboot: Restarting system
> 
> SLOF
> S
> SLOF
> SLOFLOF[0[0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m *************************************m**********[?25l **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ***********************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> 
> During reset, disable decrementer interrupt for secondary CPUs and enable them
> when the secondary CPUs are brought online by rtas start-cpu call.
> 
> Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_cpu_core.c | 9 +++++++++
>  hw/ppc/spapr_rtas.c     | 8 ++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index ea278ce..bbfe8c2 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -87,6 +87,15 @@ static void spapr_cpu_reset(void *opaque)
>  
>      env->spr[SPR_HIOR] = 0;
>  
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>      /*
>       * This is a hack for the benefit of KVM PR - it abuses the SDR1
>       * slot in kvm_sregs to communicate the userspace address of the
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 94a2799..4623d1d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -174,6 +174,14 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>          kvm_cpu_synchronize_state(cs);
>  
>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> +        /* Enable DECR interrupt */
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> +        }
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;


you should also disable the decrementer in the stop-cpu call
when a cpu is hot unplugged.

One thing I don't explain is why the decrementer interrupt 
does not wake up the cpu after being  hot unplugged. I am not 
sure the code doing :

	env->msr = 0;

is responsible of this behavior. 

With the xive patchset, I am having issues in that area. 
the decrementer wakes up the cpu just after hot unplugged
and I need to set the LPCR to disable it. 

I wonder why XICS is immune to that. 

C.
diff mbox

Patch

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index ea278ce..bbfe8c2 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -87,6 +87,15 @@  static void spapr_cpu_reset(void *opaque)
 
     env->spr[SPR_HIOR] = 0;
 
+    /* Disable DECR for secondary cpus */
+    if (cs != first_cpu) {
+        if (env->mmu_model == POWERPC_MMU_3_00) {
+            env->spr[SPR_LPCR] &= ~LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
+        }
+    }
     /*
      * This is a hack for the benefit of KVM PR - it abuses the SDR1
      * slot in kvm_sregs to communicate the userspace address of the
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 94a2799..4623d1d 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -174,6 +174,14 @@  static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
         kvm_cpu_synchronize_state(cs);
 
         env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
+
+        /* Enable DECR interrupt */
+        if (env->mmu_model == POWERPC_MMU_3_00) {
+            env->spr[SPR_LPCR] |= LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
+        }
         env->nip = start;
         env->gpr[3] = r3;
         cs->halted = 0;