diff mbox series

xen: Don't call panic if ARM TF cpu off returns DENIED

Message ID 20220616135541.3333760-1-dmitry.semenets@gmail.com (mailing list archive)
State New, archived
Headers show
Series xen: Don't call panic if ARM TF cpu off returns DENIED | expand

Commit Message

Dmytro Semenets June 16, 2022, 1:55 p.m. UTC
From: Dmytro Semenets <dmytro_semenets@epam.com>

According to PSCI specification ARM TF can return DENIED on CPU OFF.
This patch brings the hypervisor into compliance with the PSCI
specification.
Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
section 5.5.2

Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 xen/arch/arm/psci.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Julien Grall June 16, 2022, 3:12 p.m. UTC | #1
Hi,

On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
> From: Dmytro Semenets <dmytro_semenets@epam.com>
> 
> According to PSCI specification ARM TF can return DENIED on CPU OFF.

I am confused. The spec is talking about Trusted OS and not firmware. 
The docummentation is also not specific to ARM Trusted Firmware. So did 
you mean "Trusted OS"?

Also, did you reproduce on HW? If so, on which CPU this will fail?

> This patch brings the hypervisor into compliance with the PSCI
> specification.

Now it means CPU off will never be turned off using PSCI. Instead, we 
would end up to spin in Xen. This would be a problem because we could 
save less power.

> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
> section 5.5.2

Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when the 
trusted OS can only run on one core.

Some of the trusted OS are migratable. So I think we should first 
attempt to migrate the CPU. Then if it doesn't work, we should prevent 
the CPU to go offline.

That said, upstream doesn't support cpu offlining (I don't know for your 
use case). In case of shutdown, it is not necessary to offline the CPU, 
so we could avoid to call CPU_OFF on all CPUs but one. Something like:

diff --git a/xen/arch/arm/shutdown.c b/xen/arch/arm/shutdown.c
index 3dc6819d56de..d956812ef8f4 100644
--- a/xen/arch/arm/shutdown.c
+++ b/xen/arch/arm/shutdown.c
@@ -8,7 +8,9 @@

  static void noreturn halt_this_cpu(void *arg)
  {
-    stop_cpu();
+    ASSERT(!local_irq_enable());
+    while ( 1 )
+        wfi();
  }

  void machine_halt(void)
@@ -21,10 +23,6 @@ void machine_halt(void)
      smp_call_function(halt_this_cpu, NULL, 0);
      local_irq_disable();

-    /* Wait at most another 10ms for all other CPUs to go offline. */
-    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
-        mdelay(1);
-
      /* This is mainly for PSCI-0.2, which does not return if success. */
      call_psci_system_off();

> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

I don't recall to see patch on the ML recently for this. So is this an 
internal review?

> ---
>   xen/arch/arm/psci.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> index 0c90c2305c..55787fde58 100644
> --- a/xen/arch/arm/psci.c
> +++ b/xen/arch/arm/psci.c
> @@ -63,8 +63,9 @@ void call_psci_cpu_off(void)
>   
>           /* If successfull the PSCI cpu_off call doesn't return */
>           arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, &res);
> -        panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
> -              PSCI_RET(res));
> +        if ( PSCI_RET(res) != PSCI_DENIED )
> +            panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
> +                PSCI_RET(res));
>       }
>   }
>   

Cheers,
Volodymyr Babchuk June 16, 2022, 6:40 p.m. UTC | #2
Hi Julien,

Julien Grall <julien@xen.org> writes:

> Hi,
>
> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
>> From: Dmytro Semenets <dmytro_semenets@epam.com>
>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
>
> I am confused. The spec is talking about Trusted OS and not
> firmware. The docummentation is also not specific to ARM Trusted
> Firmware. So did you mean "Trusted OS"?

It should be "firmware", I believe.

>
> Also, did you reproduce on HW? If so, on which CPU this will fail?
>

Yes, we reproduced this on HW. In our case it failed on CPU0. To be
fair, in our case it had nothing to do with Trusted OS. It is just
platform limitation - it can't turn off CPU0. But from Xen perspective
there is no difference - CPU_OFF call returns DENIED.

>> This patch brings the hypervisor into compliance with the PSCI
>> specification.
>
> Now it means CPU off will never be turned off using PSCI. Instead, we
> would end up to spin in Xen. This would be a problem because we could
> save less power.

Agreed.

>
>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
>> section 5.5.2
>
> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
> the trusted OS can only run on one core.
>
> Some of the trusted OS are migratable. So I think we should first
> attempt to migrate the CPU. Then if it doesn't work, we should prevent
> the CPU to go offline.
>
> That said, upstream doesn't support cpu offlining (I don't know for
> your use case). In case of shutdown, it is not necessary to offline
> the CPU, so we could avoid to call CPU_OFF on all CPUs but
> one. Something like:
>

This is even better approach yes. But you mentioned CPU_OFF. Did you
mean SYSTEM_RESET?

> diff --git a/xen/arch/arm/shutdown.c b/xen/arch/arm/shutdown.c
> index 3dc6819d56de..d956812ef8f4 100644
> --- a/xen/arch/arm/shutdown.c
> +++ b/xen/arch/arm/shutdown.c
> @@ -8,7 +8,9 @@
>
>  static void noreturn halt_this_cpu(void *arg)
>  {
> -    stop_cpu();
> +    ASSERT(!local_irq_enable());
> +    while ( 1 )
> +        wfi();
>  }
>
>  void machine_halt(void)
> @@ -21,10 +23,6 @@ void machine_halt(void)
>      smp_call_function(halt_this_cpu, NULL, 0);
>      local_irq_disable();
>
> -    /* Wait at most another 10ms for all other CPUs to go offline. */
> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
> -        mdelay(1);
> -
>      /* This is mainly for PSCI-0.2, which does not return if success. */
>      call_psci_system_off();
>
>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>
> I don't recall to see patch on the ML recently for this. So is this an
> internal review?

Yeah, sorry about that. Dmytro is a new member in our team and he is not
yet familiar with differences in internal reviews and reviews in ML.

If you are interested, we had internal review at [1]:

[1] https://github.com/xen-troops/xen/pull/184

>
>> ---
>>   xen/arch/arm/psci.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
>> index 0c90c2305c..55787fde58 100644
>> --- a/xen/arch/arm/psci.c
>> +++ b/xen/arch/arm/psci.c
>> @@ -63,8 +63,9 @@ void call_psci_cpu_off(void)
>>             /* If successfull the PSCI cpu_off call doesn't return
>> */
>>           arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, &res);
>> -        panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
>> -              PSCI_RET(res));
>> +        if ( PSCI_RET(res) != PSCI_DENIED )
>> +            panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
>> +                PSCI_RET(res));
>>       }
>>   }
>>   
>
> Cheers,
Julien Grall June 16, 2022, 7:09 p.m. UTC | #3
On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> 
> Hi Julien,

Hi Volodymyr,

> 
> Julien Grall <julien@xen.org> writes:
> 
>> Hi,
>>
>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
>>
>> I am confused. The spec is talking about Trusted OS and not
>> firmware. The docummentation is also not specific to ARM Trusted
>> Firmware. So did you mean "Trusted OS"?
> 
> It should be "firmware", I believe.

Hmmm... I couldn't find a reference in the spec suggesting that CPU_OFF 
could return DENIED because of the firmware. Do you have a pointer to 
the spec?

> 
>>
>> Also, did you reproduce on HW? If so, on which CPU this will fail?
>>
> 
> Yes, we reproduced this on HW. In our case it failed on CPU0. To be
> fair, in our case it had nothing to do with Trusted OS. It is just
> platform limitation - it can't turn off CPU0. But from Xen perspective
> there is no difference - CPU_OFF call returns DENIED.

Thanks for the clarification. I think I have seen that in the wild also 
but it never got on top of my queue. It is good that we are fixing it.

> 
>>> This patch brings the hypervisor into compliance with the PSCI
>>> specification.
>>
>> Now it means CPU off will never be turned off using PSCI. Instead, we
>> would end up to spin in Xen. This would be a problem because we could
>> save less power.
> 
> Agreed.
> 
>>
>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
>>> section 5.5.2
>>
>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
>> the trusted OS can only run on one core.
>>
>> Some of the trusted OS are migratable. So I think we should first
>> attempt to migrate the CPU. Then if it doesn't work, we should prevent
>> the CPU to go offline.
>>
>> That said, upstream doesn't support cpu offlining (I don't know for
>> your use case). In case of shutdown, it is not necessary to offline
>> the CPU, so we could avoid to call CPU_OFF on all CPUs but
>> one. Something like:
>>
> 
> This is even better approach yes. But you mentioned CPU_OFF. Did you
> mean SYSTEM_RESET?

By CPU_OFF I was referring to the fact that Xen will issue the call all 
CPUs but one. The remaining CPU will issue the command to reset/shutdown 
the system.

>>   void machine_halt(void)
>> @@ -21,10 +23,6 @@ void machine_halt(void)
>>       smp_call_function(halt_this_cpu, NULL, 0);
>>       local_irq_disable();
>>
>> -    /* Wait at most another 10ms for all other CPUs to go offline. */
>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>> -        mdelay(1);
>> -
>>       /* This is mainly for PSCI-0.2, which does not return if success. */
>>       call_psci_system_off();
>>
>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>
>> I don't recall to see patch on the ML recently for this. So is this an
>> internal review?
> 
> Yeah, sorry about that. Dmytro is a new member in our team and he is not
> yet familiar with differences in internal reviews and reviews in ML.

No worries. I usually classify internal review anything that was done 
privately. This looks to be a public review, althought not on xen-devel.

I understand that not all some of the patches are still in PoC stage and 
doing the review on your github is a good idea. But for those are meant 
to be for upstream (e.g. bug fixes, small patches), I would suggest to 
do the review on xen-devel directly.

Cheers,
Dmytro Semenets June 16, 2022, 7:32 p.m. UTC | #4
чт, 16 июн. 2022 г. в 22:09, Julien Grall <julien@xen.org>:

>
>
> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> >
> > Hi Julien,
>
> Hi Volodymyr,
>
> >
> > Julien Grall <julien@xen.org> writes:
> >
> >> Hi,
> >>
> >> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
> >>> From: Dmytro Semenets <dmytro_semenets@epam.com>
> >>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
> >>
> >> I am confused. The spec is talking about Trusted OS and not
> >> firmware. The docummentation is also not specific to ARM Trusted
> >> Firmware. So did you mean "Trusted OS"?
> >
> > It should be "firmware", I believe.
>
> Hmmm... I couldn't find a reference in the spec suggesting that CPU_OFF
> could return DENIED because of the firmware. Do you have a pointer to
> the spec?
>
Actually CPU_OFF performed by Trusted OS. But Trusted OS is called by the
ARM TF.
ARM TF for our platform doesn't call Trusted OS for CPU0 and returns DENIED
instead.

>
> >
> >>
> >> Also, did you reproduce on HW? If so, on which CPU this will fail?
> >>
> >
> > Yes, we reproduced this on HW. In our case it failed on CPU0. To be
> > fair, in our case it had nothing to do with Trusted OS. It is just
> > platform limitation - it can't turn off CPU0. But from Xen perspective
> > there is no difference - CPU_OFF call returns DENIED.
>
> Thanks for the clarification. I think I have seen that in the wild also
> but it never got on top of my queue. It is good that we are fixing it.
>
> >
> >>> This patch brings the hypervisor into compliance with the PSCI
> >>> specification.
> >>
> >> Now it means CPU off will never be turned off using PSCI. Instead, we
> >> would end up to spin in Xen. This would be a problem because we could
> >> save less power.
> >
> > Agreed.
> >
> >>
> >>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
> >>> section 5.5.2
> >>
> >> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
> >> the trusted OS can only run on one core.
> >>
> >> Some of the trusted OS are migratable. So I think we should first
> >> attempt to migrate the CPU. Then if it doesn't work, we should prevent
> >> the CPU to go offline.
> >>
> >> That said, upstream doesn't support cpu offlining (I don't know for
> >> your use case). In case of shutdown, it is not necessary to offline
> >> the CPU, so we could avoid to call CPU_OFF on all CPUs but
> >> one. Something like:
> >>
> >
> > This is even better approach yes. But you mentioned CPU_OFF. Did you
> > mean SYSTEM_RESET?
>
> By CPU_OFF I was referring to the fact that Xen will issue the call all
> CPUs but one. The remaining CPU will issue the command to reset/shutdown
> the system.
>
> >>   void machine_halt(void)
> >> @@ -21,10 +23,6 @@ void machine_halt(void)
> >>       smp_call_function(halt_this_cpu, NULL, 0);
> >>       local_irq_disable();
> >>
> >> -    /* Wait at most another 10ms for all other CPUs to go offline. */
> >> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
> >> -        mdelay(1);
> >> -
> >>       /* This is mainly for PSCI-0.2, which does not return if success.
> */
> >>       call_psci_system_off();
> >>
> >>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
> >>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> >>
> >> I don't recall to see patch on the ML recently for this. So is this an
> >> internal review?
> >
> > Yeah, sorry about that. Dmytro is a new member in our team and he is not
> > yet familiar with differences in internal reviews and reviews in ML.
>
> No worries. I usually classify internal review anything that was done
> privately. This looks to be a public review, althought not on xen-devel.
>
> I understand that not all some of the patches are still in PoC stage and
> doing the review on your github is a good idea. But for those are meant
> to be for upstream (e.g. bug fixes, small patches), I would suggest to
> do the review on xen-devel directly
>
Sorry about that

>
> Cheers,
>
> --
> Julien Grall
>
Volodymyr Babchuk June 17, 2022, 9:10 a.m. UTC | #5
Hi Julien,


Julien Grall <julien@xen.org> writes:

> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
>> Hi Julien,
>
> Hi Volodymyr,
>
>> Julien Grall <julien@xen.org> writes:
>> 
>>> Hi,
>>>
>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
>>>
>>> I am confused. The spec is talking about Trusted OS and not
>>> firmware. The docummentation is also not specific to ARM Trusted
>>> Firmware. So did you mean "Trusted OS"?
>> It should be "firmware", I believe.
>
> Hmmm... I couldn't find a reference in the spec suggesting that
> CPU_OFF could return DENIED because of the firmware. Do you have a
> pointer to the spec?

Ah, looks like we are talking about different things. Indeed, CPU_OFF
can return DENIED only because of Trusted OS. But entity that *returns*
the error to a caller is a firmware.

>> 
>>>
>>> Also, did you reproduce on HW? If so, on which CPU this will fail?
>>>
>> Yes, we reproduced this on HW. In our case it failed on CPU0. To be
>> fair, in our case it had nothing to do with Trusted OS. It is just
>> platform limitation - it can't turn off CPU0. But from Xen perspective
>> there is no difference - CPU_OFF call returns DENIED.
>
> Thanks for the clarification. I think I have seen that in the wild
> also but it never got on top of my queue. It is good that we are
> fixing it.
>
>> 
>>>> This patch brings the hypervisor into compliance with the PSCI
>>>> specification.
>>>
>>> Now it means CPU off will never be turned off using PSCI. Instead, we
>>> would end up to spin in Xen. This would be a problem because we could
>>> save less power.
>> Agreed.
>> 
>>>
>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
>>>> section 5.5.2
>>>
>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
>>> the trusted OS can only run on one core.
>>>
>>> Some of the trusted OS are migratable. So I think we should first
>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent
>>> the CPU to go offline.
>>>
>>> That said, upstream doesn't support cpu offlining (I don't know for
>>> your use case). In case of shutdown, it is not necessary to offline
>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but
>>> one. Something like:
>>>
>> This is even better approach yes. But you mentioned CPU_OFF. Did you
>> mean SYSTEM_RESET?
>
> By CPU_OFF I was referring to the fact that Xen will issue the call
> all CPUs but one. The remaining CPU will issue the command to
> reset/shutdown the system.
>

I just want to clarify: change that you suggested removes call to
stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all.

All CPUs except one will spin in

    while ( 1 )
        wfi();

while last cpu will issue SYSTEM_OFF or SYSTEM_RESET.

Is this correct?

>>>   void machine_halt(void)
>>> @@ -21,10 +23,6 @@ void machine_halt(void)
>>>       smp_call_function(halt_this_cpu, NULL, 0);
>>>       local_irq_disable();
>>>
>>> -    /* Wait at most another 10ms for all other CPUs to go offline. */
>>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>>> -        mdelay(1);
>>> -
>>>       /* This is mainly for PSCI-0.2, which does not return if success. */
>>>       call_psci_system_off();
>>>
>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>>
>>> I don't recall to see patch on the ML recently for this. So is this an
>>> internal review?
>> Yeah, sorry about that. Dmytro is a new member in our team and he is
>> not
>> yet familiar with differences in internal reviews and reviews in ML.
>
> No worries. I usually classify internal review anything that was done
> privately. This looks to be a public review, althought not on
> xen-devel.
>
> I understand that not all some of the patches are still in PoC stage
> and doing the review on your github is a good idea. But for those are
> meant to be for upstream (e.g. bug fixes, small patches), I would
> suggest to do the review on xen-devel directly.

It not always clear if patch is eligible for upstream. At first we
thought that problem is platform-specific and we weren't sure that we
will find a proper upstreamable fix. Probably you saw that PR's name
quite differs from final patch. This is because initial solution was
completely different from final one.
Julien Grall June 17, 2022, 11:12 a.m. UTC | #6
Hi,

On 17/06/2022 10:10, Volodymyr Babchuk wrote:
> Julien Grall <julien@xen.org> writes:
> 
>> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
>>> Hi Julien,
>>
>> Hi Volodymyr,
>>
>>> Julien Grall <julien@xen.org> writes:
>>>
>>>> Hi,
>>>>
>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
>>>>
>>>> I am confused. The spec is talking about Trusted OS and not
>>>> firmware. The docummentation is also not specific to ARM Trusted
>>>> Firmware. So did you mean "Trusted OS"?
>>> It should be "firmware", I believe.
>>
>> Hmmm... I couldn't find a reference in the spec suggesting that
>> CPU_OFF could return DENIED because of the firmware. Do you have a
>> pointer to the spec?
> 
> Ah, looks like we are talking about different things. Indeed, CPU_OFF
> can return DENIED only because of Trusted OS. But entity that *returns*
> the error to a caller is a firmware.

Right, the interesting part is *why* DENIED is returned not *who* 
returns it.
>>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
>>>>> section 5.5.2
>>>>
>>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
>>>> the trusted OS can only run on one core.
>>>>
>>>> Some of the trusted OS are migratable. So I think we should first
>>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent
>>>> the CPU to go offline.
>>>>
>>>> That said, upstream doesn't support cpu offlining (I don't know for
>>>> your use case). In case of shutdown, it is not necessary to offline
>>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but
>>>> one. Something like:
>>>>
>>> This is even better approach yes. But you mentioned CPU_OFF. Did you
>>> mean SYSTEM_RESET?
>>
>> By CPU_OFF I was referring to the fact that Xen will issue the call
>> all CPUs but one. The remaining CPU will issue the command to
>> reset/shutdown the system.
>>
> 
> I just want to clarify: change that you suggested removes call to
> stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all.

I was describing the existing behavior.

> 
> All CPUs except one will spin in
> 
>      while ( 1 )
>          wfi();
> 
> while last cpu will issue SYSTEM_OFF or SYSTEM_RESET.
> 
> Is this correct?

Yes.

> 
>>>>    void machine_halt(void)
>>>> @@ -21,10 +23,6 @@ void machine_halt(void)
>>>>        smp_call_function(halt_this_cpu, NULL, 0);
>>>>        local_irq_disable();
>>>>
>>>> -    /* Wait at most another 10ms for all other CPUs to go offline. */
>>>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
>>>> -        mdelay(1);
>>>> -
>>>>        /* This is mainly for PSCI-0.2, which does not return if success. */
>>>>        call_psci_system_off();
>>>>
>>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
>>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>>>>
>>>> I don't recall to see patch on the ML recently for this. So is this an
>>>> internal review?
>>> Yeah, sorry about that. Dmytro is a new member in our team and he is
>>> not
>>> yet familiar with differences in internal reviews and reviews in ML.
>>
>> No worries. I usually classify internal review anything that was done
>> privately. This looks to be a public review, althought not on
>> xen-devel.
>>
>> I understand that not all some of the patches are still in PoC stage
>> and doing the review on your github is a good idea. But for those are
>> meant to be for upstream (e.g. bug fixes, small patches), I would
>> suggest to do the review on xen-devel directly.
> 
> It not always clear if patch is eligible for upstream. At first we
> thought that problem is platform-specific and we weren't sure that we
> will find a proper upstreamable fix. 

You can guess but not be sure until you send it to upstrema :). In fact,...

> Probably you saw that PR's name
> quite differs from final patch. This is because initial solution was
> completely different from final one.

... even before looking at your PR, this was the first solution I had in 
mind. I am still pondering whether this could be the best approach 
because I have the suspicion there might be some platform out relying on 
receiving the shutdown request on CPU0.

Anyway, this is so far just theorical, my proposal should solve your 
problem.

On a separate topic, the community is aiming to support a wide range of 
platforms out-of-the-box. I think platform-specific patches are 
acceptable so long they are self-contained (to some extend. I.e if you 
ask to support Xen on RPI3 then I would still probably argue against :)) 
or have a limited impact to the rest of the users (this is why we have 
alternative in Xen).

My point here is your initial solution may have been the preferred 
approach for upstream. So if you involve the community early, you are 
reducing the risk to have to backtrack and/or spend extra time in the 
wrong directions.

Cheers,
Dmytro Semenets June 18, 2022, 5:43 p.m. UTC | #7
пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xen.org>:
Hi Julien,
>
> Hi,
>
> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
> > Julien Grall <julien@xen.org> writes:
> >
> >> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> >>> Hi Julien,
> >>
> >> Hi Volodymyr,
> >>
> >>> Julien Grall <julien@xen.org> writes:
> >>>
> >>>> Hi,
> >>>>
> >>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
> >>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
> >>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
> >>>>
> >>>> I am confused. The spec is talking about Trusted OS and not
> >>>> firmware. The docummentation is also not specific to ARM Trusted
> >>>> Firmware. So did you mean "Trusted OS"?
> >>> It should be "firmware", I believe.
> >>
> >> Hmmm... I couldn't find a reference in the spec suggesting that
> >> CPU_OFF could return DENIED because of the firmware. Do you have a
> >> pointer to the spec?
> >
> > Ah, looks like we are talking about different things. Indeed, CPU_OFF
> > can return DENIED only because of Trusted OS. But entity that *returns*
> > the error to a caller is a firmware.
>
> Right, the interesting part is *why* DENIED is returned not *who*
> returns it.
ARM TF returns DENIED *only* for the platform I have.
We have a dissonance between spec and xen implementation because
DENIED returned by
ARM TF or Thrusted OS or whatever is not a reason for panic. And we
have issues with this.
If machine_restart() behaviour is more or less correct  (sometimes
reports about panic but restarts the machine)
but machine_halt() doesn't work at all.
Transit execution to CPU0 for my understanding is a workaround and
this approach will fix
machine_restart() function but will not fix machine_halt(). Approach
you suggested (spinning all cpus) will work but
will save less energy.
> >>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)"
> >>>>> section 5.5.2
> >>>>
> >>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when
> >>>> the trusted OS can only run on one core.
> >>>>
> >>>> Some of the trusted OS are migratable. So I think we should first
> >>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent
> >>>> the CPU to go offline.
> >>>>
> >>>> That said, upstream doesn't support cpu offlining (I don't know for
> >>>> your use case). In case of shutdown, it is not necessary to offline
> >>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but
> >>>> one. Something like:
> >>>>
> >>> This is even better approach yes. But you mentioned CPU_OFF. Did you
> >>> mean SYSTEM_RESET?
> >>
> >> By CPU_OFF I was referring to the fact that Xen will issue the call
> >> all CPUs but one. The remaining CPU will issue the command to
> >> reset/shutdown the system.
> >>
> >
> > I just want to clarify: change that you suggested removes call to
> > stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all.
>
> I was describing the existing behavior.
>
> >
> > All CPUs except one will spin in
> >
> >      while ( 1 )
> >          wfi();
> >
> > while last cpu will issue SYSTEM_OFF or SYSTEM_RESET.
> >
> > Is this correct?
>
> Yes.
>
> >
> >>>>    void machine_halt(void)
> >>>> @@ -21,10 +23,6 @@ void machine_halt(void)
> >>>>        smp_call_function(halt_this_cpu, NULL, 0);
> >>>>        local_irq_disable();
> >>>>
> >>>> -    /* Wait at most another 10ms for all other CPUs to go offline. */
> >>>> -    while ( (num_online_cpus() > 1) && (timeout-- > 0) )
> >>>> -        mdelay(1);
> >>>> -
> >>>>        /* This is mainly for PSCI-0.2, which does not return if success. */
> >>>>        call_psci_system_off();
> >>>>
> >>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com>
> >>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> >>>>
> >>>> I don't recall to see patch on the ML recently for this. So is this an
> >>>> internal review?
> >>> Yeah, sorry about that. Dmytro is a new member in our team and he is
> >>> not
> >>> yet familiar with differences in internal reviews and reviews in ML.
> >>
> >> No worries. I usually classify internal review anything that was done
> >> privately. This looks to be a public review, althought not on
> >> xen-devel.
> >>
> >> I understand that not all some of the patches are still in PoC stage
> >> and doing the review on your github is a good idea. But for those are
> >> meant to be for upstream (e.g. bug fixes, small patches), I would
> >> suggest to do the review on xen-devel directly.
> >
> > It not always clear if patch is eligible for upstream. At first we
> > thought that problem is platform-specific and we weren't sure that we
> > will find a proper upstreamable fix.
>
> You can guess but not be sure until you send it to upstrema :). In fact,...
>
> > Probably you saw that PR's name
> > quite differs from final patch. This is because initial solution was
> > completely different from final one.
>
> ... even before looking at your PR, this was the first solution I had in
> mind. I am still pondering whether this could be the best approach
> because I have the suspicion there might be some platform out relying on
> receiving the shutdown request on CPU0.
>
> Anyway, this is so far just theorical, my proposal should solve your
> problem.
>
> On a separate topic, the community is aiming to support a wide range of
> platforms out-of-the-box. I think platform-specific patches are
> acceptable so long they are self-contained (to some extend. I.e if you
> ask to support Xen on RPI3 then I would still probably argue against :))
> or have a limited impact to the rest of the users (this is why we have
> alternative in Xen).
>
> My point here is your initial solution may have been the preferred
> approach for upstream. So if you involve the community early, you are
> reducing the risk to have to backtrack and/or spend extra time in the
> wrong directions.
>
> Cheers,
>
> --
> Julien Grall
Julien Grall June 19, 2022, 9:04 a.m. UTC | #8
Hi,

On 18/06/2022 18:43, Dmytro Semenets wrote:
> пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xen.org>:
> Hi Julien,
>>
>> Hi,
>>
>> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
>>> Julien Grall <julien@xen.org> writes:
>>>
>>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
>>>>> Hi Julien,
>>>>
>>>> Hi Volodymyr,
>>>>
>>>>> Julien Grall <julien@xen.org> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
>>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
>>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
>>>>>>
>>>>>> I am confused. The spec is talking about Trusted OS and not
>>>>>> firmware. The docummentation is also not specific to ARM Trusted
>>>>>> Firmware. So did you mean "Trusted OS"?
>>>>> It should be "firmware", I believe.
>>>>
>>>> Hmmm... I couldn't find a reference in the spec suggesting that
>>>> CPU_OFF could return DENIED because of the firmware. Do you have a
>>>> pointer to the spec?
>>>
>>> Ah, looks like we are talking about different things. Indeed, CPU_OFF
>>> can return DENIED only because of Trusted OS. But entity that *returns*
>>> the error to a caller is a firmware.
>>
>> Right, the interesting part is *why* DENIED is returned not *who*
>> returns it.
> ARM TF returns DENIED *only* for the platform I have.
> We have a dissonance between spec and xen implementation because
> DENIED returned by
> ARM TF or Thrusted OS or whatever is not a reason for panic.

I agree that's not a reason for panic. However, knowing the reason does 
help to figure out the correct approach.

For instance, one could have suggest to migrate the trusted OS to 
another pCPU. But this would not have worked for you because the DENIED 
is not about that.

> And we
> have issues with this.
> If machine_restart() behaviour is more or less correct  (sometimes
> reports about panic but restarts the machine)

Right...

> but machine_halt() doesn't work at al
... this should also be the case here because machine_halt() could also 
be called from cpu0. So I am a bit confused why you are saying it never 
works.

> Transit execution to CPU0 for my understanding is a workaround and
> this approach will fix
> machine_restart() function but will not fix machine_halt().

I would say it is a more specific case of what the spec suggests (see 
below). But it should fix both machine_restart() and machine_halt() 
because the last CPU running will be CPU0. So Xen would call SYSTEM_* 
rather than CPU_OF. So I don't understand why you think it will fix one 
but not the other.

In fact, the idea to always run the request from a given CPU is quite 
similar to what the specification suggests (5.10.3 DEN0022D.b):

"
One way in which cores can be placed into a known state is to use calls 
to CPU_OFF on all online cores
except for the last one, which instead uses SYSTEM_OFF. If a UP Trusted 
OS is present, this method
only works if the core that calls SYSTEM_OFF is the one where the 
Trusted OS is resident, as calls to
CPU_OFF on this core return a DENIED error. Any core can call SYSTEM_OFF.
"

For Xen, we would need to detect if the trusted OS is UP and where it is 
running. Then we could always restart/halt from that CPU or CPU0.

> Approach
> you suggested (spinning all cpus) will work but
> will save less energy.

I am not sure to understand what's the concern about the energy here. 
 From my understanding of the specification, SYSTEM_OFF will take care 
of switching off the power for all the cores. So at worse, the CPUs will 
spin for a few ms. This would like be more efficient than a call to PSCI 
CPU off.

This is different compare just turning off one CPU (i.e. CPU hot-unplug) 
because the CPU will end up to spin for a very long time. And this is 
why I wasn't OK with conditionally avoiding the panic.

Cheers,
Dmytro Semenets June 21, 2022, 8:55 a.m. UTC | #9
Hi Julien,
> >>
> >> Hi,
> >>
> >> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
> >>> Julien Grall <julien@xen.org> writes:
> >>>
> >>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> >>>>> Hi Julien,
> >>>>
> >>>> Hi Volodymyr,
> >>>>
> >>>>> Julien Grall <julien@xen.org> writes:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
> >>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
> >>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
> >>>>>>
> >>>>>> I am confused. The spec is talking about Trusted OS and not
> >>>>>> firmware. The docummentation is also not specific to ARM Trusted
> >>>>>> Firmware. So did you mean "Trusted OS"?
> >>>>> It should be "firmware", I believe.
> >>>>
> >>>> Hmmm... I couldn't find a reference in the spec suggesting that
> >>>> CPU_OFF could return DENIED because of the firmware. Do you have a
> >>>> pointer to the spec?
> >>>
> >>> Ah, looks like we are talking about different things. Indeed, CPU_OFF
> >>> can return DENIED only because of Trusted OS. But entity that *returns*
> >>> the error to a caller is a firmware.
> >>
> >> Right, the interesting part is *why* DENIED is returned not *who*
> >> returns it.
> > ARM TF returns DENIED *only* for the platform I have.
> > We have a dissonance between spec and xen implementation because
> > DENIED returned by
> > ARM TF or Thrusted OS or whatever is not a reason for panic.
>
> I agree that's not a reason for panic. However, knowing the reason does
> help to figure out the correct approach.
>
> For instance, one could have suggest to migrate the trusted OS to
> another pCPU. But this would not have worked for you because the DENIED
> is not about that.
>
> > And we
> > have issues with this.
> > If machine_restart() behaviour is more or less correct  (sometimes
> > reports about panic but restarts the machine)
>
> Right...
>
> > but machine_halt() doesn't work at al
> ... this should also be the case here because machine_halt() could also
> be called from cpu0. So I am a bit confused why you are saying it never
> works.
If machine_halt() called on a CPU other than CPU0 it caused panic and reboot.
If it called on a CPU0 it also caused panic but after system power off
and reboot
is not issued. In this state you can still call the xen console. But
you can't reboot the system.
>
> > Transit execution to CPU0 for my understanding is a workaround and
> > this approach will fix
> > machine_restart() function but will not fix machine_halt().
>
> I would say it is a more specific case of what the spec suggests (see
> below). But it should fix both machine_restart() and machine_halt()
> because the last CPU running will be CPU0. So Xen would call SYSTEM_*
> rather than CPU_OF. So I don't understand why you think it will fix one
> but not the other.
Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop
the whole system.
>
> In fact, the idea to always run the request from a given CPU is quite
> similar to what the specification suggests (5.10.3 DEN0022D.b):
>
> "
> One way in which cores can be placed into a known state is to use calls
> to CPU_OFF on all online cores
> except for the last one, which instead uses SYSTEM_OFF. If a UP Trusted
> OS is present, this method
> only works if the core that calls SYSTEM_OFF is the one where the
> Trusted OS is resident, as calls to
> CPU_OFF on this core return a DENIED error. Any core can call SYSTEM_OFF.
> "
>
> For Xen, we would need to detect if the trusted OS is UP and where it is
> running. Then we could always restart/halt from that CPU or CPU0.
>
> > Approach
> > you suggested (spinning all cpus) will work but
> > will save less energy.
>
> I am not sure to understand what's the concern about the energy here.
>  From my understanding of the specification, SYSTEM_OFF will take care
> of switching off the power for all the cores. So at worse, the CPUs will
> spin for a few ms. This would like be more efficient than a call to PSCI
> CPU off.
>
> This is different compare just turning off one CPU (i.e. CPU hot-unplug)
> because the CPU will end up to spin for a very long time. And this is
> why I wasn't OK with conditionally avoiding the panic.
>
> Cheers,
>
> --
> Julien Grall
Julien Grall June 21, 2022, 9:51 a.m. UTC | #10
On 21/06/2022 09:55, Dmytro Semenets wrote:
> Hi Julien,

Hi Dmytro,

>>>>
>>>> Hi,
>>>>
>>>> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
>>>>> Julien Grall <julien@xen.org> writes:
>>>>>
>>>>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
>>>>>>> Hi Julien,
>>>>>>
>>>>>> Hi Volodymyr,
>>>>>>
>>>>>>> Julien Grall <julien@xen.org> writes:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
>>>>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
>>>>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
>>>>>>>>
>>>>>>>> I am confused. The spec is talking about Trusted OS and not
>>>>>>>> firmware. The docummentation is also not specific to ARM Trusted
>>>>>>>> Firmware. So did you mean "Trusted OS"?
>>>>>>> It should be "firmware", I believe.
>>>>>>
>>>>>> Hmmm... I couldn't find a reference in the spec suggesting that
>>>>>> CPU_OFF could return DENIED because of the firmware. Do you have a
>>>>>> pointer to the spec?
>>>>>
>>>>> Ah, looks like we are talking about different things. Indeed, CPU_OFF
>>>>> can return DENIED only because of Trusted OS. But entity that *returns*
>>>>> the error to a caller is a firmware.
>>>>
>>>> Right, the interesting part is *why* DENIED is returned not *who*
>>>> returns it.
>>> ARM TF returns DENIED *only* for the platform I have.
>>> We have a dissonance between spec and xen implementation because
>>> DENIED returned by
>>> ARM TF or Thrusted OS or whatever is not a reason for panic.
>>
>> I agree that's not a reason for panic. However, knowing the reason does
>> help to figure out the correct approach.
>>
>> For instance, one could have suggest to migrate the trusted OS to
>> another pCPU. But this would not have worked for you because the DENIED
>> is not about that.
>>
>>> And we
>>> have issues with this.
>>> If machine_restart() behaviour is more or less correct  (sometimes
>>> reports about panic but restarts the machine)
>>
>> Right...
>>
>>> but machine_halt() doesn't work at al
>> ... this should also be the case here because machine_halt() could also
>> be called from cpu0. So I am a bit confused why you are saying it never
>> works.
> If machine_halt() called on a CPU other than CPU0 it caused panic and reboot.
> If it called on a CPU0 it also caused panic but after system power off
> and reboot
> is not issued. In this state you can still call the xen console. But
> you can't reboot the system.

I am lost. In a previous e-mail you said that PSCI CPU_OFF would return 
DENIED on CPU0. IOW, I understood that for other CPUs, it would succeed.

But here, you are tell me the opposite:

"If it called on a CPU0 it also caused panic but after system power off
  and reboot".

If machine_halt() is called from CPU0, then CPU_OFF should not be called 
on CPU0. So where is that panic coming from?

>>
>>> Transit execution to CPU0 for my understanding is a workaround and
>>> this approach will fix
>>> machine_restart() function but will not fix machine_halt().
>>
>> I would say it is a more specific case of what the spec suggests (see
>> below). But it should fix both machine_restart() and machine_halt()
>> because the last CPU running will be CPU0. So Xen would call SYSTEM_*
>> rather than CPU_OF. So I don't understand why you think it will fix one
>> but not the other.
> Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop
> the whole system.

Hmmm... All the other CPUs should be off (or spinning with interrupt 
disabled), so are you saying that SYSTEM_OFF return?

Cheers,
Dmytro Semenets June 21, 2022, 10:05 a.m. UTC | #11
Hi Julien,
>
> Hi Dmytro,
>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 17/06/2022 10:10, Volodymyr Babchuk wrote:
> >>>>> Julien Grall <julien@xen.org> writes:
> >>>>>
> >>>>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote:
> >>>>>>> Hi Julien,
> >>>>>>
> >>>>>> Hi Volodymyr,
> >>>>>>
> >>>>>>> Julien Grall <julien@xen.org> writes:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote:
> >>>>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com>
> >>>>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF.
> >>>>>>>>
> >>>>>>>> I am confused. The spec is talking about Trusted OS and not
> >>>>>>>> firmware. The docummentation is also not specific to ARM Trusted
> >>>>>>>> Firmware. So did you mean "Trusted OS"?
> >>>>>>> It should be "firmware", I believe.
> >>>>>>
> >>>>>> Hmmm... I couldn't find a reference in the spec suggesting that
> >>>>>> CPU_OFF could return DENIED because of the firmware. Do you have a
> >>>>>> pointer to the spec?
> >>>>>
> >>>>> Ah, looks like we are talking about different things. Indeed, CPU_OFF
> >>>>> can return DENIED only because of Trusted OS. But entity that *returns*
> >>>>> the error to a caller is a firmware.
> >>>>
> >>>> Right, the interesting part is *why* DENIED is returned not *who*
> >>>> returns it.
> >>> ARM TF returns DENIED *only* for the platform I have.
> >>> We have a dissonance between spec and xen implementation because
> >>> DENIED returned by
> >>> ARM TF or Thrusted OS or whatever is not a reason for panic.
> >>
> >> I agree that's not a reason for panic. However, knowing the reason does
> >> help to figure out the correct approach.
> >>
> >> For instance, one could have suggest to migrate the trusted OS to
> >> another pCPU. But this would not have worked for you because the DENIED
> >> is not about that.
> >>
> >>> And we
> >>> have issues with this.
> >>> If machine_restart() behaviour is more or less correct  (sometimes
> >>> reports about panic but restarts the machine)
> >>
> >> Right...
> >>
> >>> but machine_halt() doesn't work at al
> >> ... this should also be the case here because machine_halt() could also
> >> be called from cpu0. So I am a bit confused why you are saying it never
> >> works.
> > If machine_halt() called on a CPU other than CPU0 it caused panic and reboot.
> > If it called on a CPU0 it also caused panic but after system power off
> > and reboot
> > is not issued. In this state you can still call the xen console. But
> > you can't reboot the system.
>
> I am lost. In a previous e-mail you said that PSCI CPU_OFF would return
> DENIED on CPU0. IOW, I understood that for other CPUs, it would succeed.
I'm sorry I confused You.
Yes it causes panic and prints it will be rebooted but actual reboot
doesn't happen.

>
> But here, you are tell me the opposite:
>
> "If it called on a CPU0 it also caused panic but after system power off
>   and reboot".
>
> If machine_halt() is called from CPU0, then CPU_OFF should not be called
> on CPU0. So where is that panic coming from?
>
> >>
> >>> Transit execution to CPU0 for my understanding is a workaround and
> >>> this approach will fix
> >>> machine_restart() function but will not fix machine_halt().
> >>
> >> I would say it is a more specific case of what the spec suggests (see
> >> below). But it should fix both machine_restart() and machine_halt()
> >> because the last CPU running will be CPU0. So Xen would call SYSTEM_*
> >> rather than CPU_OF. So I don't understand why you think it will fix one
> >> but not the other.
> > Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop
> > the whole system.
>
> Hmmm... All the other CPUs should be off (or spinning with interrupt
> disabled), so are you saying that SYSTEM_OFF return?
Yes. SYSTEM_OFF returns on my HW. This is reason when CPU_OFF for CPU0 happens.
>
> Cheers,
>
> --
> Julien Grall
Julien Grall June 21, 2022, 10:45 a.m. UTC | #12
Hi,

On 21/06/2022 11:05, Dmytro Semenets wrote:
>>>>> but machine_halt() doesn't work at al
>>>> ... this should also be the case here because machine_halt() could also
>>>> be called from cpu0. So I am a bit confused why you are saying it never
>>>> works.
>>> If machine_halt() called on a CPU other than CPU0 it caused panic and reboot.
>>> If it called on a CPU0 it also caused panic but after system power off
>>> and reboot
>>> is not issued. In this state you can still call the xen console. But
>>> you can't reboot the system.
>>
>> I am lost. In a previous e-mail you said that PSCI CPU_OFF would return
>> DENIED on CPU0. IOW, I understood that for other CPUs, it would succeed.
> I'm sorry I confused You.
> Yes it causes panic and prints it will be rebooted but actual reboot
> doesn't happen.

Ok. That's most likely because of the call to smp_call_function() in 
machine_restart(). It is using cpu_online_map to decide which CPUs to 
send the IPI.

This will block because some of the CPUs are already off. So they will 
never acknowledge the IPI.

> 
>>
>> But here, you are tell me the opposite:
>>
>> "If it called on a CPU0 it also caused panic but after system power off
>>    and reboot".
>>
>> If machine_halt() is called from CPU0, then CPU_OFF should not be called
>> on CPU0. So where is that panic coming from?
>>
>>>>
>>>>> Transit execution to CPU0 for my understanding is a workaround and
>>>>> this approach will fix
>>>>> machine_restart() function but will not fix machine_halt().
>>>>
>>>> I would say it is a more specific case of what the spec suggests (see
>>>> below). But it should fix both machine_restart() and machine_halt()
>>>> because the last CPU running will be CPU0. So Xen would call SYSTEM_*
>>>> rather than CPU_OF. So I don't understand why you think it will fix one
>>>> but not the other.
>>> Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop
>>> the whole system.
>>
>> Hmmm... All the other CPUs should be off (or spinning with interrupt
>> disabled), so are you saying that SYSTEM_OFF return?
> Yes. SYSTEM_OFF returns on my HW.

Hmmm... This is not compliant with the specification. Could you check 
why PSCI SYSTEM_OFF is returning?

> This is reason when CPU_OFF for CPU0 happens.

Right, machine_halt() will call halt_this_cpu() that in turn will call 
PSCI CPU_OFF.

If you modify halt_this_cpu() to avoid calling PSCI CPU_OFF (as I 
suggested before) then the panic() will never happen. Instead, the CPU 
will execute "wfi" in a loop with interrupt disabled.

To summarize there are two parts to resolve:
   1) Understand why PSCI SYSTEM_OFF returns on your platform
   2) Modify stop_cpu() to not call PSCI CPU_OFF

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 0c90c2305c..55787fde58 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -63,8 +63,9 @@  void call_psci_cpu_off(void)
 
         /* If successfull the PSCI cpu_off call doesn't return */
         arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, &res);
-        panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
-              PSCI_RET(res));
+        if ( PSCI_RET(res) != PSCI_DENIED )
+            panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
+                PSCI_RET(res));
     }
 }