Message ID | 20220616135541.3333760-1-dmitry.semenets@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen: Don't call panic if ARM TF cpu off returns DENIED | expand |
Hi, On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: > From: Dmytro Semenets <dmytro_semenets@epam.com> > > According to PSCI specification ARM TF can return DENIED on CPU OFF. I am confused. The spec is talking about Trusted OS and not firmware. The docummentation is also not specific to ARM Trusted Firmware. So did you mean "Trusted OS"? Also, did you reproduce on HW? If so, on which CPU this will fail? > This patch brings the hypervisor into compliance with the PSCI > specification. Now it means CPU off will never be turned off using PSCI. Instead, we would end up to spin in Xen. This would be a problem because we could save less power. > Refer to "Arm Power State Coordination Interface (DEN0022D.b)" > section 5.5.2 Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when the trusted OS can only run on one core. Some of the trusted OS are migratable. So I think we should first attempt to migrate the CPU. Then if it doesn't work, we should prevent the CPU to go offline. That said, upstream doesn't support cpu offlining (I don't know for your use case). In case of shutdown, it is not necessary to offline the CPU, so we could avoid to call CPU_OFF on all CPUs but one. Something like: diff --git a/xen/arch/arm/shutdown.c b/xen/arch/arm/shutdown.c index 3dc6819d56de..d956812ef8f4 100644 --- a/xen/arch/arm/shutdown.c +++ b/xen/arch/arm/shutdown.c @@ -8,7 +8,9 @@ static void noreturn halt_this_cpu(void *arg) { - stop_cpu(); + ASSERT(!local_irq_enable()); + while ( 1 ) + wfi(); } void machine_halt(void) @@ -21,10 +23,6 @@ void machine_halt(void) smp_call_function(halt_this_cpu, NULL, 0); local_irq_disable(); - /* Wait at most another 10ms for all other CPUs to go offline. */ - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) - mdelay(1); - /* This is mainly for PSCI-0.2, which does not return if success. */ call_psci_system_off(); > Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> > Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> I don't recall to see patch on the ML recently for this. So is this an internal review? > --- > xen/arch/arm/psci.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c > index 0c90c2305c..55787fde58 100644 > --- a/xen/arch/arm/psci.c > +++ b/xen/arch/arm/psci.c > @@ -63,8 +63,9 @@ void call_psci_cpu_off(void) > > /* If successfull the PSCI cpu_off call doesn't return */ > arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, &res); > - panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(), > - PSCI_RET(res)); > + if ( PSCI_RET(res) != PSCI_DENIED ) > + panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(), > + PSCI_RET(res)); > } > } > Cheers,
Hi Julien, Julien Grall <julien@xen.org> writes: > Hi, > > On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: >> From: Dmytro Semenets <dmytro_semenets@epam.com> >> According to PSCI specification ARM TF can return DENIED on CPU OFF. > > I am confused. The spec is talking about Trusted OS and not > firmware. The docummentation is also not specific to ARM Trusted > Firmware. So did you mean "Trusted OS"? It should be "firmware", I believe. > > Also, did you reproduce on HW? If so, on which CPU this will fail? > Yes, we reproduced this on HW. In our case it failed on CPU0. To be fair, in our case it had nothing to do with Trusted OS. It is just platform limitation - it can't turn off CPU0. But from Xen perspective there is no difference - CPU_OFF call returns DENIED. >> This patch brings the hypervisor into compliance with the PSCI >> specification. > > Now it means CPU off will never be turned off using PSCI. Instead, we > would end up to spin in Xen. This would be a problem because we could > save less power. Agreed. > >> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" >> section 5.5.2 > > Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when > the trusted OS can only run on one core. > > Some of the trusted OS are migratable. So I think we should first > attempt to migrate the CPU. Then if it doesn't work, we should prevent > the CPU to go offline. > > That said, upstream doesn't support cpu offlining (I don't know for > your use case). In case of shutdown, it is not necessary to offline > the CPU, so we could avoid to call CPU_OFF on all CPUs but > one. Something like: > This is even better approach yes. But you mentioned CPU_OFF. Did you mean SYSTEM_RESET? > diff --git a/xen/arch/arm/shutdown.c b/xen/arch/arm/shutdown.c > index 3dc6819d56de..d956812ef8f4 100644 > --- a/xen/arch/arm/shutdown.c > +++ b/xen/arch/arm/shutdown.c > @@ -8,7 +8,9 @@ > > static void noreturn halt_this_cpu(void *arg) > { > - stop_cpu(); > + ASSERT(!local_irq_enable()); > + while ( 1 ) > + wfi(); > } > > void machine_halt(void) > @@ -21,10 +23,6 @@ void machine_halt(void) > smp_call_function(halt_this_cpu, NULL, 0); > local_irq_disable(); > > - /* Wait at most another 10ms for all other CPUs to go offline. */ > - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) > - mdelay(1); > - > /* This is mainly for PSCI-0.2, which does not return if success. */ > call_psci_system_off(); > >> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> >> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> > > I don't recall to see patch on the ML recently for this. So is this an > internal review? Yeah, sorry about that. Dmytro is a new member in our team and he is not yet familiar with differences in internal reviews and reviews in ML. If you are interested, we had internal review at [1]: [1] https://github.com/xen-troops/xen/pull/184 > >> --- >> xen/arch/arm/psci.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c >> index 0c90c2305c..55787fde58 100644 >> --- a/xen/arch/arm/psci.c >> +++ b/xen/arch/arm/psci.c >> @@ -63,8 +63,9 @@ void call_psci_cpu_off(void) >> /* If successfull the PSCI cpu_off call doesn't return >> */ >> arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, &res); >> - panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(), >> - PSCI_RET(res)); >> + if ( PSCI_RET(res) != PSCI_DENIED ) >> + panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(), >> + PSCI_RET(res)); >> } >> } >> > > Cheers,
On 16/06/2022 19:40, Volodymyr Babchuk wrote: > > Hi Julien, Hi Volodymyr, > > Julien Grall <julien@xen.org> writes: > >> Hi, >> >> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: >>> From: Dmytro Semenets <dmytro_semenets@epam.com> >>> According to PSCI specification ARM TF can return DENIED on CPU OFF. >> >> I am confused. The spec is talking about Trusted OS and not >> firmware. The docummentation is also not specific to ARM Trusted >> Firmware. So did you mean "Trusted OS"? > > It should be "firmware", I believe. Hmmm... I couldn't find a reference in the spec suggesting that CPU_OFF could return DENIED because of the firmware. Do you have a pointer to the spec? > >> >> Also, did you reproduce on HW? If so, on which CPU this will fail? >> > > Yes, we reproduced this on HW. In our case it failed on CPU0. To be > fair, in our case it had nothing to do with Trusted OS. It is just > platform limitation - it can't turn off CPU0. But from Xen perspective > there is no difference - CPU_OFF call returns DENIED. Thanks for the clarification. I think I have seen that in the wild also but it never got on top of my queue. It is good that we are fixing it. > >>> This patch brings the hypervisor into compliance with the PSCI >>> specification. >> >> Now it means CPU off will never be turned off using PSCI. Instead, we >> would end up to spin in Xen. This would be a problem because we could >> save less power. > > Agreed. > >> >>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" >>> section 5.5.2 >> >> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when >> the trusted OS can only run on one core. >> >> Some of the trusted OS are migratable. So I think we should first >> attempt to migrate the CPU. Then if it doesn't work, we should prevent >> the CPU to go offline. >> >> That said, upstream doesn't support cpu offlining (I don't know for >> your use case). In case of shutdown, it is not necessary to offline >> the CPU, so we could avoid to call CPU_OFF on all CPUs but >> one. Something like: >> > > This is even better approach yes. But you mentioned CPU_OFF. Did you > mean SYSTEM_RESET? By CPU_OFF I was referring to the fact that Xen will issue the call all CPUs but one. The remaining CPU will issue the command to reset/shutdown the system. >> void machine_halt(void) >> @@ -21,10 +23,6 @@ void machine_halt(void) >> smp_call_function(halt_this_cpu, NULL, 0); >> local_irq_disable(); >> >> - /* Wait at most another 10ms for all other CPUs to go offline. */ >> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) >> - mdelay(1); >> - >> /* This is mainly for PSCI-0.2, which does not return if success. */ >> call_psci_system_off(); >> >>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> >>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> >> >> I don't recall to see patch on the ML recently for this. So is this an >> internal review? > > Yeah, sorry about that. Dmytro is a new member in our team and he is not > yet familiar with differences in internal reviews and reviews in ML. No worries. I usually classify internal review anything that was done privately. This looks to be a public review, althought not on xen-devel. I understand that not all some of the patches are still in PoC stage and doing the review on your github is a good idea. But for those are meant to be for upstream (e.g. bug fixes, small patches), I would suggest to do the review on xen-devel directly. Cheers,
чт, 16 июн. 2022 г. в 22:09, Julien Grall <julien@xen.org>: > > > On 16/06/2022 19:40, Volodymyr Babchuk wrote: > > > > Hi Julien, > > Hi Volodymyr, > > > > > Julien Grall <julien@xen.org> writes: > > > >> Hi, > >> > >> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: > >>> From: Dmytro Semenets <dmytro_semenets@epam.com> > >>> According to PSCI specification ARM TF can return DENIED on CPU OFF. > >> > >> I am confused. The spec is talking about Trusted OS and not > >> firmware. The docummentation is also not specific to ARM Trusted > >> Firmware. So did you mean "Trusted OS"? > > > > It should be "firmware", I believe. > > Hmmm... I couldn't find a reference in the spec suggesting that CPU_OFF > could return DENIED because of the firmware. Do you have a pointer to > the spec? > Actually CPU_OFF performed by Trusted OS. But Trusted OS is called by the ARM TF. ARM TF for our platform doesn't call Trusted OS for CPU0 and returns DENIED instead. > > > > >> > >> Also, did you reproduce on HW? If so, on which CPU this will fail? > >> > > > > Yes, we reproduced this on HW. In our case it failed on CPU0. To be > > fair, in our case it had nothing to do with Trusted OS. It is just > > platform limitation - it can't turn off CPU0. But from Xen perspective > > there is no difference - CPU_OFF call returns DENIED. > > Thanks for the clarification. I think I have seen that in the wild also > but it never got on top of my queue. It is good that we are fixing it. > > > > >>> This patch brings the hypervisor into compliance with the PSCI > >>> specification. > >> > >> Now it means CPU off will never be turned off using PSCI. Instead, we > >> would end up to spin in Xen. This would be a problem because we could > >> save less power. > > > > Agreed. > > > >> > >>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" > >>> section 5.5.2 > >> > >> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when > >> the trusted OS can only run on one core. > >> > >> Some of the trusted OS are migratable. So I think we should first > >> attempt to migrate the CPU. Then if it doesn't work, we should prevent > >> the CPU to go offline. > >> > >> That said, upstream doesn't support cpu offlining (I don't know for > >> your use case). In case of shutdown, it is not necessary to offline > >> the CPU, so we could avoid to call CPU_OFF on all CPUs but > >> one. Something like: > >> > > > > This is even better approach yes. But you mentioned CPU_OFF. Did you > > mean SYSTEM_RESET? > > By CPU_OFF I was referring to the fact that Xen will issue the call all > CPUs but one. The remaining CPU will issue the command to reset/shutdown > the system. > > >> void machine_halt(void) > >> @@ -21,10 +23,6 @@ void machine_halt(void) > >> smp_call_function(halt_this_cpu, NULL, 0); > >> local_irq_disable(); > >> > >> - /* Wait at most another 10ms for all other CPUs to go offline. */ > >> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) > >> - mdelay(1); > >> - > >> /* This is mainly for PSCI-0.2, which does not return if success. > */ > >> call_psci_system_off(); > >> > >>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> > >>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> > >> > >> I don't recall to see patch on the ML recently for this. So is this an > >> internal review? > > > > Yeah, sorry about that. Dmytro is a new member in our team and he is not > > yet familiar with differences in internal reviews and reviews in ML. > > No worries. I usually classify internal review anything that was done > privately. This looks to be a public review, althought not on xen-devel. > > I understand that not all some of the patches are still in PoC stage and > doing the review on your github is a good idea. But for those are meant > to be for upstream (e.g. bug fixes, small patches), I would suggest to > do the review on xen-devel directly > Sorry about that > > Cheers, > > -- > Julien Grall >
Hi Julien, Julien Grall <julien@xen.org> writes: > On 16/06/2022 19:40, Volodymyr Babchuk wrote: >> Hi Julien, > > Hi Volodymyr, > >> Julien Grall <julien@xen.org> writes: >> >>> Hi, >>> >>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: >>>> From: Dmytro Semenets <dmytro_semenets@epam.com> >>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. >>> >>> I am confused. The spec is talking about Trusted OS and not >>> firmware. The docummentation is also not specific to ARM Trusted >>> Firmware. So did you mean "Trusted OS"? >> It should be "firmware", I believe. > > Hmmm... I couldn't find a reference in the spec suggesting that > CPU_OFF could return DENIED because of the firmware. Do you have a > pointer to the spec? Ah, looks like we are talking about different things. Indeed, CPU_OFF can return DENIED only because of Trusted OS. But entity that *returns* the error to a caller is a firmware. >> >>> >>> Also, did you reproduce on HW? If so, on which CPU this will fail? >>> >> Yes, we reproduced this on HW. In our case it failed on CPU0. To be >> fair, in our case it had nothing to do with Trusted OS. It is just >> platform limitation - it can't turn off CPU0. But from Xen perspective >> there is no difference - CPU_OFF call returns DENIED. > > Thanks for the clarification. I think I have seen that in the wild > also but it never got on top of my queue. It is good that we are > fixing it. > >> >>>> This patch brings the hypervisor into compliance with the PSCI >>>> specification. >>> >>> Now it means CPU off will never be turned off using PSCI. Instead, we >>> would end up to spin in Xen. This would be a problem because we could >>> save less power. >> Agreed. >> >>> >>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" >>>> section 5.5.2 >>> >>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when >>> the trusted OS can only run on one core. >>> >>> Some of the trusted OS are migratable. So I think we should first >>> attempt to migrate the CPU. Then if it doesn't work, we should prevent >>> the CPU to go offline. >>> >>> That said, upstream doesn't support cpu offlining (I don't know for >>> your use case). In case of shutdown, it is not necessary to offline >>> the CPU, so we could avoid to call CPU_OFF on all CPUs but >>> one. Something like: >>> >> This is even better approach yes. But you mentioned CPU_OFF. Did you >> mean SYSTEM_RESET? > > By CPU_OFF I was referring to the fact that Xen will issue the call > all CPUs but one. The remaining CPU will issue the command to > reset/shutdown the system. > I just want to clarify: change that you suggested removes call to stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all. All CPUs except one will spin in while ( 1 ) wfi(); while last cpu will issue SYSTEM_OFF or SYSTEM_RESET. Is this correct? >>> void machine_halt(void) >>> @@ -21,10 +23,6 @@ void machine_halt(void) >>> smp_call_function(halt_this_cpu, NULL, 0); >>> local_irq_disable(); >>> >>> - /* Wait at most another 10ms for all other CPUs to go offline. */ >>> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) >>> - mdelay(1); >>> - >>> /* This is mainly for PSCI-0.2, which does not return if success. */ >>> call_psci_system_off(); >>> >>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> >>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> >>> >>> I don't recall to see patch on the ML recently for this. So is this an >>> internal review? >> Yeah, sorry about that. Dmytro is a new member in our team and he is >> not >> yet familiar with differences in internal reviews and reviews in ML. > > No worries. I usually classify internal review anything that was done > privately. This looks to be a public review, althought not on > xen-devel. > > I understand that not all some of the patches are still in PoC stage > and doing the review on your github is a good idea. But for those are > meant to be for upstream (e.g. bug fixes, small patches), I would > suggest to do the review on xen-devel directly. It not always clear if patch is eligible for upstream. At first we thought that problem is platform-specific and we weren't sure that we will find a proper upstreamable fix. Probably you saw that PR's name quite differs from final patch. This is because initial solution was completely different from final one.
Hi, On 17/06/2022 10:10, Volodymyr Babchuk wrote: > Julien Grall <julien@xen.org> writes: > >> On 16/06/2022 19:40, Volodymyr Babchuk wrote: >>> Hi Julien, >> >> Hi Volodymyr, >> >>> Julien Grall <julien@xen.org> writes: >>> >>>> Hi, >>>> >>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: >>>>> From: Dmytro Semenets <dmytro_semenets@epam.com> >>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. >>>> >>>> I am confused. The spec is talking about Trusted OS and not >>>> firmware. The docummentation is also not specific to ARM Trusted >>>> Firmware. So did you mean "Trusted OS"? >>> It should be "firmware", I believe. >> >> Hmmm... I couldn't find a reference in the spec suggesting that >> CPU_OFF could return DENIED because of the firmware. Do you have a >> pointer to the spec? > > Ah, looks like we are talking about different things. Indeed, CPU_OFF > can return DENIED only because of Trusted OS. But entity that *returns* > the error to a caller is a firmware. Right, the interesting part is *why* DENIED is returned not *who* returns it. >>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" >>>>> section 5.5.2 >>>> >>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when >>>> the trusted OS can only run on one core. >>>> >>>> Some of the trusted OS are migratable. So I think we should first >>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent >>>> the CPU to go offline. >>>> >>>> That said, upstream doesn't support cpu offlining (I don't know for >>>> your use case). In case of shutdown, it is not necessary to offline >>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but >>>> one. Something like: >>>> >>> This is even better approach yes. But you mentioned CPU_OFF. Did you >>> mean SYSTEM_RESET? >> >> By CPU_OFF I was referring to the fact that Xen will issue the call >> all CPUs but one. The remaining CPU will issue the command to >> reset/shutdown the system. >> > > I just want to clarify: change that you suggested removes call to > stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all. I was describing the existing behavior. > > All CPUs except one will spin in > > while ( 1 ) > wfi(); > > while last cpu will issue SYSTEM_OFF or SYSTEM_RESET. > > Is this correct? Yes. > >>>> void machine_halt(void) >>>> @@ -21,10 +23,6 @@ void machine_halt(void) >>>> smp_call_function(halt_this_cpu, NULL, 0); >>>> local_irq_disable(); >>>> >>>> - /* Wait at most another 10ms for all other CPUs to go offline. */ >>>> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) >>>> - mdelay(1); >>>> - >>>> /* This is mainly for PSCI-0.2, which does not return if success. */ >>>> call_psci_system_off(); >>>> >>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> >>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> >>>> >>>> I don't recall to see patch on the ML recently for this. So is this an >>>> internal review? >>> Yeah, sorry about that. Dmytro is a new member in our team and he is >>> not >>> yet familiar with differences in internal reviews and reviews in ML. >> >> No worries. I usually classify internal review anything that was done >> privately. This looks to be a public review, althought not on >> xen-devel. >> >> I understand that not all some of the patches are still in PoC stage >> and doing the review on your github is a good idea. But for those are >> meant to be for upstream (e.g. bug fixes, small patches), I would >> suggest to do the review on xen-devel directly. > > It not always clear if patch is eligible for upstream. At first we > thought that problem is platform-specific and we weren't sure that we > will find a proper upstreamable fix. You can guess but not be sure until you send it to upstrema :). In fact,... > Probably you saw that PR's name > quite differs from final patch. This is because initial solution was > completely different from final one. ... even before looking at your PR, this was the first solution I had in mind. I am still pondering whether this could be the best approach because I have the suspicion there might be some platform out relying on receiving the shutdown request on CPU0. Anyway, this is so far just theorical, my proposal should solve your problem. On a separate topic, the community is aiming to support a wide range of platforms out-of-the-box. I think platform-specific patches are acceptable so long they are self-contained (to some extend. I.e if you ask to support Xen on RPI3 then I would still probably argue against :)) or have a limited impact to the rest of the users (this is why we have alternative in Xen). My point here is your initial solution may have been the preferred approach for upstream. So if you involve the community early, you are reducing the risk to have to backtrack and/or spend extra time in the wrong directions. Cheers,
пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xen.org>: Hi Julien, > > Hi, > > On 17/06/2022 10:10, Volodymyr Babchuk wrote: > > Julien Grall <julien@xen.org> writes: > > > >> On 16/06/2022 19:40, Volodymyr Babchuk wrote: > >>> Hi Julien, > >> > >> Hi Volodymyr, > >> > >>> Julien Grall <julien@xen.org> writes: > >>> > >>>> Hi, > >>>> > >>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: > >>>>> From: Dmytro Semenets <dmytro_semenets@epam.com> > >>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. > >>>> > >>>> I am confused. The spec is talking about Trusted OS and not > >>>> firmware. The docummentation is also not specific to ARM Trusted > >>>> Firmware. So did you mean "Trusted OS"? > >>> It should be "firmware", I believe. > >> > >> Hmmm... I couldn't find a reference in the spec suggesting that > >> CPU_OFF could return DENIED because of the firmware. Do you have a > >> pointer to the spec? > > > > Ah, looks like we are talking about different things. Indeed, CPU_OFF > > can return DENIED only because of Trusted OS. But entity that *returns* > > the error to a caller is a firmware. > > Right, the interesting part is *why* DENIED is returned not *who* > returns it. ARM TF returns DENIED *only* for the platform I have. We have a dissonance between spec and xen implementation because DENIED returned by ARM TF or Thrusted OS or whatever is not a reason for panic. And we have issues with this. If machine_restart() behaviour is more or less correct (sometimes reports about panic but restarts the machine) but machine_halt() doesn't work at all. Transit execution to CPU0 for my understanding is a workaround and this approach will fix machine_restart() function but will not fix machine_halt(). Approach you suggested (spinning all cpus) will work but will save less energy. > >>>>> Refer to "Arm Power State Coordination Interface (DEN0022D.b)" > >>>>> section 5.5.2 > >>>> > >>>> Reading both 5.5.2 and 5.9.1 together, DENIED would be returned when > >>>> the trusted OS can only run on one core. > >>>> > >>>> Some of the trusted OS are migratable. So I think we should first > >>>> attempt to migrate the CPU. Then if it doesn't work, we should prevent > >>>> the CPU to go offline. > >>>> > >>>> That said, upstream doesn't support cpu offlining (I don't know for > >>>> your use case). In case of shutdown, it is not necessary to offline > >>>> the CPU, so we could avoid to call CPU_OFF on all CPUs but > >>>> one. Something like: > >>>> > >>> This is even better approach yes. But you mentioned CPU_OFF. Did you > >>> mean SYSTEM_RESET? > >> > >> By CPU_OFF I was referring to the fact that Xen will issue the call > >> all CPUs but one. The remaining CPU will issue the command to > >> reset/shutdown the system. > >> > > > > I just want to clarify: change that you suggested removes call to > > stop_cpu() in halt_this_cpu(). So no CPU_OFF will be sent at all. > > I was describing the existing behavior. > > > > > All CPUs except one will spin in > > > > while ( 1 ) > > wfi(); > > > > while last cpu will issue SYSTEM_OFF or SYSTEM_RESET. > > > > Is this correct? > > Yes. > > > > >>>> void machine_halt(void) > >>>> @@ -21,10 +23,6 @@ void machine_halt(void) > >>>> smp_call_function(halt_this_cpu, NULL, 0); > >>>> local_irq_disable(); > >>>> > >>>> - /* Wait at most another 10ms for all other CPUs to go offline. */ > >>>> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) > >>>> - mdelay(1); > >>>> - > >>>> /* This is mainly for PSCI-0.2, which does not return if success. */ > >>>> call_psci_system_off(); > >>>> > >>>>> Signed-off-by: Dmytro Semenets <dmytro_semenets@epam.com> > >>>>> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> > >>>> > >>>> I don't recall to see patch on the ML recently for this. So is this an > >>>> internal review? > >>> Yeah, sorry about that. Dmytro is a new member in our team and he is > >>> not > >>> yet familiar with differences in internal reviews and reviews in ML. > >> > >> No worries. I usually classify internal review anything that was done > >> privately. This looks to be a public review, althought not on > >> xen-devel. > >> > >> I understand that not all some of the patches are still in PoC stage > >> and doing the review on your github is a good idea. But for those are > >> meant to be for upstream (e.g. bug fixes, small patches), I would > >> suggest to do the review on xen-devel directly. > > > > It not always clear if patch is eligible for upstream. At first we > > thought that problem is platform-specific and we weren't sure that we > > will find a proper upstreamable fix. > > You can guess but not be sure until you send it to upstrema :). In fact,... > > > Probably you saw that PR's name > > quite differs from final patch. This is because initial solution was > > completely different from final one. > > ... even before looking at your PR, this was the first solution I had in > mind. I am still pondering whether this could be the best approach > because I have the suspicion there might be some platform out relying on > receiving the shutdown request on CPU0. > > Anyway, this is so far just theorical, my proposal should solve your > problem. > > On a separate topic, the community is aiming to support a wide range of > platforms out-of-the-box. I think platform-specific patches are > acceptable so long they are self-contained (to some extend. I.e if you > ask to support Xen on RPI3 then I would still probably argue against :)) > or have a limited impact to the rest of the users (this is why we have > alternative in Xen). > > My point here is your initial solution may have been the preferred > approach for upstream. So if you involve the community early, you are > reducing the risk to have to backtrack and/or spend extra time in the > wrong directions. > > Cheers, > > -- > Julien Grall
Hi, On 18/06/2022 18:43, Dmytro Semenets wrote: > пт, 17 июн. 2022 г. в 14:12, Julien Grall <julien@xen.org>: > Hi Julien, >> >> Hi, >> >> On 17/06/2022 10:10, Volodymyr Babchuk wrote: >>> Julien Grall <julien@xen.org> writes: >>> >>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote: >>>>> Hi Julien, >>>> >>>> Hi Volodymyr, >>>> >>>>> Julien Grall <julien@xen.org> writes: >>>>> >>>>>> Hi, >>>>>> >>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: >>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com> >>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. >>>>>> >>>>>> I am confused. The spec is talking about Trusted OS and not >>>>>> firmware. The docummentation is also not specific to ARM Trusted >>>>>> Firmware. So did you mean "Trusted OS"? >>>>> It should be "firmware", I believe. >>>> >>>> Hmmm... I couldn't find a reference in the spec suggesting that >>>> CPU_OFF could return DENIED because of the firmware. Do you have a >>>> pointer to the spec? >>> >>> Ah, looks like we are talking about different things. Indeed, CPU_OFF >>> can return DENIED only because of Trusted OS. But entity that *returns* >>> the error to a caller is a firmware. >> >> Right, the interesting part is *why* DENIED is returned not *who* >> returns it. > ARM TF returns DENIED *only* for the platform I have. > We have a dissonance between spec and xen implementation because > DENIED returned by > ARM TF or Thrusted OS or whatever is not a reason for panic. I agree that's not a reason for panic. However, knowing the reason does help to figure out the correct approach. For instance, one could have suggest to migrate the trusted OS to another pCPU. But this would not have worked for you because the DENIED is not about that. > And we > have issues with this. > If machine_restart() behaviour is more or less correct (sometimes > reports about panic but restarts the machine) Right... > but machine_halt() doesn't work at al ... this should also be the case here because machine_halt() could also be called from cpu0. So I am a bit confused why you are saying it never works. > Transit execution to CPU0 for my understanding is a workaround and > this approach will fix > machine_restart() function but will not fix machine_halt(). I would say it is a more specific case of what the spec suggests (see below). But it should fix both machine_restart() and machine_halt() because the last CPU running will be CPU0. So Xen would call SYSTEM_* rather than CPU_OF. So I don't understand why you think it will fix one but not the other. In fact, the idea to always run the request from a given CPU is quite similar to what the specification suggests (5.10.3 DEN0022D.b): " One way in which cores can be placed into a known state is to use calls to CPU_OFF on all online cores except for the last one, which instead uses SYSTEM_OFF. If a UP Trusted OS is present, this method only works if the core that calls SYSTEM_OFF is the one where the Trusted OS is resident, as calls to CPU_OFF on this core return a DENIED error. Any core can call SYSTEM_OFF. " For Xen, we would need to detect if the trusted OS is UP and where it is running. Then we could always restart/halt from that CPU or CPU0. > Approach > you suggested (spinning all cpus) will work but > will save less energy. I am not sure to understand what's the concern about the energy here. From my understanding of the specification, SYSTEM_OFF will take care of switching off the power for all the cores. So at worse, the CPUs will spin for a few ms. This would like be more efficient than a call to PSCI CPU off. This is different compare just turning off one CPU (i.e. CPU hot-unplug) because the CPU will end up to spin for a very long time. And this is why I wasn't OK with conditionally avoiding the panic. Cheers,
Hi Julien, > >> > >> Hi, > >> > >> On 17/06/2022 10:10, Volodymyr Babchuk wrote: > >>> Julien Grall <julien@xen.org> writes: > >>> > >>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote: > >>>>> Hi Julien, > >>>> > >>>> Hi Volodymyr, > >>>> > >>>>> Julien Grall <julien@xen.org> writes: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: > >>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com> > >>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. > >>>>>> > >>>>>> I am confused. The spec is talking about Trusted OS and not > >>>>>> firmware. The docummentation is also not specific to ARM Trusted > >>>>>> Firmware. So did you mean "Trusted OS"? > >>>>> It should be "firmware", I believe. > >>>> > >>>> Hmmm... I couldn't find a reference in the spec suggesting that > >>>> CPU_OFF could return DENIED because of the firmware. Do you have a > >>>> pointer to the spec? > >>> > >>> Ah, looks like we are talking about different things. Indeed, CPU_OFF > >>> can return DENIED only because of Trusted OS. But entity that *returns* > >>> the error to a caller is a firmware. > >> > >> Right, the interesting part is *why* DENIED is returned not *who* > >> returns it. > > ARM TF returns DENIED *only* for the platform I have. > > We have a dissonance between spec and xen implementation because > > DENIED returned by > > ARM TF or Thrusted OS or whatever is not a reason for panic. > > I agree that's not a reason for panic. However, knowing the reason does > help to figure out the correct approach. > > For instance, one could have suggest to migrate the trusted OS to > another pCPU. But this would not have worked for you because the DENIED > is not about that. > > > And we > > have issues with this. > > If machine_restart() behaviour is more or less correct (sometimes > > reports about panic but restarts the machine) > > Right... > > > but machine_halt() doesn't work at al > ... this should also be the case here because machine_halt() could also > be called from cpu0. So I am a bit confused why you are saying it never > works. If machine_halt() called on a CPU other than CPU0 it caused panic and reboot. If it called on a CPU0 it also caused panic but after system power off and reboot is not issued. In this state you can still call the xen console. But you can't reboot the system. > > > Transit execution to CPU0 for my understanding is a workaround and > > this approach will fix > > machine_restart() function but will not fix machine_halt(). > > I would say it is a more specific case of what the spec suggests (see > below). But it should fix both machine_restart() and machine_halt() > because the last CPU running will be CPU0. So Xen would call SYSTEM_* > rather than CPU_OF. So I don't understand why you think it will fix one > but not the other. Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop the whole system. > > In fact, the idea to always run the request from a given CPU is quite > similar to what the specification suggests (5.10.3 DEN0022D.b): > > " > One way in which cores can be placed into a known state is to use calls > to CPU_OFF on all online cores > except for the last one, which instead uses SYSTEM_OFF. If a UP Trusted > OS is present, this method > only works if the core that calls SYSTEM_OFF is the one where the > Trusted OS is resident, as calls to > CPU_OFF on this core return a DENIED error. Any core can call SYSTEM_OFF. > " > > For Xen, we would need to detect if the trusted OS is UP and where it is > running. Then we could always restart/halt from that CPU or CPU0. > > > Approach > > you suggested (spinning all cpus) will work but > > will save less energy. > > I am not sure to understand what's the concern about the energy here. > From my understanding of the specification, SYSTEM_OFF will take care > of switching off the power for all the cores. So at worse, the CPUs will > spin for a few ms. This would like be more efficient than a call to PSCI > CPU off. > > This is different compare just turning off one CPU (i.e. CPU hot-unplug) > because the CPU will end up to spin for a very long time. And this is > why I wasn't OK with conditionally avoiding the panic. > > Cheers, > > -- > Julien Grall
On 21/06/2022 09:55, Dmytro Semenets wrote: > Hi Julien, Hi Dmytro, >>>> >>>> Hi, >>>> >>>> On 17/06/2022 10:10, Volodymyr Babchuk wrote: >>>>> Julien Grall <julien@xen.org> writes: >>>>> >>>>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote: >>>>>>> Hi Julien, >>>>>> >>>>>> Hi Volodymyr, >>>>>> >>>>>>> Julien Grall <julien@xen.org> writes: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: >>>>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com> >>>>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. >>>>>>>> >>>>>>>> I am confused. The spec is talking about Trusted OS and not >>>>>>>> firmware. The docummentation is also not specific to ARM Trusted >>>>>>>> Firmware. So did you mean "Trusted OS"? >>>>>>> It should be "firmware", I believe. >>>>>> >>>>>> Hmmm... I couldn't find a reference in the spec suggesting that >>>>>> CPU_OFF could return DENIED because of the firmware. Do you have a >>>>>> pointer to the spec? >>>>> >>>>> Ah, looks like we are talking about different things. Indeed, CPU_OFF >>>>> can return DENIED only because of Trusted OS. But entity that *returns* >>>>> the error to a caller is a firmware. >>>> >>>> Right, the interesting part is *why* DENIED is returned not *who* >>>> returns it. >>> ARM TF returns DENIED *only* for the platform I have. >>> We have a dissonance between spec and xen implementation because >>> DENIED returned by >>> ARM TF or Thrusted OS or whatever is not a reason for panic. >> >> I agree that's not a reason for panic. However, knowing the reason does >> help to figure out the correct approach. >> >> For instance, one could have suggest to migrate the trusted OS to >> another pCPU. But this would not have worked for you because the DENIED >> is not about that. >> >>> And we >>> have issues with this. >>> If machine_restart() behaviour is more or less correct (sometimes >>> reports about panic but restarts the machine) >> >> Right... >> >>> but machine_halt() doesn't work at al >> ... this should also be the case here because machine_halt() could also >> be called from cpu0. So I am a bit confused why you are saying it never >> works. > If machine_halt() called on a CPU other than CPU0 it caused panic and reboot. > If it called on a CPU0 it also caused panic but after system power off > and reboot > is not issued. In this state you can still call the xen console. But > you can't reboot the system. I am lost. In a previous e-mail you said that PSCI CPU_OFF would return DENIED on CPU0. IOW, I understood that for other CPUs, it would succeed. But here, you are tell me the opposite: "If it called on a CPU0 it also caused panic but after system power off and reboot". If machine_halt() is called from CPU0, then CPU_OFF should not be called on CPU0. So where is that panic coming from? >> >>> Transit execution to CPU0 for my understanding is a workaround and >>> this approach will fix >>> machine_restart() function but will not fix machine_halt(). >> >> I would say it is a more specific case of what the spec suggests (see >> below). But it should fix both machine_restart() and machine_halt() >> because the last CPU running will be CPU0. So Xen would call SYSTEM_* >> rather than CPU_OF. So I don't understand why you think it will fix one >> but not the other. > Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop > the whole system. Hmmm... All the other CPUs should be off (or spinning with interrupt disabled), so are you saying that SYSTEM_OFF return? Cheers,
Hi Julien, > > Hi Dmytro, > > >>>> > >>>> Hi, > >>>> > >>>> On 17/06/2022 10:10, Volodymyr Babchuk wrote: > >>>>> Julien Grall <julien@xen.org> writes: > >>>>> > >>>>>> On 16/06/2022 19:40, Volodymyr Babchuk wrote: > >>>>>>> Hi Julien, > >>>>>> > >>>>>> Hi Volodymyr, > >>>>>> > >>>>>>> Julien Grall <julien@xen.org> writes: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> On 16/06/2022 14:55, dmitry.semenets@gmail.com wrote: > >>>>>>>>> From: Dmytro Semenets <dmytro_semenets@epam.com> > >>>>>>>>> According to PSCI specification ARM TF can return DENIED on CPU OFF. > >>>>>>>> > >>>>>>>> I am confused. The spec is talking about Trusted OS and not > >>>>>>>> firmware. The docummentation is also not specific to ARM Trusted > >>>>>>>> Firmware. So did you mean "Trusted OS"? > >>>>>>> It should be "firmware", I believe. > >>>>>> > >>>>>> Hmmm... I couldn't find a reference in the spec suggesting that > >>>>>> CPU_OFF could return DENIED because of the firmware. Do you have a > >>>>>> pointer to the spec? > >>>>> > >>>>> Ah, looks like we are talking about different things. Indeed, CPU_OFF > >>>>> can return DENIED only because of Trusted OS. But entity that *returns* > >>>>> the error to a caller is a firmware. > >>>> > >>>> Right, the interesting part is *why* DENIED is returned not *who* > >>>> returns it. > >>> ARM TF returns DENIED *only* for the platform I have. > >>> We have a dissonance between spec and xen implementation because > >>> DENIED returned by > >>> ARM TF or Thrusted OS or whatever is not a reason for panic. > >> > >> I agree that's not a reason for panic. However, knowing the reason does > >> help to figure out the correct approach. > >> > >> For instance, one could have suggest to migrate the trusted OS to > >> another pCPU. But this would not have worked for you because the DENIED > >> is not about that. > >> > >>> And we > >>> have issues with this. > >>> If machine_restart() behaviour is more or less correct (sometimes > >>> reports about panic but restarts the machine) > >> > >> Right... > >> > >>> but machine_halt() doesn't work at al > >> ... this should also be the case here because machine_halt() could also > >> be called from cpu0. So I am a bit confused why you are saying it never > >> works. > > If machine_halt() called on a CPU other than CPU0 it caused panic and reboot. > > If it called on a CPU0 it also caused panic but after system power off > > and reboot > > is not issued. In this state you can still call the xen console. But > > you can't reboot the system. > > I am lost. In a previous e-mail you said that PSCI CPU_OFF would return > DENIED on CPU0. IOW, I understood that for other CPUs, it would succeed. I'm sorry I confused You. Yes it causes panic and prints it will be rebooted but actual reboot doesn't happen. > > But here, you are tell me the opposite: > > "If it called on a CPU0 it also caused panic but after system power off > and reboot". > > If machine_halt() is called from CPU0, then CPU_OFF should not be called > on CPU0. So where is that panic coming from? > > >> > >>> Transit execution to CPU0 for my understanding is a workaround and > >>> this approach will fix > >>> machine_restart() function but will not fix machine_halt(). > >> > >> I would say it is a more specific case of what the spec suggests (see > >> below). But it should fix both machine_restart() and machine_halt() > >> because the last CPU running will be CPU0. So Xen would call SYSTEM_* > >> rather than CPU_OF. So I don't understand why you think it will fix one > >> but not the other. > > Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop > > the whole system. > > Hmmm... All the other CPUs should be off (or spinning with interrupt > disabled), so are you saying that SYSTEM_OFF return? Yes. SYSTEM_OFF returns on my HW. This is reason when CPU_OFF for CPU0 happens. > > Cheers, > > -- > Julien Grall
Hi, On 21/06/2022 11:05, Dmytro Semenets wrote: >>>>> but machine_halt() doesn't work at al >>>> ... this should also be the case here because machine_halt() could also >>>> be called from cpu0. So I am a bit confused why you are saying it never >>>> works. >>> If machine_halt() called on a CPU other than CPU0 it caused panic and reboot. >>> If it called on a CPU0 it also caused panic but after system power off >>> and reboot >>> is not issued. In this state you can still call the xen console. But >>> you can't reboot the system. >> >> I am lost. In a previous e-mail you said that PSCI CPU_OFF would return >> DENIED on CPU0. IOW, I understood that for other CPUs, it would succeed. > I'm sorry I confused You. > Yes it causes panic and prints it will be rebooted but actual reboot > doesn't happen. Ok. That's most likely because of the call to smp_call_function() in machine_restart(). It is using cpu_online_map to decide which CPUs to send the IPI. This will block because some of the CPUs are already off. So they will never acknowledge the IPI. > >> >> But here, you are tell me the opposite: >> >> "If it called on a CPU0 it also caused panic but after system power off >> and reboot". >> >> If machine_halt() is called from CPU0, then CPU_OFF should not be called >> on CPU0. So where is that panic coming from? >> >>>> >>>>> Transit execution to CPU0 for my understanding is a workaround and >>>>> this approach will fix >>>>> machine_restart() function but will not fix machine_halt(). >>>> >>>> I would say it is a more specific case of what the spec suggests (see >>>> below). But it should fix both machine_restart() and machine_halt() >>>> because the last CPU running will be CPU0. So Xen would call SYSTEM_* >>>> rather than CPU_OF. So I don't understand why you think it will fix one >>>> but not the other. >>> Looks like this is specific for my HW case. SYSTEM_OFF doesn't stop >>> the whole system. >> >> Hmmm... All the other CPUs should be off (or spinning with interrupt >> disabled), so are you saying that SYSTEM_OFF return? > Yes. SYSTEM_OFF returns on my HW. Hmmm... This is not compliant with the specification. Could you check why PSCI SYSTEM_OFF is returning? > This is reason when CPU_OFF for CPU0 happens. Right, machine_halt() will call halt_this_cpu() that in turn will call PSCI CPU_OFF. If you modify halt_this_cpu() to avoid calling PSCI CPU_OFF (as I suggested before) then the panic() will never happen. Instead, the CPU will execute "wfi" in a loop with interrupt disabled. To summarize there are two parts to resolve: 1) Understand why PSCI SYSTEM_OFF returns on your platform 2) Modify stop_cpu() to not call PSCI CPU_OFF Cheers,
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c index 0c90c2305c..55787fde58 100644 --- a/xen/arch/arm/psci.c +++ b/xen/arch/arm/psci.c @@ -63,8 +63,9 @@ void call_psci_cpu_off(void) /* If successfull the PSCI cpu_off call doesn't return */ arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, &res); - panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(), - PSCI_RET(res)); + if ( PSCI_RET(res) != PSCI_DENIED ) + panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(), + PSCI_RET(res)); } }