diff mbox series

KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return

Message ID 20201229160059.64135-1-dbrazdil@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return | expand

Commit Message

David Brazdil Dec. 29, 2020, 4 p.m. UTC
The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should
not return, as dictated by the PSCI spec. However, there is firmware out
there which breaks this assumption, leading to a hyp panic. Make KVM
more robust to broken firmware by allowing these to return.

Signed-off-by: David Brazdil <dbrazdil@google.com>
---
 arch/arm64/kvm/hyp/nvhe/psci-relay.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

Comments

Russell King (Oracle) Dec. 29, 2020, 5:04 p.m. UTC | #1
On Tue, Dec 29, 2020 at 04:00:59PM +0000, David Brazdil wrote:
> The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should
> not return, as dictated by the PSCI spec. However, there is firmware out
> there which breaks this assumption, leading to a hyp panic. Make KVM
> more robust to broken firmware by allowing these to return.

Are you sure you should just return?

We've had issues in the past with Linux reboot(2) that returns
to userspace, allowing on 32-bit ARM for example watchdogs to
unexpectedly continue being serviced.
Marc Zyngier Dec. 29, 2020, 5:16 p.m. UTC | #2
Hi David,

On 2020-12-29 16:00, David Brazdil wrote:
> The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET 
> should
> not return, as dictated by the PSCI spec. However, there is firmware 
> out
> there which breaks this assumption, leading to a hyp panic. Make KVM
> more robust to broken firmware by allowing these to return.
> 
> Signed-off-by: David Brazdil <dbrazdil@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index e3947846ffcb..8e7128cb7667 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -77,12 +77,6 @@ static unsigned long psci_forward(struct
> kvm_cpu_context *host_ctxt)
>  			 cpu_reg(host_ctxt, 2), cpu_reg(host_ctxt, 3));
>  }
> 
> -static __noreturn unsigned long psci_forward_noreturn(struct
> kvm_cpu_context *host_ctxt)
> -{
> -	psci_forward(host_ctxt);
> -	hyp_panic(); /* unreachable */
> -}
> -
>  static unsigned int find_cpu_id(u64 mpidr)
>  {
>  	unsigned int i;
> @@ -251,10 +245,13 @@ static unsigned long psci_0_2_handler(u64
> func_id, struct kvm_cpu_context *host_
>  	case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
>  	case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
>  		return psci_forward(host_ctxt);
> +	/*
> +	 * SYSTEM_OFF/RESET should not return according to the spec.
> +	 * Allow it so as to stay robust to broken firmware.
> +	 */
>  	case PSCI_0_2_FN_SYSTEM_OFF:
>  	case PSCI_0_2_FN_SYSTEM_RESET:
> -		psci_forward_noreturn(host_ctxt);
> -		unreachable();
> +		return psci_forward(host_ctxt);
>  	case PSCI_0_2_FN64_CPU_SUSPEND:
>  		return psci_cpu_suspend(func_id, host_ctxt);
>  	case PSCI_0_2_FN64_CPU_ON:

Thanks for having tracked this.

I wonder whether we should also taint the kernel in this case,
because this is completely unexpected, and a major spec violation.

Ideally, we'd be able to detect this case and prevent pKVM from
getting initialised at all, but I guess there is no way to detect
the sucker without ... calling SYSTEM_RESET?

         M.
Marc Zyngier Dec. 30, 2020, 10:06 a.m. UTC | #3
On 2020-12-29 17:04, Russell King - ARM Linux admin wrote:
> On Tue, Dec 29, 2020 at 04:00:59PM +0000, David Brazdil wrote:
>> The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET 
>> should
>> not return, as dictated by the PSCI spec. However, there is firmware 
>> out
>> there which breaks this assumption, leading to a hyp panic. Make KVM
>> more robust to broken firmware by allowing these to return.
> 
> Are you sure you should just return?
> 
> We've had issues in the past with Linux reboot(2) that returns
> to userspace, allowing on 32-bit ARM for example watchdogs to
> unexpectedly continue being serviced.

I don't think this changes anything compared to the case where
the PSCI relay isn't enabled. The EL1 part of the kernel would
see the SYSTEM_RESET call return, and handle it accordingly
(stay in a while(1) loop).

This is consistent with the PSCI relay design goal of being
invisible to the EL1 kernel.

Thanks,

         M.
David Brazdil Dec. 30, 2020, 11:03 a.m. UTC | #4
On Tue, Dec 29, 2020 at 05:16:41PM +0000, Marc Zyngier wrote:
> Hi David,
> 
> On 2020-12-29 16:00, David Brazdil wrote:
> > The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should
> > not return, as dictated by the PSCI spec. However, there is firmware out
> > there which breaks this assumption, leading to a hyp panic. Make KVM
> > more robust to broken firmware by allowing these to return.
> > 
> > Signed-off-by: David Brazdil <dbrazdil@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/psci-relay.c | 13 +++++--------
> >  1 file changed, 5 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> > b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> > index e3947846ffcb..8e7128cb7667 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> > @@ -77,12 +77,6 @@ static unsigned long psci_forward(struct
> > kvm_cpu_context *host_ctxt)
> >  			 cpu_reg(host_ctxt, 2), cpu_reg(host_ctxt, 3));
> >  }
> > 
> > -static __noreturn unsigned long psci_forward_noreturn(struct
> > kvm_cpu_context *host_ctxt)
> > -{
> > -	psci_forward(host_ctxt);
> > -	hyp_panic(); /* unreachable */
> > -}
> > -
> >  static unsigned int find_cpu_id(u64 mpidr)
> >  {
> >  	unsigned int i;
> > @@ -251,10 +245,13 @@ static unsigned long psci_0_2_handler(u64
> > func_id, struct kvm_cpu_context *host_
> >  	case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
> >  	case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
> >  		return psci_forward(host_ctxt);
> > +	/*
> > +	 * SYSTEM_OFF/RESET should not return according to the spec.
> > +	 * Allow it so as to stay robust to broken firmware.
> > +	 */
> >  	case PSCI_0_2_FN_SYSTEM_OFF:
> >  	case PSCI_0_2_FN_SYSTEM_RESET:
> > -		psci_forward_noreturn(host_ctxt);
> > -		unreachable();
> > +		return psci_forward(host_ctxt);
> >  	case PSCI_0_2_FN64_CPU_SUSPEND:
> >  		return psci_cpu_suspend(func_id, host_ctxt);
> >  	case PSCI_0_2_FN64_CPU_ON:
> 
> Thanks for having tracked this.
> 
> I wonder whether we should also taint the kernel in this case,
> because this is completely unexpected, and a major spec violation.
> 
> Ideally, we'd be able to detect this case and prevent pKVM from
> getting initialised at all, but I guess there is no way to detect
> the sucker without ... calling SYSTEM_RESET?

Yeah, there are no bits to check, unfortunately. :(

David
Marc Zyngier Jan. 15, 2021, 11:33 a.m. UTC | #5
On Tue, 29 Dec 2020 16:00:59 +0000, David Brazdil wrote:
> The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should
> not return, as dictated by the PSCI spec. However, there is firmware out
> there which breaks this assumption, leading to a hyp panic. Make KVM
> more robust to broken firmware by allowing these to return.

Applied to next, thanks!

[1/1] KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return
      commit: 2c91ef39216149df6703c3fa6a47dd9a1e6091c1

Cheers,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index e3947846ffcb..8e7128cb7667 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -77,12 +77,6 @@  static unsigned long psci_forward(struct kvm_cpu_context *host_ctxt)
 			 cpu_reg(host_ctxt, 2), cpu_reg(host_ctxt, 3));
 }
 
-static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *host_ctxt)
-{
-	psci_forward(host_ctxt);
-	hyp_panic(); /* unreachable */
-}
-
 static unsigned int find_cpu_id(u64 mpidr)
 {
 	unsigned int i;
@@ -251,10 +245,13 @@  static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_
 	case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
 	case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
 		return psci_forward(host_ctxt);
+	/*
+	 * SYSTEM_OFF/RESET should not return according to the spec.
+	 * Allow it so as to stay robust to broken firmware.
+	 */
 	case PSCI_0_2_FN_SYSTEM_OFF:
 	case PSCI_0_2_FN_SYSTEM_RESET:
-		psci_forward_noreturn(host_ctxt);
-		unreachable();
+		return psci_forward(host_ctxt);
 	case PSCI_0_2_FN64_CPU_SUSPEND:
 		return psci_cpu_suspend(func_id, host_ctxt);
 	case PSCI_0_2_FN64_CPU_ON: