diff mbox series

[4/7] KVM: arm64: Allocate guard pages near hyp stacks

Message ID 20220210224220.4076151-5-kaleshsingh@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Hypervisor stack enhancements | expand

Commit Message

Kalesh Singh Feb. 10, 2022, 10:41 p.m. UTC
From: Quentin Perret <qperret@google.com>

Allocate unbacked VA space underneath each stack page to ensure stack
overflows get trapped and don't corrupt memory silently.

The stack is aligned to twice its size (PAGE_SIZE), meaning that any
valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
check for overflow in the exception entry without corrupting any GPRs.

Signed-off-by: Quentin Perret <qperret@google.com>
[ Kalesh - Update commit text and comments,
           refactor, add overflow handling ]
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
---
 arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
 3 files changed, 39 insertions(+), 1 deletion(-)

Comments

Marc Zyngier Feb. 14, 2022, 2:06 p.m. UTC | #1
On Thu, 10 Feb 2022 22:41:45 +0000,
Kalesh Singh <kaleshsingh@google.com> wrote:
> 
> From: Quentin Perret <qperret@google.com>
> 
> Allocate unbacked VA space underneath each stack page to ensure stack
> overflows get trapped and don't corrupt memory silently.
> 
> The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> check for overflow in the exception entry without corrupting any GPRs.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> [ Kalesh - Update commit text and comments,
>            refactor, add overflow handling ]
> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
>  3 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 3d613e721a75..78e4b612ac06 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
>  
>  .macro invalid_host_el2_vect
>  	.align 7
> +
> +	/* Test stack overflow without corrupting GPRs */
> +	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> +

I am definitely concerned with this in a system not using pKVM (which
is on average 100% of the upstream users so far! ;-). This is more or
less guaranteed to do the wrong thing 50% of the times, depending on
the alignment of the stack.

>  	/* If a guest is loaded, panic out of it. */
>  	stp	x0, x1, [sp, #-16]!
>  	get_loaded_vcpu x0, x1
> @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
>  	 * been partially clobbered by __host_enter.
>  	 */
>  	b	hyp_panic
> +
> +.L__hyp_sp_overflow\@:
> +	/*
> +	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
> +	 * This corrupts the stack but is ok, since we won't be attempting
> +	 * any unwinding here.
> +	 */
> +	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> +	mov	sp, x0
> +
> +	bl	hyp_panic_bad_stack
> +	ASM_BUG()
>  .endm
>  
>  .macro invalid_host_el1_vect
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 99e178cf4249..114053dff228 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
>  		if (ret)
>  			return ret;
>  
> -		/* Map stack pages in the 'private' VA range */
> +		/*
> +		 * Allocate 'private' VA range for stack guard pages.
> +		 *
> +		 * The 'private' VA range grows upward and stacks downwards, so
> +		 * allocate the guard page first. But make sure to align the
> +		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> +		 * detection in the entry assembly code.
> +		 */
> +		do {
> +			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> +			if (IS_ERR(start))
> +				return PTR_ERR(start);
> +		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));

This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
perform the required alignment? It could easily be convinced to return
an address that is aligned on the size of the region, which would
avoid this sort of loop.

Thanks,

	M.
Kalesh Singh Feb. 14, 2022, 10:03 p.m. UTC | #2
On Mon, Feb 14, 2022 at 6:06 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 22:41:45 +0000,
> Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > From: Quentin Perret <qperret@google.com>
> >
> > Allocate unbacked VA space underneath each stack page to ensure stack
> > overflows get trapped and don't corrupt memory silently.
> >
> > The stack is aligned to twice its size (PAGE_SIZE), meaning that any
> > valid stack address has PAGE_SHIFT bit as 0. This allows us to easily
> > check for overflow in the exception entry without corrupting any GPRs.
> >
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > [ Kalesh - Update commit text and comments,
> >            refactor, add overflow handling ]
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >  arch/arm64/kvm/hyp/nvhe/host.S   | 16 ++++++++++++++++
> >  arch/arm64/kvm/hyp/nvhe/setup.c  | 19 ++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/switch.c |  5 +++++
> >  3 files changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> > index 3d613e721a75..78e4b612ac06 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/host.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> > @@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
> >
> >  .macro invalid_host_el2_vect
> >       .align 7
> > +
> > +     /* Test stack overflow without corrupting GPRs */
> > +     test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
> > +
>
> I am definitely concerned with this in a system not using pKVM (which
> is on average 100% of the upstream users so far! ;-). This is more or
> less guaranteed to do the wrong thing 50% of the times, depending on
> the alignment of the stack.

Thanks for pointing this out Marc. I'll rework this to work for both
protected and non-protected modes.

>
> >       /* If a guest is loaded, panic out of it. */
> >       stp     x0, x1, [sp, #-16]!
> >       get_loaded_vcpu x0, x1
> > @@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
> >        * been partially clobbered by __host_enter.
> >        */
> >       b       hyp_panic
> > +
> > +.L__hyp_sp_overflow\@:
> > +     /*
> > +      * Reset SP to the top of the stack, to allow handling the hyp_panic.
> > +      * This corrupts the stack but is ok, since we won't be attempting
> > +      * any unwinding here.
> > +      */
> > +     ldr_this_cpu    x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
> > +     mov     sp, x0
> > +
> > +     bl      hyp_panic_bad_stack
> > +     ASM_BUG()
> >  .endm
> >
> >  .macro invalid_host_el1_vect
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 99e178cf4249..114053dff228 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -105,7 +105,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> >               if (ret)
> >                       return ret;
> >
> > -             /* Map stack pages in the 'private' VA range */
> > +             /*
> > +              * Allocate 'private' VA range for stack guard pages.
> > +              *
> > +              * The 'private' VA range grows upward and stacks downwards, so
> > +              * allocate the guard page first. But make sure to align the
> > +              * stack itself with PAGE_SIZE * 2 granularity to ease overflow
> > +              * detection in the entry assembly code.
> > +              */
> > +             do {
> > +                     start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
> > +                     if (IS_ERR(start))
> > +                             return PTR_ERR(start);
> > +             } while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
>
> This seems cumbersome. Can't we tweak hyp_alloc_private_va_range() to
> perform the required alignment? It could easily be convinced to return
> an address that is aligned on the size of the region, which would
> avoid this sort of loop.

Ack. I'll update it for v2.

- Kalesh

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..78e4b612ac06 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,10 @@  SYM_FUNC_END(__host_hvc)
 
 .macro invalid_host_el2_vect
 	.align 7
+
+	/* Test stack overflow without corrupting GPRs */
+	test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
+
 	/* If a guest is loaded, panic out of it. */
 	stp	x0, x1, [sp, #-16]!
 	get_loaded_vcpu x0, x1
@@ -165,6 +169,18 @@  SYM_FUNC_END(__host_hvc)
 	 * been partially clobbered by __host_enter.
 	 */
 	b	hyp_panic
+
+.L__hyp_sp_overflow\@:
+	/*
+	 * Reset SP to the top of the stack, to allow handling the hyp_panic.
+	 * This corrupts the stack but is ok, since we won't be attempting
+	 * any unwinding here.
+	 */
+	ldr_this_cpu	x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+	mov	sp, x0
+
+	bl	hyp_panic_bad_stack
+	ASM_BUG()
 .endm
 
 .macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 99e178cf4249..114053dff228 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,7 +105,24 @@  static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 		if (ret)
 			return ret;
 
-		/* Map stack pages in the 'private' VA range */
+		/*
+		 * Allocate 'private' VA range for stack guard pages.
+		 *
+		 * The 'private' VA range grows upward and stacks downwards, so
+		 * allocate the guard page first. But make sure to align the
+		 * stack itself with PAGE_SIZE * 2 granularity to ease overflow
+		 * detection in the entry assembly code.
+		 */
+		do {
+			start = (void *)hyp_alloc_private_va_range(PAGE_SIZE);
+			if (IS_ERR(start))
+				return PTR_ERR(start);
+		} while (IS_ALIGNED((u64) start, PAGE_SIZE * 2));
+
+		/*
+		 * Map stack pages in the 'private' VA range above the allocated
+		 * guard pages.
+		 */
 		end = (void *)__hyp_pa(per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va);
 		start = end - PAGE_SIZE;
 		start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..5a2e1ab79913 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -369,6 +369,11 @@  void __noreturn hyp_panic(void)
 	unreachable();
 }
 
+void __noreturn hyp_panic_bad_stack(void)
+{
+	hyp_panic();
+}
+
 asmlinkage void kvm_unexpected_el2_exception(void)
 {
 	return __kvm_unexpected_el2_exception();