Message ID | 20230515165917.1306922-2-ltykernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv | expand |
On Mon, May 15, 2023 at 12:59:03PM -0400, Tianyu Lan wrote: > From: Tianyu Lan <tiala@microsoft.com> > > Add a #HV exception handler that uses IST stack. > Urgh.. that is entirely insufficient. Like it doesn't even begin to start to cover things. The whole existing VC IST stack abuse is already a nightmare and you're duplicating that.. without any explanation for why this would be needed and how it is correct. Please try again.
On 5/16/2023 5:30 PM, Peter Zijlstra wrote: > On Mon, May 15, 2023 at 12:59:03PM -0400, Tianyu Lan wrote: >> From: Tianyu Lan<tiala@microsoft.com> >> >> Add a #HV exception handler that uses IST stack. >> > Urgh.. that is entirely insufficient. Like it doesn't even begin to > start to cover things. > > The whole existing VC IST stack abuse is already a nightmare and you're > duplicating that.. without any explanation for why this would be needed > and how it is correct. > > Please try again. Hi Peter: Thanks for your review. Will add more explanation in the next version.
>> Add a #HV exception handler that uses IST stack. >> > > Urgh.. that is entirely insufficient. Like it doesn't even begin to > start to cover things. > > The whole existing VC IST stack abuse is already a nightmare and you're > duplicating that.. without any explanation for why this would be needed > and how it is correct. > > Please try again. #HV handler handles both #NMI & #MCE in the guest and nested #HV is never raised by the hypervisor. Next #HV exception is only raised by the hypervisor when Guest acknowledges the pending #HV exception by clearing "NoFurtherSignal” bit in the doorbell page. There is still protection (please see hv_switch_off_ist()) to gracefully exit the guest if by any chance a malicious hypervisor sends nested #HV. This saves with most of the nested IST stack pitfalls with #NMI & #MCE, also #DB is handled in noinstr code block(exc_vmm_communication()->vc_is_db {...}) hence avoid any recursive #DBs. Do you see anything else needs to be handled in #HV IST handling? Thanks, Pankaj
On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote: > > > > Add a #HV exception handler that uses IST stack. > > > > > > > Urgh.. that is entirely insufficient. Like it doesn't even begin to > > start to cover things. > > > > The whole existing VC IST stack abuse is already a nightmare and you're > > duplicating that.. without any explanation for why this would be needed > > and how it is correct. > > > > Please try again. > > #HV handler handles both #NMI & #MCE in the guest and nested #HV is never > raised by the hypervisor. I thought all this confidental computing nonsense was about not trusting the hypervisor, so how come we're now relying on the hypervisor being sane?
On 5/30/23 05:16, Gupta, Pankaj wrote: > #HV handler handles both #NMI & #MCE in the guest and nested #HV is > never raised by the hypervisor. Next #HV exception is only raised by the > hypervisor when Guest acknowledges the pending #HV exception by clearing > "NoFurtherSignal” bit in the doorbell page. There's a big difference between "is never raised by" and "cannot be raised by". Either way, this series (and this patch in particular) needs some much better changelogs so that this behavior is clear. It would also be nice to reference the relevant parts of the hardware specs if the "hardware"* is helping to provide these guarantees. * I say "hardware" in quotes because on TDX a big chunk of this behavior is implemented in software in the TDX module. SEV probably does it in microcode (or maybe in the secure processor), but I kinda doubt it's purely silicon.
On 5/30/23 09:35, Peter Zijlstra wrote: > On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote: >> >>>> Add a #HV exception handler that uses IST stack. >>>> >>> >>> Urgh.. that is entirely insufficient. Like it doesn't even begin to >>> start to cover things. >>> >>> The whole existing VC IST stack abuse is already a nightmare and you're >>> duplicating that.. without any explanation for why this would be needed >>> and how it is correct. >>> >>> Please try again. >> >> #HV handler handles both #NMI & #MCE in the guest and nested #HV is never >> raised by the hypervisor. > > I thought all this confidental computing nonsense was about not trusting > the hypervisor, so how come we're now relying on the hypervisor being > sane? That should really say that a nested #HV should never be raised by the hypervisor, but if it is, then the guest should detect that and self-terminate knowing that the hypervisor is possibly being malicious. Thanks, Tom
On Tue, May 30, 2023 at 10:59:01AM -0500, Tom Lendacky wrote: > On 5/30/23 09:35, Peter Zijlstra wrote: > > On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote: > > > > > > > > Add a #HV exception handler that uses IST stack. > > > > > > > > > > > > > Urgh.. that is entirely insufficient. Like it doesn't even begin to > > > > start to cover things. > > > > > > > > The whole existing VC IST stack abuse is already a nightmare and you're > > > > duplicating that.. without any explanation for why this would be needed > > > > and how it is correct. > > > > > > > > Please try again. > > > > > > #HV handler handles both #NMI & #MCE in the guest and nested #HV is never > > > raised by the hypervisor. > > > > I thought all this confidental computing nonsense was about not trusting > > the hypervisor, so how come we're now relying on the hypervisor being > > sane? > > That should really say that a nested #HV should never be raised by the > hypervisor, but if it is, then the guest should detect that and > self-terminate knowing that the hypervisor is possibly being malicious. I've yet to see code that can do that reliably.
On 5/30/23 11:52, Peter Zijlstra wrote: >> That should really say that a nested #HV should never be raised by the >> hypervisor, but if it is, then the guest should detect that and >> self-terminate knowing that the hypervisor is possibly being malicious. > I've yet to see code that can do that reliably. By "#HV should never be raised by the hypervisor", I think Tom means: #HV can and will be raised by malicious hypervisors and the guest must be able to unambiguously handle it in a way that will not result in the guest getting rooted. Right? ;)
On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote: > > That should really say that a nested #HV should never be raised by the > > hypervisor, but if it is, then the guest should detect that and > > self-terminate knowing that the hypervisor is possibly being malicious. > > I've yet to see code that can do that reliably. Tom; could you please investigate if this can be enforced in ucode? Ideally #HV would have an internal latch such that a recursive #HV will terminate the guest (much like double #MC and tripple-fault). But unlike the #MC trainwreck, can we please not leave a glaring hole in this latch and use a spare bit in the IRET frame please? So have #HV delivery: - check internal latch; if set, terminate machine - set latch - write IRET frame with magic bit set have IRET: - check magic bit and reset #HV latch
>> That should really say that a nested #HV should never be raised by the >> hypervisor, but if it is, then the guest should detect that and >> self-terminate knowing that the hypervisor is possibly being malicious. > > I've yet to see code that can do that reliably. - Currently, we are detecting the direct nested #HV with below check and guest self terminate. <snip> if (get_stack_info_noinstr(stack, current, &info) && (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) || info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2))) panic("Nested #HV exception, HV IST corrupted, stack type = %d\n", info.type); </snip> - Thinking about below solution to detect the nested #HV reliably: -- Make reliable IST stack switching for #VC -> #HV -> #VC case (similar to done in __sev_es_ist_enter/__sev_es_ist_exit for NMI IST stack). -- In addition to this, we can make nested #HV detection (with another exception type) more reliable with refcounting (percpu?). Need your inputs before I implement this solution. Or any other idea in software you have in mind? Thanks, Pankaj
On Tue, Jun 06, 2023 at 08:00:32AM +0200, Gupta, Pankaj wrote: > > > > That should really say that a nested #HV should never be raised by the > > > hypervisor, but if it is, then the guest should detect that and > > > self-terminate knowing that the hypervisor is possibly being malicious. > > > > I've yet to see code that can do that reliably. > > - Currently, we are detecting the direct nested #HV with below check and > guest self terminate. > > <snip> > if (get_stack_info_noinstr(stack, current, &info) && > (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) || > info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2))) > panic("Nested #HV exception, HV IST corrupted, stack > type = %d\n", info.type); > </snip> > > - Thinking about below solution to detect the nested > #HV reliably: > > -- Make reliable IST stack switching for #VC -> #HV -> #VC case > (similar to done in __sev_es_ist_enter/__sev_es_ist_exit for NMI > IST stack). I'm not convinced any of that is actually correct; there is a *huge* window between NMI hitting and calling __sev_es_ist_enter(), idem on the exit side. > -- In addition to this, we can make nested #HV detection (with another > exception type) more reliable with refcounting (percpu?). There is also #DB and the MOVSS shadow. And no, I don't think any of that is what you'd call 'robust'. This is what I call a trainwreck :/ And I'm more than willing to say no until the hardware is more sane. Supervisor Shadow Stack support is in the same boat, that's on hold until FRED makes things workable.
On 5/31/23 04:14, Peter Zijlstra wrote: > On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote: > >>> That should really say that a nested #HV should never be raised by the >>> hypervisor, but if it is, then the guest should detect that and >>> self-terminate knowing that the hypervisor is possibly being malicious. >> >> I've yet to see code that can do that reliably. > > Tom; could you please investigate if this can be enforced in ucode? > > Ideally #HV would have an internal latch such that a recursive #HV will > terminate the guest (much like double #MC and tripple-fault). > > But unlike the #MC trainwreck, can we please not leave a glaring hole in > this latch and use a spare bit in the IRET frame please? > > So have #HV delivery: > - check internal latch; if set, terminate machine > - set latch > - write IRET frame with magic bit set > > have IRET: > - check magic bit and reset #HV latch Hi Peter, I talked with the hardware team about this and, unfortunately, it is not practical to implement. The main concerns are that there are already two generations of hardware out there with the current support and, given limited patch space, in addition to the ucode support to track and perform the latch support, additional ucode support would be required to save/restore the latch information when handling a VMEXIT during #HV processing. Thanks, Tom >
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index eccc3431e515..653b1f10699b 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -496,7 +496,7 @@ SYM_CODE_END(\asmsym) #ifdef CONFIG_AMD_MEM_ENCRYPT /** - * idtentry_vc - Macro to generate entry stub for #VC + * idtentry_sev - Macro to generate entry stub for #VC * @vector: Vector number * @asmsym: ASM symbol for the entry point * @cfunc: C function to be called @@ -515,14 +515,18 @@ SYM_CODE_END(\asmsym) * * The macro is only used for one vector, but it is planned to be extended in * the future for the #HV exception. - */ -.macro idtentry_vc vector asmsym cfunc +*/ +.macro idtentry_sev vector asmsym cfunc has_error_code:req SYM_CODE_START(\asmsym) UNWIND_HINT_IRET_REGS ENDBR ASM_CLAC cld + .if \vector == X86_TRAP_HV + pushq $-1 /* ORIG_RAX: no syscall */ + .endif + /* * If the entry is from userspace, switch stacks and treat it as * a normal entry. @@ -545,7 +549,12 @@ SYM_CODE_START(\asmsym) * stack. */ movq %rsp, %rdi /* pt_regs pointer */ - call vc_switch_off_ist + .if \vector == X86_TRAP_VC + call vc_switch_off_ist + .else + call hv_switch_off_ist + .endif + movq %rax, %rsp /* Switch to new stack */ ENCODE_FRAME_POINTER @@ -568,10 +577,7 @@ SYM_CODE_START(\asmsym) /* Switch to the regular task stack */ .Lfrom_usermode_switch_stack_\@: - idtentry_body user_\cfunc, has_error_code=1 - -_ASM_NOKPROBE(\asmsym) -SYM_CODE_END(\asmsym) + idtentry_body user_\cfunc, \has_error_code .endm #endif diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index 462fc34f1317..2186ed601b4a 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -30,6 +30,10 @@ char VC_stack[optional_stack_size]; \ char VC2_stack_guard[guardsize]; \ char VC2_stack[optional_stack_size]; \ + char HV_stack_guard[guardsize]; \ + char HV_stack[optional_stack_size]; \ + char HV2_stack_guard[guardsize]; \ + char HV2_stack[optional_stack_size]; \ char IST_top_guard[guardsize]; \ /* The exception stacks' physical storage. No guard pages required */ @@ -52,6 +56,8 @@ enum exception_stack_ordering { ESTACK_MCE, ESTACK_VC, ESTACK_VC2, + ESTACK_HV, + ESTACK_HV2, N_EXCEPTION_STACKS }; diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index b241af4ce9b4..b0f3501b2767 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -317,6 +317,19 @@ static __always_inline void __##func(struct pt_regs *regs) __visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code); \ __visible noinstr void user_##func(struct pt_regs *regs, unsigned long error_code) + +/** + * DECLARE_IDTENTRY_HV - Declare functions for the HV entry point + * @vector: Vector number (ignored for C) + * @func: Function name of the entry point + * + * Maps to DECLARE_IDTENTRY_RAW, but declares also the user C handler. + */ +#define DECLARE_IDTENTRY_HV(vector, func) \ + DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func); \ + __visible noinstr void kernel_##func(struct pt_regs *regs); \ + __visible noinstr void user_##func(struct pt_regs *regs) + /** * DEFINE_IDTENTRY_IST - Emit code for IST entry points * @func: Function name of the entry point @@ -376,6 +389,26 @@ static __always_inline void __##func(struct pt_regs *regs) #define DEFINE_IDTENTRY_VC_USER(func) \ DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func) +/** + * DEFINE_IDTENTRY_HV_KERNEL - Emit code for HV injection handler + * when raised from kernel mode + * @func: Function name of the entry point + * + * Maps to DEFINE_IDTENTRY_RAW + */ +#define DEFINE_IDTENTRY_HV_KERNEL(func) \ + DEFINE_IDTENTRY_RAW(kernel_##func) + +/** + * DEFINE_IDTENTRY_HV_USER - Emit code for HV injection handler + * when raised from user mode + * @func: Function name of the entry point + * + * Maps to DEFINE_IDTENTRY_RAW + */ +#define DEFINE_IDTENTRY_HV_USER(func) \ + DEFINE_IDTENTRY_RAW(user_##func) + #else /* CONFIG_X86_64 */ /** @@ -463,8 +496,10 @@ __visible noinstr void func(struct pt_regs *regs, \ DECLARE_IDTENTRY(vector, func) # define DECLARE_IDTENTRY_VC(vector, func) \ - idtentry_vc vector asm_##func func + idtentry_sev vector asm_##func func has_error_code=1 +# define DECLARE_IDTENTRY_HV(vector, func) \ + idtentry_sev vector asm_##func func has_error_code=0 #else # define DECLARE_IDTENTRY_MCE(vector, func) \ DECLARE_IDTENTRY(vector, func) @@ -618,9 +653,10 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif -/* #VC */ +/* #VC & #HV */ #ifdef CONFIG_AMD_MEM_ENCRYPT DECLARE_IDTENTRY_VC(X86_TRAP_VC, exc_vmm_communication); +DECLARE_IDTENTRY_HV(X86_TRAP_HV, exc_hv_injection); #endif #ifdef CONFIG_XEN_PV diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index e9e2c3ba5923..0bd7dab676c5 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -29,6 +29,7 @@ #define IST_INDEX_DB 2 #define IST_INDEX_MCE 3 #define IST_INDEX_VC 4 +#define IST_INDEX_HV 5 /* * Set __PAGE_OFFSET to the most negative possible address + diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h index f5d2325aa0b7..c6583631cecb 100644 --- a/arch/x86/include/asm/trapnr.h +++ b/arch/x86/include/asm/trapnr.h @@ -26,6 +26,7 @@ #define X86_TRAP_XF 19 /* SIMD Floating-Point Exception */ #define X86_TRAP_VE 20 /* Virtualization Exception */ #define X86_TRAP_CP 21 /* Control Protection Exception */ +#define X86_TRAP_HV 28 /* HV injected exception in SNP restricted mode */ #define X86_TRAP_VC 29 /* VMM Communication Exception */ #define X86_TRAP_IRET 32 /* IRET Exception */ diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 47ecfff2c83d..6795d3e517d6 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -16,6 +16,7 @@ asmlinkage __visible notrace struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs); void __init trap_init(void); asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs); +asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *eregs); #endif extern bool ibt_selftest(void); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8cd4126d8253..5bc44bcf6e48 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2172,6 +2172,7 @@ static inline void tss_setup_ist(struct tss_struct *tss) tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE); /* Only mapped when SEV-ES is active */ tss->x86_tss.ist[IST_INDEX_VC] = __this_cpu_ist_top_va(VC); + tss->x86_tss.ist[IST_INDEX_HV] = __this_cpu_ist_top_va(HV); } #else /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index f05339fee778..6d8f8864810c 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -26,11 +26,14 @@ static const char * const exception_stack_names[] = { [ ESTACK_MCE ] = "#MC", [ ESTACK_VC ] = "#VC", [ ESTACK_VC2 ] = "#VC2", + [ ESTACK_HV ] = "#HV", + [ ESTACK_HV2 ] = "#HV2", + }; const char *stack_type_name(enum stack_type type) { - BUILD_BUG_ON(N_EXCEPTION_STACKS != 6); + BUILD_BUG_ON(N_EXCEPTION_STACKS != 8); if (type == STACK_TYPE_TASK) return "TASK"; @@ -89,6 +92,8 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = { EPAGERANGE(MCE), EPAGERANGE(VC), EPAGERANGE(VC2), + EPAGERANGE(HV), + EPAGERANGE(HV2), }; static __always_inline bool in_exception_stack(unsigned long *stack, struct stack_info *info) @@ -98,7 +103,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac struct pt_regs *regs; unsigned int k; - BUILD_BUG_ON(N_EXCEPTION_STACKS != 6); + BUILD_BUG_ON(N_EXCEPTION_STACKS != 8); begin = (unsigned long)__this_cpu_read(cea_exception_stacks); /* diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..48c0a7e1dbcb 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -113,6 +113,7 @@ static const __initconst struct idt_data def_idts[] = { #ifdef CONFIG_AMD_MEM_ENCRYPT ISTG(X86_TRAP_VC, asm_exc_vmm_communication, IST_INDEX_VC), + ISTG(X86_TRAP_HV, asm_exc_hv_injection, IST_INDEX_HV), #endif SYSG(X86_TRAP_OF, asm_exc_overflow), diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index b031244d6d2d..e25445de0957 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -2006,6 +2006,59 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication) irqentry_exit_to_user_mode(regs); } +static bool hv_raw_handle_exception(struct pt_regs *regs) +{ + return false; +} + +static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs) +{ + unsigned long sp = (unsigned long)regs; + + return (sp >= __this_cpu_ist_bottom_va(HV2) && sp < __this_cpu_ist_top_va(HV2)); +} + +DEFINE_IDTENTRY_HV_USER(exc_hv_injection) +{ + irqentry_enter_from_user_mode(regs); + instrumentation_begin(); + + if (!hv_raw_handle_exception(regs)) { + /* + * Do not kill the machine if user-space triggered the + * exception. Send SIGBUS instead and let user-space deal + * with it. + */ + force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0); + } + + instrumentation_end(); + irqentry_exit_to_user_mode(regs); +} + +DEFINE_IDTENTRY_HV_KERNEL(exc_hv_injection) +{ + irqentry_state_t irq_state; + + irq_state = irqentry_enter(regs); + instrumentation_begin(); + + if (!hv_raw_handle_exception(regs)) { + pr_emerg("PANIC: Unhandled #HV exception in kernel space\n"); + + /* Show some debug info */ + show_regs(regs); + + /* Ask hypervisor to sev_es_terminate */ + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); + + panic("Returned from Terminate-Request to Hypervisor\n"); + } + + instrumentation_end(); + irqentry_exit(regs, irq_state); +} + bool __init handle_vc_boot_ghcb(struct pt_regs *regs) { unsigned long exit_code = regs->orig_ax; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d317dc3d06a3..5dca05d0fa38 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -905,6 +905,64 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r return regs_ret; } + +asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *regs) +{ + unsigned long sp, *stack; + struct stack_info info; + struct pt_regs *regs_ret; + + /* + * In the SYSCALL entry path the RSP value comes from user-space - don't + * trust it and switch to the current kernel stack + */ + if (ip_within_syscall_gap(regs)) { + sp = this_cpu_read(pcpu_hot.top_of_stack); + goto sync; + } + + /* + * From here on the RSP value is trusted. Now check whether entry + * happened from a safe stack. Not safe are the entry or unknown stacks, + * use the fall-back stack instead in this case. + */ + sp = regs->sp; + stack = (unsigned long *)sp; + + /* + * We support nested #HV exceptions once the IST stack is + * switched out. The HV can always inject an #HV, but as per + * GHCB specs, the HV will not inject another #HV, if + * PendingEvent.NoFurtherSignal is set and we only clear this + * after switching out the IST stack and handling the current + * #HV. But there is still a window before the IST stack is + * switched out, where a malicious HV can inject nested #HV. + * The code below checks the interrupted stack to check if + * it is the IST stack, and if so panic as this is + * not supported and this nested #HV would have corrupted + * the iret frame of the previous #HV on the IST stack. + */ + if (get_stack_info_noinstr(stack, current, &info) && + (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) || + info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2))) + panic("Nested #HV exception, HV IST corrupted, stack type = %d\n", info.type); + + if (!get_stack_info_noinstr(stack, current, &info) || info.type == STACK_TYPE_ENTRY || + info.type > STACK_TYPE_EXCEPTION_LAST) + sp = __this_cpu_ist_top_va(HV2); +sync: + /* + * Found a safe stack - switch to it as if the entry didn't happen via + * IST stack. The code below only copies pt_regs, the real switch happens + * in assembly code. + */ + sp = ALIGN_DOWN(sp, 8) - sizeof(*regs_ret); + + regs_ret = (struct pt_regs *)sp; + *regs_ret = *regs; + + return regs_ret; +} #endif asmlinkage __visible noinstr struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs) diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index e91500a80963..97554fa0ff30 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -160,6 +160,8 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu) if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) { cea_map_stack(VC); cea_map_stack(VC2); + cea_map_stack(HV); + cea_map_stack(HV2); } } }