[RFC,V6,02/14] x86/sev: Add Check of #HV event in path

Message ID	20230515165917.1306922-3-ltykernel@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@vger.kernel.org> From: Tianyu Lan <ltykernel@gmail.com> To: luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, jgross@suse.com, tiala@microsoft.com, kirill@shutemov.name, jiangshan.ljs@antgroup.com, peterz@infradead.org, ashish.kalra@amd.com, srutherford@google.com, akpm@linux-foundation.org, anshuman.khandual@arm.com, pawan.kumar.gupta@linux.intel.com, adrian.hunter@intel.com, daniel.sneddon@linux.intel.com, alexander.shishkin@linux.intel.com, sandipan.das@amd.com, ray.huang@amd.com, brijesh.singh@amd.com, michael.roth@amd.com, thomas.lendacky@amd.com, venu.busireddy@oracle.com, sterritt@google.com, tony.luck@intel.com, samitolvanen@google.com, fenghua.yu@intel.com Cc: pangupta@amd.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-arch@vger.kernel.org Subject: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path Date: Mon, 15 May 2023 12:59:04 -0400 Message-Id: <20230515165917.1306922-3-ltykernel@gmail.com> In-Reply-To: <20230515165917.1306922-1-ltykernel@gmail.com> References: <20230515165917.1306922-1-ltykernel@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv \| expand [RFC,V6,00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv [RFC,V6,01/14] x86/sev: Add a #HV exception handler [RFC,V6,02/14] x86/sev: Add Check of #HV event in path [RFC,V6,03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv [RFC,V6,04/14] x86/sev: optimize system vector processing invoked from #HV exception [RFC,V6,05/14] x86/hyperv: Add sev-snp enlightened guest static key [RFC,V6,06/14] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest [RFC,V6,07/14] x86/hyperv: Set Virtual Trust Level in VMBus init message [RFC,V6,08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest [RFC,V6,09/14] clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest [RFC,V6,10/14] hv: vmbus: Mask VMBus pages unencrypted for sev-snp enlightened guest [RFC,V6,11/14] drivers: hv: Decrypt percpu hvcall input arg page in sev-snp enlightened guest [RFC,V6,12/14] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest [RFC,V6,13/14] x86/hyperv: Add smp support for sev-snp guest [RFC,V6,14/14] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES

Tianyu Lan May 15, 2023, 4:59 p.m. UTC

From: Tianyu Lan <tiala@microsoft.com>

Add check_hv_pending() and check_hv_pending_after_irq() to
check queued #HV event when irq is disabled.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/entry/entry_64.S       | 18 ++++++++++++++++
 arch/x86/include/asm/irqflags.h | 14 +++++++++++-
 arch/x86/kernel/sev.c           | 38 +++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

Peter Zijlstra May 16, 2023, 9:32 a.m. UTC | #1

On Mon, May 15, 2023 at 12:59:04PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <tiala@microsoft.com>
> 
> Add check_hv_pending() and check_hv_pending_after_irq() to
> check queued #HV event when irq is disabled.
> 
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
>  arch/x86/entry/entry_64.S       | 18 ++++++++++++++++
>  arch/x86/include/asm/irqflags.h | 14 +++++++++++-
>  arch/x86/kernel/sev.c           | 38 +++++++++++++++++++++++++++++++++
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 653b1f10699b..147b850babf6 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
>   * R15 - old SPEC_CTRL
>   */
>  SYM_CODE_START_LOCAL(paranoid_exit)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/*
> +	 * If a #HV was delivered during execution and interrupts were
> +	 * disabled, then check if it can be handled before the iret
> +	 * (which may re-enable interrupts).
> +	 */
> +	mov     %rsp, %rdi
> +	call    check_hv_pending
> +#endif
>  	UNWIND_HINT_REGS
>  
>  	/*
> @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
>  SYM_CODE_END(error_entry)
>  
>  SYM_CODE_START_LOCAL(error_return)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/*
> +	 * If a #HV was delivered during execution and interrupts were
> +	 * disabled, then check if it can be handled before the iret
> +	 * (which may re-enable interrupts).
> +	 */
> +	mov     %rsp, %rdi
> +	call    check_hv_pending
> +#endif
>  	UNWIND_HINT_REGS
>  	DEBUG_ENTRY_ASSERT_IRQS_OFF
>  	testb	$3, CS(%rsp)

Oh hell no... do now you're adding unconditional calls to every single
interrupt and nmi exit path, with the grand total of 0 justification.

Tianyu Lan May 17, 2023, 9:55 a.m. UTC | #2

On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
>>    * R15 - old SPEC_CTRL
>>    */
>>   SYM_CODE_START_LOCAL(paranoid_exit)
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +	/*
>> +	 * If a #HV was delivered during execution and interrupts were
>> +	 * disabled, then check if it can be handled before the iret
>> +	 * (which may re-enable interrupts).
>> +	 */
>> +	mov     %rsp, %rdi
>> +	call    check_hv_pending
>> +#endif
>>   	UNWIND_HINT_REGS
>>   
>>   	/*
>> @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
>>   SYM_CODE_END(error_entry)
>>   
>>   SYM_CODE_START_LOCAL(error_return)
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +	/*
>> +	 * If a #HV was delivered during execution and interrupts were
>> +	 * disabled, then check if it can be handled before the iret
>> +	 * (which may re-enable interrupts).
>> +	 */
>> +	mov     %rsp, %rdi
>> +	call    check_hv_pending
>> +#endif
>>   	UNWIND_HINT_REGS
>>   	DEBUG_ENTRY_ASSERT_IRQS_OFF
>>   	testb	$3, CS(%rsp)
> Oh hell no... do now you're adding unconditional calls to every single
> interrupt and nmi exit path, with the grand total of 0 justification.
> 

Sorry to Add check inside of check_hv_pending(). Will move the check 
before calling check_hv_pending() in the next version. Thanks.

Peter Zijlstra May 17, 2023, 1:09 p.m. UTC | #3

On Wed, May 17, 2023 at 05:55:45PM +0800, Tianyu Lan wrote:
> On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
> > > --- a/arch/x86/entry/entry_64.S
> > > +++ b/arch/x86/entry/entry_64.S
> > > @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
> > >    * R15 - old SPEC_CTRL
> > >    */
> > >   SYM_CODE_START_LOCAL(paranoid_exit)
> > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > +	/*
> > > +	 * If a #HV was delivered during execution and interrupts were
> > > +	 * disabled, then check if it can be handled before the iret
> > > +	 * (which may re-enable interrupts).
> > > +	 */
> > > +	mov     %rsp, %rdi
> > > +	call    check_hv_pending
> > > +#endif
> > >   	UNWIND_HINT_REGS
> > >   	/*
> > > @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
> > >   SYM_CODE_END(error_entry)
> > >   SYM_CODE_START_LOCAL(error_return)
> > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > +	/*
> > > +	 * If a #HV was delivered during execution and interrupts were
> > > +	 * disabled, then check if it can be handled before the iret
> > > +	 * (which may re-enable interrupts).
> > > +	 */
> > > +	mov     %rsp, %rdi
> > > +	call    check_hv_pending
> > > +#endif
> > >   	UNWIND_HINT_REGS
> > >   	DEBUG_ENTRY_ASSERT_IRQS_OFF
> > >   	testb	$3, CS(%rsp)
> > Oh hell no... do now you're adding unconditional calls to every single
> > interrupt and nmi exit path, with the grand total of 0 justification.
> > 
> 
> Sorry to Add check inside of check_hv_pending(). Will move the check before
> calling check_hv_pending() in the next version. Thanks.

You will also explain, in the Changelog, in excruciating detail, *WHY*
any of this is required.

Any additional code in these paths that are only required for some
random hypervisor had better proof that they are absolutely required and
no alternative solution exists and have no performance impact on normal
users.

If this is due to Hyper-V design idiocies over something fundamentally
required by the hardware design you'll get a NAK.

Michael Kelley (LINUX) May 31, 2023, 2:50 p.m. UTC | #4

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 17, 2023 6:10 AM
> 
> On Wed, May 17, 2023 at 05:55:45PM +0800, Tianyu Lan wrote:
> > On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
> > > > --- a/arch/x86/entry/entry_64.S
> > > > +++ b/arch/x86/entry/entry_64.S
> > > > @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
> > > >    * R15 - old SPEC_CTRL
> > > >    */
> > > >   SYM_CODE_START_LOCAL(paranoid_exit)
> > > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > > +	/*
> > > > +	 * If a #HV was delivered during execution and interrupts were
> > > > +	 * disabled, then check if it can be handled before the iret
> > > > +	 * (which may re-enable interrupts).
> > > > +	 */
> > > > +	mov     %rsp, %rdi
> > > > +	call    check_hv_pending
> > > > +#endif
> > > >   	UNWIND_HINT_REGS
> > > >   	/*
> > > > @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
> > > >   SYM_CODE_END(error_entry)
> > > >   SYM_CODE_START_LOCAL(error_return)
> > > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > > +	/*
> > > > +	 * If a #HV was delivered during execution and interrupts were
> > > > +	 * disabled, then check if it can be handled before the iret
> > > > +	 * (which may re-enable interrupts).
> > > > +	 */
> > > > +	mov     %rsp, %rdi
> > > > +	call    check_hv_pending
> > > > +#endif
> > > >   	UNWIND_HINT_REGS
> > > >   	DEBUG_ENTRY_ASSERT_IRQS_OFF
> > > >   	testb	$3, CS(%rsp)
> > > Oh hell no... do now you're adding unconditional calls to every single
> > > interrupt and nmi exit path, with the grand total of 0 justification.
> > >
> >
> > Sorry to Add check inside of check_hv_pending(). Will move the check before
> > calling check_hv_pending() in the next version. Thanks.
> 
> You will also explain, in the Changelog, in excruciating detail, *WHY*
> any of this is required.
> 
> Any additional code in these paths that are only required for some
> random hypervisor had better proof that they are absolutely required and
> no alternative solution exists and have no performance impact on normal
> users.
> 
> If this is due to Hyper-V design idiocies over something fundamentally
> required by the hardware design you'll get a NAK.

I'm jumping in to answer some of the basic questions here.  Yesterday,
there was a discussion about nested #HV exceptions, so maybe some of
this is already understood, but let me recap at a higher level, provide some
references, and suggest the path forward.

This code and some of the other patches in this series are for handling the
#HV exception that is introduced by the Restricted Interrupt Injection
feature of the SEV-SNP architecture.  See Section 15.36.16 of [1], and
Section 5 of [2].   There's also an AMD presentation from LPC last fall [3].

Hyper-V requires that the guest implement Restricted Interrupt Injection
to handle the case of a compromised hypervisor injecting an exception
(and forcing the running of that exception handler), even when it should
be disallowed by guest state. For example, the hypervisor could inject an
interrupt while the guest has interrupts disabled.  In time, presumably other
hypervisors like KVM will at least have an option where they expect SEV-SNP
guests to implement Restricted Interrupt Injection functionality, so it's
not Hyper-V specific.

Naming the new exception as #HV and use of "hv" as the Linux prefix
for related functions and variable names is a bit unfortunate.  It
conflicts with the existing use of the "hv" prefix to denote Hyper-V
specific code in the Linux kernel, and at first glance makes this code
look like it is Hyper-V specific code. Maybe we can choose a different
prefix ("hvex"?) for this #HV exception related code to avoid that
"first glance" confusion.

I've talked with Tianyu offline, and he will do the following:

1) Split this patch set into two patch sets.  The first patch set is Hyper-V
specific code for managing communication pages that must be shared
between the guest and Hyper-V, for starting APs, etc.  The second patch
set will be only the Restricted Interrupt Injection and #HV code.

2) For the Restricted Interrupt Injection code, Tianyu will look at
how to absolutely minimize the impact in the hot code paths,
particularly when SEV-SNP is not active.  Hopefully the impact can
be a couple of instructions at most, or even less with the use of
other existing kernel techniques.  He'll look at the other things you've
commented on and get the code into a better state.  I'll work with
him on writing commit messages and comments that explain what's
going on.

Michael

[1] https://www.amd.com/system/files/TechDocs/24593.pdf 
[2] https://www.amd.com/system/files/TechDocs/56421-guest-hypervisor-communication-block-standardization.pdf
[3] https://lpc.events/event/16/contributions/1321/attachments/965/1886/SNP_Interrupt_Security.pptx

Peter Zijlstra May 31, 2023, 3:48 p.m. UTC | #5

On Wed, May 31, 2023 at 02:50:50PM +0000, Michael Kelley (LINUX) wrote:

> I'm jumping in to answer some of the basic questions here.  Yesterday,
> there was a discussion about nested #HV exceptions, so maybe some of
> this is already understood, but let me recap at a higher level, provide some
> references, and suggest the path forward.

> 2) For the Restricted Interrupt Injection code, Tianyu will look at
> how to absolutely minimize the impact in the hot code paths,
> particularly when SEV-SNP is not active.  Hopefully the impact can
> be a couple of instructions at most, or even less with the use of
> other existing kernel techniques.  He'll look at the other things you've
> commented on and get the code into a better state.  I'll work with
> him on writing commit messages and comments that explain what's
> going on.

So from what I understand of all this SEV-SNP/#HV muck is that it is
near impossible to get right without ucode/hw changes. Hence my request
to Tom to look into that.

The feature as specified in the AMD documentation seems fundamentally
buggered.

Specifically #HV needs to be IST because hypervisor can inject at any
moment, irrespective of IF or anything else -- even #HV itself. This
means also in the syscall gap.

Since it is IST, a nested #HV is instant stack corruption -- #HV can
attempt to play stack games as per the copied #VC crap (which I'm not at
all convinced about being correct itself), but this doesn't actually fix
anything, all you need is a single instruction window to wreck things.

Because as stated, the whole premise is that the hypervisor is out to
get you, you must not leave it room to wiggle. As is, this is security
through prayer, and we don't do that.

In short; I really want a solid proof that what you propose to implement
is correct and not wishful thinking.

Michael Kelley (LINUX) May 31, 2023, 3:58 p.m. UTC | #6

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 31, 2023 8:49 AM
> 
> On Wed, May 31, 2023 at 02:50:50PM +0000, Michael Kelley (LINUX) wrote:
> 
> > I'm jumping in to answer some of the basic questions here.  Yesterday,
> > there was a discussion about nested #HV exceptions, so maybe some of
> > this is already understood, but let me recap at a higher level, provide some
> > references, and suggest the path forward.
> 
> > 2) For the Restricted Interrupt Injection code, Tianyu will look at
> > how to absolutely minimize the impact in the hot code paths,
> > particularly when SEV-SNP is not active.  Hopefully the impact can
> > be a couple of instructions at most, or even less with the use of
> > other existing kernel techniques.  He'll look at the other things you've
> > commented on and get the code into a better state.  I'll work with
> > him on writing commit messages and comments that explain what's
> > going on.
> 
> So from what I understand of all this SEV-SNP/#HV muck is that it is
> near impossible to get right without ucode/hw changes. Hence my request
> to Tom to look into that.
> 
> The feature as specified in the AMD documentation seems fundamentally
> buggered.
> 
> Specifically #HV needs to be IST because hypervisor can inject at any
> moment, irrespective of IF or anything else -- even #HV itself. This
> means also in the syscall gap.
> 
> Since it is IST, a nested #HV is instant stack corruption -- #HV can
> attempt to play stack games as per the copied #VC crap (which I'm not at
> all convinced about being correct itself), but this doesn't actually fix
> anything, all you need is a single instruction window to wreck things.
> 
> Because as stated, the whole premise is that the hypervisor is out to
> get you, you must not leave it room to wiggle. As is, this is security
> through prayer, and we don't do that.
> 
> In short; I really want a solid proof that what you propose to implement
> is correct and not wishful thinking.

Fair enough.  We will be sync'ing with the AMD folks to make sure that
one way or another this really will work.

Michael

[RFC,V6,02/14] x86/sev: Add Check of #HV event in path

Commit Message

Comments

Patch