diff mbox series

[11/16] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO

Message ID 20240228024147.41573-12-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Page fault and MMIO cleanups | expand

Commit Message

Sean Christopherson Feb. 28, 2024, 2:41 a.m. UTC
Explicitly detect and disallow private accesses to emulated MMIO in
kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
to perform the check.  This will allow the page fault path to go straight
to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Huang, Kai March 6, 2024, 10:35 p.m. UTC | #1
On 28/02/2024 3:41 pm, Sean Christopherson wrote:
> Explicitly detect and disallow private accesses to emulated MMIO in
> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
> to perform the check.  This will allow the page fault path to go straight
> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5c8caab64ba2..ebdb3fcce3dc 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>   {
>   	gva_t gva = fault->is_tdp ? 0 : fault->addr;
>   
> +	if (fault->is_private) {
> +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> +		return -EFAULT;
> +	}
> +

As mentioned in another reply in this series, unless I am mistaken, for 
TDX guest the _first_ MMIO access would still cause EPT violation with 
MMIO GFN being private.

Returning to userspace cannot really help here because the MMIO mapping 
is inside the guest.

I am hoping I am missing something here?
Sean Christopherson March 6, 2024, 10:43 p.m. UTC | #2
On Thu, Mar 07, 2024, Kai Huang wrote:
> 
> 
> On 28/02/2024 3:41 pm, Sean Christopherson wrote:
> > Explicitly detect and disallow private accesses to emulated MMIO in
> > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
> > to perform the check.  This will allow the page fault path to go straight
> > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 5c8caab64ba2..ebdb3fcce3dc 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
> >   {
> >   	gva_t gva = fault->is_tdp ? 0 : fault->addr;
> > +	if (fault->is_private) {
> > +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > +		return -EFAULT;
> > +	}
> > +
> 
> As mentioned in another reply in this series, unless I am mistaken, for TDX
> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN
> being private.
> 
> Returning to userspace cannot really help here because the MMIO mapping is
> inside the guest.

That's a guest bug.  The guest *knows* it's a TDX VM, it *has* to know.  Accessing
emulated MMIO and thus taking a #VE before enabling paging is nonsensical.  Either
enable paging and setup MMIO regions as shared, or go straight to TDCALL.

> 
> I am hoping I am missing something here?
Huang, Kai March 6, 2024, 10:49 p.m. UTC | #3
On 7/03/2024 11:43 am, Sean Christopherson wrote:
> On Thu, Mar 07, 2024, Kai Huang wrote:
>>
>>
>> On 28/02/2024 3:41 pm, Sean Christopherson wrote:
>>> Explicitly detect and disallow private accesses to emulated MMIO in
>>> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
>>> to perform the check.  This will allow the page fault path to go straight
>>> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>>    arch/x86/kvm/mmu/mmu.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index 5c8caab64ba2..ebdb3fcce3dc 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>>>    {
>>>    	gva_t gva = fault->is_tdp ? 0 : fault->addr;
>>> +	if (fault->is_private) {
>>> +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>>> +		return -EFAULT;
>>> +	}
>>> +
>>
>> As mentioned in another reply in this series, unless I am mistaken, for TDX
>> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN
>> being private.
>>
>> Returning to userspace cannot really help here because the MMIO mapping is
>> inside the guest.
> 
> That's a guest bug.  The guest *knows* it's a TDX VM, it *has* to know.  Accessing
> emulated MMIO and thus taking a #VE before enabling paging is nonsensical.  Either
> enable paging and setup MMIO regions as shared, or go straight to TDCALL.

+Kirill,

I kinda forgot the detail, but what I am afraid is there might be bunch 
of existing TDX guests (since TDX guest code is upstream-ed) using 
unmodified drivers, which doesn't map MMIO regions as shared I suppose.

Kirill,

Could you clarify whether TDX guest code maps MMIO regions as shared 
since beginning?
Sean Christopherson March 6, 2024, 11:01 p.m. UTC | #4
On Thu, Mar 07, 2024, Kai Huang wrote:
> 
> 
> On 7/03/2024 11:43 am, Sean Christopherson wrote:
> > On Thu, Mar 07, 2024, Kai Huang wrote:
> > > 
> > > 
> > > On 28/02/2024 3:41 pm, Sean Christopherson wrote:
> > > > Explicitly detect and disallow private accesses to emulated MMIO in
> > > > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
> > > > to perform the check.  This will allow the page fault path to go straight
> > > > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
> > > > 
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >    arch/x86/kvm/mmu/mmu.c | 5 +++++
> > > >    1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 5c8caab64ba2..ebdb3fcce3dc 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
> > > >    {
> > > >    	gva_t gva = fault->is_tdp ? 0 : fault->addr;
> > > > +	if (fault->is_private) {
> > > > +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > > +		return -EFAULT;
> > > > +	}
> > > > +
> > > 
> > > As mentioned in another reply in this series, unless I am mistaken, for TDX
> > > guest the _first_ MMIO access would still cause EPT violation with MMIO GFN
> > > being private.
> > > 
> > > Returning to userspace cannot really help here because the MMIO mapping is
> > > inside the guest.
> > 
> > That's a guest bug.  The guest *knows* it's a TDX VM, it *has* to know.  Accessing
> > emulated MMIO and thus taking a #VE before enabling paging is nonsensical.  Either
> > enable paging and setup MMIO regions as shared, or go straight to TDCALL.
> 
> +Kirill,
> 
> I kinda forgot the detail, but what I am afraid is there might be bunch of
> existing TDX guests (since TDX guest code is upstream-ed) using unmodified
> drivers, which doesn't map MMIO regions as shared I suppose.
> 
> Kirill,
> 
> Could you clarify whether TDX guest code maps MMIO regions as shared since
> beginning?

Y'all get the same answer we gave the SNP folks: KVM does not yet support TDX,
so as far is KVM is concerned, there is no existing functionality to support.

s/firmware/Linux if this is a Linux kernel problem.

  On Thu, Feb 08, 2024, Paolo Bonzini wrote:
  > On Thu, Feb 8, 2024 at 6:27 PM Sean Christopherson <seanjc@google.com> wrote:
  > > No.  KVM does not yet support SNP, so as far as KVM's ABI goes, there are no
  > > existing guests.  Yes, I realize that I am burying my head in the sand to some
  > > extent, but it is simply not sustainable for KVM to keep trying to pick up the
  > > pieces of poorly defined hardware specs and broken guest firmware.
  > 
  > 101% agreed. There are cases in which we have to and should bend
  > together backwards for guests (e.g. older Linux kernels), but not for
  > code that---according to current practices---is chosen by the host
  > admin.
  > 
  > (I am of the opinion that "bring your own firmware" is the only sane
  > way to handle attestation/measurement, but that's not how things are
  > done currently).
Huang, Kai March 6, 2024, 11:20 p.m. UTC | #5
On 7/03/2024 12:01 pm, Sean Christopherson wrote:
> On Thu, Mar 07, 2024, Kai Huang wrote:
>>
>>
>> On 7/03/2024 11:43 am, Sean Christopherson wrote:
>>> On Thu, Mar 07, 2024, Kai Huang wrote:
>>>>
>>>>
>>>> On 28/02/2024 3:41 pm, Sean Christopherson wrote:
>>>>> Explicitly detect and disallow private accesses to emulated MMIO in
>>>>> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
>>>>> to perform the check.  This will allow the page fault path to go straight
>>>>> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
>>>>>
>>>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>>>> ---
>>>>>     arch/x86/kvm/mmu/mmu.c | 5 +++++
>>>>>     1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>>>> index 5c8caab64ba2..ebdb3fcce3dc 100644
>>>>> --- a/arch/x86/kvm/mmu/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>>>> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>>>>>     {
>>>>>     	gva_t gva = fault->is_tdp ? 0 : fault->addr;
>>>>> +	if (fault->is_private) {
>>>>> +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>>>>> +		return -EFAULT;
>>>>> +	}
>>>>> +
>>>>
>>>> As mentioned in another reply in this series, unless I am mistaken, for TDX
>>>> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN
>>>> being private.
>>>>
>>>> Returning to userspace cannot really help here because the MMIO mapping is
>>>> inside the guest.
>>>
>>> That's a guest bug.  The guest *knows* it's a TDX VM, it *has* to know.  Accessing
>>> emulated MMIO and thus taking a #VE before enabling paging is nonsensical.  Either
>>> enable paging and setup MMIO regions as shared, or go straight to TDCALL.
>>
>> +Kirill,
>>
>> I kinda forgot the detail, but what I am afraid is there might be bunch of
>> existing TDX guests (since TDX guest code is upstream-ed) using unmodified
>> drivers, which doesn't map MMIO regions as shared I suppose.
>>
>> Kirill,
>>
>> Could you clarify whether TDX guest code maps MMIO regions as shared since
>> beginning?
> 
> Y'all get the same answer we gave the SNP folks: KVM does not yet support TDX,
> so as far is KVM is concerned, there is no existing functionality to support.
> 
> s/firmware/Linux if this is a Linux kernel problem.
> 
>    On Thu, Feb 08, 2024, Paolo Bonzini wrote:
>    > On Thu, Feb 8, 2024 at 6:27 PM Sean Christopherson <seanjc@google.com> wrote:
>    > > No.  KVM does not yet support SNP, so as far as KVM's ABI goes, there are no
>    > > existing guests.  Yes, I realize that I am burying my head in the sand to some
>    > > extent, but it is simply not sustainable for KVM to keep trying to pick up the
>    > > pieces of poorly defined hardware specs and broken guest firmware.
>    >
>    > 101% agreed. There are cases in which we have to and should bend
>    > together backwards for guests (e.g. older Linux kernels), but not for
>    > code that---according to current practices---is chosen by the host
>    > admin.
>    >
>    > (I am of the opinion that "bring your own firmware" is the only sane
>    > way to handle attestation/measurement, but that's not how things are
>    > done currently).

Fair enough, and good to know. :-)

(Still better to hear from Kirill, though.)
kirill.shutemov@linux.intel.com March 7, 2024, 5:10 p.m. UTC | #6
On Thu, Mar 07, 2024 at 11:49:11AM +1300, Huang, Kai wrote:
> 
> 
> On 7/03/2024 11:43 am, Sean Christopherson wrote:
> > On Thu, Mar 07, 2024, Kai Huang wrote:
> > > 
> > > 
> > > On 28/02/2024 3:41 pm, Sean Christopherson wrote:
> > > > Explicitly detect and disallow private accesses to emulated MMIO in
> > > > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
> > > > to perform the check.  This will allow the page fault path to go straight
> > > > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
> > > > 
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >    arch/x86/kvm/mmu/mmu.c | 5 +++++
> > > >    1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 5c8caab64ba2..ebdb3fcce3dc 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
> > > >    {
> > > >    	gva_t gva = fault->is_tdp ? 0 : fault->addr;
> > > > +	if (fault->is_private) {
> > > > +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > > +		return -EFAULT;
> > > > +	}
> > > > +
> > > 
> > > As mentioned in another reply in this series, unless I am mistaken, for TDX
> > > guest the _first_ MMIO access would still cause EPT violation with MMIO GFN
> > > being private.
> > > 
> > > Returning to userspace cannot really help here because the MMIO mapping is
> > > inside the guest.
> > 
> > That's a guest bug.  The guest *knows* it's a TDX VM, it *has* to know.  Accessing
> > emulated MMIO and thus taking a #VE before enabling paging is nonsensical.  Either
> > enable paging and setup MMIO regions as shared, or go straight to TDCALL.
> 
> +Kirill,
> 
> I kinda forgot the detail, but what I am afraid is there might be bunch of
> existing TDX guests (since TDX guest code is upstream-ed) using unmodified
> drivers, which doesn't map MMIO regions as shared I suppose.

Unmodified drivers gets their MMIO regions mapped with ioremap() that sets
shared bit, unless asked explicitly to make it private (encrypted).
Huang, Kai March 8, 2024, 12:09 a.m. UTC | #7
On 8/03/2024 6:10 am, Kirill A. Shutemov wrote:
> On Thu, Mar 07, 2024 at 11:49:11AM +1300, Huang, Kai wrote:
>>
>>
>> On 7/03/2024 11:43 am, Sean Christopherson wrote:
>>> On Thu, Mar 07, 2024, Kai Huang wrote:
>>>>
>>>>
>>>> On 28/02/2024 3:41 pm, Sean Christopherson wrote:
>>>>> Explicitly detect and disallow private accesses to emulated MMIO in
>>>>> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
>>>>> to perform the check.  This will allow the page fault path to go straight
>>>>> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
>>>>>
>>>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>>>> ---
>>>>>     arch/x86/kvm/mmu/mmu.c | 5 +++++
>>>>>     1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>>>> index 5c8caab64ba2..ebdb3fcce3dc 100644
>>>>> --- a/arch/x86/kvm/mmu/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>>>> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>>>>>     {
>>>>>     	gva_t gva = fault->is_tdp ? 0 : fault->addr;
>>>>> +	if (fault->is_private) {
>>>>> +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>>>>> +		return -EFAULT;
>>>>> +	}
>>>>> +
>>>>
>>>> As mentioned in another reply in this series, unless I am mistaken, for TDX
>>>> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN
>>>> being private.
>>>>
>>>> Returning to userspace cannot really help here because the MMIO mapping is
>>>> inside the guest.
>>>
>>> That's a guest bug.  The guest *knows* it's a TDX VM, it *has* to know.  Accessing
>>> emulated MMIO and thus taking a #VE before enabling paging is nonsensical.  Either
>>> enable paging and setup MMIO regions as shared, or go straight to TDCALL.
>>
>> +Kirill,
>>
>> I kinda forgot the detail, but what I am afraid is there might be bunch of
>> existing TDX guests (since TDX guest code is upstream-ed) using unmodified
>> drivers, which doesn't map MMIO regions as shared I suppose.
> 
> Unmodified drivers gets their MMIO regions mapped with ioremap() that sets
> shared bit, unless asked explicitly to make it private (encrypted).
> 

Thanks for clarification.  Obviously I had bad memory.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5c8caab64ba2..ebdb3fcce3dc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3314,6 +3314,11 @@  static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 {
 	gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
+	if (fault->is_private) {
+		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+		return -EFAULT;
+	}
+
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);