Message ID | 20240228024147.41573-12-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: Page fault and MMIO cleanups | expand |
On 28/02/2024 3:41 pm, Sean Christopherson wrote: > Explicitly detect and disallow private accesses to emulated MMIO in > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() > to perform the check. This will allow the page fault path to go straight > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 5c8caab64ba2..ebdb3fcce3dc 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, > { > gva_t gva = fault->is_tdp ? 0 : fault->addr; > > + if (fault->is_private) { > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > + return -EFAULT; > + } > + As mentioned in another reply in this series, unless I am mistaken, for TDX guest the _first_ MMIO access would still cause EPT violation with MMIO GFN being private. Returning to userspace cannot really help here because the MMIO mapping is inside the guest. I am hoping I am missing something here?
On Thu, Mar 07, 2024, Kai Huang wrote: > > > On 28/02/2024 3:41 pm, Sean Christopherson wrote: > > Explicitly detect and disallow private accesses to emulated MMIO in > > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() > > to perform the check. This will allow the page fault path to go straight > > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 5c8caab64ba2..ebdb3fcce3dc 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, > > { > > gva_t gva = fault->is_tdp ? 0 : fault->addr; > > + if (fault->is_private) { > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > > + return -EFAULT; > > + } > > + > > As mentioned in another reply in this series, unless I am mistaken, for TDX > guest the _first_ MMIO access would still cause EPT violation with MMIO GFN > being private. > > Returning to userspace cannot really help here because the MMIO mapping is > inside the guest. That's a guest bug. The guest *knows* it's a TDX VM, it *has* to know. Accessing emulated MMIO and thus taking a #VE before enabling paging is nonsensical. Either enable paging and setup MMIO regions as shared, or go straight to TDCALL. > > I am hoping I am missing something here?
On 7/03/2024 11:43 am, Sean Christopherson wrote: > On Thu, Mar 07, 2024, Kai Huang wrote: >> >> >> On 28/02/2024 3:41 pm, Sean Christopherson wrote: >>> Explicitly detect and disallow private accesses to emulated MMIO in >>> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() >>> to perform the check. This will allow the page fault path to go straight >>> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). >>> >>> Signed-off-by: Sean Christopherson <seanjc@google.com> >>> --- >>> arch/x86/kvm/mmu/mmu.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>> index 5c8caab64ba2..ebdb3fcce3dc 100644 >>> --- a/arch/x86/kvm/mmu/mmu.c >>> +++ b/arch/x86/kvm/mmu/mmu.c >>> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, >>> { >>> gva_t gva = fault->is_tdp ? 0 : fault->addr; >>> + if (fault->is_private) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return -EFAULT; >>> + } >>> + >> >> As mentioned in another reply in this series, unless I am mistaken, for TDX >> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN >> being private. >> >> Returning to userspace cannot really help here because the MMIO mapping is >> inside the guest. > > That's a guest bug. The guest *knows* it's a TDX VM, it *has* to know. Accessing > emulated MMIO and thus taking a #VE before enabling paging is nonsensical. Either > enable paging and setup MMIO regions as shared, or go straight to TDCALL. +Kirill, I kinda forgot the detail, but what I am afraid is there might be bunch of existing TDX guests (since TDX guest code is upstream-ed) using unmodified drivers, which doesn't map MMIO regions as shared I suppose. Kirill, Could you clarify whether TDX guest code maps MMIO regions as shared since beginning?
On Thu, Mar 07, 2024, Kai Huang wrote: > > > On 7/03/2024 11:43 am, Sean Christopherson wrote: > > On Thu, Mar 07, 2024, Kai Huang wrote: > > > > > > > > > On 28/02/2024 3:41 pm, Sean Christopherson wrote: > > > > Explicitly detect and disallow private accesses to emulated MMIO in > > > > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() > > > > to perform the check. This will allow the page fault path to go straight > > > > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). > > > > > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > > --- > > > > arch/x86/kvm/mmu/mmu.c | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index 5c8caab64ba2..ebdb3fcce3dc 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, > > > > { > > > > gva_t gva = fault->is_tdp ? 0 : fault->addr; > > > > + if (fault->is_private) { > > > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > > > > + return -EFAULT; > > > > + } > > > > + > > > > > > As mentioned in another reply in this series, unless I am mistaken, for TDX > > > guest the _first_ MMIO access would still cause EPT violation with MMIO GFN > > > being private. > > > > > > Returning to userspace cannot really help here because the MMIO mapping is > > > inside the guest. > > > > That's a guest bug. The guest *knows* it's a TDX VM, it *has* to know. Accessing > > emulated MMIO and thus taking a #VE before enabling paging is nonsensical. Either > > enable paging and setup MMIO regions as shared, or go straight to TDCALL. > > +Kirill, > > I kinda forgot the detail, but what I am afraid is there might be bunch of > existing TDX guests (since TDX guest code is upstream-ed) using unmodified > drivers, which doesn't map MMIO regions as shared I suppose. > > Kirill, > > Could you clarify whether TDX guest code maps MMIO regions as shared since > beginning? Y'all get the same answer we gave the SNP folks: KVM does not yet support TDX, so as far is KVM is concerned, there is no existing functionality to support. s/firmware/Linux if this is a Linux kernel problem. On Thu, Feb 08, 2024, Paolo Bonzini wrote: > On Thu, Feb 8, 2024 at 6:27 PM Sean Christopherson <seanjc@google.com> wrote: > > No. KVM does not yet support SNP, so as far as KVM's ABI goes, there are no > > existing guests. Yes, I realize that I am burying my head in the sand to some > > extent, but it is simply not sustainable for KVM to keep trying to pick up the > > pieces of poorly defined hardware specs and broken guest firmware. > > 101% agreed. There are cases in which we have to and should bend > together backwards for guests (e.g. older Linux kernels), but not for > code that---according to current practices---is chosen by the host > admin. > > (I am of the opinion that "bring your own firmware" is the only sane > way to handle attestation/measurement, but that's not how things are > done currently).
On 7/03/2024 12:01 pm, Sean Christopherson wrote: > On Thu, Mar 07, 2024, Kai Huang wrote: >> >> >> On 7/03/2024 11:43 am, Sean Christopherson wrote: >>> On Thu, Mar 07, 2024, Kai Huang wrote: >>>> >>>> >>>> On 28/02/2024 3:41 pm, Sean Christopherson wrote: >>>>> Explicitly detect and disallow private accesses to emulated MMIO in >>>>> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() >>>>> to perform the check. This will allow the page fault path to go straight >>>>> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). >>>>> >>>>> Signed-off-by: Sean Christopherson <seanjc@google.com> >>>>> --- >>>>> arch/x86/kvm/mmu/mmu.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>>>> index 5c8caab64ba2..ebdb3fcce3dc 100644 >>>>> --- a/arch/x86/kvm/mmu/mmu.c >>>>> +++ b/arch/x86/kvm/mmu/mmu.c >>>>> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, >>>>> { >>>>> gva_t gva = fault->is_tdp ? 0 : fault->addr; >>>>> + if (fault->is_private) { >>>>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>>>> + return -EFAULT; >>>>> + } >>>>> + >>>> >>>> As mentioned in another reply in this series, unless I am mistaken, for TDX >>>> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN >>>> being private. >>>> >>>> Returning to userspace cannot really help here because the MMIO mapping is >>>> inside the guest. >>> >>> That's a guest bug. The guest *knows* it's a TDX VM, it *has* to know. Accessing >>> emulated MMIO and thus taking a #VE before enabling paging is nonsensical. Either >>> enable paging and setup MMIO regions as shared, or go straight to TDCALL. >> >> +Kirill, >> >> I kinda forgot the detail, but what I am afraid is there might be bunch of >> existing TDX guests (since TDX guest code is upstream-ed) using unmodified >> drivers, which doesn't map MMIO regions as shared I suppose. >> >> Kirill, >> >> Could you clarify whether TDX guest code maps MMIO regions as shared since >> beginning? > > Y'all get the same answer we gave the SNP folks: KVM does not yet support TDX, > so as far is KVM is concerned, there is no existing functionality to support. > > s/firmware/Linux if this is a Linux kernel problem. > > On Thu, Feb 08, 2024, Paolo Bonzini wrote: > > On Thu, Feb 8, 2024 at 6:27 PM Sean Christopherson <seanjc@google.com> wrote: > > > No. KVM does not yet support SNP, so as far as KVM's ABI goes, there are no > > > existing guests. Yes, I realize that I am burying my head in the sand to some > > > extent, but it is simply not sustainable for KVM to keep trying to pick up the > > > pieces of poorly defined hardware specs and broken guest firmware. > > > > 101% agreed. There are cases in which we have to and should bend > > together backwards for guests (e.g. older Linux kernels), but not for > > code that---according to current practices---is chosen by the host > > admin. > > > > (I am of the opinion that "bring your own firmware" is the only sane > > way to handle attestation/measurement, but that's not how things are > > done currently). Fair enough, and good to know. :-) (Still better to hear from Kirill, though.)
On Thu, Mar 07, 2024 at 11:49:11AM +1300, Huang, Kai wrote: > > > On 7/03/2024 11:43 am, Sean Christopherson wrote: > > On Thu, Mar 07, 2024, Kai Huang wrote: > > > > > > > > > On 28/02/2024 3:41 pm, Sean Christopherson wrote: > > > > Explicitly detect and disallow private accesses to emulated MMIO in > > > > kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() > > > > to perform the check. This will allow the page fault path to go straight > > > > to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). > > > > > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > > --- > > > > arch/x86/kvm/mmu/mmu.c | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > > index 5c8caab64ba2..ebdb3fcce3dc 100644 > > > > --- a/arch/x86/kvm/mmu/mmu.c > > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > > @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, > > > > { > > > > gva_t gva = fault->is_tdp ? 0 : fault->addr; > > > > + if (fault->is_private) { > > > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > > > > + return -EFAULT; > > > > + } > > > > + > > > > > > As mentioned in another reply in this series, unless I am mistaken, for TDX > > > guest the _first_ MMIO access would still cause EPT violation with MMIO GFN > > > being private. > > > > > > Returning to userspace cannot really help here because the MMIO mapping is > > > inside the guest. > > > > That's a guest bug. The guest *knows* it's a TDX VM, it *has* to know. Accessing > > emulated MMIO and thus taking a #VE before enabling paging is nonsensical. Either > > enable paging and setup MMIO regions as shared, or go straight to TDCALL. > > +Kirill, > > I kinda forgot the detail, but what I am afraid is there might be bunch of > existing TDX guests (since TDX guest code is upstream-ed) using unmodified > drivers, which doesn't map MMIO regions as shared I suppose. Unmodified drivers gets their MMIO regions mapped with ioremap() that sets shared bit, unless asked explicitly to make it private (encrypted).
On 8/03/2024 6:10 am, Kirill A. Shutemov wrote: > On Thu, Mar 07, 2024 at 11:49:11AM +1300, Huang, Kai wrote: >> >> >> On 7/03/2024 11:43 am, Sean Christopherson wrote: >>> On Thu, Mar 07, 2024, Kai Huang wrote: >>>> >>>> >>>> On 28/02/2024 3:41 pm, Sean Christopherson wrote: >>>>> Explicitly detect and disallow private accesses to emulated MMIO in >>>>> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() >>>>> to perform the check. This will allow the page fault path to go straight >>>>> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). >>>>> >>>>> Signed-off-by: Sean Christopherson <seanjc@google.com> >>>>> --- >>>>> arch/x86/kvm/mmu/mmu.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>>>> index 5c8caab64ba2..ebdb3fcce3dc 100644 >>>>> --- a/arch/x86/kvm/mmu/mmu.c >>>>> +++ b/arch/x86/kvm/mmu/mmu.c >>>>> @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, >>>>> { >>>>> gva_t gva = fault->is_tdp ? 0 : fault->addr; >>>>> + if (fault->is_private) { >>>>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>>>> + return -EFAULT; >>>>> + } >>>>> + >>>> >>>> As mentioned in another reply in this series, unless I am mistaken, for TDX >>>> guest the _first_ MMIO access would still cause EPT violation with MMIO GFN >>>> being private. >>>> >>>> Returning to userspace cannot really help here because the MMIO mapping is >>>> inside the guest. >>> >>> That's a guest bug. The guest *knows* it's a TDX VM, it *has* to know. Accessing >>> emulated MMIO and thus taking a #VE before enabling paging is nonsensical. Either >>> enable paging and setup MMIO regions as shared, or go straight to TDCALL. >> >> +Kirill, >> >> I kinda forgot the detail, but what I am afraid is there might be bunch of >> existing TDX guests (since TDX guest code is upstream-ed) using unmodified >> drivers, which doesn't map MMIO regions as shared I suppose. > > Unmodified drivers gets their MMIO regions mapped with ioremap() that sets > shared bit, unless asked explicitly to make it private (encrypted). > Thanks for clarification. Obviously I had bad memory.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5c8caab64ba2..ebdb3fcce3dc 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3314,6 +3314,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu, { gva_t gva = fault->is_tdp ? 0 : fault->addr; + if (fault->is_private) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + vcpu_cache_mmio_info(vcpu, gva, fault->gfn, access & shadow_mmio_access_mask);
Explicitly detect and disallow private accesses to emulated MMIO in kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private() to perform the check. This will allow the page fault path to go straight to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn(). Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu.c | 5 +++++ 1 file changed, 5 insertions(+)