diff mbox series

[v18,044/121] KVM: x86/mmu: Assume guest MMIOs are shared

Message ID cce34006a4db0e1995ce007c917f834b117b12af.1705965635.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM TDX basic feature support | expand

Commit Message

Isaku Yamahata Jan. 22, 2024, 11:53 p.m. UTC
From: Chao Gao <chao.gao@intel.com>

Guest TD doesn't necessarily invoke MAP_GPA to convert the virtual MMIO
range to shared before accessing it.  When TD tries to access the virtual
device's MMIO as shared, an EPT violation is raised first.
kvm_mem_is_private() checks whether the GFN is shared or private.  If
MAP_GPA is not called for the GPA, KVM thinks the GPA is private and
refuses shared access, and doesn't set up shared EPT entry.  The guest
can't make progress.

Instead of requiring the guest to invoke MAP_GPA for regions of virtual
MMIOs assume regions of virtual MMIOs are shared in KVM as well (i.e., GPAs
either have no kvm_memory_slot or are backed by host MMIOs). So that guests
can access those MMIO regions.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini Feb. 12, 2024, 10:29 a.m. UTC | #1
On Tue, Jan 23, 2024 at 12:55 AM <isaku.yamahata@intel.com> wrote:
>
> From: Chao Gao <chao.gao@intel.com>
>
> Guest TD doesn't necessarily invoke MAP_GPA to convert the virtual MMIO
> range to shared before accessing it.  When TD tries to access the virtual
> device's MMIO as shared, an EPT violation is raised first.
> kvm_mem_is_private() checks whether the GFN is shared or private.  If
> MAP_GPA is not called for the GPA, KVM thinks the GPA is private and
> refuses shared access, and doesn't set up shared EPT entry.  The guest
> can't make progress.
>
> Instead of requiring the guest to invoke MAP_GPA for regions of virtual
> MMIOs assume regions of virtual MMIOs are shared in KVM as well (i.e., GPAs
> either have no kvm_memory_slot or are backed by host MMIOs). So that guests
> can access those MMIO regions.

I'm not sure how the patch below deals with host MMIOs?

> Signed-off-by: Chao Gao <chao.gao@intel.com>

Missing Signed-off-by.

Also, this patch conflicts with "[PATCH v11 09/35] KVM: x86: Determine
shared/private faults based on vm_type".  I think in general the logic
in that patch (which forces an exit to userspace if needed, to convert
the MMIO area to shared) can be applied to sw-protected and TDX guests
as well. I'm preparing a set of common patches that can be applied for
6.9 and will include something after testing with sw-protected VMs.

Paolo


> ---
>  arch/x86/kvm/mmu/mmu.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e93bc16a5e9b..583ae9d6bf5d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4371,7 +4371,12 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>                         return RET_PF_EMULATE;
>         }
>
> -       if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
> +       /*
> +        * !fault->slot means MMIO.  Don't require explicit GPA conversion for
> +        * MMIO because MMIO is assigned at the boot time.
> +        */
> +       if (fault->slot &&
> +           fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
>                 if (vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM)
>                         return RET_PF_RETRY;
>                 kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> --
> 2.25.1
>
Isaku Yamahata Feb. 26, 2024, 6:06 p.m. UTC | #2
On Mon, Feb 12, 2024 at 11:29:51AM +0100,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On Tue, Jan 23, 2024 at 12:55 AM <isaku.yamahata@intel.com> wrote:
> >
> > From: Chao Gao <chao.gao@intel.com>
> >
> > Guest TD doesn't necessarily invoke MAP_GPA to convert the virtual MMIO
> > range to shared before accessing it.  When TD tries to access the virtual
> > device's MMIO as shared, an EPT violation is raised first.
> > kvm_mem_is_private() checks whether the GFN is shared or private.  If
> > MAP_GPA is not called for the GPA, KVM thinks the GPA is private and
> > refuses shared access, and doesn't set up shared EPT entry.  The guest
> > can't make progress.
> >
> > Instead of requiring the guest to invoke MAP_GPA for regions of virtual
> > MMIOs assume regions of virtual MMIOs are shared in KVM as well (i.e., GPAs
> > either have no kvm_memory_slot or are backed by host MMIOs). So that guests
> > can access those MMIO regions.
> 
> I'm not sure how the patch below deals with host MMIOs?

It falls back to shared case to hit KVM_PFN_NOSLOT. It will be handled as
MMIO.

Anyway I found it breaks SW_PROTECTED case.  So I came up with the following.
I think we'd like to handle as
  - SW_PROTECTED => KVM_EXIT_MEMORY_FAULT
  - SNP, TDX => MMIO.
  


-       if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+       /*
+        * !fault->slot means MMIO for SNP and TDX.  Don't require explicit GPA
+        * conversion for MMIO because MMIO is assigned at the boot time.  Fall
+        * to !is_private case to get pfn = KVM_PFN_NOSLOT.
+        */
+       force_mmio = !slot &&
+               vcpu->kvm->arch.vm_type != KVM_X86_DEFAULT_VM &&
+               vcpu->kvm->arch.vm_type != KVM_X86_SW_PROTECTED_VM;
+       if (!force_mmio &&
+           fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
                kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
                return -EFAULT;
        }
 
-       if (fault->is_private)
+       if (!force_mmio && fault->is_private)
                return kvm_faultin_pfn_private(vcpu, fault);
Sean Christopherson Feb. 26, 2024, 6:39 p.m. UTC | #3
On Mon, Feb 26, 2024, Isaku Yamahata wrote:
> On Mon, Feb 12, 2024 at 11:29:51AM +0100,
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On Tue, Jan 23, 2024 at 12:55 AM <isaku.yamahata@intel.com> wrote:
> > >
> > > From: Chao Gao <chao.gao@intel.com>
> > >
> > > Guest TD doesn't necessarily invoke MAP_GPA to convert the virtual MMIO
> > > range to shared before accessing it.  When TD tries to access the virtual
> > > device's MMIO as shared, an EPT violation is raised first.
> > > kvm_mem_is_private() checks whether the GFN is shared or private.  If
> > > MAP_GPA is not called for the GPA, KVM thinks the GPA is private and
> > > refuses shared access, and doesn't set up shared EPT entry.  The guest
> > > can't make progress.
> > >
> > > Instead of requiring the guest to invoke MAP_GPA for regions of virtual
> > > MMIOs assume regions of virtual MMIOs are shared in KVM as well (i.e., GPAs
> > > either have no kvm_memory_slot or are backed by host MMIOs). So that guests
> > > can access those MMIO regions.
> > 
> > I'm not sure how the patch below deals with host MMIOs?
> 
> It falls back to shared case to hit KVM_PFN_NOSLOT. It will be handled as
> MMIO.
> 
> Anyway I found it breaks SW_PROTECTED case.  So I came up with the following.
> I think we'd like to handle as
>   - SW_PROTECTED => KVM_EXIT_MEMORY_FAULT
>   - SNP, TDX => MMIO.
>   

FFS.  Stop lobbing patch bombs and start having actual conversations.

Seriously, the whole point of using mailing lists is to have *discussions* and
to coordinate development.  Throwing patches at kvm@ and then walking away DOES
NOT WORK.

Putting a "TODO: Drop this patch once the common patch is merged." in the
changelog[1] is not helpful.

Dropping a proposed common uAPI[2] into a 121 patch series without even *acknowledging*
that you received the message DOES NOT WORK.  You didn't even add a Suggested-by
or Cc: the people who expressed interest.  I can't read minds, and AFAIK no one
else working on KVM is a telepath either.

I do not know to make it any clearer: for TDX support to go anywhere, there needs
to be a _lot_ more communication.

[1] https://lore.kernel.org/all/b2e5c92fd66a0113b472dd602220346d3d435732.1708933498.git.isaku.yamahata@intel.com
[2] https://lore.kernel.org/all/8b7380f1b02f8e3995f18bebb085e43165d5d682.1708933498.git.isaku.yamahata@intel.com

> -       if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
> +       /*
> +        * !fault->slot means MMIO for SNP and TDX.  Don't require explicit GPA
> +        * conversion for MMIO because MMIO is assigned at the boot time.  Fall
> +        * to !is_private case to get pfn = KVM_PFN_NOSLOT.
> +        */
> +       force_mmio = !slot &&

NAK, this already got shot down.

https://lore.kernel.org/all/ZcUO5sFEAIH68JIA@google.com

> +               vcpu->kvm->arch.vm_type != KVM_X86_DEFAULT_VM &&
> +               vcpu->kvm->arch.vm_type != KVM_X86_SW_PROTECTED_VM;
> +       if (!force_mmio &&
> +           fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
>                 kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>                 return -EFAULT;
>         }
>  
> -       if (fault->is_private)
> +       if (!force_mmio && fault->is_private)
>                 return kvm_faultin_pfn_private(vcpu, fault);
> 
> -- 
> Isaku Yamahata <isaku.yamahata@linux.intel.com>
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e93bc16a5e9b..583ae9d6bf5d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4371,7 +4371,12 @@  static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			return RET_PF_EMULATE;
 	}
 
-	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+	/*
+	 * !fault->slot means MMIO.  Don't require explicit GPA conversion for
+	 * MMIO because MMIO is assigned at the boot time.
+	 */
+	if (fault->slot &&
+	    fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
 		if (vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM)
 			return RET_PF_RETRY;
 		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);