Message ID | 20240404185034.3184582-12-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: guest_memfd: New hooks and functionality for SEV-SNP and TDX | expand |
On Thu, Apr 04, 2024 at 02:50:33PM -0400, Paolo Bonzini wrote: > From: Michael Roth <michael.roth@amd.com> > > In the case of SEV-SNP, whether or not a 2MB page can be mapped via a > 2MB mapping in the guest's nested page table depends on whether or not > any subpages within the range have already been initialized as private > in the RMP table. The existing mixed-attribute tracking in KVM is > insufficient here, for instance: > > - gmem allocates 2MB page > - guest issues PVALIDATE on 2MB page > - guest later converts a subpage to shared > - SNP host code issues PSMASH to split 2MB RMP mapping to 4K > - KVM MMU splits NPT mapping to 4K Binbin caught that I'd neglected to document the last step in the theoretical sequence here. It should state something to the effect of: - guest later converts that shared page back to private -Mike > > At this point there are no mixed attributes, and KVM would normally > allow for 2MB NPT mappings again, but this is actually not allowed > because the RMP table mappings are 4K and cannot be promoted on the > hypervisor side, so the NPT mappings must still be limited to 4K to > match this. > > Add a hook to determine the max NPT mapping size in situations like > this. > > Signed-off-by: Michael Roth <michael.roth@amd.com> > Message-Id: <20231230172351.574091-31-michael.roth@amd.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/mmu/mmu.c | 8 ++++++++ > 3 files changed, 11 insertions(+) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index c81990937ab4..2db87a6fd52a 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > KVM_X86_OP_OPTIONAL(get_untagged_addr) > KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) > KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) > +KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault) > KVM_X86_OP_OPTIONAL(gmem_invalidate) > > #undef KVM_X86_OP > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 59c7b95034fc..67dc108dd366 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1811,6 +1811,8 @@ struct kvm_x86_ops { > void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); > int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); > void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end); > + int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private, > + u8 *max_level); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 992e651540e8..13dd367b8af1 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4338,6 +4338,14 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > fault->max_level); > fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); > > + r = static_call(kvm_x86_gmem_validate_fault)(vcpu->kvm, fault->pfn, > + fault->gfn, fault->is_private, > + &fault->max_level); > + if (r) { > + kvm_release_pfn_clean(fault->pfn); > + return r; > + } > + > return RET_PF_CONTINUE; > } > > -- > 2.43.0 >
On Tue, Apr 09, 2024 at 06:46:32PM -0500, Michael Roth <michael.roth@amd.com> wrote: > On Thu, Apr 04, 2024 at 02:50:33PM -0400, Paolo Bonzini wrote: > > From: Michael Roth <michael.roth@amd.com> > > > > In the case of SEV-SNP, whether or not a 2MB page can be mapped via a > > 2MB mapping in the guest's nested page table depends on whether or not > > any subpages within the range have already been initialized as private > > in the RMP table. The existing mixed-attribute tracking in KVM is > > insufficient here, for instance: > > > > - gmem allocates 2MB page > > - guest issues PVALIDATE on 2MB page > > - guest later converts a subpage to shared > > - SNP host code issues PSMASH to split 2MB RMP mapping to 4K > > - KVM MMU splits NPT mapping to 4K > > Binbin caught that I'd neglected to document the last step in the > theoretical sequence here. It should state something to the effect > of: > > - guest later converts that shared page back to private > > -Mike > > > > > At this point there are no mixed attributes, and KVM would normally > > allow for 2MB NPT mappings again, but this is actually not allowed > > because the RMP table mappings are 4K and cannot be promoted on the > > hypervisor side, so the NPT mappings must still be limited to 4K to > > match this. > > > > Add a hook to determine the max NPT mapping size in situations like > > this. > > > > Signed-off-by: Michael Roth <michael.roth@amd.com> > > Message-Id: <20231230172351.574091-31-michael.roth@amd.com> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > --- > > arch/x86/include/asm/kvm-x86-ops.h | 1 + > > arch/x86/include/asm/kvm_host.h | 2 ++ > > arch/x86/kvm/mmu/mmu.c | 8 ++++++++ > > 3 files changed, 11 insertions(+) > > > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > > index c81990937ab4..2db87a6fd52a 100644 > > --- a/arch/x86/include/asm/kvm-x86-ops.h > > +++ b/arch/x86/include/asm/kvm-x86-ops.h > > @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > > KVM_X86_OP_OPTIONAL(get_untagged_addr) > > KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) > > KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) > > +KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault) > > KVM_X86_OP_OPTIONAL(gmem_invalidate) > > > > #undef KVM_X86_OP > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 59c7b95034fc..67dc108dd366 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1811,6 +1811,8 @@ struct kvm_x86_ops { > > void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); > > int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); > > void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end); > > + int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private, > > + u8 *max_level); > > }; I think you added is_private due to the TDX patches. As Yan pointed out at https://lore.kernel.org/kvm/ZiHGoUUcGlZObQvx@yzhao56-desk.sh.intel.com/ It's guaranteed that is_private is always true because the caller check it. if (fault->is_private) kvm_faultin_pfn_private() So we can drop is_private parameter.
On Thu, Apr 04, 2024 at 02:50:33PM -0400, Paolo Bonzini wrote: > From: Michael Roth <michael.roth@amd.com> > > In the case of SEV-SNP, whether or not a 2MB page can be mapped via a > 2MB mapping in the guest's nested page table depends on whether or not > any subpages within the range have already been initialized as private > in the RMP table. The existing mixed-attribute tracking in KVM is > insufficient here, for instance: > > - gmem allocates 2MB page > - guest issues PVALIDATE on 2MB page > - guest later converts a subpage to shared > - SNP host code issues PSMASH to split 2MB RMP mapping to 4K > - KVM MMU splits NPT mapping to 4K > > At this point there are no mixed attributes, and KVM would normally > allow for 2MB NPT mappings again, but this is actually not allowed > because the RMP table mappings are 4K and cannot be promoted on the > hypervisor side, so the NPT mappings must still be limited to 4K to > match this. Just curious how the mapping could be promoted back in this case? Thanks, Yilun > > Add a hook to determine the max NPT mapping size in situations like > this. > > Signed-off-by: Michael Roth <michael.roth@amd.com> > Message-Id: <20231230172351.574091-31-michael.roth@amd.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/mmu/mmu.c | 8 ++++++++ > 3 files changed, 11 insertions(+) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index c81990937ab4..2db87a6fd52a 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > KVM_X86_OP_OPTIONAL(get_untagged_addr) > KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) > KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) > +KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault) > KVM_X86_OP_OPTIONAL(gmem_invalidate) > > #undef KVM_X86_OP > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 59c7b95034fc..67dc108dd366 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1811,6 +1811,8 @@ struct kvm_x86_ops { > void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); > int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); > void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end); > + int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private, > + u8 *max_level); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 992e651540e8..13dd367b8af1 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4338,6 +4338,14 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > fault->max_level); > fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); > > + r = static_call(kvm_x86_gmem_validate_fault)(vcpu->kvm, fault->pfn, > + fault->gfn, fault->is_private, > + &fault->max_level); > + if (r) { > + kvm_release_pfn_clean(fault->pfn); > + return r; > + } > + > return RET_PF_CONTINUE; > } > > -- > 2.43.0 > >
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index c81990937ab4..2db87a6fd52a 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -140,6 +140,7 @@ KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); KVM_X86_OP_OPTIONAL(get_untagged_addr) KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) +KVM_X86_OP_OPTIONAL_RET0(gmem_validate_fault) KVM_X86_OP_OPTIONAL(gmem_invalidate) #undef KVM_X86_OP diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 59c7b95034fc..67dc108dd366 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1811,6 +1811,8 @@ struct kvm_x86_ops { void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); void (*gmem_invalidate)(kvm_pfn_t start, kvm_pfn_t end); + int (*gmem_validate_fault)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, bool is_private, + u8 *max_level); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 992e651540e8..13dd367b8af1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4338,6 +4338,14 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, fault->max_level); fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + r = static_call(kvm_x86_gmem_validate_fault)(vcpu->kvm, fault->pfn, + fault->gfn, fault->is_private, + &fault->max_level); + if (r) { + kvm_release_pfn_clean(fault->pfn); + return r; + } + return RET_PF_CONTINUE; }