Message ID | 20210420112006.741541-1-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Apr 20, 2021, Paolo Bonzini wrote: > From 547d4d4edcd05fdfac6ce650d65db1d42bcd2807 Mon Sep 17 00:00:00 2001 > From: Paolo Bonzini <pbonzini@redhat.com> > Date: Tue, 20 Apr 2021 05:49:11 -0400 > Subject: [PATCH 1/3] KVM: SEV: mask CPUID[0x8000001F].eax according to > supported features Your mailer is obviously a bit wonky, took me a while to find this patch :-) > Do not return the SEV-ES bit from KVM_GET_SUPPORTED_CPUID unless > the corresponding module parameter is 1, and clear the memory encryption > leaf completely if SEV is disabled. Impeccable timing, I was planning on refreshing my SEV cleanup series[*] today. There's going to be an annoying conflict with the svm_set_cpu_caps() change (see below), any objecting to folding your unintentional feedback into my series? [*] https://lkml.kernel.org/r/20210306015905.186698-1-seanjc@google.com > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/cpuid.c | 5 ++++- > arch/x86/kvm/cpuid.h | 1 + > arch/x86/kvm/svm/svm.c | 7 +++++++ > 3 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 2ae061586677..d791d1f093ab 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -944,8 +944,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > break; > /* Support memory encryption cpuid if host supports it */ > case 0x8000001F: > - if (!boot_cpu_has(X86_FEATURE_SEV)) > + if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) { > entry->eax = entry->ebx = entry->ecx = entry->edx = 0; > + break; > + } > + cpuid_entry_override(entry, CPUID_8000_001F_EAX); I find this easier to read: if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) entry->eax = entry->ebx = entry->ecx = entry->edx = 0; else cpuid_entry_override(entry, CPUID_8000_001F_EAX); > break; > /*Add support for Centaur's CPUID instruction*/ > case 0xC0000000: > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > index 888e88b42e8d..e873a60a4830 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -99,6 +99,7 @@ static const struct cpuid_reg reverse_cpuid[] = { > [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, > [CPUID_7_1_EAX] = { 7, 1, CPUID_EAX}, > [CPUID_12_EAX] = {0x00000012, 0, CPUID_EAX}, > + [CPUID_8000_001F_EAX] = {0x8000001F, 0, CPUID_EAX}, > }; > > /* > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index cd8c333ed2dc..acdb8457289e 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -923,6 +923,13 @@ static __init void svm_set_cpu_caps(void) > if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || > boot_cpu_has(X86_FEATURE_AMD_SSBD)) > kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); > + > + /* CPUID 0x8000001F */ > + if (sev) { > + kvm_cpu_cap_set(X86_FEATURE_SEV); > + if (sev_es) > + kvm_cpu_cap_set(X86_FEATURE_SEV_ES); Gah, I completely spaced on the module params in my series, which is more problematic than normal because it also moves "sev" and "sev_es" to sev.c. The easy solution is to add sev_set_cpu_caps(). On the other, this misses SME_COHERENT. I also think it makes sense to call kvm_cpu_cap_mask() for the leaf, even if it's just to crush KVM's caps to zero. However, because of SME_COHERENT and other potential bits in the future, I think I prefer starting with the bits carried over from boot_cpu_data. E.g. kvm_cpu_cap_mask(CPUID_8000_001F_EAX, 0 /* SME */ | F(SEV) | 0 /* VM_PAGE_FLUSH */ | F(SEV_ES) | F(SME_COHERENT)); and (with renamed module params): if (sev_enabled) kvm_cpu_cap_clear(X86_FEATURE_SEV); if (sev_es_enabled) kvm_cpu_cap_clear(X86_FEATURE_SEV_ES); > + } > } > > static __init int svm_hardware_setup(void)
On Tue, Apr 20, 2021, Paolo Bonzini wrote: > From ef78673f78e3f2eedc498c1fbf9271146caa83cb Mon Sep 17 00:00:00 2001 > From: Ashish Kalra <ashish.kalra@amd.com> > Date: Thu, 15 Apr 2021 15:57:02 +0000 > Subject: [PATCH 2/3] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall > > This hypercall is used by the SEV guest to notify a change in the page > encryption status to the hypervisor. The hypercall should be invoked > only when the encryption attribute is changed from encrypted -> decrypted > and vice versa. By default all guest pages are considered encrypted. > > The hypercall exits to userspace to manage the guest shared regions and > integrate with the userspace VMM's migration code. ... > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index fd4a84911355..2bc353d1f356 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6766,3 +6766,14 @@ they will get passed on to user space. So user space still has to have > an implementation for these despite the in kernel acceleration. > > This capability is always enabled. > + > +8.32 KVM_CAP_EXIT_HYPERCALL > +--------------------------- > + > +:Capability: KVM_CAP_EXIT_HYPERCALL > +:Architectures: x86 > +:Type: vm > + > +This capability, if enabled, will cause KVM to exit to userspace > +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls. > +Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS. > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst > index cf62162d4be2..c9378d163b5a 100644 > --- a/Documentation/virt/kvm/cpuid.rst > +++ b/Documentation/virt/kvm/cpuid.rst > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit > before using extended destination > ID bits in MSI address bits 11-5. > > +KVM_FEATURE_HC_PAGE_ENC_STATUS 16 guest checks this feature bit before > + using the page encryption state > + hypercall to notify the page state > + change ... > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > { > unsigned long nr, a0, a1, a2, a3, ret; > @@ -8334,6 +8346,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > kvm_sched_yield(vcpu, a0); > ret = 0; > break; > + case KVM_HC_PAGE_ENC_STATUS: { > + u64 gpa = a0, npages = a1, enc = a2; > + > + ret = -KVM_ENOSYS; > + if (!vcpu->kvm->arch.hypercall_exit_enabled) I don't follow, why does the hypercall need to be gated by a capability? What would break if this were changed to? if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > + break; > + > + if (!PAGE_ALIGNED(gpa) || !npages || > + gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { > + ret = -EINVAL; > + break; > + } > + > + vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS; > + vcpu->run->hypercall.args[0] = gpa; > + vcpu->run->hypercall.args[1] = npages; > + vcpu->run->hypercall.args[2] = enc; > + vcpu->run->hypercall.longmode = op_64_bit; > + vcpu->arch.complete_userspace_io = complete_hypercall_exit; > + return 0; > + } > default: > ret = -KVM_ENOSYS; > break; ... > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 590cc811c99a..d696a9f13e33 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > vcpu->arch.msr_kvm_poll_control = data; > break; > > + case MSR_KVM_MIGRATION_CONTROL: > + if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE) > + return 1; > + > + if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) Why let the guest write '0'? Letting the guest do WRMSR but not RDMSR is bizarre. > + return 1; > + break; > + > case MSR_IA32_MCG_CTL: > case MSR_IA32_MCG_STATUS: > case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: > @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) > return 1; > > + msr_info->data = 0; > + break; > + case MSR_KVM_MIGRATION_CONTROL: > + if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > + return 1; > + > msr_info->data = 0; > break; > case MSR_KVM_STEAL_TIME: > -- > 2.26.2 >
On Tue, Apr 20, 2021 at 05:31:07PM +0000, Sean Christopherson wrote: > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > From ef78673f78e3f2eedc498c1fbf9271146caa83cb Mon Sep 17 00:00:00 2001 > > From: Ashish Kalra <ashish.kalra@amd.com> > > Date: Thu, 15 Apr 2021 15:57:02 +0000 > > Subject: [PATCH 2/3] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall > > > > This hypercall is used by the SEV guest to notify a change in the page > > encryption status to the hypervisor. The hypercall should be invoked > > only when the encryption attribute is changed from encrypted -> decrypted > > and vice versa. By default all guest pages are considered encrypted. > > > > The hypercall exits to userspace to manage the guest shared regions and > > integrate with the userspace VMM's migration code. > > ... > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index fd4a84911355..2bc353d1f356 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6766,3 +6766,14 @@ they will get passed on to user space. So user space still has to have > > an implementation for these despite the in kernel acceleration. > > > > This capability is always enabled. > > + > > +8.32 KVM_CAP_EXIT_HYPERCALL > > +--------------------------- > > + > > +:Capability: KVM_CAP_EXIT_HYPERCALL > > +:Architectures: x86 > > +:Type: vm > > + > > +This capability, if enabled, will cause KVM to exit to userspace > > +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls. > > +Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS. > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst > > index cf62162d4be2..c9378d163b5a 100644 > > --- a/Documentation/virt/kvm/cpuid.rst > > +++ b/Documentation/virt/kvm/cpuid.rst > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit > > before using extended destination > > ID bits in MSI address bits 11-5. > > > > +KVM_FEATURE_HC_PAGE_ENC_STATUS 16 guest checks this feature bit before > > + using the page encryption state > > + hypercall to notify the page state > > + change > > ... > > > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > > { > > unsigned long nr, a0, a1, a2, a3, ret; > > @@ -8334,6 +8346,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > > kvm_sched_yield(vcpu, a0); > > ret = 0; > > break; > > + case KVM_HC_PAGE_ENC_STATUS: { > > + u64 gpa = a0, npages = a1, enc = a2; > > + > > + ret = -KVM_ENOSYS; > > + if (!vcpu->kvm->arch.hypercall_exit_enabled) > > I don't follow, why does the hypercall need to be gated by a capability? What > would break if this were changed to? > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > But, the above indicates host support for page_enc_status_hc, so we want to ensure that host supports and has enabled support for the hypercall exit, i.e., hypercall has been enabled. Thanks, Ashish > > + break; > > + > > + if (!PAGE_ALIGNED(gpa) || !npages || > > + gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { > > + ret = -EINVAL; > > + break; > > + } > > + > > + vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > > + vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS; > > + vcpu->run->hypercall.args[0] = gpa; > > + vcpu->run->hypercall.args[1] = npages; > > + vcpu->run->hypercall.args[2] = enc; > > + vcpu->run->hypercall.longmode = op_64_bit; > > + vcpu->arch.complete_userspace_io = complete_hypercall_exit; > > + return 0; > > + } > > default: > > ret = -KVM_ENOSYS; > > break; > > ... > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 590cc811c99a..d696a9f13e33 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > vcpu->arch.msr_kvm_poll_control = data; > > break; > > > > + case MSR_KVM_MIGRATION_CONTROL: > > + if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE) > > + return 1; > > + > > + if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > Why let the guest write '0'? Letting the guest do WRMSR but not RDMSR is > bizarre. > > > + return 1; > > + break; > > + > > case MSR_IA32_MCG_CTL: > > case MSR_IA32_MCG_STATUS: > > case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: > > @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) > > return 1; > > > > + msr_info->data = 0; > > + break; > > + case MSR_KVM_MIGRATION_CONTROL: > > + if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > + return 1; > > + > > msr_info->data = 0; > > break; > > case MSR_KVM_STEAL_TIME: > > -- > > 2.26.2 > >
On 20/04/21 17:15, Sean Christopherson wrote: > On Tue, Apr 20, 2021, Paolo Bonzini wrote: >> Do not return the SEV-ES bit from KVM_GET_SUPPORTED_CPUID unless >> the corresponding module parameter is 1, and clear the memory encryption >> leaf completely if SEV is disabled. > > Impeccable timing, I was planning on refreshing my SEV cleanup series[*] today. > There's going to be an annoying conflict with the svm_set_cpu_caps() change > (see below), any objecting to folding your unintentional feedback into my series? That's fine of course. >> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h >> index 888e88b42e8d..e873a60a4830 100644 >> --- a/arch/x86/kvm/cpuid.h >> +++ b/arch/x86/kvm/cpuid.h >> @@ -99,6 +99,7 @@ static const struct cpuid_reg reverse_cpuid[] = { >> [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, >> [CPUID_7_1_EAX] = { 7, 1, CPUID_EAX}, >> [CPUID_12_EAX] = {0x00000012, 0, CPUID_EAX}, >> + [CPUID_8000_001F_EAX] = {0x8000001F, 0, CPUID_EAX}, >> }; >> >> /* >> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c >> index cd8c333ed2dc..acdb8457289e 100644 >> --- a/arch/x86/kvm/svm/svm.c >> +++ b/arch/x86/kvm/svm/svm.c >> @@ -923,6 +923,13 @@ static __init void svm_set_cpu_caps(void) >> if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || >> boot_cpu_has(X86_FEATURE_AMD_SSBD)) >> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); >> + >> + /* CPUID 0x8000001F */ >> + if (sev) { >> + kvm_cpu_cap_set(X86_FEATURE_SEV); >> + if (sev_es) >> + kvm_cpu_cap_set(X86_FEATURE_SEV_ES); > > Gah, I completely spaced on the module params in my series, which is more > problematic than normal because it also moves "sev" and "sev_es" to sev.c. The > easy solution is to add sev_set_cpu_caps(). Sounds good. Paolo
On 20/04/21 19:31, Sean Christopherson wrote: >> + case KVM_HC_PAGE_ENC_STATUS: { >> + u64 gpa = a0, npages = a1, enc = a2; >> + >> + ret = -KVM_ENOSYS; >> + if (!vcpu->kvm->arch.hypercall_exit_enabled) > > I don't follow, why does the hypercall need to be gated by a capability? What > would break if this were changed to? > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it unmodified to KVM_SET_CPUID2. For this reason, features that are conditional on other ioctls, or that require some kind of userspace support, must not be in KVM_GET_SUPPORTED_CPUID. For example: - TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT) - MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS X2APIC is reported even though it shouldn't be. Too late to fix that, I think. In this particular case, if userspace sets the bit in CPUID2 but doesn't handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall. (I should document that, Jim asked for documentation around KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST many times). Paolo >> + break; >> + >> + if (!PAGE_ALIGNED(gpa) || !npages || >> + gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { >> + ret = -EINVAL; >> + break; >> + } >> + >> + vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; >> + vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS; >> + vcpu->run->hypercall.args[0] = gpa; >> + vcpu->run->hypercall.args[1] = npages; >> + vcpu->run->hypercall.args[2] = enc; >> + vcpu->run->hypercall.longmode = op_64_bit; >> + vcpu->arch.complete_userspace_io = complete_hypercall_exit; >> + return 0; >> + } >> default: >> ret = -KVM_ENOSYS; >> break; > > ... > >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 590cc811c99a..d696a9f13e33 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> vcpu->arch.msr_kvm_poll_control = data; >> break; >> >> + case MSR_KVM_MIGRATION_CONTROL: >> + if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE) >> + return 1; >> + >> + if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > Why let the guest write '0'? Letting the guest do WRMSR but not RDMSR is > bizarre. Because it was the simplest way to write the code, but returning 0 unconditionally from RDMSR is actually simpler. Paolo >> + return 1; >> + break; >> + >> case MSR_IA32_MCG_CTL: >> case MSR_IA32_MCG_STATUS: >> case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: >> @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) >> return 1; >> >> + msr_info->data = 0; >> + break; >> + case MSR_KVM_MIGRATION_CONTROL: >> + if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) >> + return 1; >> + >> msr_info->data = 0; >> break; >> case MSR_KVM_STEAL_TIME: >> -- >> 2.26.2 >> >
On Tue, Apr 20, 2021, Ashish Kalra wrote: > On Tue, Apr 20, 2021 at 05:31:07PM +0000, Sean Christopherson wrote: > > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > > + case KVM_HC_PAGE_ENC_STATUS: { > > > + u64 gpa = a0, npages = a1, enc = a2; > > > + > > > + ret = -KVM_ENOSYS; > > > + if (!vcpu->kvm->arch.hypercall_exit_enabled) > > > > I don't follow, why does the hypercall need to be gated by a capability? What > > would break if this were changed to? > > > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > > > But, the above indicates host support for page_enc_status_hc, so we want > to ensure that host supports and has enabled support for the hypercall > exit, i.e., hypercall has been enabled. I still don't see how parroting back KVM_GET_SUPPORTED_CPUID, i.e. "unintentionally" setting KVM_FEATURE_HC_PAGE_ENC_STATUS, would break anything. Sure, the guest does unnecessary hypercalls, but they're eaten by KVM. On the flip side, gating the hypercall on the capability, and especially only the capability, creates weird scenarios where the guest can observe KVM_FEATURE_HC_PAGE_ENC_STATUS=1 but fail the hypercall. Those would be fairly clearcut VMM bugs, but at the same time KVM is essentially going out of its way to manufacture the problem.
On Tue, Apr 20, 2021, Sean Christopherson wrote: > On Tue, Apr 20, 2021, Ashish Kalra wrote: > > On Tue, Apr 20, 2021 at 05:31:07PM +0000, Sean Christopherson wrote: > > > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > > > + case KVM_HC_PAGE_ENC_STATUS: { > > > > + u64 gpa = a0, npages = a1, enc = a2; > > > > + > > > > + ret = -KVM_ENOSYS; > > > > + if (!vcpu->kvm->arch.hypercall_exit_enabled) > > > > > > I don't follow, why does the hypercall need to be gated by a capability? What > > > would break if this were changed to? > > > > > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > > > > > > But, the above indicates host support for page_enc_status_hc, so we want > > to ensure that host supports and has enabled support for the hypercall > > exit, i.e., hypercall has been enabled. > > I still don't see how parroting back KVM_GET_SUPPORTED_CPUID, i.e. "unintentionally" > setting KVM_FEATURE_HC_PAGE_ENC_STATUS, would break anything. Sure, the guest > does unnecessary hypercalls, but they're eaten by KVM. On the flip side, gating > the hypercall on the capability, and especially only the capability, creates > weird scenarios where the guest can observe KVM_FEATURE_HC_PAGE_ENC_STATUS=1 > but fail the hypercall. Those would be fairly clearcut VMM bugs, but at the > same time KVM is essentially going out of its way to manufacture the problem. Doh, I was thinking of the MSR behavior, not the hypercall. I'll respond to Paolo's mail, I have one more hail mary idea.
On Tue, Apr 20, 2021, Paolo Bonzini wrote: > On 20/04/21 19:31, Sean Christopherson wrote: > > > + case KVM_HC_PAGE_ENC_STATUS: { > > > + u64 gpa = a0, npages = a1, enc = a2; > > > + > > > + ret = -KVM_ENOSYS; > > > + if (!vcpu->kvm->arch.hypercall_exit_enabled) > > > > I don't follow, why does the hypercall need to be gated by a capability? What > > would break if this were changed to? > > > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it > unmodified to KVM_SET_CPUID2. For this reason, features that are > conditional on other ioctls, or that require some kind of userspace support, > must not be in KVM_GET_SUPPORTED_CPUID. For example: > > - TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or > after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT) > > - MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS > > X2APIC is reported even though it shouldn't be. Too late to fix that, I > think. > > In this particular case, if userspace sets the bit in CPUID2 but doesn't > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall. Gah, I was thinking of the MSR behavior and forgot that the hypercall exiting behavior intentionally doesn't require extra filtering. It's also worth noting that guest_pv_has() is particularly useless since it will unconditionally return true for older VMMs that dont' enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID. Bummer.
On Tue, Apr 20, 2021, Sean Christopherson wrote: > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > On 20/04/21 19:31, Sean Christopherson wrote: > > > > + case KVM_HC_PAGE_ENC_STATUS: { > > > > + u64 gpa = a0, npages = a1, enc = a2; > > > > + > > > > + ret = -KVM_ENOSYS; > > > > + if (!vcpu->kvm->arch.hypercall_exit_enabled) > > > > > > I don't follow, why does the hypercall need to be gated by a capability? What > > > would break if this were changed to? > > > > > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > > > > The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it > > unmodified to KVM_SET_CPUID2. For this reason, features that are > > conditional on other ioctls, or that require some kind of userspace support, > > must not be in KVM_GET_SUPPORTED_CPUID. For example: > > > > - TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or > > after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT) > > > > - MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS > > > > X2APIC is reported even though it shouldn't be. Too late to fix that, I > > think. > > > > In this particular case, if userspace sets the bit in CPUID2 but doesn't > > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of > > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall. > > Gah, I was thinking of the MSR behavior and forgot that the hypercall exiting > behavior intentionally doesn't require extra filtering. > > It's also worth noting that guest_pv_has() is particularly useless since it > will unconditionally return true for older VMMs that dont' enable > KVM_CAP_ENFORCE_PV_FEATURE_CPUID. > > Bummer. Oh! Almost forgot my hail mary idea. Instead of a new capability, can we reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID? if (vcpu->arch.pv_cpuid.enforce && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS) break;
On 20/04/21 22:16, Sean Christopherson wrote: > On Tue, Apr 20, 2021, Sean Christopherson wrote: >> On Tue, Apr 20, 2021, Paolo Bonzini wrote: >>> In this particular case, if userspace sets the bit in CPUID2 but doesn't >>> handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of >>> assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall. > > Oh! Almost forgot my hail mary idea. Instead of a new capability, can we > reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID? > > if (vcpu->arch.pv_cpuid.enforce && > !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS) > break; Couldn't userspace enable that capability and _still_ copy the supported CPUID blindly to the guest CPUID, without supporting the hypercall? (BTW, it's better to return a bitmask of hypercalls that will exit to userspace from KVM_CHECK_EXTENSION. Userspace can still reject with -ENOSYS those that it doesn't know, but it's important that it knows in general how to handle KVM_EXIT_HYPERCALL). Paolo
On Tue, Apr 20, 2021, Paolo Bonzini wrote: > On 20/04/21 22:16, Sean Christopherson wrote: > > On Tue, Apr 20, 2021, Sean Christopherson wrote: > > > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > > > In this particular case, if userspace sets the bit in CPUID2 but doesn't > > > > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of > > > > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall. > > > > Oh! Almost forgot my hail mary idea. Instead of a new capability, can we > > reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID? > > > > if (vcpu->arch.pv_cpuid.enforce && > > !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS) > > break; > > Couldn't userspace enable that capability and _still_ copy the supported > CPUID blindly to the guest CPUID, without supporting the hypercall? Yes. I was going to argue that we get to define the behavior, but that's not true because it would break existing VMMs that blindly copy. Capability it is...
On Tue, Apr 20, 2021, Sean Christopherson wrote: > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > On 20/04/21 22:16, Sean Christopherson wrote: > > > On Tue, Apr 20, 2021, Sean Christopherson wrote: > > > > On Tue, Apr 20, 2021, Paolo Bonzini wrote: > > > > > In this particular case, if userspace sets the bit in CPUID2 but doesn't > > > > > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of > > > > > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall. > > > > > > Oh! Almost forgot my hail mary idea. Instead of a new capability, can we > > > reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID? > > > > > > if (vcpu->arch.pv_cpuid.enforce && > > > !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS) > > > break; > > > > Couldn't userspace enable that capability and _still_ copy the supported > > CPUID blindly to the guest CPUID, without supporting the hypercall? > > Yes. I was going to argue that we get to define the behavior, but that's not > true because it would break existing VMMs that blindly copy. Capability it is... Hrm, that won't quite work though. If userspace blindly copies CPUID, but doesn't enable the capability, the guest will think the hypercall is supported. The guest hopefully won't freak out too much on the resulting -KVM_ENOSYS, but it does make the CPUID flag rather useless. We can make it work with: u64 gpa = a0, npages = a1, enc = a2; if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) break; if (!PAGE_ALIGNED(gpa) || !npages || gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { ret = -EINVAL; break; } if (!vcpu->kvm->arch.hypercall_exit_enabled) { ret = 0; break; } vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS; vcpu->run->hypercall.args[0] = gpa; vcpu->run->hypercall.args[1] = npages; vcpu->run->hypercall.args[2] = enc; vcpu->run->hypercall.longmode = op_64_bit; vcpu->arch.complete_userspace_io = complete_hypercall_exit; That's dancing pretty close to hypercall filtering, which I was hoping to avoid. I guess it's not reaaaally filtering since the exit check happens after the validity checks. > > > (BTW, it's better to return a bitmask of hypercalls that will exit to > > > userspace from KVM_CHECK_EXTENSION. Userspace can still reject with -ENOSYS > > > those that it doesn't know, but it's important that it knows in general how > > > to handle KVM_EXIT_HYPERCALL). Speaking of bitmasks, what about also accepting a bitmask for enabling the capability? (not sure if the above implies that). E.g. if (!(vcpu->kvm->arch.hypercall_exit_enabled & BIT_ULL(nr))) { ret = 0; break; }
On 21/04/21 01:20, Sean Christopherson wrote: > If userspace blindly copies CPUID, but doesn't > enable the capability, the guest will think the hypercall is supported. The > guest hopefully won't freak out too much on the resulting -KVM_ENOSYS, but it > does make the CPUID flag rather useless. Yes that's why the CPUID bit must *not* be in KVM_GET_SUPPORTED_CPUID. > The > guest hopefully won't freak out too much on the resulting -KVM_ENOSYS, but it > does make the CPUID flag rather useless. > > We can make it work with: > > u64 gpa = a0, npages = a1, enc = a2; > > if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) > break; > > if (!PAGE_ALIGNED(gpa) || !npages || > gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { > ret = -EINVAL; > break; > } > > if (!vcpu->kvm->arch.hypercall_exit_enabled) { > ret = 0; > break; > } > > [...] The interaction with KVM_CAP_ENFORCE_PV_FEATURE_CPUID scares me. But I'll take it into account when posting v2. >>>> (BTW, it's better to return a bitmask of hypercalls that will exit to >>>> userspace from KVM_CHECK_EXTENSION. Userspace can still reject with -ENOSYS >>>> those that it doesn't know, but it's important that it knows in general how >>>> to handle KVM_EXIT_HYPERCALL). > > Speaking of bitmasks, what about also accepting a bitmask for enabling the > capability? (not sure if the above implies that). E.g. Yes, makes sense. Paolo
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 2ae061586677..d791d1f093ab 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -944,8 +944,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) break; /* Support memory encryption cpuid if host supports it */ case 0x8000001F: - if (!boot_cpu_has(X86_FEATURE_SEV)) + if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; + break; + } + cpuid_entry_override(entry, CPUID_8000_001F_EAX); break; /*Add support for Centaur's CPUID instruction*/ case 0xC0000000: diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 888e88b42e8d..e873a60a4830 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -99,6 +99,7 @@ static const struct cpuid_reg reverse_cpuid[] = { [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, [CPUID_7_1_EAX] = { 7, 1, CPUID_EAX}, [CPUID_12_EAX] = {0x00000012, 0, CPUID_EAX}, + [CPUID_8000_001F_EAX] = {0x8000001F, 0, CPUID_EAX}, }; /* diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index cd8c333ed2dc..acdb8457289e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -923,6 +923,13 @@ static __init void svm_set_cpu_caps(void) if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || boot_cpu_has(X86_FEATURE_AMD_SSBD)) kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD); + + /* CPUID 0x8000001F */ + if (sev) { + kvm_cpu_cap_set(X86_FEATURE_SEV); + if (sev_es) + kvm_cpu_cap_set(X86_FEATURE_SEV_ES); + } } static __init int svm_hardware_setup(void) -- 2.26.2 From ef78673f78e3f2eedc498c1fbf9271146caa83cb Mon Sep 17 00:00:00 2001 From: Ashish Kalra <ashish.kalra@amd.com> Date: Thu, 15 Apr 2021 15:57:02 +0000 Subject: [PATCH 2/3] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <93d7f2c2888315adc48905722574d89699edde33.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- Documentation/virt/kvm/api.rst | 11 +++++++++ Documentation/virt/kvm/cpuid.rst | 5 ++++ Documentation/virt/kvm/hypercalls.rst | 15 ++++++++++++ arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/x86.c | 34 +++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + include/uapi/linux/kvm_para.h | 1 + 8 files changed, 70 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index fd4a84911355..2bc353d1f356 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6766,3 +6766,14 @@ they will get passed on to user space. So user space still has to have an implementation for these despite the in kernel acceleration. This capability is always enabled. + +8.32 KVM_CAP_EXIT_HYPERCALL +--------------------------- + +:Capability: KVM_CAP_EXIT_HYPERCALL +:Architectures: x86 +:Type: vm + +This capability, if enabled, will cause KVM to exit to userspace +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls. +Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS. diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index cf62162d4be2..c9378d163b5a 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit before using extended destination ID bits in MSI address bits 11-5. +KVM_FEATURE_HC_PAGE_ENC_STATUS 16 guest checks this feature bit before + using the page encryption state + hypercall to notify the page state + change + KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expected in kvmclock diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst index ed4fddd364ea..234ed5188aef 100644 --- a/Documentation/virt/kvm/hypercalls.rst +++ b/Documentation/virt/kvm/hypercalls.rst @@ -169,3 +169,18 @@ a0: destination APIC ID :Usage example: When sending a call-function IPI-many to vCPUs, yield if any of the IPI target vCPUs was preempted. + + +8. KVM_HC_PAGE_ENC_STATUS +------------------------- +:Architecture: x86 +:Status: active +:Purpose: Notify the encryption status changes in guest page table (SEV guest) + +a0: the guest physical address of the start page +a1: the number of pages +a2: page encryption status + + Where: + * 1: Page is encrypted + * 0: Page is decrypted diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6e195f7df4f0..45cb09543b56 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1047,6 +1047,8 @@ struct kvm_arch { bool bus_lock_detection_enabled; + bool hypercall_exit_enabled; + /* Guest can access the SGX PROVISIONKEY. */ bool sgx_provisioning_allowed; diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 950afebfba88..be49956b603f 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -33,6 +33,7 @@ #define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_FEATURE_ASYNC_PF_INT 14 #define KVM_FEATURE_MSI_EXT_DEST_ID 15 +#define KVM_FEATURE_HC_PAGE_ENC_STATUS 16 #define KVM_HINTS_REALTIME 0 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c9ba6f2d9bcd..590cc811c99a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3809,6 +3809,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_SGX_ATTRIBUTE: #endif case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: + case KVM_CAP_EXIT_HYPERCALL: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: @@ -5413,6 +5414,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; } #endif + case KVM_CAP_EXIT_HYPERCALL: + kvm->arch.hypercall_exit_enabled = cap->args[0]; + r = 0; + break; case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: r = -EINVAL; if (kvm_x86_ops.vm_copy_enc_context_from) @@ -8269,6 +8274,13 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id) return; } +static int complete_hypercall_exit(struct kvm_vcpu *vcpu) +{ + kvm_rax_write(vcpu, vcpu->run->hypercall.ret); + ++vcpu->stat.hypercalls; + return kvm_skip_emulated_instruction(vcpu); +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -8334,6 +8346,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) kvm_sched_yield(vcpu, a0); ret = 0; break; + case KVM_HC_PAGE_ENC_STATUS: { + u64 gpa = a0, npages = a1, enc = a2; + + ret = -KVM_ENOSYS; + if (!vcpu->kvm->arch.hypercall_exit_enabled) + break; + + if (!PAGE_ALIGNED(gpa) || !npages || + gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) { + ret = -EINVAL; + break; + } + + vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; + vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS; + vcpu->run->hypercall.args[0] = gpa; + vcpu->run->hypercall.args[1] = npages; + vcpu->run->hypercall.args[2] = enc; + vcpu->run->hypercall.longmode = op_64_bit; + vcpu->arch.complete_userspace_io = complete_hypercall_exit; + return 0; + } default: ret = -KVM_ENOSYS; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d76533498543..7dc1c217704f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1081,6 +1081,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_SET_GUEST_DEBUG2 195 #define KVM_CAP_SGX_ATTRIBUTE 196 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197 +#define KVM_CAP_EXIT_HYPERCALL 198 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 8b86609849b9..847b83b75dc8 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -29,6 +29,7 @@ #define KVM_HC_CLOCK_PAIRING 9 #define KVM_HC_SEND_IPI 10 #define KVM_HC_SCHED_YIELD 11 +#define KVM_HC_PAGE_ENC_STATUS 12 /* * hypercalls use architecture specific -- 2.26.2 From b9109a4be9c33eb09658003d135abe77da4bbaa1 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Tue, 20 Apr 2021 06:50:21 -0400 Subject: [PATCH 3/3] KVM: x86: add MSR_KVM_MIGRATION_CONTROL Add a new MSR that can be used to communicate whether the page encryption status bitmap is up to date and therefore whether live migration of an encrypted guest is possible. The MSR should be processed by userspace if it is going to live migrate the guest; the default implementation does nothing. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- Documentation/virt/kvm/cpuid.rst | 3 ++- Documentation/virt/kvm/msr.rst | 10 ++++++++++ arch/x86/include/uapi/asm/kvm_para.h | 3 +++ arch/x86/kvm/x86.c | 14 ++++++++++++++ 4 files changed, 29 insertions(+), 1 deletion(-) diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst index c9378d163b5a..15923857de56 100644 --- a/Documentation/virt/kvm/cpuid.rst +++ b/Documentation/virt/kvm/cpuid.rst @@ -99,7 +99,8 @@ KVM_FEATURE_MSI_EXT_DEST_ID 15 guest checks this feature bit KVM_FEATURE_HC_PAGE_ENC_STATUS 16 guest checks this feature bit before using the page encryption state hypercall to notify the page state - change + change, and before setting bit 0 of + MSR_KVM_MIGRATION_CONTROL KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 host will warn if no guest-side per-cpu warps are expected in diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst index e37a14c323d2..691d718df60a 100644 --- a/Documentation/virt/kvm/msr.rst +++ b/Documentation/virt/kvm/msr.rst @@ -376,3 +376,13 @@ data: write '1' to bit 0 of the MSR, this causes the host to re-scan its queue and check if there are more notifications pending. The MSR is available if KVM_FEATURE_ASYNC_PF_INT is present in CPUID. + +MSR_KVM_MIGRATION_CONTROL: + 0x4b564d08 + +data: + If the guest is running with encrypted memory and it is communicating + page encryption status to the host using the ``KVM_HC_PAGE_ENC_STATUS`` + hypercall, it can set bit 0 in this MSR to allow live migration of + the guest. The bit can be set if KVM_FEATURE_HC_PAGE_ENC_STATUS is + present in CPUID. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index be49956b603f..c5ee419775d8 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -55,6 +55,7 @@ #define MSR_KVM_POLL_CONTROL 0x4b564d05 #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 +#define MSR_KVM_MIGRATION_CONTROL 0x4b564d08 struct kvm_steal_time { __u64 steal; @@ -91,6 +92,8 @@ struct kvm_clock_pairing { /* MSR_KVM_ASYNC_PF_INT */ #define KVM_ASYNC_PF_VEC_MASK GENMASK(7, 0) +/* MSR_KVM_MIGRATION_CONTROL */ +#define KVM_PAGE_ENC_STATUS_UPTODATE (1 << 0) /* Operations for KVM_HC_MMU_OP */ #define KVM_MMU_OP_WRITE_PTE 1 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 590cc811c99a..d696a9f13e33 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu->arch.msr_kvm_poll_control = data; break; + case MSR_KVM_MIGRATION_CONTROL: + if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE) + return 1; + + if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) + return 1; + break; + case MSR_IA32_MCG_CTL: case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF)) return 1; + msr_info->data = 0; + break; + case MSR_KVM_MIGRATION_CONTROL: + if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)) + return 1; + msr_info->data = 0; break; case MSR_KVM_STEAL_TIME: