[0/3] KVM: x86: guest interface for SEV live migration

Message ID	20210420112006.741541-1-pbonzini@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Paolo Bonzini <pbonzini@redhat.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, srutherford@google.com, joro@8bytes.org, brijesh.singh@amd.com, thomas.lendacky@amd.com, venu.busireddy@oracle.com, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@suse.de>, x86@kernel.org, Ashish Kalra <ashish.kalra@amd.com> Subject: [PATCH 0/3] KVM: x86: guest interface for SEV live migration Date: Tue, 20 Apr 2021 07:20:06 -0400 Message-Id: <20210420112006.741541-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk

Paolo Bonzini April 20, 2021, 11:20 a.m. UTC

This is a reviewed version of the guest interface (hypercall+MSR)
for SEV live migration.  The differences lie mostly in the API
for userspace.  In particular:

- the CPUID feature is not exposed in KVM_GET_SUPPORTED_CPUID

- the hypercall must be enabled manually with KVM_ENABLE_CAP

- the MSR has sensible behavior if not filtered (reads as zero,
  writes must leave undefined bits as zero or they #GP)

Patch 1 is an unrelated cleanup, that was made moot by not
exposing the bit in KVM_GET_SUPPORTED_CPUID.

Paolo

Ashish Kalra (1):
  KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

Paolo Bonzini (2):
  KVM: SEV: mask CPUID[0x8000001F].eax according to supported features
  KVM: x86: add MSR_KVM_MIGRATION_CONTROL

 Documentation/virt/kvm/api.rst        | 11 ++++++
 Documentation/virt/kvm/cpuid.rst      |  6 ++++
 Documentation/virt/kvm/hypercalls.rst | 15 +++++++++
 Documentation/virt/kvm/msr.rst        | 10 ++++++
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h  |  4 +++
 arch/x86/kvm/cpuid.c                  |  5 ++-
 arch/x86/kvm/cpuid.h                  |  1 +
 arch/x86/kvm/svm/svm.c                |  7 ++++
 arch/x86/kvm/x86.c                    | 48 +++++++++++++++++++++++++++
 include/uapi/linux/kvm.h              |  1 +
 include/uapi/linux/kvm_para.h         |  1 +
 12 files changed, 110 insertions(+), 1 deletion(-)

Sean Christopherson April 20, 2021, 3:15 p.m. UTC | #1

On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> From 547d4d4edcd05fdfac6ce650d65db1d42bcd2807 Mon Sep 17 00:00:00 2001
> From: Paolo Bonzini <pbonzini@redhat.com>
> Date: Tue, 20 Apr 2021 05:49:11 -0400
> Subject: [PATCH 1/3] KVM: SEV: mask CPUID[0x8000001F].eax according to
>  supported features

Your mailer is obviously a bit wonky, took me a while to find this patch :-)
 
> Do not return the SEV-ES bit from KVM_GET_SUPPORTED_CPUID unless
> the corresponding module parameter is 1, and clear the memory encryption
> leaf completely if SEV is disabled.

Impeccable timing, I was planning on refreshing my SEV cleanup series[*] today.
There's going to be an annoying conflict with the svm_set_cpu_caps() change
(see below), any objecting to folding your unintentional feedback into my series?

[*] https://lkml.kernel.org/r/20210306015905.186698-1-seanjc@google.com

> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/cpuid.c   | 5 ++++-
>  arch/x86/kvm/cpuid.h   | 1 +
>  arch/x86/kvm/svm/svm.c | 7 +++++++
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 2ae061586677..d791d1f093ab 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -944,8 +944,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>  		break;
>  	/* Support memory encryption cpuid if host supports it */
>  	case 0x8000001F:
> -		if (!boot_cpu_has(X86_FEATURE_SEV))
> +		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
>  			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> +			break;
> +		}
> +		cpuid_entry_override(entry, CPUID_8000_001F_EAX);

I find this easier to read:

		if (!kvm_cpu_cap_has(X86_FEATURE_SEV))
			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
		else
			cpuid_entry_override(entry, CPUID_8000_001F_EAX);

>  		break;
>  	/*Add support for Centaur's CPUID instruction*/
>  	case 0xC0000000:
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 888e88b42e8d..e873a60a4830 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -99,6 +99,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
>  	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>  	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
>  	[CPUID_12_EAX]        = {0x00000012, 0, CPUID_EAX},
> +	[CPUID_8000_001F_EAX] = {0x8000001F, 0, CPUID_EAX},
>  };
>  
>  /*
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cd8c333ed2dc..acdb8457289e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -923,6 +923,13 @@ static __init void svm_set_cpu_caps(void)
>  	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>  	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
>  		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> +
> +	/* CPUID 0x8000001F */
> +	if (sev) {
> +		kvm_cpu_cap_set(X86_FEATURE_SEV);
> +		if (sev_es)
> +			kvm_cpu_cap_set(X86_FEATURE_SEV_ES);

Gah, I completely spaced on the module params in my series, which is more
problematic than normal because it also moves "sev" and "sev_es" to sev.c.  The
easy solution is to add sev_set_cpu_caps().

On the other, this misses SME_COHERENT.  I also think it makes sense to call
kvm_cpu_cap_mask() for the leaf, even if it's just to crush KVM's caps to zero.
However, because of SME_COHERENT and other potential bits in the future, I think
I prefer starting with the bits carried over from boot_cpu_data.  E.g.

	kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
		0 /* SME */ | F(SEV) | 0 /* VM_PAGE_FLUSH */ | F(SEV_ES) |
		F(SME_COHERENT));

and (with renamed module params):

	if (sev_enabled)
		kvm_cpu_cap_clear(X86_FEATURE_SEV);
	if (sev_es_enabled)
		kvm_cpu_cap_clear(X86_FEATURE_SEV_ES);

> +	}
>  }
>  
>  static __init int svm_hardware_setup(void)

Sean Christopherson April 20, 2021, 5:31 p.m. UTC | #2

On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> From ef78673f78e3f2eedc498c1fbf9271146caa83cb Mon Sep 17 00:00:00 2001
> From: Ashish Kalra <ashish.kalra@amd.com>
> Date: Thu, 15 Apr 2021 15:57:02 +0000
> Subject: [PATCH 2/3] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> 
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
> 
> The hypercall exits to userspace to manage the guest shared regions and
> integrate with the userspace VMM's migration code.

...

> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index fd4a84911355..2bc353d1f356 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6766,3 +6766,14 @@ they will get passed on to user space. So user space still has to have
>  an implementation for these despite the in kernel acceleration.
>  
>  This capability is always enabled.
> +
> +8.32 KVM_CAP_EXIT_HYPERCALL
> +---------------------------
> +
> +:Capability: KVM_CAP_EXIT_HYPERCALL
> +:Architectures: x86
> +:Type: vm
> +
> +This capability, if enabled, will cause KVM to exit to userspace
> +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
> +Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS.
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index cf62162d4be2..c9378d163b5a 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
>                                                 before using extended destination
>                                                 ID bits in MSI address bits 11-5.
>  
> +KVM_FEATURE_HC_PAGE_ENC_STATUS     16          guest checks this feature bit before
> +                                               using the page encryption state
> +                                               hypercall to notify the page state
> +                                               change

...

>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long nr, a0, a1, a2, a3, ret;
> @@ -8334,6 +8346,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  		kvm_sched_yield(vcpu, a0);
>  		ret = 0;
>  		break;
> +	case KVM_HC_PAGE_ENC_STATUS: {
> +		u64 gpa = a0, npages = a1, enc = a2;
> +
> +		ret = -KVM_ENOSYS;
> +		if (!vcpu->kvm->arch.hypercall_exit_enabled)

I don't follow, why does the hypercall need to be gated by a capability?  What
would break if this were changed to?

		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

> +			break;
> +
> +		if (!PAGE_ALIGNED(gpa) || !npages ||
> +		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> +		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
> +		vcpu->run->hypercall.args[0]  = gpa;
> +		vcpu->run->hypercall.args[1]  = npages;
> +		vcpu->run->hypercall.args[2]  = enc;
> +		vcpu->run->hypercall.longmode = op_64_bit;
> +		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> +		return 0;
> +	}
>  	default:
>  		ret = -KVM_ENOSYS;
>  		break;

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 590cc811c99a..d696a9f13e33 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		vcpu->arch.msr_kvm_poll_control = data;
>  		break;
>  
> +	case MSR_KVM_MIGRATION_CONTROL:
> +		if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE)
> +			return 1;
> +
> +		if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

Why let the guest write '0'?  Letting the guest do WRMSR but not RDMSR is
bizarre.

> +			return 1;
> +		break;
> +
>  	case MSR_IA32_MCG_CTL:
>  	case MSR_IA32_MCG_STATUS:
>  	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
>  			return 1;
>  
> +		msr_info->data = 0;
> +		break;
> +	case MSR_KVM_MIGRATION_CONTROL:
> +		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> +			return 1;
> +
>  		msr_info->data = 0;
>  		break;
>  	case MSR_KVM_STEAL_TIME:
> -- 
> 2.26.2
>

Kalra, Ashish April 20, 2021, 6:11 p.m. UTC | #3

On Tue, Apr 20, 2021 at 05:31:07PM +0000, Sean Christopherson wrote:
> On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > From ef78673f78e3f2eedc498c1fbf9271146caa83cb Mon Sep 17 00:00:00 2001
> > From: Ashish Kalra <ashish.kalra@amd.com>
> > Date: Thu, 15 Apr 2021 15:57:02 +0000
> > Subject: [PATCH 2/3] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall
> > 
> > This hypercall is used by the SEV guest to notify a change in the page
> > encryption status to the hypervisor. The hypercall should be invoked
> > only when the encryption attribute is changed from encrypted -> decrypted
> > and vice versa. By default all guest pages are considered encrypted.
> > 
> > The hypercall exits to userspace to manage the guest shared regions and
> > integrate with the userspace VMM's migration code.
> 
> ...
> 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index fd4a84911355..2bc353d1f356 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6766,3 +6766,14 @@ they will get passed on to user space. So user space still has to have
> >  an implementation for these despite the in kernel acceleration.
> >  
> >  This capability is always enabled.
> > +
> > +8.32 KVM_CAP_EXIT_HYPERCALL
> > +---------------------------
> > +
> > +:Capability: KVM_CAP_EXIT_HYPERCALL
> > +:Architectures: x86
> > +:Type: vm
> > +
> > +This capability, if enabled, will cause KVM to exit to userspace
> > +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
> > +Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS.
> > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > index cf62162d4be2..c9378d163b5a 100644
> > --- a/Documentation/virt/kvm/cpuid.rst
> > +++ b/Documentation/virt/kvm/cpuid.rst
> > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> >                                                 before using extended destination
> >                                                 ID bits in MSI address bits 11-5.
> >  
> > +KVM_FEATURE_HC_PAGE_ENC_STATUS     16          guest checks this feature bit before
> > +                                               using the page encryption state
> > +                                               hypercall to notify the page state
> > +                                               change
> 
> ...
> 
> >  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long nr, a0, a1, a2, a3, ret;
> > @@ -8334,6 +8346,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >  		kvm_sched_yield(vcpu, a0);
> >  		ret = 0;
> >  		break;
> > +	case KVM_HC_PAGE_ENC_STATUS: {
> > +		u64 gpa = a0, npages = a1, enc = a2;
> > +
> > +		ret = -KVM_ENOSYS;
> > +		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> 
> I don't follow, why does the hypercall need to be gated by a capability?  What
> would break if this were changed to?
> 
> 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> 

But, the above indicates host support for page_enc_status_hc, so we want
to ensure that host supports and has enabled support for the hypercall
exit, i.e., hypercall has been enabled.

Thanks,
Ashish

> > +			break;
> > +
> > +		if (!PAGE_ALIGNED(gpa) || !npages ||
> > +		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> > +
> > +		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> > +		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
> > +		vcpu->run->hypercall.args[0]  = gpa;
> > +		vcpu->run->hypercall.args[1]  = npages;
> > +		vcpu->run->hypercall.args[2]  = enc;
> > +		vcpu->run->hypercall.longmode = op_64_bit;
> > +		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> > +		return 0;
> > +	}
> >  	default:
> >  		ret = -KVM_ENOSYS;
> >  		break;
> 
> ...
> 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 590cc811c99a..d696a9f13e33 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		vcpu->arch.msr_kvm_poll_control = data;
> >  		break;
> >  
> > +	case MSR_KVM_MIGRATION_CONTROL:
> > +		if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE)
> > +			return 1;
> > +
> > +		if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> 
> Why let the guest write '0'?  Letting the guest do WRMSR but not RDMSR is
> bizarre.
> 
> > +			return 1;
> > +		break;
> > +
> >  	case MSR_IA32_MCG_CTL:
> >  	case MSR_IA32_MCG_STATUS:
> >  	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> > @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >  		if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
> >  			return 1;
> >  
> > +		msr_info->data = 0;
> > +		break;
> > +	case MSR_KVM_MIGRATION_CONTROL:
> > +		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> > +			return 1;
> > +
> >  		msr_info->data = 0;
> >  		break;
> >  	case MSR_KVM_STEAL_TIME:
> > -- 
> > 2.26.2
> >

Paolo Bonzini April 20, 2021, 6:27 p.m. UTC | #4

On 20/04/21 17:15, Sean Christopherson wrote:
> On Tue, Apr 20, 2021, Paolo Bonzini wrote:
>> Do not return the SEV-ES bit from KVM_GET_SUPPORTED_CPUID unless
>> the corresponding module parameter is 1, and clear the memory encryption
>> leaf completely if SEV is disabled.
> 
> Impeccable timing, I was planning on refreshing my SEV cleanup series[*] today.
> There's going to be an annoying conflict with the svm_set_cpu_caps() change
> (see below), any objecting to folding your unintentional feedback into my series?

That's fine of course.

>> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
>> index 888e88b42e8d..e873a60a4830 100644
>> --- a/arch/x86/kvm/cpuid.h
>> +++ b/arch/x86/kvm/cpuid.h
>> @@ -99,6 +99,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
>>   	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>>   	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
>>   	[CPUID_12_EAX]        = {0x00000012, 0, CPUID_EAX},
>> +	[CPUID_8000_001F_EAX] = {0x8000001F, 0, CPUID_EAX},
>>   };
>>   
>>   /*
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index cd8c333ed2dc..acdb8457289e 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -923,6 +923,13 @@ static __init void svm_set_cpu_caps(void)
>>   	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>>   	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
>>   		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
>> +
>> +	/* CPUID 0x8000001F */
>> +	if (sev) {
>> +		kvm_cpu_cap_set(X86_FEATURE_SEV);
>> +		if (sev_es)
>> +			kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
> 
> Gah, I completely spaced on the module params in my series, which is more
> problematic than normal because it also moves "sev" and "sev_es" to sev.c.  The
> easy solution is to add sev_set_cpu_caps().

Sounds good.

Paolo

Paolo Bonzini April 20, 2021, 6:39 p.m. UTC | #5

On 20/04/21 19:31, Sean Christopherson wrote:
>> +	case KVM_HC_PAGE_ENC_STATUS: {
>> +		u64 gpa = a0, npages = a1, enc = a2;
>> +
>> +		ret = -KVM_ENOSYS;
>> +		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> 
> I don't follow, why does the hypercall need to be gated by a capability?  What
> would break if this were changed to?
> 
> 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send 
it unmodified to KVM_SET_CPUID2.  For this reason, features that are 
conditional on other ioctls, or that require some kind of userspace 
support, must not be in KVM_GET_SUPPORTED_CPUID.  For example:

- TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP 
(or after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT)

- MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS

X2APIC is reported even though it shouldn't be.  Too late to fix that, I 
think.

In this particular case, if userspace sets the bit in CPUID2 but doesn't 
handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of 
assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.

(I should document that, Jim asked for documentation around 
KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST many times).

Paolo

>> +			break;
>> +
>> +		if (!PAGE_ALIGNED(gpa) || !npages ||
>> +		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
>> +			ret = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
>> +		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
>> +		vcpu->run->hypercall.args[0]  = gpa;
>> +		vcpu->run->hypercall.args[1]  = npages;
>> +		vcpu->run->hypercall.args[2]  = enc;
>> +		vcpu->run->hypercall.longmode = op_64_bit;
>> +		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
>> +		return 0;
>> +	}
>>   	default:
>>   		ret = -KVM_ENOSYS;
>>   		break;
> 
> ...
> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 590cc811c99a..d696a9f13e33 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		vcpu->arch.msr_kvm_poll_control = data;
>>   		break;
>>   
>> +	case MSR_KVM_MIGRATION_CONTROL:
>> +		if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE)
>> +			return 1;
>> +
>> +		if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> 
> Why let the guest write '0'?  Letting the guest do WRMSR but not RDMSR is
> bizarre.

Because it was the simplest way to write the code, but returning 0 
unconditionally from RDMSR is actually simpler.

Paolo

>> +			return 1;
>> +		break;
>> +
>>   	case MSR_IA32_MCG_CTL:
>>   	case MSR_IA32_MCG_STATUS:
>>   	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
>> @@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
>>   			return 1;
>>   
>> +		msr_info->data = 0;
>> +		break;
>> +	case MSR_KVM_MIGRATION_CONTROL:
>> +		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
>> +			return 1;
>> +
>>   		msr_info->data = 0;
>>   		break;
>>   	case MSR_KVM_STEAL_TIME:
>> -- 
>> 2.26.2
>>
>

Sean Christopherson April 20, 2021, 6:40 p.m. UTC | #6

On Tue, Apr 20, 2021, Ashish Kalra wrote:
> On Tue, Apr 20, 2021 at 05:31:07PM +0000, Sean Christopherson wrote:
> > On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > > +	case KVM_HC_PAGE_ENC_STATUS: {
> > > +		u64 gpa = a0, npages = a1, enc = a2;
> > > +
> > > +		ret = -KVM_ENOSYS;
> > > +		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> > 
> > I don't follow, why does the hypercall need to be gated by a capability?  What
> > would break if this were changed to?
> > 
> > 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> > 
> 
> But, the above indicates host support for page_enc_status_hc, so we want
> to ensure that host supports and has enabled support for the hypercall
> exit, i.e., hypercall has been enabled.

I still don't see how parroting back KVM_GET_SUPPORTED_CPUID, i.e. "unintentionally"
setting KVM_FEATURE_HC_PAGE_ENC_STATUS, would break anything.  Sure, the guest
does unnecessary hypercalls, but they're eaten by KVM.  On the flip side, gating
the hypercall on the capability, and especially only the capability, creates
weird scenarios where the guest can observe KVM_FEATURE_HC_PAGE_ENC_STATUS=1
but fail the hypercall.  Those would be fairly clearcut VMM bugs, but at the
same time KVM is essentially going out of its way to manufacture the problem.

Sean Christopherson April 20, 2021, 6:58 p.m. UTC | #7

On Tue, Apr 20, 2021, Sean Christopherson wrote:
> On Tue, Apr 20, 2021, Ashish Kalra wrote:
> > On Tue, Apr 20, 2021 at 05:31:07PM +0000, Sean Christopherson wrote:
> > > On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > > > +	case KVM_HC_PAGE_ENC_STATUS: {
> > > > +		u64 gpa = a0, npages = a1, enc = a2;
> > > > +
> > > > +		ret = -KVM_ENOSYS;
> > > > +		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> > > 
> > > I don't follow, why does the hypercall need to be gated by a capability?  What
> > > would break if this were changed to?
> > > 
> > > 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> > > 
> > 
> > But, the above indicates host support for page_enc_status_hc, so we want
> > to ensure that host supports and has enabled support for the hypercall
> > exit, i.e., hypercall has been enabled.
> 
> I still don't see how parroting back KVM_GET_SUPPORTED_CPUID, i.e. "unintentionally"
> setting KVM_FEATURE_HC_PAGE_ENC_STATUS, would break anything.  Sure, the guest
> does unnecessary hypercalls, but they're eaten by KVM.  On the flip side, gating
> the hypercall on the capability, and especially only the capability, creates
> weird scenarios where the guest can observe KVM_FEATURE_HC_PAGE_ENC_STATUS=1
> but fail the hypercall.  Those would be fairly clearcut VMM bugs, but at the
> same time KVM is essentially going out of its way to manufacture the problem.

Doh, I was thinking of the MSR behavior, not the hypercall.  I'll respond to
Paolo's mail, I have one more hail mary idea.

Sean Christopherson April 20, 2021, 7:02 p.m. UTC | #8

On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> On 20/04/21 19:31, Sean Christopherson wrote:
> > > +	case KVM_HC_PAGE_ENC_STATUS: {
> > > +		u64 gpa = a0, npages = a1, enc = a2;
> > > +
> > > +		ret = -KVM_ENOSYS;
> > > +		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> > 
> > I don't follow, why does the hypercall need to be gated by a capability?  What
> > would break if this were changed to?
> > 
> > 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> 
> The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it
> unmodified to KVM_SET_CPUID2.  For this reason, features that are
> conditional on other ioctls, or that require some kind of userspace support,
> must not be in KVM_GET_SUPPORTED_CPUID.  For example:
> 
> - TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or
> after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT)
> 
> - MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS
> 
> X2APIC is reported even though it shouldn't be.  Too late to fix that, I
> think.
> 
> In this particular case, if userspace sets the bit in CPUID2 but doesn't
> handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of
> assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.

Gah, I was thinking of the MSR behavior and forgot that the hypercall exiting
behavior intentionally doesn't require extra filtering.

It's also worth noting that guest_pv_has() is particularly useless since it
will unconditionally return true for older VMMs that dont' enable
KVM_CAP_ENFORCE_PV_FEATURE_CPUID.

Bummer.

Sean Christopherson April 20, 2021, 8:16 p.m. UTC | #9

On Tue, Apr 20, 2021, Sean Christopherson wrote:
> On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > On 20/04/21 19:31, Sean Christopherson wrote:
> > > > +	case KVM_HC_PAGE_ENC_STATUS: {
> > > > +		u64 gpa = a0, npages = a1, enc = a2;
> > > > +
> > > > +		ret = -KVM_ENOSYS;
> > > > +		if (!vcpu->kvm->arch.hypercall_exit_enabled)
> > > 
> > > I don't follow, why does the hypercall need to be gated by a capability?  What
> > > would break if this were changed to?
> > > 
> > > 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> > 
> > The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it
> > unmodified to KVM_SET_CPUID2.  For this reason, features that are
> > conditional on other ioctls, or that require some kind of userspace support,
> > must not be in KVM_GET_SUPPORTED_CPUID.  For example:
> > 
> > - TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or
> > after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT)
> > 
> > - MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS
> > 
> > X2APIC is reported even though it shouldn't be.  Too late to fix that, I
> > think.
> > 
> > In this particular case, if userspace sets the bit in CPUID2 but doesn't
> > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of
> > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.
> 
> Gah, I was thinking of the MSR behavior and forgot that the hypercall exiting
> behavior intentionally doesn't require extra filtering.
> 
> It's also worth noting that guest_pv_has() is particularly useless since it
> will unconditionally return true for older VMMs that dont' enable
> KVM_CAP_ENFORCE_PV_FEATURE_CPUID.
> 
> Bummer.

Oh!  Almost forgot my hail mary idea.  Instead of a new capability, can we
reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID?

			if (vcpu->arch.pv_cpuid.enforce &&
			    !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)
				break;

Paolo Bonzini April 20, 2021, 8:58 p.m. UTC | #10

On 20/04/21 22:16, Sean Christopherson wrote:
> On Tue, Apr 20, 2021, Sean Christopherson wrote:
>> On Tue, Apr 20, 2021, Paolo Bonzini wrote:
>>> In this particular case, if userspace sets the bit in CPUID2 but doesn't
>>> handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of
>>> assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.
>
> Oh!  Almost forgot my hail mary idea.  Instead of a new capability, can we
> reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID?
> 
> 			if (vcpu->arch.pv_cpuid.enforce &&
> 			    !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)
> 				break;

Couldn't userspace enable that capability and _still_ copy the supported 
CPUID blindly to the guest CPUID, without supporting the hypercall?

(BTW, it's better to return a bitmask of hypercalls that will exit to 
userspace from KVM_CHECK_EXTENSION.  Userspace can still reject with 
-ENOSYS those that it doesn't know, but it's important that it knows in 
general how to handle KVM_EXIT_HYPERCALL).

Paolo

Sean Christopherson April 20, 2021, 10:49 p.m. UTC | #11

On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> On 20/04/21 22:16, Sean Christopherson wrote:
> > On Tue, Apr 20, 2021, Sean Christopherson wrote:
> > > On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > > > In this particular case, if userspace sets the bit in CPUID2 but doesn't
> > > > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of
> > > > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.
> > 
> > Oh!  Almost forgot my hail mary idea.  Instead of a new capability, can we
> > reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID?
> > 
> > 			if (vcpu->arch.pv_cpuid.enforce &&
> > 			    !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)
> > 				break;
> 
> Couldn't userspace enable that capability and _still_ copy the supported
> CPUID blindly to the guest CPUID, without supporting the hypercall?

Yes.  I was going to argue that we get to define the behavior, but that's not
true because it would break existing VMMs that blindly copy.  Capability it is...

Sean Christopherson April 20, 2021, 11:20 p.m. UTC | #12

On Tue, Apr 20, 2021, Sean Christopherson wrote:
> On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > On 20/04/21 22:16, Sean Christopherson wrote:
> > > On Tue, Apr 20, 2021, Sean Christopherson wrote:
> > > > On Tue, Apr 20, 2021, Paolo Bonzini wrote:
> > > > > In this particular case, if userspace sets the bit in CPUID2 but doesn't
> > > > > handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of
> > > > > assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.
> > > 
> > > Oh!  Almost forgot my hail mary idea.  Instead of a new capability, can we
> > > reject the hypercall if userspace has _not_ set KVM_CAP_ENFORCE_PV_FEATURE_CPUID?
> > > 
> > > 			if (vcpu->arch.pv_cpuid.enforce &&
> > > 			    !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS)
> > > 				break;
> > 
> > Couldn't userspace enable that capability and _still_ copy the supported
> > CPUID blindly to the guest CPUID, without supporting the hypercall?
> 
> Yes.  I was going to argue that we get to define the behavior, but that's not
> true because it would break existing VMMs that blindly copy.  Capability it is...

Hrm, that won't quite work though.  If userspace blindly copies CPUID, but doesn't
enable the capability, the guest will think the hypercall is supported.  The
guest hopefully won't freak out too much on the resulting -KVM_ENOSYS, but it
does make the CPUID flag rather useless.

We can make it work with:

		u64 gpa = a0, npages = a1, enc = a2;

		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
			break;

		if (!PAGE_ALIGNED(gpa) || !npages ||
		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
			ret = -EINVAL;
			break;
		}

		if (!vcpu->kvm->arch.hypercall_exit_enabled) {
			ret = 0;
			break;
		}

		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
		vcpu->run->hypercall.args[0]  = gpa;
		vcpu->run->hypercall.args[1]  = npages;
		vcpu->run->hypercall.args[2]  = enc;
		vcpu->run->hypercall.longmode = op_64_bit;
		vcpu->arch.complete_userspace_io = complete_hypercall_exit;

That's dancing pretty close to hypercall filtering, which I was hoping to avoid.
I guess it's not reaaaally filtering since the exit check happens after the
validity checks.

> > > (BTW, it's better to return a bitmask of hypercalls that will exit to
> > > userspace from KVM_CHECK_EXTENSION.  Userspace can still reject with -ENOSYS
> > > those that it doesn't know, but it's important that it knows in general how
> > > to handle KVM_EXIT_HYPERCALL).

Speaking of bitmasks, what about also accepting a bitmask for enabling the
capability?  (not sure if the above implies that).  E.g.

		if (!(vcpu->kvm->arch.hypercall_exit_enabled & BIT_ULL(nr))) {
			ret = 0;
			break;
		}

Paolo Bonzini April 21, 2021, 12:03 p.m. UTC | #13

On 21/04/21 01:20, Sean Christopherson wrote:
> If userspace blindly copies CPUID, but doesn't
> enable the capability, the guest will think the hypercall is supported.  The
> guest hopefully won't freak out too much on the resulting -KVM_ENOSYS, but it
> does make the CPUID flag rather useless.

Yes that's why the CPUID bit must *not* be in KVM_GET_SUPPORTED_CPUID.

> The
> guest hopefully won't freak out too much on the resulting -KVM_ENOSYS, but it
> does make the CPUID flag rather useless.
> 
> We can make it work with:
> 
> 		u64 gpa = a0, npages = a1, enc = a2;
> 
> 		if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
> 			break;
> 
> 		if (!PAGE_ALIGNED(gpa) || !npages ||
> 		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
> 			ret = -EINVAL;
> 			break;
> 		}
> 
> 		if (!vcpu->kvm->arch.hypercall_exit_enabled) {
> 			ret = 0;
> 			break;
> 		}
> 
> 		[...]

The interaction with KVM_CAP_ENFORCE_PV_FEATURE_CPUID scares me.  But 
I'll take it into account when posting v2.

>>>> (BTW, it's better to return a bitmask of hypercalls that will exit to
>>>> userspace from KVM_CHECK_EXTENSION.  Userspace can still reject with -ENOSYS
>>>> those that it doesn't know, but it's important that it knows in general how
>>>> to handle KVM_EXIT_HYPERCALL).
>
> Speaking of bitmasks, what about also accepting a bitmask for enabling the
> capability?  (not sure if the above implies that).  E.g.

Yes, makes sense.

Paolo

[0/3] KVM: x86: guest interface for SEV live migration

Commit Message

Comments

Patch