diff mbox series

[03/15] KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest

Message ID 20210504171734.1434054-4-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: RDPID/RDTSCP fixes and uret MSR cleanups | expand

Commit Message

Sean Christopherson May 4, 2021, 5:17 p.m. UTC
Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.

Note, SVM does not support intercepting RDPID.  Unlike VMX's
ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
is a benign virtualization hole as the host kernel (incorrectly) sets
MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
the host's MSR_TSC_AUX to the guest.

But, when the kernel bug is fixed, KVM will start leaking the host's
MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
for whatever reason.  This leak will be remedied in a future commit.

Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

Comments

Jim Mattson May 4, 2021, 9:45 p.m. UTC | #1
On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
>
> Note, SVM does not support intercepting RDPID.  Unlike VMX's
> ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> is a benign virtualization hole as the host kernel (incorrectly) sets
> MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> the host's MSR_TSC_AUX to the guest.
>
> But, when the kernel bug is fixed, KVM will start leaking the host's
> MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> for whatever reason.  This leak will be remedied in a future commit.
>
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
...
> @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
>                              guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
>
> -       /* Check again if INVPCID interception if required */
> -       svm_check_invpcid(svm);
> +       svm_recalc_instruction_intercepts(vcpu, svm);

Does the right thing happen here if the vCPU is in guest mode when
userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
off?
Sean Christopherson May 4, 2021, 9:53 p.m. UTC | #2
On Tue, May 04, 2021, Jim Mattson wrote:
> On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> >
> > Note, SVM does not support intercepting RDPID.  Unlike VMX's
> > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> > is a benign virtualization hole as the host kernel (incorrectly) sets
> > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > the host's MSR_TSC_AUX to the guest.
> >
> > But, when the kernel bug is fixed, KVM will start leaking the host's
> > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > for whatever reason.  This leak will be remedied in a future commit.
> >
> > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> ...
> > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> >                              guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> >
> > -       /* Check again if INVPCID interception if required */
> > -       svm_check_invpcid(svm);
> > +       svm_recalc_instruction_intercepts(vcpu, svm);
> 
> Does the right thing happen here if the vCPU is in guest mode when
> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> off?

I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
you specifically mean running in L2?
Jim Mattson May 4, 2021, 9:56 p.m. UTC | #3
On Tue, May 4, 2021 at 2:53 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, May 04, 2021, Jim Mattson wrote:
> > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> > >
> > > Note, SVM does not support intercepting RDPID.  Unlike VMX's
> > > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> > > is a benign virtualization hole as the host kernel (incorrectly) sets
> > > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > > the host's MSR_TSC_AUX to the guest.
> > >
> > > But, when the kernel bug is fixed, KVM will start leaking the host's
> > > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > > for whatever reason.  This leak will be remedied in a future commit.
> > >
> > > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > ...
> > > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > >                              guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> > >
> > > -       /* Check again if INVPCID interception if required */
> > > -       svm_check_invpcid(svm);
> > > +       svm_recalc_instruction_intercepts(vcpu, svm);
> >
> > Does the right thing happen here if the vCPU is in guest mode when
> > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > off?
>
> I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> you specifically mean running in L2?

I mean is_guest_mode(vcpu) is true (i.e. running L2).
Paolo Bonzini May 4, 2021, 9:57 p.m. UTC | #4
On 04/05/21 23:53, Sean Christopherson wrote:
>> Does the right thing happen here if the vCPU is in guest mode when
>> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
>> off?
> I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> you specifically mean running in L2?
> 

Guest mode should mean L2.

(I wonder if we should have a capability that says "KVM_SET_CPUID2 can 
only be called prior to KVM_RUN").

Paolo
Jim Mattson May 4, 2021, 9:58 p.m. UTC | #5
On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 04/05/21 23:53, Sean Christopherson wrote:
> >> Does the right thing happen here if the vCPU is in guest mode when
> >> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> >> off?
> > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > you specifically mean running in L2?
> >
>
> Guest mode should mean L2.
>
> (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> only be called prior to KVM_RUN").

It would certainly make it easier to reason about potential security issues.
Sean Christopherson May 4, 2021, 10:10 p.m. UTC | #6
On Tue, May 04, 2021, Jim Mattson wrote:
> On Tue, May 4, 2021 at 2:53 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, May 04, 2021, Jim Mattson wrote:
> > > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> > > >
> > > > Note, SVM does not support intercepting RDPID.  Unlike VMX's
> > > > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> > > > is a benign virtualization hole as the host kernel (incorrectly) sets
> > > > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > > > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > > > the host's MSR_TSC_AUX to the guest.
> > > >
> > > > But, when the kernel bug is fixed, KVM will start leaking the host's
> > > > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > > > for whatever reason.  This leak will be remedied in a future commit.
> > > >
> > > > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > ...
> > > > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > >         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > > >                              guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> > > >
> > > > -       /* Check again if INVPCID interception if required */
> > > > -       svm_check_invpcid(svm);
> > > > +       svm_recalc_instruction_intercepts(vcpu, svm);
> > >
> > > Does the right thing happen here if the vCPU is in guest mode when
> > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > off?
> >
> > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > you specifically mean running in L2?
> 
> I mean is_guest_mode(vcpu) is true (i.e. running L2).

No, it will not do the right thing, whatever "right thing" even means in this
context.  That's a pre-existing issue, e.g. INVCPID handling is also wrong.
I highly doubt VMX does, or even can, do the right thing either.

I'm pretty sure I lobbied in the past to disallow KVM_SET_CPUID* if the vCPU is
in guest mode since it's impossible to do the right thing without forcing an
exit to L1, e.g. changing MAXPHYSADDR will allow running L2 with an illegal
CR3, ditto for various CR4 bits.
Jim Mattson May 4, 2021, 10:24 p.m. UTC | #7
On Tue, May 4, 2021 at 3:10 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, May 04, 2021, Jim Mattson wrote:
> > On Tue, May 4, 2021 at 2:53 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, May 04, 2021, Jim Mattson wrote:
> > > > On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <seanjc@google.com> wrote:
> > > > >
> > > > > Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> > > > >
> > > > > Note, SVM does not support intercepting RDPID.  Unlike VMX's
> > > > > ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> > > > > is a benign virtualization hole as the host kernel (incorrectly) sets
> > > > > MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> > > > > into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> > > > > the host's MSR_TSC_AUX to the guest.
> > > > >
> > > > > But, when the kernel bug is fixed, KVM will start leaking the host's
> > > > > MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> > > > > for whatever reason.  This leak will be remedied in a future commit.
> > > > >
> > > > > Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > > ---
> > > > ...
> > > > > @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > >         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> > > > >                              guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
> > > > >
> > > > > -       /* Check again if INVPCID interception if required */
> > > > > -       svm_check_invpcid(svm);
> > > > > +       svm_recalc_instruction_intercepts(vcpu, svm);
> > > >
> > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > off?
> > >
> > > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > > you specifically mean running in L2?
> >
> > I mean is_guest_mode(vcpu) is true (i.e. running L2).
>
> No, it will not do the right thing, whatever "right thing" even means in this
> context.  That's a pre-existing issue, e.g. INVCPID handling is also wrong.
> I highly doubt VMX does, or even can, do the right thing either.
>
> I'm pretty sure I lobbied in the past to disallow KVM_SET_CPUID* if the vCPU is
> in guest mode since it's impossible to do the right thing without forcing an
> exit to L1, e.g. changing MAXPHYSADDR will allow running L2 with an illegal
> CR3, ditto for various CR4 bits.

With that caveat understood,

Reviewed-by: Jim Mattson <jmattson@google.com>
Reiji Watanabe May 5, 2021, 4:26 a.m. UTC | #8
On Tue, May 4, 2021 at 10:17 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
>
> Note, SVM does not support intercepting RDPID.  Unlike VMX's
> ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> is a benign virtualization hole as the host kernel (incorrectly) sets
> MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> the host's MSR_TSC_AUX to the guest.
>
> But, when the kernel bug is fixed, KVM will start leaking the host's
> MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> for whatever reason.  This leak will be remedied in a future commit.
>
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Reiji Watanabe <reijiw@google.com>
Maxim Levitsky May 10, 2021, 8:08 a.m. UTC | #9
On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > off?
> > > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > > you specifically mean running in L2?
> > > 
> > 
> > Guest mode should mean L2.
> > 
> > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > only be called prior to KVM_RUN").
> 
> It would certainly make it easier to reason about potential security issues.
> 
I vote too for this.
Best regards,
	Maxim Levitsky
Maxim Levitsky May 10, 2021, 8:08 a.m. UTC | #10
On Tue, 2021-05-04 at 10:17 -0700, Sean Christopherson wrote:
> Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.
> 
> Note, SVM does not support intercepting RDPID.  Unlike VMX's
> ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
> is a benign virtualization hole as the host kernel (incorrectly) sets
> MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
> into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
> the host's MSR_TSC_AUX to the guest.
> 
> But, when the kernel bug is fixed, KVM will start leaking the host's
> MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
> for whatever reason.  This leak will be remedied in a future commit.
> 
> Fixes: 46896c73c1a4 ("KVM: svm: add support for RDTSCP")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/svm.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a7271f31df47..8f2b184270c0 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1100,7 +1100,9 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
>  	return svm->vmcb->control.tsc_offset;
>  }
>  
> -static void svm_check_invpcid(struct vcpu_svm *svm)
> +/* Evaluate instruction intercepts that depend on guest CPUID features. */
> +static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
> +					      struct vcpu_svm *svm)
>  {
>  	/*
>  	 * Intercept INVPCID if shadow paging is enabled to sync/free shadow
> @@ -1113,6 +1115,13 @@ static void svm_check_invpcid(struct vcpu_svm *svm)
>  		else
>  			svm_clr_intercept(svm, INTERCEPT_INVPCID);
>  	}
> +
> +	if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
> +		if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> +			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> +		else
> +			svm_set_intercept(svm, INTERCEPT_RDTSCP);
> +	}
>  }
>  
>  static void init_vmcb(struct kvm_vcpu *vcpu)
> @@ -1248,7 +1257,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  		svm_clr_intercept(svm, INTERCEPT_PAUSE);
>  	}
>  
> -	svm_check_invpcid(svm);
> +	svm_recalc_instruction_intercepts(vcpu, svm);
>  
>  	/*
>  	 * If the host supports V_SPEC_CTRL then disable the interception
> @@ -3084,6 +3093,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_STGI]				= stgi_interception,
>  	[SVM_EXIT_CLGI]				= clgi_interception,
>  	[SVM_EXIT_SKINIT]			= skinit_interception,
> +	[SVM_EXIT_RDTSCP]			= kvm_handle_invalid_op,
>  	[SVM_EXIT_WBINVD]                       = kvm_emulate_wbinvd,
>  	[SVM_EXIT_MONITOR]			= kvm_emulate_monitor,
>  	[SVM_EXIT_MWAIT]			= kvm_emulate_mwait,
> @@ -4007,8 +4017,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
>  			     guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
>  
> -	/* Check again if INVPCID interception if required */
> -	svm_check_invpcid(svm);
> +	svm_recalc_instruction_intercepts(vcpu, svm);
>  
>  	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
>  	if (sev_guest(vcpu->kvm)) {
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky
Sean Christopherson May 10, 2021, 4:56 p.m. UTC | #11
On Mon, May 10, 2021, Maxim Levitsky wrote:
> On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> > On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > > off?
> > > > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > > > you specifically mean running in L2?
> > > > 
> > > 
> > > Guest mode should mean L2.
> > > 
> > > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > > only be called prior to KVM_RUN").
> > 
> > It would certainly make it easier to reason about potential security issues.
> > 
> I vote too for this.

Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
pull RESET#, and defining that ioctl() to freeze the vCPU model?  I.e. after
userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
disallowed.

Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
EDX after vCPU creation to get the right value at RESET.  A dedicated ioctl()
would kill two birds with one stone, without having to add yet another "2"
ioctl().
Maxim Levitsky May 11, 2021, 12:34 p.m. UTC | #12
On Mon, 2021-05-10 at 16:56 +0000, Sean Christopherson wrote:
> On Mon, May 10, 2021, Maxim Levitsky wrote:
> > On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> > > On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > > > off?
> > > > > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > > > > you specifically mean running in L2?
> > > > > 
> > > > 
> > > > Guest mode should mean L2.
> > > > 
> > > > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > > > only be called prior to KVM_RUN").
> > > 
> > > It would certainly make it easier to reason about potential security issues.
> > > 
> > I vote too for this.
> 
> Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
> pull RESET#, and defining that ioctl() to freeze the vCPU model?  I.e. after
> userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
> disallowed.

I vote even stronger for this!

I recently created what Paulo called an 'evil' test to stress KVM related
bugs in nested migration by simulating a nested migration with a vCPU reset,
followed by reload of all of its state including nested state.

It is ugly since as you say I have to  manually load the reset state, and thus 
using 'KVM_VCPU_RESET' ioctl instead would make this test much more 
cleaner and even more 'evil'.

Best regards,
	Maxim Levitsky


> 
> Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
> EDX after vCPU creation to get the right value at RESET.  A dedicated ioctl()
> would kill two birds with one stone, without having to add yet another "2"
> ioctl().
>
Paolo Bonzini May 18, 2021, 10:59 a.m. UTC | #13
On 10/05/21 18:56, Sean Christopherson wrote:
> On Mon, May 10, 2021, Maxim Levitsky wrote:
>> On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
>>> On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> On 04/05/21 23:53, Sean Christopherson wrote:
>>>>>> Does the right thing happen here if the vCPU is in guest mode when
>>>>>> userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
>>>>>> off?
>>>>> I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
>>>>> you specifically mean running in L2?
>>>>>
>>>>
>>>> Guest mode should mean L2.
>>>>
>>>> (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
>>>> only be called prior to KVM_RUN").
>>>
>>> It would certainly make it easier to reason about potential security issues.
>>>
>> I vote too for this.
> 
> Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
> pull RESET#, and defining that ioctl() to freeze the vCPU model?  I.e. after
> userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
> disallowed.
> 
> Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
> EDX after vCPU creation to get the right value at RESET.  A dedicated ioctl()
> would kill two birds with one stone, without having to add yet another "2"
> ioctl().

That has a disadvantage of opting into the more secure behavior, but we 
can do both (forbidding KVM_SET_CPUID2 after both KVM_RUN and KVM_RESET).

Paolo
Sean Christopherson May 18, 2021, 7:24 p.m. UTC | #14
On Tue, May 18, 2021, Paolo Bonzini wrote:
> On 10/05/21 18:56, Sean Christopherson wrote:
> > On Mon, May 10, 2021, Maxim Levitsky wrote:
> > > On Tue, 2021-05-04 at 14:58 -0700, Jim Mattson wrote:
> > > > On Tue, May 4, 2021 at 2:57 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > > On 04/05/21 23:53, Sean Christopherson wrote:
> > > > > > > Does the right thing happen here if the vCPU is in guest mode when
> > > > > > > userspace decides to toggle the CPUID.80000001H:EDX.RDTSCP bit on or
> > > > > > > off?
> > > > > > I hate our terminology.  By "guest mode", do you mean running the vCPU, or do
> > > > > > you specifically mean running in L2?
> > > > > > 
> > > > > 
> > > > > Guest mode should mean L2.
> > > > > 
> > > > > (I wonder if we should have a capability that says "KVM_SET_CPUID2 can
> > > > > only be called prior to KVM_RUN").
> > > > 
> > > > It would certainly make it easier to reason about potential security issues.
> > > > 
> > > I vote too for this.
> > 
> > Alternatively, what about adding KVM_VCPU_RESET to let userspace explicitly
> > pull RESET#, and defining that ioctl() to freeze the vCPU model?  I.e. after
> > userspace resets the vCPU, KVM_SET_CPUID (and any other relevant ioctls() is
> > disallowed.
> > 
> > Lack of proper RESET emulation is annoying, e.g. userspace has to manually stuff
> > EDX after vCPU creation to get the right value at RESET.  A dedicated ioctl()
> > would kill two birds with one stone, without having to add yet another "2"
> > ioctl().
> 
> That has a disadvantage of opting into the more secure behavior, but we can
> do both (forbidding KVM_SET_CPUID2 after both KVM_RUN and KVM_RESET).

Doesn't changing KVM_SET_CPUID2 need to be opt-in as well, e.g. if the strict
behavior is activated via a capability?
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a7271f31df47..8f2b184270c0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1100,7 +1100,9 @@  static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	return svm->vmcb->control.tsc_offset;
 }
 
-static void svm_check_invpcid(struct vcpu_svm *svm)
+/* Evaluate instruction intercepts that depend on guest CPUID features. */
+static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
+					      struct vcpu_svm *svm)
 {
 	/*
 	 * Intercept INVPCID if shadow paging is enabled to sync/free shadow
@@ -1113,6 +1115,13 @@  static void svm_check_invpcid(struct vcpu_svm *svm)
 		else
 			svm_clr_intercept(svm, INTERCEPT_INVPCID);
 	}
+
+	if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
+		if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
+		else
+			svm_set_intercept(svm, INTERCEPT_RDTSCP);
+	}
 }
 
 static void init_vmcb(struct kvm_vcpu *vcpu)
@@ -1248,7 +1257,7 @@  static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_clr_intercept(svm, INTERCEPT_PAUSE);
 	}
 
-	svm_check_invpcid(svm);
+	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/*
 	 * If the host supports V_SPEC_CTRL then disable the interception
@@ -3084,6 +3093,7 @@  static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_STGI]				= stgi_interception,
 	[SVM_EXIT_CLGI]				= clgi_interception,
 	[SVM_EXIT_SKINIT]			= skinit_interception,
+	[SVM_EXIT_RDTSCP]			= kvm_handle_invalid_op,
 	[SVM_EXIT_WBINVD]                       = kvm_emulate_wbinvd,
 	[SVM_EXIT_MONITOR]			= kvm_emulate_monitor,
 	[SVM_EXIT_MWAIT]			= kvm_emulate_mwait,
@@ -4007,8 +4017,7 @@  static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
 			     guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
 
-	/* Check again if INVPCID interception if required */
-	svm_check_invpcid(svm);
+	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
 	if (sev_guest(vcpu->kvm)) {