diff mbox series

[1/4,v3] KVM: nSVM: Do not advance RIP following VMRUN completion if the latter is single-stepped

Message ID 20210223191958.24218-2-krish.sadhukhan@oracle.com (mailing list archive)
State New, archived
Headers show
Series nSVM: Test host RFLAGS.TF on VMRUN | expand

Commit Message

Krish Sadhukhan Feb. 23, 2021, 7:19 p.m. UTC
Currently, svm_vcpu_run() advances the RIP following VMRUN completion when
control returns to host. This works fine if there is no trap flag set
on the VMRUN instruction i.e., if VMRUN is not single-stepped. But if
VMRUN is single-stepped, this advancement of the RIP leads to an incorrect
RIP in the #DB handler invoked for the single-step trap. Therefore, check
if the VMRUN instruction is single-stepped and if so, do not advance the RIP
when the #DB intercept #VMEXIT happens.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com>
---
 arch/x86/kvm/svm/svm.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Feb. 23, 2021, 10:42 p.m. UTC | #1
On Tue, Feb 23, 2021, Krish Sadhukhan wrote:
> Currently, svm_vcpu_run() advances the RIP following VMRUN completion when
> control returns to host. This works fine if there is no trap flag set
> on the VMRUN instruction i.e., if VMRUN is not single-stepped. But if
> VMRUN is single-stepped, this advancement of the RIP leads to an incorrect
> RIP in the #DB handler invoked for the single-step trap. Therefore, check
> if the VMRUN instruction is single-stepped and if so, do not advance the RIP
> when the #DB intercept #VMEXIT happens.

This really needs to clarify which VMRUN, i.e. L0 vs. L1.  AFAICT, you're
talking about both at separate times.  Is this an issue with L1 single-stepping
its VMRUN, L0 single-stepping its VMRUN, L0 single-stepping L1's VMRUN, ???
 
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com>
> ---
>  arch/x86/kvm/svm/svm.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3442d44ca53b..427d32213f51 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3740,6 +3740,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  	instrumentation_end();
>  }
>  
> +static bool single_step_vmrun = false;

Sharing a global flag amongst all vCPUs isn't going to fare well...

> +
>  static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -3800,6 +3802,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	svm_vcpu_enter_exit(vcpu, svm);
>  
> +	if (svm->vmcb->control.exit_code == SVM_EXIT_VMRUN &&
> +	    (svm->vmcb->save.rflags & X86_EFLAGS_TF))
> +                single_step_vmrun = true;
> +
>  	/*
>  	 * We do not use IBRS in the kernel. If this vCPU has used the
>  	 * SPEC_CTRL MSR it may have left it on; save the value and
> @@ -3827,7 +3833,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>  		vcpu->arch.cr2 = svm->vmcb->save.cr2;
>  		vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
>  		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
> -		vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
> +		if (single_step_vmrun && svm->vmcb->control.exit_code ==
> +		    SVM_EXIT_EXCP_BASE + DB_VECTOR)
> +			single_step_vmrun = false;

Even if you fix the global flag issue, this can't possibly work if userspace
changes state, if VMRUN fails and leaves a timebomb, and probably any number of
other conditions.

> +		else
> +			vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
>  	}
>  
>  	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
> -- 
> 2.27.0
>
Krish Sadhukhan Feb. 24, 2021, 9:18 p.m. UTC | #2
On 2/23/21 2:42 PM, Sean Christopherson wrote:
> On Tue, Feb 23, 2021, Krish Sadhukhan wrote:
>> Currently, svm_vcpu_run() advances the RIP following VMRUN completion when
>> control returns to host. This works fine if there is no trap flag set
>> on the VMRUN instruction i.e., if VMRUN is not single-stepped. But if
>> VMRUN is single-stepped, this advancement of the RIP leads to an incorrect
>> RIP in the #DB handler invoked for the single-step trap. Therefore, check
>> if the VMRUN instruction is single-stepped and if so, do not advance the RIP
>> when the #DB intercept #VMEXIT happens.
> This really needs to clarify which VMRUN, i.e. L0 vs. L1.  AFAICT, you're
> talking about both at separate times.  Is this an issue with L1 single-stepping
> its VMRUN, L0 single-stepping its VMRUN, L0 single-stepping L1's VMRUN, ???


The issue is seen when L1 single-steps its own VMRUN. I will fix the 
wording.

I don't know if there's a way to test L0 single-stepping its own or L1's 
VMRUN.

>   
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com>
>> ---
>>   arch/x86/kvm/svm/svm.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 3442d44ca53b..427d32213f51 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3740,6 +3740,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>>   	instrumentation_end();
>>   }
>>   
>> +static bool single_step_vmrun = false;
> Sharing a global flag amongst all vCPUs isn't going to fare well...


I will fix this.

>
>> +
>>   static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>   {
>>   	struct vcpu_svm *svm = to_svm(vcpu);
>> @@ -3800,6 +3802,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>   
>>   	svm_vcpu_enter_exit(vcpu, svm);
>>   
>> +	if (svm->vmcb->control.exit_code == SVM_EXIT_VMRUN &&
>> +	    (svm->vmcb->save.rflags & X86_EFLAGS_TF))
>> +                single_step_vmrun = true;
>> +
>>   	/*
>>   	 * We do not use IBRS in the kernel. If this vCPU has used the
>>   	 * SPEC_CTRL MSR it may have left it on; save the value and
>> @@ -3827,7 +3833,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>   		vcpu->arch.cr2 = svm->vmcb->save.cr2;
>>   		vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
>>   		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
>> -		vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
>> +		if (single_step_vmrun && svm->vmcb->control.exit_code ==
>> +		    SVM_EXIT_EXCP_BASE + DB_VECTOR)
>> +			single_step_vmrun = false;
> Even if you fix the global flag issue, this can't possibly work if userspace
> changes state, if VMRUN fails and leaves a timebomb, and probably any number of
> other conditions.


  Are you saying that I need to adjust the RIP in cases where VMRUN fails ?

>> +		else
>> +			vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
>>   	}
>>   
>>   	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
>> -- 
>> 2.27.0
>>
Sean Christopherson Feb. 24, 2021, 9:59 p.m. UTC | #3
On Wed, Feb 24, 2021, Krish Sadhukhan wrote:
> 
> On 2/23/21 2:42 PM, Sean Christopherson wrote:
> > On Tue, Feb 23, 2021, Krish Sadhukhan wrote:
> > > Currently, svm_vcpu_run() advances the RIP following VMRUN completion when
> > > control returns to host. This works fine if there is no trap flag set
> > > on the VMRUN instruction i.e., if VMRUN is not single-stepped. But if
> > > VMRUN is single-stepped, this advancement of the RIP leads to an incorrect
> > > RIP in the #DB handler invoked for the single-step trap. Therefore, check

Whose #DB handler?  L1's?

> > > if the VMRUN instruction is single-stepped and if so, do not advance the RIP
> > > when the #DB intercept #VMEXIT happens.
> > This really needs to clarify which VMRUN, i.e. L0 vs. L1.  AFAICT, you're
> > talking about both at separate times.  Is this an issue with L1 single-stepping
> > its VMRUN, L0 single-stepping its VMRUN, L0 single-stepping L1's VMRUN, ???
> 
> 
> The issue is seen when L1 single-steps its own VMRUN. I will fix the
> wording.

...

> > > @@ -3827,7 +3833,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
> > >   		vcpu->arch.cr2 = svm->vmcb->save.cr2;
> > >   		vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
> > >   		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
> > > -		vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
> > > +		if (single_step_vmrun && svm->vmcb->control.exit_code ==
> > > +		    SVM_EXIT_EXCP_BASE + DB_VECTOR)
> > > +			single_step_vmrun = false;
> > Even if you fix the global flag issue, this can't possibly work if userspace
> > changes state, if VMRUN fails and leaves a timebomb, and probably any number of
> > other conditions.
> 
> 
>  Are you saying that I need to adjust the RIP in cases where VMRUN fails ?

If VMRUN fails, the #DB exit will never occur and single_step_vmrun will be left
set.  Ditto if a higher priority exit occurs, though I'm not sure that can cause
problems in practice.  Anyways, my point is that setting a flag that must be
consumed on an exact instruction is almost always fragile, there are just too
many corner cases that pop up.

Can you elaborate more on who/what incorrectly advances RIP?  The changelog says
"svm_vcpu_run() advances the RIP", but it's not advancing anything it's just
grabbing RIP from the VMCB, which IIUC is L2's RIP.
Krish Sadhukhan March 22, 2021, 7 p.m. UTC | #4
On 2/24/21 1:59 PM, Sean Christopherson wrote:
> On Wed, Feb 24, 2021, Krish Sadhukhan wrote:
>> On 2/23/21 2:42 PM, Sean Christopherson wrote:
>>> On Tue, Feb 23, 2021, Krish Sadhukhan wrote:
>>>> Currently, svm_vcpu_run() advances the RIP following VMRUN completion when
>>>> control returns to host. This works fine if there is no trap flag set
>>>> on the VMRUN instruction i.e., if VMRUN is not single-stepped. But if
>>>> VMRUN is single-stepped, this advancement of the RIP leads to an incorrect
>>>> RIP in the #DB handler invoked for the single-step trap. Therefore, check
> Whose #DB handler?  L1's?


Yes

>>>> if the VMRUN instruction is single-stepped and if so, do not advance the RIP
>>>> when the #DB intercept #VMEXIT happens.
>>> This really needs to clarify which VMRUN, i.e. L0 vs. L1.  AFAICT, you're
>>> talking about both at separate times.  Is this an issue with L1 single-stepping
>>> its VMRUN, L0 single-stepping its VMRUN, L0 single-stepping L1's VMRUN, ???
>> The issue is seen when L1 single-steps its own VMRUN. I will fix the
>> wording.
> ...
>
>>>> @@ -3827,7 +3833,11 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>>>    		vcpu->arch.cr2 = svm->vmcb->save.cr2;
>>>>    		vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
>>>>    		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
>>>> -		vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
>>>> +		if (single_step_vmrun && svm->vmcb->control.exit_code ==
>>>> +		    SVM_EXIT_EXCP_BASE + DB_VECTOR)
>>>> +			single_step_vmrun = false;
>>> Even if you fix the global flag issue, this can't possibly work if userspace
>>> changes state, if VMRUN fails and leaves a timebomb, and probably any number of
>>> other conditions.
>>   Are you saying that I need to adjust the RIP in cases where VMRUN fails ?
> If VMRUN fails, the #DB exit will never occur and single_step_vmrun will be left
> set.  Ditto if a higher priority exit occurs, though I'm not sure that can cause
> problems in practice.  Anyways, my point is that setting a flag that must be
> consumed on an exact instruction is almost always fragile, there are just too
> many corner cases that pop up.
>
> Can you elaborate more on who/what incorrectly advances RIP?


The RIP advances because KVM is not taking action for the pending 
RFLAGS.TF when it executes L1 for the first time after L2 exits. #DB 
intercept is triggered only after the instruction next to VMRUN is 
executed in svm_vcpu_run and hence the #DB handler for L1 shows the 
wrong RIP.

I have just sent out v4 which fixes the problem in a better way.

>   The changelog says
> "svm_vcpu_run() advances the RIP", but it's not advancing anything it's just
> grabbing RIP from the VMCB, which IIUC is L2's RIP.
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3442d44ca53b..427d32213f51 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3740,6 +3740,8 @@  static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	instrumentation_end();
 }
 
+static bool single_step_vmrun = false;
+
 static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3800,6 +3802,10 @@  static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	svm_vcpu_enter_exit(vcpu, svm);
 
+	if (svm->vmcb->control.exit_code == SVM_EXIT_VMRUN &&
+	    (svm->vmcb->save.rflags & X86_EFLAGS_TF))
+                single_step_vmrun = true;
+
 	/*
 	 * We do not use IBRS in the kernel. If this vCPU has used the
 	 * SPEC_CTRL MSR it may have left it on; save the value and
@@ -3827,7 +3833,11 @@  static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 		vcpu->arch.cr2 = svm->vmcb->save.cr2;
 		vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax;
 		vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
-		vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
+		if (single_step_vmrun && svm->vmcb->control.exit_code ==
+		    SVM_EXIT_EXCP_BASE + DB_VECTOR)
+			single_step_vmrun = false;
+		else
+			vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
 	}
 
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))