diff mbox series

[v2,2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

Message ID 20210114205449.8715-3-mlevitsk@redhat.com (mailing list archive)
State New, archived
Headers show
Series VMX: more nested fixes | expand

Commit Message

Maxim Levitsky Jan. 14, 2021, 8:54 p.m. UTC
This is very helpful for debugging nested VMX issues.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/trace.h      | 30 ++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/nested.c |  6 ++++++
 arch/x86/kvm/x86.c        |  1 +
 3 files changed, 37 insertions(+)

Comments

Sean Christopherson Jan. 15, 2021, 12:14 a.m. UTC | #1
On Thu, Jan 14, 2021, Maxim Levitsky wrote:
> This is very helpful for debugging nested VMX issues.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/trace.h      | 30 ++++++++++++++++++++++++++++++
>  arch/x86/kvm/vmx/nested.c |  6 ++++++
>  arch/x86/kvm/x86.c        |  1 +
>  3 files changed, 37 insertions(+)
> 
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 2de30c20bc264..663d1b1d8bf64 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun,
>  		__entry->npt ? "on" : "off")
>  );
>  
> +
> +/*
> + * Tracepoint for nested VMLAUNCH/VMRESUME

VM-Enter, as below.

> + */
> +TRACE_EVENT(kvm_nested_vmlaunch_resume,

s/vmlaunc_resume/vmenter, both for consistency with other code and so that it
can sanely be reused by SVM.  IMO, trace_kvm_entry is wrong :-).

> +	    TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip,
> +		     __u32 entry_intr_info),
> +	    TP_ARGS(rip, vmcs, nested_rip, entry_intr_info),
> +
> +	TP_STRUCT__entry(
> +		__field(	__u64,		rip		)
> +		__field(	__u64,		vmcs		)
> +		__field(	__u64,		nested_rip	)
> +		__field(	__u32,		entry_intr_info	)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->rip			= rip;
> +		__entry->vmcs			= vmcs;
> +		__entry->nested_rip		= nested_rip;
> +		__entry->entry_intr_info	= entry_intr_info;
> +	),
> +
> +	TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx "
> +		  "entry_intr_info: 0x%08x",
> +		__entry->rip, __entry->vmcs, __entry->nested_rip,
> +		__entry->entry_intr_info)
> +);
> +
> +
>  TRACE_EVENT(kvm_nested_intercepts,
>  	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
>  		     __u32 intercept1, __u32 intercept2, __u32 intercept3),
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 776688f9d1017..cd51b66480d52 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>  		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
>  
> +	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),

Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
as is the case for the "true" nested VM-Enter path.

> +					 vmx->nested.current_vmptr,
> +					 vmcs12->guest_rip,
> +					 vmcs12->vm_entry_intr_info_field);

The placement is a bit funky.  I assume you put it here so that calls from
vmx_set_nested_state() also get traced.  But, that also means
vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
some nested VM-Enters that VM-Fail will get traced, but others will not.

Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
especially if the debugger looks up the RIP and sees RSM.  Ditto for the
migration case.

Not sure what would be a good answer.

> +
> +
>  	/*
>  	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
>  	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a480804ae27a3..7c6e94e32100e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
> -- 
> 2.26.2
>
Paolo Bonzini Jan. 15, 2021, 1:48 p.m. UTC | #2
On 15/01/21 01:14, Sean Christopherson wrote:
>> +	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
> as is the case for the "true" nested VM-Enter path.

It will be the previous RIP---might as well be 0xfffffff0 depending on 
what userspace does.  I don't think you can do much better than that, 
using vmcs12->host_rip would be confusing in the SMM case.

>> +					 vmx->nested.current_vmptr,
>> +					 vmcs12->guest_rip,
>> +					 vmcs12->vm_entry_intr_info_field);
> The placement is a bit funky.  I assume you put it here so that calls from
> vmx_set_nested_state() also get traced.  But, that also means
> vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
> some nested VM-Enters that VM-Fail will get traced, but others will not.
> 
> Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
> especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> migration case.

Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes 
sense so I'm not worried about that.

Paolo
Sean Christopherson Jan. 15, 2021, 4:30 p.m. UTC | #3
On Fri, Jan 15, 2021, Paolo Bonzini wrote:
> On 15/01/21 01:14, Sean Christopherson wrote:
> > > +	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> > Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
> > as is the case for the "true" nested VM-Enter path.
> 
> It will be the previous RIP---might as well be 0xfffffff0 depending on what
> userspace does.  I don't think you can do much better than that, using
> vmcs12->host_rip would be confusing in the SMM case.
> 
> > > +					 vmx->nested.current_vmptr,
> > > +					 vmcs12->guest_rip,
> > > +					 vmcs12->vm_entry_intr_info_field);
> > The placement is a bit funky.  I assume you put it here so that calls from
> > vmx_set_nested_state() also get traced.  But, that also means
> > vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
> > some nested VM-Enters that VM-Fail will get traced, but others will not.
> > 
> > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
> > especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> > migration case.
> 
> Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes
> sense so I'm not worried about that.

Ideally there would something in the tracepoint to differentiate the various
cases.  Not that the RSM/migration cases will pop up often, but I think it's an
easily solved problem that could avoid confusion.

What if we captured vmx->nested.smm.guest_mode and from_vmentry, and explicitly
record what triggered the entry?

	TP_printk("from: %s rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx intr_info: 0x%08x",
		  __entry->vmenter ? "VM-Enter" : __entry->smm ? "RSM" : "SET_STATE",
		  __entry->rip, __entry->vmcs, __entry->nested_rip,
		  __entry->entry_intr_info

Side topic, can we have an "official" ruling on whether KVM tracepoints should
use colons and/or commas? And probably same question for whether or not to
prepend zeros.  E.g. kvm_entry has "vcpu %u, rip 0x%lx" versus "rip: 0x%016llx
vmcs: 0x%016llx".  It bugs me that we're so inconsistent.
Maxim Levitsky Jan. 21, 2021, 4:58 p.m. UTC | #4
On Fri, 2021-01-15 at 14:48 +0100, Paolo Bonzini wrote:
> On 15/01/21 01:14, Sean Christopherson wrote:
> > > +	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> > Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
> > as is the case for the "true" nested VM-Enter path.

Actually in this case, the initial RIP of 0x000000000000fff0 will be printed
which isn't that bad.

A tracepoint in nested state load function would be very nice to add
to mark this explicitly. I'll do this later.

> 
> It will be the previous RIP---might as well be 0xfffffff0 depending on 
> what userspace does.  I don't think you can do much better than that, 
> using vmcs12->host_rip would be confusing in the SMM case.
> 
> > > +					 vmx->nested.current_vmptr,
> > > +					 vmcs12->guest_rip,
> > > +					 vmcs12->vm_entry_intr_info_field);
> > The placement is a bit funky.  I assume you put it here so that calls from
> > vmx_set_nested_state() also get traced.  But, that also means
> > vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
> > some nested VM-Enters that VM-Fail will get traced, but others will not.
> > 
> > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
> > especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> > migration case.
> 
> Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes 
> sense so I'm not worried about that.
> 
> Paolo
> 

I agree with that and indeed this was my intention.

In fact I will change the svm's tracepoint to behave the same way
in the next patch series (I'll move it to enter_svm_guest_mode).

(When I wrote this patch I somehow thought that this is what SVM already does).

Best regards,
	Maxim Levitsky
Maxim Levitsky Jan. 21, 2021, 5 p.m. UTC | #5
On Fri, 2021-01-15 at 08:30 -0800, Sean Christopherson wrote:
> On Fri, Jan 15, 2021, Paolo Bonzini wrote:
> > On 15/01/21 01:14, Sean Christopherson wrote:
> > > > +	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> > > Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
> > > as is the case for the "true" nested VM-Enter path.
> > 
> > It will be the previous RIP---might as well be 0xfffffff0 depending on what
> > userspace does.  I don't think you can do much better than that, using
> > vmcs12->host_rip would be confusing in the SMM case.
> > 
> > > > +					 vmx->nested.current_vmptr,
> > > > +					 vmcs12->guest_rip,
> > > > +					 vmcs12->vm_entry_intr_info_field);
> > > The placement is a bit funky.  I assume you put it here so that calls from
> > > vmx_set_nested_state() also get traced.  But, that also means
> > > vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
> > > some nested VM-Enters that VM-Fail will get traced, but others will not.
> > > 
> > > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
> > > especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> > > migration case.
> > 
> > Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes
> > sense so I'm not worried about that.
> 
> Ideally there would something in the tracepoint to differentiate the various
> cases.  Not that the RSM/migration cases will pop up often, but I think it's an
> easily solved problem that could avoid confusion.
> 
> What if we captured vmx->nested.smm.guest_mode and from_vmentry, and explicitly
> record what triggered the entry?
> 
> 	TP_printk("from: %s rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx intr_info: 0x%08x",
> 		  __entry->vmenter ? "VM-Enter" : __entry->smm ? "RSM" : "SET_STATE",
> 		  __entry->rip, __entry->vmcs, __entry->nested_rip,
> 		  __entry->entry_intr_info

I think that this is a good idea, but should be done in a separate patch.

> 
> Side topic, can we have an "official" ruling on whether KVM tracepoints should
> use colons and/or commas? And probably same question for whether or not to
> prepend zeros.  E.g. kvm_entry has "vcpu %u, rip 0x%lx" versus "rip: 0x%016llx
> vmcs: 0x%016llx".  It bugs me that we're so inconsistent.
> 

As I said the kvm tracing has a lot of things that can be imporoved, 
and as it is often the only way to figure out complex bugs as these I had to deal with recently,
I will do more improvements in this area as time permits.

Best regards,
	Maxim Levitsky
Maxim Levitsky Jan. 21, 2021, 5:02 p.m. UTC | #6
On Thu, 2021-01-14 at 16:14 -0800, Sean Christopherson wrote:
> On Thu, Jan 14, 2021, Maxim Levitsky wrote:
> > This is very helpful for debugging nested VMX issues.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/trace.h      | 30 ++++++++++++++++++++++++++++++
> >  arch/x86/kvm/vmx/nested.c |  6 ++++++
> >  arch/x86/kvm/x86.c        |  1 +
> >  3 files changed, 37 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> > index 2de30c20bc264..663d1b1d8bf64 100644
> > --- a/arch/x86/kvm/trace.h
> > +++ b/arch/x86/kvm/trace.h
> > @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun,
> >  		__entry->npt ? "on" : "off")
> >  );
> >  
> > +
> > +/*
> > + * Tracepoint for nested VMLAUNCH/VMRESUME
> 
> VM-Enter, as below.

Will do

> 
> > + */
> > +TRACE_EVENT(kvm_nested_vmlaunch_resume,
> 
> s/vmlaunc_resume/vmenter, both for consistency with other code and so that it
> can sanely be reused by SVM.  IMO, trace_kvm_entry is wrong :-).
SVM already has trace_kvm_nested_vmrun and it contains some SVM specific
stuff that doesn't exist on VMX and vise versa.
So I do want to keep these trace points separate.


> 
> > +	    TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip,
> > +		     __u32 entry_intr_info),
> > +	    TP_ARGS(rip, vmcs, nested_rip, entry_intr_info),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	__u64,		rip		)
> > +		__field(	__u64,		vmcs		)
> > +		__field(	__u64,		nested_rip	)
> > +		__field(	__u32,		entry_intr_info	)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->rip			= rip;
> > +		__entry->vmcs			= vmcs;
> > +		__entry->nested_rip		= nested_rip;
> > +		__entry->entry_intr_info	= entry_intr_info;
> > +	),
> > +
> > +	TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx "
> > +		  "entry_intr_info: 0x%08x",
> > +		__entry->rip, __entry->vmcs, __entry->nested_rip,
> > +		__entry->entry_intr_info)
> > +);
> > +
> > +
> >  TRACE_EVENT(kvm_nested_intercepts,
> >  	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
> >  		     __u32 intercept1, __u32 intercept2, __u32 intercept3),
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 776688f9d1017..cd51b66480d52 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> >  		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
> >  		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
> >  
> > +	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> 
> Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
> as is the case for the "true" nested VM-Enter path.

> 
> > +					 vmx->nested.current_vmptr,
> > +					 vmcs12->guest_rip,
> > +					 vmcs12->vm_entry_intr_info_field);
> 
> The placement is a bit funky.  I assume you put it here so that calls from
> vmx_set_nested_state() also get traced.  But, that also means
> vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
> some nested VM-Enters that VM-Fail will get traced, but others will not.
> 
> Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
> especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> migration case.
> 
> Not sure what would be a good answer.
> 
> > +
> > +
> >  	/*
> >  	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
> >  	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a480804ae27a3..7c6e94e32100e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
> > +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
> > -- 
> > 2.26.2
> >
diff mbox series

Patch

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 2de30c20bc264..663d1b1d8bf64 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -554,6 +554,36 @@  TRACE_EVENT(kvm_nested_vmrun,
 		__entry->npt ? "on" : "off")
 );
 
+
+/*
+ * Tracepoint for nested VMLAUNCH/VMRESUME
+ */
+TRACE_EVENT(kvm_nested_vmlaunch_resume,
+	    TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip,
+		     __u32 entry_intr_info),
+	    TP_ARGS(rip, vmcs, nested_rip, entry_intr_info),
+
+	TP_STRUCT__entry(
+		__field(	__u64,		rip		)
+		__field(	__u64,		vmcs		)
+		__field(	__u64,		nested_rip	)
+		__field(	__u32,		entry_intr_info	)
+	),
+
+	TP_fast_assign(
+		__entry->rip			= rip;
+		__entry->vmcs			= vmcs;
+		__entry->nested_rip		= nested_rip;
+		__entry->entry_intr_info	= entry_intr_info;
+	),
+
+	TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx "
+		  "entry_intr_info: 0x%08x",
+		__entry->rip, __entry->vmcs, __entry->nested_rip,
+		__entry->entry_intr_info)
+);
+
+
 TRACE_EVENT(kvm_nested_intercepts,
 	    TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
 		     __u32 intercept1, __u32 intercept2, __u32 intercept3),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 776688f9d1017..cd51b66480d52 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3327,6 +3327,12 @@  enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
 		!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
 		vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
 
+	trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
+					 vmx->nested.current_vmptr,
+					 vmcs12->guest_rip,
+					 vmcs12->vm_entry_intr_info_field);
+
+
 	/*
 	 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
 	 * nested early checks are disabled.  In the event of a "late" VM-Fail,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a480804ae27a3..7c6e94e32100e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11562,6 +11562,7 @@  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);