diff mbox series

[RFC,v2,2/3] perf/x86/intel/pt: Introduce and export pt_get_curr_event()

Message ID 20220921164521.2858932-3-xiaoyao.li@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT | expand

Commit Message

Xiaoyao Li Sept. 21, 2022, 4:45 p.m. UTC
KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest
have separate Intel PT configurations and they work independently.

Currently, in this mode, when both host and guest enable PT, KVM manually
clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can
context switch the other PT MSRs. However, PT PMI can be delivered after
this point and before the VM-entry. As a result, the re-enabling of PT
leads to VM-entry failure of guest.

To solve the problem, introduce and export pt_get_curr_event() for KVM
to get current pt event. Along with perf_event_{dis, en}able_local(),
With them, KVM can avoid PT re-enabling in PT PMI handler.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/events/intel/pt.c        | 8 ++++++++
 arch/x86/include/asm/perf_event.h | 2 ++
 2 files changed, 10 insertions(+)

Comments

Xiaoyao Li Sept. 22, 2022, 5:14 a.m. UTC | #1
On 9/22/2022 12:45 AM, Xiaoyao Li wrote:
> KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest
> have separate Intel PT configurations and they work independently.
> 
> Currently, in this mode, when both host and guest enable PT, KVM manually
> clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can
> context switch the other PT MSRs. However, PT PMI can be delivered after
> this point and before the VM-entry. As a result, the re-enabling of PT
> leads to VM-entry failure of guest.
> 
> To solve the problem, introduce and export pt_get_curr_event() for KVM
> to get current pt event. Along with perf_event_{dis, en}able_local(),
> With them, KVM can avoid PT re-enabling in PT PMI handler.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>   arch/x86/events/intel/pt.c        | 8 ++++++++
>   arch/x86/include/asm/perf_event.h | 2 ++
>   2 files changed, 10 insertions(+)
> 
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 82ef87e9a897..62bfc45c11c9 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode)
>   	}
>   }
>   
> +struct perf_event *pt_get_curr_event(void)
> +{
> +	struct pt *pt = this_cpu_ptr(&pt_ctx);
> +
> +	return pt->handle.event;
> +}
> +EXPORT_SYMBOL_GPL(pt_get_curr_event);
> +
>   static long pt_event_snapshot_aux(struct perf_event *event,
>   				  struct perf_output_handle *handle,
>   				  unsigned long size)
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index f6fc8dd51ef4..7c3533392cf5 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
>   
>   #ifdef CONFIG_CPU_SUP_INTEL
>    extern void intel_pt_handle_vmx(int on);
> + extern struct perf_event *pt_get_curr_event(void);
>   #else
>   static inline void intel_pt_handle_vmx(int on)
>   {
>   
>   }
> +struct perf_event *pt_get_curr_event(void) { return NULL; }

My fault, this needs to be

   static inline ...

>   #endif
>   
>   #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
Liang, Kan Sept. 22, 2022, 12:33 p.m. UTC | #2
On 2022-09-21 12:45 p.m., Xiaoyao Li wrote:
> KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest
> have separate Intel PT configurations and they work independently.
> 
> Currently, in this mode, when both host and guest enable PT, KVM manually
> clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can
> context switch the other PT MSRs. However, PT PMI can be delivered after
> this point and before the VM-entry. As a result, the re-enabling of PT
> leads to VM-entry failure of guest.
> 
> To solve the problem, introduce and export pt_get_curr_event() for KVM
> to get current pt event. 

I don't think the current pt event is created by KVM. IIUC, the patch
basically expose the event created by the other user to KVM. That
doesn't sounds correct.

I think it should be perf's responsibility to decide which events should
be disabled, and which MSRs should be switched. Because only perf can
see all the events. The users should only see the events they created.

Thanks,
Kan

> Along with perf_event_{dis, en}able_local(),
> With them, KVM can avoid PT re-enabling in PT PMI handler.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  arch/x86/events/intel/pt.c        | 8 ++++++++
>  arch/x86/include/asm/perf_event.h | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 82ef87e9a897..62bfc45c11c9 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode)
>  	}
>  }
>  
> +struct perf_event *pt_get_curr_event(void)
> +{
> +	struct pt *pt = this_cpu_ptr(&pt_ctx);
> +
> +	return pt->handle.event;
> +}
> +EXPORT_SYMBOL_GPL(pt_get_curr_event);
> +
>  static long pt_event_snapshot_aux(struct perf_event *event,
>  				  struct perf_output_handle *handle,
>  				  unsigned long size)
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index f6fc8dd51ef4..7c3533392cf5 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
>  
>  #ifdef CONFIG_CPU_SUP_INTEL
>   extern void intel_pt_handle_vmx(int on);
> + extern struct perf_event *pt_get_curr_event(void);
>  #else
>  static inline void intel_pt_handle_vmx(int on)
>  {
>  
>  }
> +struct perf_event *pt_get_curr_event(void) { return NULL; }
>  #endif
>  
>  #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
Wang, Wei W Sept. 22, 2022, 12:58 p.m. UTC | #3
On Thursday, September 22, 2022 8:34 PM, Liang, Kan wrote:
> > To solve the problem, introduce and export pt_get_curr_event() for KVM
> > to get current pt event.
> 
> I don't think the current pt event is created by KVM. IIUC, the patch basically
> expose the event created by the other user to KVM. That doesn't sounds
> correct.

Yes, that's the host PT event running on the current CPU. Not created by KVM.

> 
> I think it should be perf's responsibility to decide which events should be
> disabled, and which MSRs should be switched. Because only perf can see all the
> events. The users should only see the events they created.

For other pmu cases, yes. For PT, its management is simpler than other pmu
resources and PT PMU is much simpler. It doesn't have a scheduler to manage
events.

I think the usage here is similar to the CPU thread scheduling case. When we have
CPUs isolated from the CPU scheduler, i.e. no scheduler for those CPUs, basically
we rely on users to ping tasks to those CPUs (e.g. taskset).

For the commit log, probably we don't need those KVM details here. Simplify a bit:

Add a function to expose the current running PT event to users. One usage is in KVM,
it needs to get and disable the running host PT event before VMEnter to the guest and
resumes the event after VMexit to host.
Peter Zijlstra Sept. 22, 2022, 1:34 p.m. UTC | #4
On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:

> Add a function to expose the current running PT event to users. One usage is in KVM,
> it needs to get and disable the running host PT event before VMEnter to the guest and
> resumes the event after VMexit to host.

You cannot just kill a host event like that. If there is a host event,
the guest looses out.
Wang, Wei W Sept. 22, 2022, 1:59 p.m. UTC | #5
On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra
> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:
> 
> > Add a function to expose the current running PT event to users. One
> > usage is in KVM, it needs to get and disable the running host PT event
> > before VMEnter to the guest and resumes the event after VMexit to host.
> 
> You cannot just kill a host event like that. If there is a host event, the guest
> looses out.

OK. The intention was to pause the event (that only profiles host info) when switching to guest,
and resume when switching back to host.

I don't think it is proper to directly disable it via clear MSR_IA32_RTIT_CTL, e.g. there
may be dangling PT bit in the interrupt status register, this invokes the PT interrupt handler
to re-enable it (for more details about the issue please see patch 3 commit).
Would you have any suggestions to pause it? 

Thanks,
Wei
Peter Zijlstra Sept. 22, 2022, 2:09 p.m. UTC | #6
On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote:
> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra
> > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:
> > 
> > > Add a function to expose the current running PT event to users. One
> > > usage is in KVM, it needs to get and disable the running host PT event
> > > before VMEnter to the guest and resumes the event after VMexit to host.
> > 
> > You cannot just kill a host event like that. If there is a host event, the guest
> > looses out.
> 
> OK. The intention was to pause the event (that only profiles host info) when switching to guest,
> and resume when switching back to host.

If the even doesn't profile guest context, then yes. If it does profile
guest context, you can't.
Wang, Wei W Sept. 22, 2022, 2:42 p.m. UTC | #7
On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote:
> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote:
> > On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra
> > > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:
> > >
> > > > Add a function to expose the current running PT event to users.
> > > > One usage is in KVM, it needs to get and disable the running host
> > > > PT event before VMEnter to the guest and resumes the event after
> VMexit to host.
> > >
> > > You cannot just kill a host event like that. If there is a host
> > > event, the guest looses out.
> >
> > OK. The intention was to pause the event (that only profiles host
> > info) when switching to guest, and resume when switching back to host.
> 
> If the even doesn't profile guest context, then yes. If it does profile guest
> context, you can't.

Seems better to add this one:

+int perf_event_disable_local_exclude_guest(struct perf_event *event)
+{
+       struct perf_event_attr *attr = &event->attr;
+
+       if (!attr->exclude_guest)
+               return -EPERM;
+
+       event_function_local(event, __perf_event_disable, NULL);
+
+       return 0;
+}
+EXPORT_SYMBOL_GPL(perf_event_disable_local_exclude_guest);
Liang, Kan Sept. 26, 2022, 3:32 p.m. UTC | #8
On 2022-09-22 8:58 a.m., Wang, Wei W wrote:
> On Thursday, September 22, 2022 8:34 PM, Liang, Kan wrote:
>>> To solve the problem, introduce and export pt_get_curr_event() for KVM
>>> to get current pt event.
>>
>> I don't think the current pt event is created by KVM. IIUC, the patch basically
>> expose the event created by the other user to KVM. That doesn't sounds
>> correct.
> 
> Yes, that's the host PT event running on the current CPU. Not created by KVM.
> 
>>
>> I think it should be perf's responsibility to decide which events should be
>> disabled, and which MSRs should be switched. Because only perf can see all the
>> events. The users should only see the events they created.
> 
> For other pmu cases, yes. For PT, its management is simpler than other pmu
> resources and PT PMU is much simpler. It doesn't have a scheduler to manage
> events.
>

Right, but I think we'd better to create a simpler scheduler (just for
two events) in the PT driver, since you have two co-existing PT events
now, one is from the host and the other is from the guest. I don't think
it's KVM's responsibility to schedule events.

So I think the process should be something as below.

- Let KVM create a PT event if the guest request one.
- In VM-entry, just invoke the perf_event_enable_local(guest).
  The PT driver should schedule out the host event or whatever events
and schedule in the guest event.
- In VM-exit, just invoke the perf_event_disable_local(guest).
  The PT driver should schedule in the host event or whatever events and
schedule out the guest event.

I still don't think we want to expose the host event.

Thanks,
Kan

> I think the usage here is similar to the CPU thread scheduling case. When we have
> CPUs isolated from the CPU scheduler, i.e. no scheduler for those CPUs, basically
> we rely on users to ping tasks to those CPUs (e.g. taskset).
> 
> For the commit log, probably we don't need those KVM details here. Simplify a bit:
> 
> Add a function to expose the current running PT event to users. One usage is in KVM,
> it needs to get and disable the running host PT event before VMEnter to the guest and
> resumes the event after VMexit to host.
Liang, Kan Sept. 26, 2022, 3:48 p.m. UTC | #9
On 2022-09-22 10:42 a.m., Wang, Wei W wrote:
> On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote:
>> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote:
>>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra
>>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:
>>>>
>>>>> Add a function to expose the current running PT event to users.
>>>>> One usage is in KVM, it needs to get and disable the running host
>>>>> PT event before VMEnter to the guest and resumes the event after
>> VMexit to host.
>>>>
>>>> You cannot just kill a host event like that. If there is a host
>>>> event, the guest looses out.
>>>
>>> OK. The intention was to pause the event (that only profiles host
>>> info) when switching to guest, and resume when switching back to host.
>>
>> If the even doesn't profile guest context, then yes. If it does profile guest
>> context, you can't.
> 
> Seems better to add this one:

If the guest host mode is enabled, I think the PT driver should not
allow the perf tool to create a host event with !exclude_guest.

Thanks,
Kan
> 
> +int perf_event_disable_local_exclude_guest(struct perf_event *event)
> +{
> +       struct perf_event_attr *attr = &event->attr;
> +
> +       if (!attr->exclude_guest)
> +               return -EPERM;
> +
> +       event_function_local(event, __perf_event_disable, NULL);
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(perf_event_disable_local_exclude_guest);
>
Jim Mattson Sept. 26, 2022, 5:24 p.m. UTC | #10
On Mon, Sep 26, 2022 at 9:55 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2022-09-22 10:42 a.m., Wang, Wei W wrote:
> > On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote:
> >> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote:
> >>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra
> >>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:
> >>>>
> >>>>> Add a function to expose the current running PT event to users.
> >>>>> One usage is in KVM, it needs to get and disable the running host
> >>>>> PT event before VMEnter to the guest and resumes the event after
> >> VMexit to host.
> >>>>
> >>>> You cannot just kill a host event like that. If there is a host
> >>>> event, the guest looses out.
> >>>
> >>> OK. The intention was to pause the event (that only profiles host
> >>> info) when switching to guest, and resume when switching back to host.
> >>
> >> If the even doesn't profile guest context, then yes. If it does profile guest
> >> context, you can't.
> >
> > Seems better to add this one:
>
> If the guest host mode is enabled, I think the PT driver should not
> allow the perf tool to create a host event with !exclude_guest.

While I agree that guest events should generally have priority over
host events, this is not consistent with the way "normal" PMU events
are handled.
Liang, Kan Sept. 26, 2022, 6:08 p.m. UTC | #11
On 2022-09-26 1:24 p.m., Jim Mattson wrote:
> On Mon, Sep 26, 2022 at 9:55 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2022-09-22 10:42 a.m., Wang, Wei W wrote:
>>> On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote:
>>>> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote:
>>>>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra
>>>>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote:
>>>>>>
>>>>>>> Add a function to expose the current running PT event to users.
>>>>>>> One usage is in KVM, it needs to get and disable the running host
>>>>>>> PT event before VMEnter to the guest and resumes the event after
>>>> VMexit to host.
>>>>>>
>>>>>> You cannot just kill a host event like that. If there is a host
>>>>>> event, the guest looses out.
>>>>>
>>>>> OK. The intention was to pause the event (that only profiles host
>>>>> info) when switching to guest, and resume when switching back to host.
>>>>
>>>> If the even doesn't profile guest context, then yes. If it does profile guest
>>>> context, you can't.
>>>
>>> Seems better to add this one:
>>
>> If the guest host mode is enabled, I think the PT driver should not
>> allow the perf tool to create a host event with !exclude_guest.
> 
> While I agree that guest events should generally have priority over
> host events, this is not consistent with the way "normal" PMU events
> are handled.
Only when the two events try to access a resource at the same time, we
have to decide whether priority an event or share between events. But I
don't think this is the case here.

From my understanding of the host-guest mode, the host PT event never
traces the guest no matter whether the guest enables PT. So when
VM-entry, there is only a guest PT event or no event. If so, I think the
perf tool should warn the user if they try to create a host event with
!exclude_guest, since the host event never traces a guest.


Thanks,
Kan
Wang, Wei W Sept. 27, 2022, 2:27 p.m. UTC | #12
On Tuesday, September 27, 2022 2:09 AM, Liang, Kan wrote:
> From my understanding of the host-guest mode, the host PT event never
> traces the guest no matter whether the guest enables PT. So when VM-entry,
> there is only a guest PT event or no event. If so, I think the perf tool should
> warn the user if they try to create a host event with !exclude_guest, since the
> host event never traces a guest.

Probably not from the perf side. It's actually an issue of the KVM's pt_mode:
How can KVM prevent host from profiling the guest when in host-guest mode?
Just a warning from the perf side wouldn't be enough. I think that's a wrong
assumption from KVM side. I had a plan to fix this one. Exposing the host event
can tell KVM if the host is profiling the guest or not (i.e. host-guest is allowed).
Liang, Kan Sept. 27, 2022, 4:52 p.m. UTC | #13
On 2022-09-27 10:27 a.m., Wang, Wei W wrote:
> On Tuesday, September 27, 2022 2:09 AM, Liang, Kan wrote:
>> From my understanding of the host-guest mode, the host PT event never
>> traces the guest no matter whether the guest enables PT. So when VM-entry,
>> there is only a guest PT event or no event. If so, I think the perf tool should
>> warn the user if they try to create a host event with !exclude_guest, since the
>> host event never traces a guest.
> 
> Probably not from the perf side. It's actually an issue of the KVM's pt_mode:
> How can KVM prevent host from profiling the guest when in host-guest mode?

I don't think it's KVM's job to prevent host from profiling the guest.
However, the current KVM implementation implicitly disables the guest
profiling from the host. If I understand correct, this patch series just
change it to an explicit way (Just for kernel. Perf tool still doesn't
know about it.). There is no substantial change.

I think it should be PT driver who decide which event should be
scheduled. So the first step is to let the PT driver understand the
host-guest mode. Then if a perf user in the host tries to profile with
!exclude_guest with the host-guest mode, the pt driver can deny the
request. The perf tool than throw a warning, e.g., "The VMM is running
with host-guest mode. Perf cannot profile guest. Please remove the guest
profiling setting or switch to the host-only mode.".

> Just a warning from the perf side wouldn't be enough. I think that's a wrong
> assumption from KVM side. I had a plan to fix this one. Exposing the host event
> can tell KVM if the host is profiling the guest or not (i.e. host-guest is allowed).

I don't think KVM should know such information.

Thanks,
Kan
diff mbox series

Patch

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 82ef87e9a897..62bfc45c11c9 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1624,6 +1624,14 @@  static void pt_event_stop(struct perf_event *event, int mode)
 	}
 }
 
+struct perf_event *pt_get_curr_event(void)
+{
+	struct pt *pt = this_cpu_ptr(&pt_ctx);
+
+	return pt->handle.event;
+}
+EXPORT_SYMBOL_GPL(pt_get_curr_event);
+
 static long pt_event_snapshot_aux(struct perf_event *event,
 				  struct perf_output_handle *handle,
 				  unsigned long size)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f6fc8dd51ef4..7c3533392cf5 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -553,11 +553,13 @@  static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
 
 #ifdef CONFIG_CPU_SUP_INTEL
  extern void intel_pt_handle_vmx(int on);
+ extern struct perf_event *pt_get_curr_event(void);
 #else
 static inline void intel_pt_handle_vmx(int on)
 {
 
 }
+struct perf_event *pt_get_curr_event(void) { return NULL; }
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)