Message ID | 20220921164521.2858932-3-xiaoyao.li@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT | expand |
On 9/22/2022 12:45 AM, Xiaoyao Li wrote: > KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest > have separate Intel PT configurations and they work independently. > > Currently, in this mode, when both host and guest enable PT, KVM manually > clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can > context switch the other PT MSRs. However, PT PMI can be delivered after > this point and before the VM-entry. As a result, the re-enabling of PT > leads to VM-entry failure of guest. > > To solve the problem, introduce and export pt_get_curr_event() for KVM > to get current pt event. Along with perf_event_{dis, en}able_local(), > With them, KVM can avoid PT re-enabling in PT PMI handler. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/events/intel/pt.c | 8 ++++++++ > arch/x86/include/asm/perf_event.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index 82ef87e9a897..62bfc45c11c9 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode) > } > } > > +struct perf_event *pt_get_curr_event(void) > +{ > + struct pt *pt = this_cpu_ptr(&pt_ctx); > + > + return pt->handle.event; > +} > +EXPORT_SYMBOL_GPL(pt_get_curr_event); > + > static long pt_event_snapshot_aux(struct perf_event *event, > struct perf_output_handle *handle, > unsigned long size) > diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h > index f6fc8dd51ef4..7c3533392cf5 100644 > --- a/arch/x86/include/asm/perf_event.h > +++ b/arch/x86/include/asm/perf_event.h > @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) > > #ifdef CONFIG_CPU_SUP_INTEL > extern void intel_pt_handle_vmx(int on); > + extern struct perf_event *pt_get_curr_event(void); > #else > static inline void intel_pt_handle_vmx(int on) > { > > } > +struct perf_event *pt_get_curr_event(void) { return NULL; } My fault, this needs to be static inline ... > #endif > > #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
On 2022-09-21 12:45 p.m., Xiaoyao Li wrote: > KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest > have separate Intel PT configurations and they work independently. > > Currently, in this mode, when both host and guest enable PT, KVM manually > clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can > context switch the other PT MSRs. However, PT PMI can be delivered after > this point and before the VM-entry. As a result, the re-enabling of PT > leads to VM-entry failure of guest. > > To solve the problem, introduce and export pt_get_curr_event() for KVM > to get current pt event. I don't think the current pt event is created by KVM. IIUC, the patch basically expose the event created by the other user to KVM. That doesn't sounds correct. I think it should be perf's responsibility to decide which events should be disabled, and which MSRs should be switched. Because only perf can see all the events. The users should only see the events they created. Thanks, Kan > Along with perf_event_{dis, en}able_local(), > With them, KVM can avoid PT re-enabling in PT PMI handler. > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> > --- > arch/x86/events/intel/pt.c | 8 ++++++++ > arch/x86/include/asm/perf_event.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c > index 82ef87e9a897..62bfc45c11c9 100644 > --- a/arch/x86/events/intel/pt.c > +++ b/arch/x86/events/intel/pt.c > @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode) > } > } > > +struct perf_event *pt_get_curr_event(void) > +{ > + struct pt *pt = this_cpu_ptr(&pt_ctx); > + > + return pt->handle.event; > +} > +EXPORT_SYMBOL_GPL(pt_get_curr_event); > + > static long pt_event_snapshot_aux(struct perf_event *event, > struct perf_output_handle *handle, > unsigned long size) > diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h > index f6fc8dd51ef4..7c3533392cf5 100644 > --- a/arch/x86/include/asm/perf_event.h > +++ b/arch/x86/include/asm/perf_event.h > @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) > > #ifdef CONFIG_CPU_SUP_INTEL > extern void intel_pt_handle_vmx(int on); > + extern struct perf_event *pt_get_curr_event(void); > #else > static inline void intel_pt_handle_vmx(int on) > { > > } > +struct perf_event *pt_get_curr_event(void) { return NULL; } > #endif > > #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
On Thursday, September 22, 2022 8:34 PM, Liang, Kan wrote: > > To solve the problem, introduce and export pt_get_curr_event() for KVM > > to get current pt event. > > I don't think the current pt event is created by KVM. IIUC, the patch basically > expose the event created by the other user to KVM. That doesn't sounds > correct. Yes, that's the host PT event running on the current CPU. Not created by KVM. > > I think it should be perf's responsibility to decide which events should be > disabled, and which MSRs should be switched. Because only perf can see all the > events. The users should only see the events they created. For other pmu cases, yes. For PT, its management is simpler than other pmu resources and PT PMU is much simpler. It doesn't have a scheduler to manage events. I think the usage here is similar to the CPU thread scheduling case. When we have CPUs isolated from the CPU scheduler, i.e. no scheduler for those CPUs, basically we rely on users to ping tasks to those CPUs (e.g. taskset). For the commit log, probably we don't need those KVM details here. Simplify a bit: Add a function to expose the current running PT event to users. One usage is in KVM, it needs to get and disable the running host PT event before VMEnter to the guest and resumes the event after VMexit to host.
On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > Add a function to expose the current running PT event to users. One usage is in KVM, > it needs to get and disable the running host PT event before VMEnter to the guest and > resumes the event after VMexit to host. You cannot just kill a host event like that. If there is a host event, the guest looses out.
On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > > > Add a function to expose the current running PT event to users. One > > usage is in KVM, it needs to get and disable the running host PT event > > before VMEnter to the guest and resumes the event after VMexit to host. > > You cannot just kill a host event like that. If there is a host event, the guest > looses out. OK. The intention was to pause the event (that only profiles host info) when switching to guest, and resume when switching back to host. I don't think it is proper to directly disable it via clear MSR_IA32_RTIT_CTL, e.g. there may be dangling PT bit in the interrupt status register, this invokes the PT interrupt handler to re-enable it (for more details about the issue please see patch 3 commit). Would you have any suggestions to pause it? Thanks, Wei
On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: > On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > > > > > Add a function to expose the current running PT event to users. One > > > usage is in KVM, it needs to get and disable the running host PT event > > > before VMEnter to the guest and resumes the event after VMexit to host. > > > > You cannot just kill a host event like that. If there is a host event, the guest > > looses out. > > OK. The intention was to pause the event (that only profiles host info) when switching to guest, > and resume when switching back to host. If the even doesn't profile guest context, then yes. If it does profile guest context, you can't.
On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: > On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: > > On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > > > On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > > > > > > > Add a function to expose the current running PT event to users. > > > > One usage is in KVM, it needs to get and disable the running host > > > > PT event before VMEnter to the guest and resumes the event after > VMexit to host. > > > > > > You cannot just kill a host event like that. If there is a host > > > event, the guest looses out. > > > > OK. The intention was to pause the event (that only profiles host > > info) when switching to guest, and resume when switching back to host. > > If the even doesn't profile guest context, then yes. If it does profile guest > context, you can't. Seems better to add this one: +int perf_event_disable_local_exclude_guest(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + + if (!attr->exclude_guest) + return -EPERM; + + event_function_local(event, __perf_event_disable, NULL); + + return 0; +} +EXPORT_SYMBOL_GPL(perf_event_disable_local_exclude_guest);
On 2022-09-22 8:58 a.m., Wang, Wei W wrote: > On Thursday, September 22, 2022 8:34 PM, Liang, Kan wrote: >>> To solve the problem, introduce and export pt_get_curr_event() for KVM >>> to get current pt event. >> >> I don't think the current pt event is created by KVM. IIUC, the patch basically >> expose the event created by the other user to KVM. That doesn't sounds >> correct. > > Yes, that's the host PT event running on the current CPU. Not created by KVM. > >> >> I think it should be perf's responsibility to decide which events should be >> disabled, and which MSRs should be switched. Because only perf can see all the >> events. The users should only see the events they created. > > For other pmu cases, yes. For PT, its management is simpler than other pmu > resources and PT PMU is much simpler. It doesn't have a scheduler to manage > events. > Right, but I think we'd better to create a simpler scheduler (just for two events) in the PT driver, since you have two co-existing PT events now, one is from the host and the other is from the guest. I don't think it's KVM's responsibility to schedule events. So I think the process should be something as below. - Let KVM create a PT event if the guest request one. - In VM-entry, just invoke the perf_event_enable_local(guest). The PT driver should schedule out the host event or whatever events and schedule in the guest event. - In VM-exit, just invoke the perf_event_disable_local(guest). The PT driver should schedule in the host event or whatever events and schedule out the guest event. I still don't think we want to expose the host event. Thanks, Kan > I think the usage here is similar to the CPU thread scheduling case. When we have > CPUs isolated from the CPU scheduler, i.e. no scheduler for those CPUs, basically > we rely on users to ping tasks to those CPUs (e.g. taskset). > > For the commit log, probably we don't need those KVM details here. Simplify a bit: > > Add a function to expose the current running PT event to users. One usage is in KVM, > it needs to get and disable the running host PT event before VMEnter to the guest and > resumes the event after VMexit to host.
On 2022-09-22 10:42 a.m., Wang, Wei W wrote: > On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: >> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: >>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra >>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: >>>> >>>>> Add a function to expose the current running PT event to users. >>>>> One usage is in KVM, it needs to get and disable the running host >>>>> PT event before VMEnter to the guest and resumes the event after >> VMexit to host. >>>> >>>> You cannot just kill a host event like that. If there is a host >>>> event, the guest looses out. >>> >>> OK. The intention was to pause the event (that only profiles host >>> info) when switching to guest, and resume when switching back to host. >> >> If the even doesn't profile guest context, then yes. If it does profile guest >> context, you can't. > > Seems better to add this one: If the guest host mode is enabled, I think the PT driver should not allow the perf tool to create a host event with !exclude_guest. Thanks, Kan > > +int perf_event_disable_local_exclude_guest(struct perf_event *event) > +{ > + struct perf_event_attr *attr = &event->attr; > + > + if (!attr->exclude_guest) > + return -EPERM; > + > + event_function_local(event, __perf_event_disable, NULL); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(perf_event_disable_local_exclude_guest); >
On Mon, Sep 26, 2022 at 9:55 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > > > On 2022-09-22 10:42 a.m., Wang, Wei W wrote: > > On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: > >> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: > >>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra > >>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: > >>>> > >>>>> Add a function to expose the current running PT event to users. > >>>>> One usage is in KVM, it needs to get and disable the running host > >>>>> PT event before VMEnter to the guest and resumes the event after > >> VMexit to host. > >>>> > >>>> You cannot just kill a host event like that. If there is a host > >>>> event, the guest looses out. > >>> > >>> OK. The intention was to pause the event (that only profiles host > >>> info) when switching to guest, and resume when switching back to host. > >> > >> If the even doesn't profile guest context, then yes. If it does profile guest > >> context, you can't. > > > > Seems better to add this one: > > If the guest host mode is enabled, I think the PT driver should not > allow the perf tool to create a host event with !exclude_guest. While I agree that guest events should generally have priority over host events, this is not consistent with the way "normal" PMU events are handled.
On 2022-09-26 1:24 p.m., Jim Mattson wrote: > On Mon, Sep 26, 2022 at 9:55 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >> >> >> >> On 2022-09-22 10:42 a.m., Wang, Wei W wrote: >>> On Thursday, September 22, 2022 10:10 PM, Peter Zijlstra wrote: >>>> On Thu, Sep 22, 2022 at 01:59:53PM +0000, Wang, Wei W wrote: >>>>> On Thursday, September 22, 2022 9:35 PM, Peter Zijlstra >>>>>> On Thu, Sep 22, 2022 at 12:58:49PM +0000, Wang, Wei W wrote: >>>>>> >>>>>>> Add a function to expose the current running PT event to users. >>>>>>> One usage is in KVM, it needs to get and disable the running host >>>>>>> PT event before VMEnter to the guest and resumes the event after >>>> VMexit to host. >>>>>> >>>>>> You cannot just kill a host event like that. If there is a host >>>>>> event, the guest looses out. >>>>> >>>>> OK. The intention was to pause the event (that only profiles host >>>>> info) when switching to guest, and resume when switching back to host. >>>> >>>> If the even doesn't profile guest context, then yes. If it does profile guest >>>> context, you can't. >>> >>> Seems better to add this one: >> >> If the guest host mode is enabled, I think the PT driver should not >> allow the perf tool to create a host event with !exclude_guest. > > While I agree that guest events should generally have priority over > host events, this is not consistent with the way "normal" PMU events > are handled. Only when the two events try to access a resource at the same time, we have to decide whether priority an event or share between events. But I don't think this is the case here. From my understanding of the host-guest mode, the host PT event never traces the guest no matter whether the guest enables PT. So when VM-entry, there is only a guest PT event or no event. If so, I think the perf tool should warn the user if they try to create a host event with !exclude_guest, since the host event never traces a guest. Thanks, Kan
On Tuesday, September 27, 2022 2:09 AM, Liang, Kan wrote: > From my understanding of the host-guest mode, the host PT event never > traces the guest no matter whether the guest enables PT. So when VM-entry, > there is only a guest PT event or no event. If so, I think the perf tool should > warn the user if they try to create a host event with !exclude_guest, since the > host event never traces a guest. Probably not from the perf side. It's actually an issue of the KVM's pt_mode: How can KVM prevent host from profiling the guest when in host-guest mode? Just a warning from the perf side wouldn't be enough. I think that's a wrong assumption from KVM side. I had a plan to fix this one. Exposing the host event can tell KVM if the host is profiling the guest or not (i.e. host-guest is allowed).
On 2022-09-27 10:27 a.m., Wang, Wei W wrote: > On Tuesday, September 27, 2022 2:09 AM, Liang, Kan wrote: >> From my understanding of the host-guest mode, the host PT event never >> traces the guest no matter whether the guest enables PT. So when VM-entry, >> there is only a guest PT event or no event. If so, I think the perf tool should >> warn the user if they try to create a host event with !exclude_guest, since the >> host event never traces a guest. > > Probably not from the perf side. It's actually an issue of the KVM's pt_mode: > How can KVM prevent host from profiling the guest when in host-guest mode? I don't think it's KVM's job to prevent host from profiling the guest. However, the current KVM implementation implicitly disables the guest profiling from the host. If I understand correct, this patch series just change it to an explicit way (Just for kernel. Perf tool still doesn't know about it.). There is no substantial change. I think it should be PT driver who decide which event should be scheduled. So the first step is to let the PT driver understand the host-guest mode. Then if a perf user in the host tries to profile with !exclude_guest with the host-guest mode, the pt driver can deny the request. The perf tool than throw a warning, e.g., "The VMM is running with host-guest mode. Perf cannot profile guest. Please remove the guest profiling setting or switch to the host-only mode.". > Just a warning from the perf side wouldn't be enough. I think that's a wrong > assumption from KVM side. I had a plan to fix this one. Exposing the host event > can tell KVM if the host is profiling the guest or not (i.e. host-guest is allowed). I don't think KVM should know such information. Thanks, Kan
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 82ef87e9a897..62bfc45c11c9 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -1624,6 +1624,14 @@ static void pt_event_stop(struct perf_event *event, int mode) } } +struct perf_event *pt_get_curr_event(void) +{ + struct pt *pt = this_cpu_ptr(&pt_ctx); + + return pt->handle.event; +} +EXPORT_SYMBOL_GPL(pt_get_curr_event); + static long pt_event_snapshot_aux(struct perf_event *event, struct perf_output_handle *handle, unsigned long size) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index f6fc8dd51ef4..7c3533392cf5 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -553,11 +553,13 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr) #ifdef CONFIG_CPU_SUP_INTEL extern void intel_pt_handle_vmx(int on); + extern struct perf_event *pt_get_curr_event(void); #else static inline void intel_pt_handle_vmx(int on) { } +struct perf_event *pt_get_curr_event(void) { return NULL; } #endif #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
KVM supports PT_MODE_HOST_GUEST mode for Intel PT where host and guest have separate Intel PT configurations and they work independently. Currently, in this mode, when both host and guest enable PT, KVM manually clears MSR_IA32_RTIT_CTL.TRACEEN to disable host PT so that it can context switch the other PT MSRs. However, PT PMI can be delivered after this point and before the VM-entry. As a result, the re-enabling of PT leads to VM-entry failure of guest. To solve the problem, introduce and export pt_get_curr_event() for KVM to get current pt event. Along with perf_event_{dis, en}able_local(), With them, KVM can avoid PT re-enabling in PT PMI handler. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> --- arch/x86/events/intel/pt.c | 8 ++++++++ arch/x86/include/asm/perf_event.h | 2 ++ 2 files changed, 10 insertions(+)