diff mbox series

[v2,6/8] perf/x86/intel/lbr: guest requesting KVM for lbr stack save/restore

Message ID 1536233456-12173-7-git-send-email-wei.w.wang@intel.com (mailing list archive)
State New, archived
Headers show
Series Guest LBR Enabling | expand

Commit Message

Wang, Wei W Sept. 6, 2018, 11:30 a.m. UTC
This patch adds an interface to enable a guest to request KVM to save
and restore the lbr stack on vCPU context switching.

KVM couldn't capture the info about whether the guest is actively using
the lbr feature via the lbr enable bit in the debugctl MSR, because that
control bit is frequently enabled and disabled by the guest, and in some
csaes, it is disabled even when the guest is actively using the lbr
feature. For example, perf_pmu_sched_task in the guest disables the bit
before reading out the lbr stack. In this case, the bit is disabled though
the guest is still using the lbr feature.

So, a KVM-specific MSR, MSR_KVM_PV_LBR_CTRL, is used by the guest at a
proper place to tell KVM if the LBR is actively in use or not. Basically,
the lbr user callstack mode needs the lbr stack to be saved/restored on a
context switching, so we set the ACTIVE bit of MSR_KVM_PV_LBR_CTRL only
when the user callstack mode is used. The KVM hypervisor will add the lbr
stack save/restore support on vCPU switching after the ACTIVE bit is set.

Signed-off-by: Like Xu <like.xu@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/lbr.c          | 21 +++++++++++++++++++++
 arch/x86/include/asm/perf_event.h    |  3 +++
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 3 files changed, 26 insertions(+)

Comments

Andi Kleen Sept. 7, 2018, 3:27 a.m. UTC | #1
On Thu, Sep 06, 2018 at 07:30:54PM +0800, Wei Wang wrote:
> This patch adds an interface to enable a guest to request KVM to save
> and restore the lbr stack on vCPU context switching.
> 
> KVM couldn't capture the info about whether the guest is actively using
> the lbr feature via the lbr enable bit in the debugctl MSR, because that
> control bit is frequently enabled and disabled by the guest, and in some
> csaes, it is disabled even when the guest is actively using the lbr
> feature. For example, perf_pmu_sched_task in the guest disables the bit
> before reading out the lbr stack. In this case, the bit is disabled though
> the guest is still using the lbr feature.
> 
> So, a KVM-specific MSR, MSR_KVM_PV_LBR_CTRL, is used by the guest at a
> proper place to tell KVM if the LBR is actively in use or not. Basically,
> the lbr user callstack mode needs the lbr stack to be saved/restored on a
> context switching, so we set the ACTIVE bit of MSR_KVM_PV_LBR_CTRL only
> when the user callstack mode is used. The KVM hypervisor will add the lbr
> stack save/restore support on vCPU switching after the ACTIVE bit is set.

PV is difficult because it requires changing all the users.

Maybe a better approach would be a lazy restore of the LBRs:

Don't restore the LBRs on context switch, but set the LBR MSRs to intercept.
Then on the first access restore the LBRs and allow direct access to the
MSRs again.

Also when the LBRs haven't been set to direct access the state doesn't
need to be saved.

-Andi
Wang, Wei W Sept. 7, 2018, 5:24 a.m. UTC | #2
On 09/07/2018 11:27 AM, Andi Kleen wrote:
> On Thu, Sep 06, 2018 at 07:30:54PM +0800, Wei Wang wrote:
>> This patch adds an interface to enable a guest to request KVM to save
>> and restore the lbr stack on vCPU context switching.
>>
>> KVM couldn't capture the info about whether the guest is actively using
>> the lbr feature via the lbr enable bit in the debugctl MSR, because that
>> control bit is frequently enabled and disabled by the guest, and in some
>> csaes, it is disabled even when the guest is actively using the lbr
>> feature. For example, perf_pmu_sched_task in the guest disables the bit
>> before reading out the lbr stack. In this case, the bit is disabled though
>> the guest is still using the lbr feature.
>>
>> So, a KVM-specific MSR, MSR_KVM_PV_LBR_CTRL, is used by the guest at a
>> proper place to tell KVM if the LBR is actively in use or not. Basically,
>> the lbr user callstack mode needs the lbr stack to be saved/restored on a
>> context switching, so we set the ACTIVE bit of MSR_KVM_PV_LBR_CTRL only
>> when the user callstack mode is used. The KVM hypervisor will add the lbr
>> stack save/restore support on vCPU switching after the ACTIVE bit is set.
> PV is difficult because it requires changing all the users.

It needs changes of the guest driver, but remains transparent to guest 
user applications (e.g. the perf tool).
Btw, we tested it, and it works in guest as good as on the native linux.

This was thought of as the hardest part of this work. Let me just 
clarify it a little bit:

The fundamental function we want to achieve is
#1 when the vCPU is actively using the LBR feature,  save/restore the 
lbr stack when the vCPU is scheduled out/in;
#2 when the vCPU is NOT actively using the LBR feature, DON'T 
save/restore the lbr stack when the vCPU is scheduled out/in;

The key problem we need to solve is: how does the host know if the guest 
is actively using the lbr feature or not?

>
> Maybe a better approach would be a lazy restore of the LBRs:
>
> Don't restore the LBRs on context switch, but set the LBR MSRs to intercept.
> Then on the first access restore the LBRs and allow direct access to the
> MSRs again.
>
> Also when the LBRs haven't been set to direct access the state doesn't
> need to be saved.

This could achieve the above #1, but how would it solve #2 above? That 
is, after the guest uses the lbr feature for a while, the lbr stack has 
been passed through, then the guest doesn't use lbr any more, but the 
vCPU will still save/restore on switching?
(Host cannot know that the guest is not using lbr by the debugctl[0], 
the commit log above has some explanations about this)

Best,
Wei
Andi Kleen Sept. 7, 2018, 2:10 p.m. UTC | #3
> This could achieve the above #1, but how would it solve #2 above? That is,
> after the guest uses the lbr feature for a while, the lbr stack has been
> passed through, then the guest doesn't use lbr any more, but the vCPU will
> still save/restore on switching?

If nothing accesses the MSR LBRs after a context switch in the guest
nothing gets saved/restored due to:

> > Also when the LBRs haven't been set to direct access the state doesn't
> > need to be saved.


-Andi
Wang, Wei W Sept. 7, 2018, 3:20 p.m. UTC | #4
On Friday, September 7, 2018 10:11 PM, Andi Kleen wrote:
> > This could achieve the above #1, but how would it solve #2 above? That
> > is, after the guest uses the lbr feature for a while, the lbr stack
> > has been passed through, then the guest doesn't use lbr any more, but
> > the vCPU will still save/restore on switching?
> 
> If nothing accesses the MSR LBRs after a context switch in the guest nothing
> gets saved/restored due to:
> 
> > > Also when the LBRs haven't been set to direct access the state
> > > doesn't need to be saved.

How would you realize the function of saving/restoring the lbr stack on the host?

Here, we create a perf event on the host (please see guest_lbr_event_create on patch 7), which essentially satisfies all the conditions (e.g. increases cpuc->lbr_users) that are required to have the lbr stack saved/restored on the vCPU switching.

If we want to stop the host side lbr stack save/restore for the vCPU, we need accordingly to call guest_lbr_event_release (in patch 7) to destroy that perf event (the host doesn't automatically stop saving the lbr stack for the vCPU if that perf event is still there).

When would you call that release function? (we all know that the lbr doesn't need to be saved when the guest is not using it, but we need to destroy that perf event to achieve "doesn't need to be saved")

Best,
Wei
Andi Kleen Sept. 7, 2018, 8:05 p.m. UTC | #5
> How would you realize the function of saving/restoring the lbr stack on the host?
> 
> Here, we create a perf event on the host (please see guest_lbr_event_create on patch 7), which essentially satisfies all the conditions (e.g. increases cpuc->lbr_users) that are required to have the lbr stack saved/restored on the vCPU switching.
> 
> If we want to stop the host side lbr stack save/restore for the vCPU, we need accordingly to call guest_lbr_event_release (in patch 7) to destroy that perf event (the host doesn't automatically stop saving the lbr stack for the vCPU if that perf event is still there).
> 
> When would you call that release function? (we all know that the lbr doesn't need to be saved when the guest is not using it, but we need to destroy that perf event to achieve "doesn't need to be saved")

Maybe set a timer on DEBUGCTL LBR=0 ? A timer would provide hysteresis, so that quick toggles
(like in a PMI handler) wouldn't do anything expensive.

It needs new interfaces for perf anyways because we need to access the LBR state from
the last save.

-Andi
Wang, Wei W Sept. 8, 2018, 1:34 a.m. UTC | #6
On Saturday, September 8, 2018 4:05 AM, Andi Kleen wrote:
> > How would you realize the function of saving/restoring the lbr stack on the
> host?
> >
> > Here, we create a perf event on the host (please see
> guest_lbr_event_create on patch 7), which essentially satisfies all the
> conditions (e.g. increases cpuc->lbr_users) that are required to have the lbr
> stack saved/restored on the vCPU switching.
> >
> > If we want to stop the host side lbr stack save/restore for the vCPU, we
> need accordingly to call guest_lbr_event_release (in patch 7) to destroy that
> perf event (the host doesn't automatically stop saving the lbr stack for the
> vCPU if that perf event is still there).
> >
> > When would you call that release function? (we all know that the lbr
> > doesn't need to be saved when the guest is not using it, but we need
> > to destroy that perf event to achieve "doesn't need to be saved")
> 
> Maybe set a timer on DEBUGCTL LBR=0 ? A timer would provide hysteresis,
> so that quick toggles (like in a PMI handler) wouldn't do anything expensive.

I'm not sure if this would solve the key problem. What would be the frequency to have the timer fired?

- Let's say every 10ms, for example. The guest is disturbed by a timer interrupt (cause VMExit) every 10ms, though the guest doesn't use the lbr any more after the first use. The problem is switched to when do we call the release function to cancel the timer if we want to avoid that unnecessary disturbance to the guest.

- When the timer fires, and it finds "DEBUGCTL LBR=0", it destroys the host side perf event, then the lbr stack isn't saved when the vCPU is scheduled out. As also mentioned in the commit log, perf_pmu_sched_task in the guest disables that bit before reading out the lbr stack (pmi is another example). Now, DEBUGCTL LBR=0", and before the guest read out the lbr stack, the vCPU may happen to be scheduled out, and another thread on the host is scheduled in and will get the lbr stack overwritten. So, before the guest reads out the lbr stack, the stack has already been polluted in this case.

Best,
Wei
diff mbox series

Patch

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 7c3958e..50921d3 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1,6 +1,7 @@ 
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/perf_event.h>
 #include <linux/types.h>
+#include <linux/kvm_para.h>
 
 #include <asm/perf_event.h>
 #include <asm/msr.h>
@@ -454,6 +455,24 @@  static inline bool branch_user_callstack(unsigned br_sel)
 	return (br_sel & X86_BR_USER) && (br_sel & X86_BR_CALL_STACK);
 }
 
+static inline void set_pv_lbr_ctrl_active(bool active)
+{
+	u64 lbr_ctrl_old, lbr_ctrl_new;
+
+	if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
+	    !kvm_para_has_feature(KVM_FEATURE_PV_LBR_CTRL))
+		return;
+
+	rdmsrl(MSR_KVM_PV_LBR_CTRL, lbr_ctrl_old);
+	if (active)
+		lbr_ctrl_new = lbr_ctrl_old | KVM_PV_LBR_CTRL_ACTIVE;
+	else
+		lbr_ctrl_new = lbr_ctrl_old & ~KVM_PV_LBR_CTRL_ACTIVE;
+
+	if (lbr_ctrl_new != lbr_ctrl_old)
+		wrmsrl(MSR_KVM_PV_LBR_CTRL, lbr_ctrl_new);
+}
+
 void intel_pmu_lbr_add(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -467,6 +486,7 @@  void intel_pmu_lbr_add(struct perf_event *event)
 	if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) {
 		task_ctx = event->ctx->task_ctx_data;
 		task_ctx->lbr_callstack_users++;
+		set_pv_lbr_ctrl_active(true);
 	}
 
 	/*
@@ -505,6 +525,7 @@  void intel_pmu_lbr_del(struct perf_event *event)
 	    event->ctx->task_ctx_data) {
 		task_ctx = event->ctx->task_ctx_data;
 		task_ctx->lbr_callstack_users--;
+		set_pv_lbr_ctrl_active(false);
 	}
 
 	cpuc->lbr_users--;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 161165f..9fb0f7e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -162,6 +162,9 @@  struct x86_pmu_capability {
  */
 #define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 16)
 
+/* Indicate to the hypervisor that the guest LBR is active */
+#define KVM_PV_LBR_CTRL_ACTIVE		(1UL << 0)
+
 #define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
 #define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 19980ec..87764dd 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -29,6 +29,7 @@ 
 #define KVM_FEATURE_PV_TLB_FLUSH	9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT	10
 #define KVM_FEATURE_PV_SEND_IPI	11
+#define KVM_FEATURE_PV_LBR_CTRL	12
 
 #define KVM_HINTS_REALTIME      0
 
@@ -47,6 +48,7 @@ 
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
+#define MSR_KVM_PV_LBR_CTRL    0x4b564d05
 
 struct kvm_steal_time {
 	__u64 steal;