[RFC,v3,10/58] perf: Add generic exclude_guest support

Message ID	20240801045907.4010984-11-mizhang@google.com (mailing list archive)
State	New, archived
Headers	show Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C348C146D7D for <kvm@vger.kernel.org>; Thu, 1 Aug 2024 04:59:30 +0000 (UTC) Reply-To: Mingwei Zhang <mizhang@google.com> Date: Thu, 1 Aug 2024 04:58:19 +0000 In-Reply-To: <20240801045907.4010984-1-mizhang@google.com> Precedence: bulk Mime-Version: 1.0 References: <20240801045907.4010984-1-mizhang@google.com> Message-ID: <20240801045907.4010984-11-mizhang@google.com> Subject: [RFC PATCH v3 10/58] perf: Add generic exclude_guest support From: Mingwei Zhang <mizhang@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Xiong Zhang <xiong.y.zhang@intel.com>, Dapeng Mi <dapeng1.mi@linux.intel.com>, Kan Liang <kan.liang@intel.com>, Zhenyu Wang <zhenyuw@linux.intel.com>, Manali Shukla <manali.shukla@amd.com>, Sandipan Das <sandipan.das@amd.com> Cc: Jim Mattson <jmattson@google.com>, Stephane Eranian <eranian@google.com>, Ian Rogers <irogers@google.com>, Namhyung Kim <namhyung@kernel.org>, Mingwei Zhang <mizhang@google.com>, gce-passthrou-pmu-dev@google.com, Samantha Alt <samantha.alt@intel.com>, Zhiyuan Lv <zhiyuan.lv@intel.com>, Yanfei Xu <yanfei.xu@intel.com>, Like Xu <like.xu.linux@gmail.com>, Peter Zijlstra <peterz@infradead.org>, Raghavendra Rao Ananta <rananta@google.com>, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org Content-Type: text/plain; charset="UTF-8"
Series	Mediated Passthrough vPMU 3.0 for x86 \| expand [RFC,v3,00/58] Mediated Passthrough vPMU 3.0 for x86 [RFC,v3,01/58] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h [RFC,v3,02/58] sched/core: Drop spinlocks on contention iff kernel is preemptible [RFC,v3,03/58] perf/x86: Do not set bit width for unavailable counters [RFC,v3,04/58] x86/msr: Define PerfCntrGlobalStatusSet register [RFC,v3,05/58] x86/msr: Introduce MSR_CORE_PERF_GLOBAL_STATUS_SET [RFC,v3,06/58] perf: Support get/put passthrough PMU interfaces [RFC,v3,07/58] perf: Skip pmu_ctx based on event_type [RFC,v3,08/58] perf: Clean up perf ctx time [RFC,v3,09/58] perf: Add a EVENT_GUEST flag [RFC,v3,10/58] perf: Add generic exclude_guest support [RFC,v3,11/58] x86/irq: Factor out common code for installing kvm irq handler [RFC,v3,12/58] perf: core/x86: Register a new vector for KVM GUEST PMI [RFC,v3,13/58] KVM: x86/pmu: Register KVM_GUEST_PMI_VECTOR handler [RFC,v3,14/58] perf: Add switch_interrupt() interface [RFC,v3,15/58] perf/x86: Support switch_interrupt interface [RFC,v3,16/58] perf/x86: Forbid PMI handler when guest own PMU [RFC,v3,17/58] perf: core/x86: Plumb passthrough PMU capability from x86_pmu to x86_pmu_cap [RFC,v3,18/58] KVM: x86/pmu: Introduce enable_passthrough_pmu module parameter [RFC,v3,19/58] KVM: x86/pmu: Plumb through pass-through PMU to vcpu for Intel CPUs [RFC,v3,20/58] KVM: x86/pmu: Always set global enable bits in passthrough mode [RFC,v3,21/58] KVM: x86/pmu: Add a helper to check if passthrough PMU is enabled [RFC,v3,22/58] KVM: x86/pmu: Add host_perf_cap and initialize it in kvm_x86_vendor_init() [RFC,v3,23/58] KVM: x86/pmu: Allow RDPMC pass through when all counters exposed to guest [RFC,v3,24/58] KVM: x86/pmu: Introduce macro PMU_CAP_PERF_METRICS [RFC,v3,25/58] KVM: x86/pmu: Introduce PMU operator to check if rdpmc passthrough allowed [RFC,v3,26/58] KVM: x86/pmu: Manage MSR interception for IA32_PERF_GLOBAL_CTRL [RFC,v3,27/58] KVM: x86/pmu: Create a function prototype to disable MSR interception [RFC,v3,28/58] KVM: x86/pmu: Add intel_passthrough_pmu_msrs() to pass-through PMU MSRs [RFC,v3,29/58] KVM: x86/pmu: Avoid legacy vPMU code when accessing global_ctrl in passthrough vPMU [RFC,v3,30/58] KVM: x86/pmu: Exclude PMU MSRs in vmx_get_passthrough_msr_slot() [RFC,v3,31/58] KVM: x86/pmu: Add counter MSR and selector MSR index into struct kvm_pmc [RFC,v3,32/58] KVM: x86/pmu: Introduce PMU operation prototypes for save/restore PMU context [RFC,v3,33/58] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU [RFC,v3,34/58] KVM: x86/pmu: Make check_pmu_event_filter() an exported function [RFC,v3,35/58] KVM: x86/pmu: Allow writing to event selector for GP counters if event is allowed [RFC,v3,36/58] KVM: x86/pmu: Allow writing to fixed counter selector if counter is exposed [RFC,v3,37/58] KVM: x86/pmu: Switch IA32_PERF_GLOBAL_CTRL at VM boundary [RFC,v3,38/58] KVM: x86/pmu: Exclude existing vLBR logic from the passthrough PMU [RFC,v3,39/58] KVM: x86/pmu: Notify perf core at KVM context switch boundary [RFC,v3,40/58] KVM: x86/pmu: Grab x86 core PMU for passthrough PMU VM [RFC,v3,41/58] KVM: x86/pmu: Add support for PMU context switch at VM-exit/enter [RFC,v3,42/58] KVM: x86/pmu: Introduce PMU operator to increment counter [RFC,v3,43/58] KVM: x86/pmu: Introduce PMU operator for setting counter overflow [RFC,v3,44/58] KVM: x86/pmu: Implement emulated counter increment for passthrough PMU [RFC,v3,45/58] KVM: x86/pmu: Update pmc_{read,write}_counter() to disconnect perf API [RFC,v3,46/58] KVM: x86/pmu: Disconnect counter reprogram logic from passthrough PMU [RFC,v3,47/58] KVM: nVMX: Add nested virtualization support for passthrough PMU [RFC,v3,48/58] perf/x86/intel: Support PERF_PMU_CAP_PASSTHROUGH_VPMU [RFC,v3,49/58] KVM: x86/pmu/svm: Set passthrough capability for vcpus [RFC,v3,50/58] KVM: x86/pmu/svm: Set enable_passthrough_pmu module parameter [RFC,v3,51/58] KVM: x86/pmu/svm: Allow RDPMC pass through when all counters exposed to guest [RFC,v3,52/58] KVM: x86/pmu/svm: Implement callback to disable MSR interception [RFC,v3,53/58] KVM: x86/pmu/svm: Set GuestOnly bit and clear HostOnly bit when guest write to event… [RFC,v3,54/58] KVM: x86/pmu/svm: Add registers to direct access list [RFC,v3,55/58] KVM: x86/pmu/svm: Implement handlers to save and restore context [RFC,v3,56/58] KVM: x86/pmu/svm: Wire up PMU filtering functionality for passthrough PMU [RFC,v3,57/58] KVM: x86/pmu/svm: Implement callback to increment counters [RFC,v3,58/58] perf/x86/amd: Support PERF_PMU_CAP_PASSTHROUGH_VPMU for AMD host

Message ID

20240801045907.4010984-11-mizhang@google.com (mailing list archive)

State

New, archived

Headers

Reply-To: Mingwei Zhang <mizhang@google.com>
Date: Thu,  1 Aug 2024 04:58:19 +0000
In-Reply-To: <20240801045907.4010984-1-mizhang@google.com>
Precedence: bulk
Mime-Version: 1.0
References: <20240801045907.4010984-1-mizhang@google.com>
Message-ID: <20240801045907.4010984-11-mizhang@google.com>
Subject: [RFC PATCH v3 10/58] perf: Add generic exclude_guest support
From: Mingwei Zhang <mizhang@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>,
	Xiong Zhang <xiong.y.zhang@intel.com>,
 Dapeng Mi <dapeng1.mi@linux.intel.com>,
	Kan Liang <kan.liang@intel.com>, Zhenyu Wang <zhenyuw@linux.intel.com>,
	Manali Shukla <manali.shukla@amd.com>, Sandipan Das <sandipan.das@amd.com>
Cc: Jim Mattson <jmattson@google.com>, Stephane Eranian <eranian@google.com>,
	Ian Rogers <irogers@google.com>, Namhyung Kim <namhyung@kernel.org>,
	Mingwei Zhang <mizhang@google.com>, gce-passthrou-pmu-dev@google.com,
	Samantha Alt <samantha.alt@intel.com>, Zhiyuan Lv <zhiyuan.lv@intel.com>,
	Yanfei Xu <yanfei.xu@intel.com>, Like Xu <like.xu.linux@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
 Raghavendra Rao Ananta <rananta@google.com>, kvm@vger.kernel.org,
	linux-perf-users@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"

Series

Mediated Passthrough vPMU 3.0 for x86 | expand

Commit Message

Mingwei Zhang Aug. 1, 2024, 4:58 a.m. UTC

From: Kan Liang <kan.liang@linux.intel.com>

Only KVM knows the exact time when a guest is entering/exiting. Expose
two interfaces to KVM to switch the ownership of the PMU resources.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 include/linux/perf_event.h |  4 +++
 kernel/events/core.c       | 54 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

Comments

Peter Zijlstra Oct. 14, 2024, 11:20 a.m. UTC | #1

On Thu, Aug 01, 2024 at 04:58:19AM +0000, Mingwei Zhang wrote:
> +void perf_guest_exit(void)
> +{
> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
> +
> +	lockdep_assert_irqs_disabled();
> +
> +	perf_ctx_lock(cpuctx, cpuctx->task_ctx);
> +
> +	if (WARN_ON_ONCE(!__this_cpu_read(perf_in_guest)))
> +		goto unlock;
> +
> +	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
> +	ctx_sched_in(&cpuctx->ctx, EVENT_GUEST);
> +	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
> +	if (cpuctx->task_ctx) {
> +		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
> +		ctx_sched_in(cpuctx->task_ctx, EVENT_GUEST);
> +		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
> +	}

Does this not violate the scheduling order of events? AFAICT this will
do:

  cpu pinned
  cpu flexible
  task pinned
  task flexible

as opposed to:

  cpu pinned
  task pinned
  cpu flexible
  task flexible

We have the perf_event_sched_in() helper for this.

> +
> +	__this_cpu_write(perf_in_guest, false);
> +unlock:
> +	perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
> +}
> +EXPORT_SYMBOL_GPL(perf_guest_exit);

Liang, Kan Oct. 14, 2024, 3:27 p.m. UTC | #2

On 2024-10-14 7:20 a.m., Peter Zijlstra wrote:
> On Thu, Aug 01, 2024 at 04:58:19AM +0000, Mingwei Zhang wrote:
>> +void perf_guest_exit(void)
>> +{
>> +	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
>> +
>> +	lockdep_assert_irqs_disabled();
>> +
>> +	perf_ctx_lock(cpuctx, cpuctx->task_ctx);
>> +
>> +	if (WARN_ON_ONCE(!__this_cpu_read(perf_in_guest)))
>> +		goto unlock;
>> +
>> +	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
>> +	ctx_sched_in(&cpuctx->ctx, EVENT_GUEST);
>> +	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
>> +	if (cpuctx->task_ctx) {
>> +		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
>> +		ctx_sched_in(cpuctx->task_ctx, EVENT_GUEST);
>> +		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
>> +	}
> 
> Does this not violate the scheduling order of events? AFAICT this will
> do:
> 
>   cpu pinned
>   cpu flexible
>   task pinned
>   task flexible
> 
> as opposed to:
> 
>   cpu pinned
>   task pinned
>   cpu flexible
>   task flexible
> 
> We have the perf_event_sched_in() helper for this.

Yes, we can avoid the sched_in() with EVENT_GUEST flag, then invoke the
perf_event_sched_in() helper to do the real schedule. I will do more
tests to double check.

Thanks,
Kan
> 
>> +
>> +	__this_cpu_write(perf_in_guest, false);
>> +unlock:
>> +	perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
>> +}
>> +EXPORT_SYMBOL_GPL(perf_guest_exit);
>

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 81a5f8399cb8..75773f9890cc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1738,6 +1738,8 @@  extern int perf_event_period(struct perf_event *event, u64 value);
 extern u64 perf_event_pause(struct perf_event *event, bool reset);
 int perf_get_mediated_pmu(void);
 void perf_put_mediated_pmu(void);
+void perf_guest_enter(void);
+void perf_guest_exit(void);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1831,6 +1833,8 @@  static inline int perf_get_mediated_pmu(void)
 }
 
 static inline void perf_put_mediated_pmu(void)			{ }
+static inline void perf_guest_enter(void)			{ }
+static inline void perf_guest_exit(void)			{ }
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57648736e43e..57ff737b922b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5941,6 +5941,60 @@  void perf_put_mediated_pmu(void)
 }
 EXPORT_SYMBOL_GPL(perf_put_mediated_pmu);
 
+/* When entering a guest, schedule out all exclude_guest events. */
+void perf_guest_enter(void)
+{
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+
+	lockdep_assert_irqs_disabled();
+
+	perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+
+	if (WARN_ON_ONCE(__this_cpu_read(perf_in_guest)))
+		goto unlock;
+
+	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
+	ctx_sched_out(&cpuctx->ctx, EVENT_GUEST);
+	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
+	if (cpuctx->task_ctx) {
+		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
+		task_ctx_sched_out(cpuctx->task_ctx, EVENT_GUEST);
+		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
+	}
+
+	__this_cpu_write(perf_in_guest, true);
+
+unlock:
+	perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+}
+EXPORT_SYMBOL_GPL(perf_guest_enter);
+
+void perf_guest_exit(void)
+{
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
+
+	lockdep_assert_irqs_disabled();
+
+	perf_ctx_lock(cpuctx, cpuctx->task_ctx);
+
+	if (WARN_ON_ONCE(!__this_cpu_read(perf_in_guest)))
+		goto unlock;
+
+	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
+	ctx_sched_in(&cpuctx->ctx, EVENT_GUEST);
+	perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST);
+	if (cpuctx->task_ctx) {
+		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
+		ctx_sched_in(cpuctx->task_ctx, EVENT_GUEST);
+		perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST);
+	}
+
+	__this_cpu_write(perf_in_guest, false);
+unlock:
+	perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
+}
+EXPORT_SYMBOL_GPL(perf_guest_exit);
+
 /*
  * Holding the top-level event's child_mutex means that any
  * descendant process that has inherited this event will block

[RFC,v3,10/58] perf: Add generic exclude_guest support

Commit Message

Comments

Patch