From patchwork Mon Dec 12 12:58:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071099
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0F47CC00145
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 12:59:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232473AbiLLM7X (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 07:59:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58460 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230113AbiLLM7R (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:17 -0500
Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com
 [IPv6:2607:f8b0:4864:20::102f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F483C78;
        Mon, 12 Dec 2022 04:59:16 -0800 (PST)
Received: by mail-pj1-x102f.google.com with SMTP id
 o1-20020a17090a678100b00219cf69e5f0so15616334pjj.2;
        Mon, 12 Dec 2022 04:59:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=eC2iVrhN64qx0wh1++pd8s8D/4z8tTuZKSx0qqh+jVk=;
        b=oaByAgTh/mGEsWxXHZhNoHOBQEsZc14IJwDr2btqBiJxfwbgrsIlb9Be3eeDwVB8yn
         8lLvfVs9DcNU69/EZpheUyDC+TWzd9tcqcRnwZu9v5yKueGX//nlPtoCp+pg1dWf2WPs
         NeY/qq2otESM0rg3evIACrKeQazyqdSa9ZFwsl8ljsDk+OQKfpAnsnMljwczl4Kap00p
         j0+/r0QonsHnbcSKj8O4v35+txfNyJnf9UO2dBS+7JQ8O3nL3jNb2s1suZmIwpyHJka5
         leuOe9bT/YJY0KJ5bcpEuGjGDjy92bxqn/JQKaJi42HbpmKkn4L8mg9JH+9q8W628to0
         LHng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=eC2iVrhN64qx0wh1++pd8s8D/4z8tTuZKSx0qqh+jVk=;
        b=CMbM1Z1x+u1DOKkCpGX7MWtRYWYQKavZLrvy5p+ytEfwMVWOPYaoQZaI+gkjcUsAgP
         Sznb9VWjUGeA4vkTNnoSnyGRPzGaP88dJ5X5/32NmkVIOPHPSW4J1GjvI6X3pQTk2FZN
         aMPYf4aNAM3Hucz7AYk0s64/xWiFtG54KiT3cU5wcI0sJJT+eQHu4j6t3rP4OiqxaBw9
         7cK4OHlFbY6aY44jloaK5uCrfjf204MnNePJhJp9diIGyR/CGu6YLwbgIwC09ENn9k6m
         C4a/NsA6Z3k8EhMswUX8eg8Ab0eMk41LqdkWmZdVCEB0ziWd35RTaH95Gelqz+gqQe69
         qPIA==
X-Gm-Message-State: ANoB5pmO/45ElxiFU+DcnAf78mT0SyQrRUQuzWDYSzfmNQGn+LCge6hK
        Wy940xzu8P8cnr8neScXTm8=
X-Google-Smtp-Source: 
 AA0mqf4KTmGbap+xoQWGEIBCNcnSByCIwuI9ifKJmSUv2uj5R1Bvnm7bWWoLlnJCQMM8sJMN59So0A==
X-Received: by 2002:a17:902:ba95:b0:185:441f:7091 with SMTP id
 k21-20020a170902ba9500b00185441f7091mr15323679pls.22.1670849956058;
        Mon, 12 Dec 2022 04:59:16 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.12
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:15 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
        Fenghua Yu <fenghua.yu@intel.com>, kvmarm@lists.linux.dev,
        linux-perf-users@vger.kernel.org
Subject: [PATCH RFC 1/8] perf/core: Add *group_leader to
 perf_event_create_kernel_counter()
Date: Mon, 12 Dec 2022 20:58:37 +0800
Message-Id: <20221212125844.41157-2-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

Like syscalls users, kernel-space perf_event creators may also use group
counters abstraction to gain pmu functionalities, and an in-kernel counter
groups behave much like normal 'single' counters, following the group
semantics-based behavior.

No functional changes at this time. An example will be that KVM creates
Intel slot event as group leader and other topdown metric events to emulate
MSR_PERF_METRICS pmu capability for guests.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: kvmarm@lists.linux.dev
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/arm64/kvm/pmu-emul.c                 | 4 ++--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 4 ++--
 arch/x86/kvm/pmu.c                        | 2 +-
 arch/x86/kvm/vmx/pmu_intel.c              | 2 +-
 include/linux/perf_event.h                | 1 +
 kernel/events/core.c                      | 4 +++-
 kernel/events/hw_breakpoint.c             | 4 ++--
 kernel/events/hw_breakpoint_test.c        | 2 +-
 kernel/watchdog_hld.c                     | 2 +-
 9 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 24908400e190..11c3386bc86b 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -624,7 +624,7 @@ static void kvm_pmu_create_perf_event(struct kvm_pmc *pmc)
 
 	attr.sample_period = compute_period(pmc, kvm_pmu_get_pmc_value(pmc));
 
-	event = perf_event_create_kernel_counter(&attr, -1, current,
+	event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
 						 kvm_pmu_perf_overflow, pmc);
 
 	if (IS_ERR(event)) {
@@ -713,7 +713,7 @@ static struct arm_pmu *kvm_pmu_probe_armpmu(void)
 	attr.config = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;
 	attr.sample_period = GENMASK(63, 0);
 
-	event = perf_event_create_kernel_counter(&attr, -1, current,
+	event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
 						 kvm_pmu_perf_overflow, &attr);
 
 	if (IS_ERR(event)) {
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index d961ae3ed96e..43e54bb200cd 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -952,12 +952,12 @@ static int measure_residency_fn(struct perf_event_attr *miss_attr,
 	u64 tmp;
 
 	miss_event = perf_event_create_kernel_counter(miss_attr, plr->cpu,
-						      NULL, NULL, NULL);
+						      NULL, NULL, NULL, NULL);
 	if (IS_ERR(miss_event))
 		goto out;
 
 	hit_event = perf_event_create_kernel_counter(hit_attr, plr->cpu,
-						     NULL, NULL, NULL);
+						     NULL, NULL, NULL, NULL);
 	if (IS_ERR(hit_event))
 		goto out_miss;
 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index eb594620dd75..f6c8180241d7 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -204,7 +204,7 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 			attr.precise_ip = 3;
 	}
 
-	event = perf_event_create_kernel_counter(&attr, -1, current,
+	event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
 						 kvm_perf_overflow, pmc);
 	if (IS_ERR(event)) {
 		pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f951dc756456..b746381307c7 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -299,7 +299,7 @@ int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu)
 	}
 
 	event = perf_event_create_kernel_counter(&attr, -1,
-						current, NULL, NULL);
+						current, NULL, NULL, NULL);
 	if (IS_ERR(event)) {
 		pr_debug_ratelimited("%s: failed %ld\n",
 					__func__, PTR_ERR(event));
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0031f7b4d9ab..5f34e1d0bff8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1023,6 +1023,7 @@ extern struct perf_event *
 perf_event_create_kernel_counter(struct perf_event_attr *attr,
 				int cpu,
 				struct task_struct *task,
+				struct perf_event *group_leader,
 				perf_overflow_handler_t callback,
 				void *context);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7f04f995c975..f671b1a9a691 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12674,12 +12674,14 @@ SYSCALL_DEFINE5(perf_event_open,
  * @attr: attributes of the counter to create
  * @cpu: cpu in which the counter is bound
  * @task: task to profile (NULL for percpu)
+ * @group_leader: event group leader
  * @overflow_handler: callback to trigger when we hit the event
  * @context: context data could be used in overflow_handler callback
  */
 struct perf_event *
 perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 				 struct task_struct *task,
+				 struct perf_event *group_leader,
 				 perf_overflow_handler_t overflow_handler,
 				 void *context)
 {
@@ -12694,7 +12696,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	if (attr->aux_output)
 		return ERR_PTR(-EINVAL);
 
-	event = perf_event_alloc(attr, cpu, task, NULL, NULL,
+	event = perf_event_alloc(attr, cpu, task, group_leader, NULL,
 				 overflow_handler, context, -1);
 	if (IS_ERR(event)) {
 		err = PTR_ERR(event);
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index c3797701339c..65b5b1421e62 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -771,7 +771,7 @@ register_user_hw_breakpoint(struct perf_event_attr *attr,
 			    void *context,
 			    struct task_struct *tsk)
 {
-	return perf_event_create_kernel_counter(attr, -1, tsk, triggered,
+	return perf_event_create_kernel_counter(attr, -1, tsk, NULL, triggered,
 						context);
 }
 EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
@@ -881,7 +881,7 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
 
 	cpus_read_lock();
 	for_each_online_cpu(cpu) {
-		bp = perf_event_create_kernel_counter(attr, cpu, NULL,
+		bp = perf_event_create_kernel_counter(attr, cpu, NULL, NULL,
 						      triggered, context);
 		if (IS_ERR(bp)) {
 			err = PTR_ERR(bp);
diff --git a/kernel/events/hw_breakpoint_test.c b/kernel/events/hw_breakpoint_test.c
index c57610f52bb4..b3597df12284 100644
--- a/kernel/events/hw_breakpoint_test.c
+++ b/kernel/events/hw_breakpoint_test.c
@@ -39,7 +39,7 @@ static struct perf_event *register_test_bp(int cpu, struct task_struct *tsk, int
 	attr.bp_addr = (unsigned long)&break_vars[idx];
 	attr.bp_len = HW_BREAKPOINT_LEN_1;
 	attr.bp_type = HW_BREAKPOINT_RW;
-	return perf_event_create_kernel_counter(&attr, cpu, tsk, NULL, NULL);
+	return perf_event_create_kernel_counter(&attr, cpu, tsk, NULL, NULL, NULL);
 }
 
 static void unregister_test_bp(struct perf_event **bp)
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..bb755dadba54 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -173,7 +173,7 @@ static int hardlockup_detector_event_create(void)
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
 
 	/* Try to register using hardware perf events */
-	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL, NULL,
 					       watchdog_overflow_callback, NULL);
 	if (IS_ERR(evt)) {
 		pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,

From patchwork Mon Dec 12 12:58:38 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071100
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 20260C04FDE
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 12:59:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232494AbiLLM70 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 07:59:26 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58490 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232426AbiLLM7T (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:19 -0500
Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com
 [IPv6:2607:f8b0:4864:20::62b])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B51AE1082;
        Mon, 12 Dec 2022 04:59:18 -0800 (PST)
Received: by mail-pl1-x62b.google.com with SMTP id s7so11984536plk.5;
        Mon, 12 Dec 2022 04:59:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=V4wbgmOrD3DUv8ZzkSj5oeUj/UNys0rAmXMTByr8Jyg=;
        b=ZzhhQG8MG+rqrkCMIUAez/3prk2GWus/OM6jHIg0zYQL/mNme4EGro0cumi02KzI/g
         k2bYuGAyE+mVdelcHESmbN02tWwIdoFevT8TZmfgJwnatlim9RGV3I2yIHs64jqY2aXm
         o+fRS0nKb7UGVz/Uhtonm+W5BPf0wFCCGeydhO2QQdZ4n6BP2OOeZRI0lXYvYWOjeHAe
         qtT3/AR3WPbTUIiCZsdYVNovLAttvhXBS8EHDktrLQ2RUi6lPvu7WpHrFPDhpoPQnbbc
         YpTZv+pe95x0ncI5NKZHxH56Rrth8n64oaBzzYj68Cw+uF0i2qv4p9q6meatXy07zWbS
         D/mg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=V4wbgmOrD3DUv8ZzkSj5oeUj/UNys0rAmXMTByr8Jyg=;
        b=UHECMnTyCP2oS3SvdMaMRCh1Sxf+/eX/QrVwWpYs2I94AeuHHmX5OYL5SCr8oVFyIY
         izWC88pRl4oQ+18cg9tb/X7YPGF4F5CETZui/d4YSFJJfcToYUgQCbt88GSbLuk6lg9R
         e+8bmRoRMGMyZSH0CcDVhUzOAMXqf2NwOHdQSftCjjHcBVlqhvf0WoR2X90KWpmRcs3n
         R/96Dj3kG6xNOWYVHsPuYByyS9lHjKK2vaiIrjZUb/huVgRdVQvHcCfuFkdjvEQktwPX
         jjQc7N0jD3/GFgmpZ2K3KYdNXNqIUa45nDiOpfYtHEpodeAg7PvnN5+xUzehsNLaAvtp
         n5bA==
X-Gm-Message-State: ANoB5pkuDGRzrparPUmWRFg4UNS6J3BRACHKg9rQOQdwRhbAvWvsCtUh
        HvzzkNAbPPfm70kDy1is7kY=
X-Google-Smtp-Source: 
 AA0mqf7/KyG/5JioQ3eTe1PgU3hbCVnMTrnBxOW44kyUrXK/XVQllGVcEDCPdpvHQP9cJ3gCLj0Bgg==
X-Received: by 2002:a17:902:a616:b0:18b:cea3:644 with SMTP id
 u22-20020a170902a61600b0018bcea30644mr13835148plq.36.1670849958305;
        Mon, 12 Dec 2022 04:59:18 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.16
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:18 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org, linux-perf-users@vger.kernel.org
Subject: [PATCH RFC 2/8] perf: x86/core: Expose the available number of the
 Topdown metrics
Date: Mon, 12 Dec 2022 20:58:38 +0800
Message-Id: <20221212125844.41157-3-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

Intel Sapphire Rapids server has 8 metrics events, while the Intel Ice Lake
only supports 4 metrics events. The available number of the Topdown
metrics are model specific without architecture hint.

To support guest Topdown metrics,  KVM may only rely on the cpu model
to emulate the correct number of metrics event on the platforms. It would
be nice to have the perf core tell KVM the available number of Topdown
metrics, just like x86_pmu.num_counters.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/events/core.c            | 1 +
 arch/x86/include/asm/perf_event.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index b30b8bbcd1e2..d0d84c7a6876 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -3006,6 +3006,7 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	 * which available for all cores.
 	 */
 	cap->num_counters_gp	= x86_pmu.num_counters;
+	cap->num_topdown_events = x86_pmu.num_topdown_events;
 	cap->num_counters_fixed	= x86_pmu.num_counters_fixed;
 	cap->bit_width_gp	= x86_pmu.cntval_bits;
 	cap->bit_width_fixed	= x86_pmu.cntval_bits;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 5d0f6891ae61..3e263d291595 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -219,6 +219,7 @@ struct x86_pmu_capability {
 	int		version;
 	int		num_counters_gp;
 	int		num_counters_fixed;
+	int		num_topdown_events;
 	int		bit_width_gp;
 	int		bit_width_fixed;
 	unsigned int	events_mask;

From patchwork Mon Dec 12 12:58:39 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071101
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 89D15C4332F
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 12:59:31 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232471AbiLLM73 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 07:59:29 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58594 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232469AbiLLM7X (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:23 -0500
Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com
 [IPv6:2607:f8b0:4864:20::62e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 393F56160;
        Mon, 12 Dec 2022 04:59:21 -0800 (PST)
Received: by mail-pl1-x62e.google.com with SMTP id m4so11995131pls.4;
        Mon, 12 Dec 2022 04:59:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=OiLHXHKbQd18qU/wJIAEe1ddmbdvgaYtPXkeXnPK4Mc=;
        b=lLBbEI1nTIJOGEDFisYJob1XhIARuSKC0/jtyDs0opcR99IRYFPtY91txhFtVGZmSW
         XBNleig4SV8X0ilug2GL4IS+NMfBUG9qYgHnRnig69KYgubkmifnBg9Lm3cLMBl//rG/
         f0FBjyamlMtBahVDdXJWVOVA0yu0xhme5Ei1yq4rk7cSRumxje8jou4VAGIDooOqNDRU
         0s+dgky3tUWgRpmuwjv+DeH9bYI6W8GPOv2FUICsFSI6AhrJXc272vMO9FUCOOdpzHRt
         xP5FGwuaFhp/KPVqWaaesBGek/EI2MtaZmOjBwgCagpLLBZb+8JSu1dhYUV8MBK5gxd5
         adxg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=OiLHXHKbQd18qU/wJIAEe1ddmbdvgaYtPXkeXnPK4Mc=;
        b=PS6BTp4RV7ZmPz8MtZvE0TWenG9SW8F2pivTNWWYUE/PECa91BaaidqtK563cR1glk
         +XjqgeGlW4IlNAAsyU14yLixHrOZRVMiVohVi6/yEfOsiSF7UzzGZM5Vf8JrKTtQQsps
         871xMVql9thM6KjrIZbpuziFZ1YCpTQ0QWay/Uwp0/Oez+mu9IKzHz7eGJA4XnArt+c9
         UWMdSUgFfcpkFW6hvMMb6sLvExRxSRWOOmL6BPIbPcu7t29T1INkkpc5YcmflTGl45wy
         1E+H8UQFWD2TAxtpJDihDPbDnDdR6wdBQWYc1kUAEUmoaXQLRoguAKeKw5oX6vja4l0R
         rxKg==
X-Gm-Message-State: ANoB5pmUIeMxZawfnpvOfYqZGgYo+jWCBwEQeeHatQvN9sYP/joMFYHg
        Fe/KS2fmQFOpu7nE4kV7uOATgfcwAlSK3Q8G
X-Google-Smtp-Source: 
 AA0mqf7Yz24FUgeNHPPXXOccqL227PFPrS8ouzv5RWtsGZv0wgyG8IldDLZyL3QB8JkOnU2GRBKh9w==
X-Received: by 2002:a17:902:7598:b0:188:92d6:49c8 with SMTP id
 j24-20020a170902759800b0018892d649c8mr16581800pll.48.1670849960678;
        Mon, 12 Dec 2022 04:59:20 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:20 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org, linux-perf-users@vger.kernel.org
Subject: [PATCH RFC 3/8] perf: x86/core: Snyc PERF_METRICS bit together with
 fixed counter3
Date: Mon, 12 Dec 2022 20:58:39 +0800
Message-Id: <20221212125844.41157-4-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

When the guest uses topdown (the fixed counter 3 and perf_metrics msr),
the sharing rule on the PERF_METRICS bit on the GLOBAL_CTRL msr does
not change, that is, it should be updated synchronously with the fixed
counter 3. Considering that guest topdown feature has just been enabled,
this is not a strictly bug fix.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/events/intel/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1b92bf05fd65..e7897fd9f7ab 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2436,6 +2436,8 @@ static void intel_pmu_disable_fixed(struct perf_event *event)
 		 */
 		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
 			return;
+
+		intel_clear_masks(event, GLOBAL_CTRL_EN_PERF_METRICS);
 		idx = INTEL_PMC_IDX_FIXED_SLOTS;
 	}
 
@@ -2729,6 +2731,7 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
 			return;
 
+		intel_set_masks(event, GLOBAL_CTRL_EN_PERF_METRICS);
 		idx = INTEL_PMC_IDX_FIXED_SLOTS;
 	}
 

From patchwork Mon Dec 12 12:58:40 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071102
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 840C3C00145
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 12:59:47 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232468AbiLLM7p (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 07:59:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58706 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232498AbiLLM70 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:26 -0500
Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com
 [IPv6:2607:f8b0:4864:20::1029])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E380C108A;
        Mon, 12 Dec 2022 04:59:23 -0800 (PST)
Received: by mail-pj1-x1029.google.com with SMTP id
 3-20020a17090a098300b00219041dcbe9so12111208pjo.3;
        Mon, 12 Dec 2022 04:59:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=h5m/rCe97xq/FcJFWy/8y2nBtJ5MNmBfj7N4aCVlUd4=;
        b=aQa7/0t1BOC1wyJTP4l/5k5VRYHAS4mj4MbMCdo52rXUjEFmh5BuErXME4RBHRRDcg
         u7KoPi9+pQNixKbjPcKiJxjZRfT6sZuqpLCB83q5v+tQu82uH/iR2t0hOxWjLhr36Etk
         J+zCwb+9B3yVsKfBqMke4duedRNU7kDqmgpvoynIrK0uMkrDzuP8NZFEuCDCQ9+y6qID
         H/hcEEFwAUdshE7UmXMsyPXKoy1i1VGHaqNJ0ZdiPhhAeB3iY9ShKFI6fjFEqZMkz7PQ
         Y5h3N/fuaqpLacmLlPTIXi9ghuUasf0Cj6oJRDVJ9cjHnFzKHv9dFxFNVn/NhqezqitB
         zsbA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=h5m/rCe97xq/FcJFWy/8y2nBtJ5MNmBfj7N4aCVlUd4=;
        b=ds1fp1a2PRXOUkIuYptDsiDO5hdBq1BFeSXut+qxZ7Ba+PoTTPvUlsZDA504SUur+t
         4cNXnjFlY3sItA0RP6rNjSdhCvRcw1aIlT+4RwFIGyj97vIvlghPzrVQVVJq8vr4NzJh
         Z0WhNyXL4JlNSLnTegcVlk/tWhjWIUKttdLIm63bEucQIZxUDpBNCzefm5T8J3JINVRh
         tYT1KWpCgKedVT4ImKhlFjsgNjq7tOrE+7gDqmZbVFtRPVu+PZ0MK/GdAqk9tpYxbjW7
         jxChja/2Ze0UkMBb1TFsbslTQcPScp4fRtnY1CaFUAlO91I9hUtcqF7YYi9Ym/g1AK7a
         i+Pw==
X-Gm-Message-State: ANoB5pmMdnrELhqtjX306wjPO1fs69+r2wPCG/gTi+ZIyJlHtHu3Je4q
        osABvo3juEUpeG69wLZ/9b8=
X-Google-Smtp-Source: 
 AA0mqf4cANmL1rZ+LV2ViIGMf4hLKWJe0EK2HscwOWu8nKVRjk43Ohwr92YK6ECnzQ67pHeezQqndw==
X-Received: by 2002:a17:902:bd02:b0:188:fc0c:b736 with SMTP id
 p2-20020a170902bd0200b00188fc0cb736mr16904622pls.67.1670849963312;
        Mon, 12 Dec 2022 04:59:23 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.20
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:23 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org
Subject: [PATCH RFC 4/8] KVM: x86/pmu: Add Intel CPUID-hinted Topdown Slots
 event
Date: Mon, 12 Dec 2022 20:58:40 +0800
Message-Id: <20221212125844.41157-5-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

This event counts the total number of available slots for an unhalted
logical processor. Software can use this event as the denominator for the
top-level metrics of the Top-down Microarchitecture Analysis method.

Although the MSR_PERF_METRICS MSR required for topdown events is not
currently available in the guest, relying only on the data provided by
the slots event is sufficient for pmu users to perceive differences in
cpu pipeline machine-width across micro-architectures.

The standalone slots event, like the instruction event, can be counted
with gp counter or fixed counter 3 (if any). As the last CPUID-hinted
Architectural Performance Events, its availability is also controlled by
CPUID.AH.EBX. On the Linux, perf user may encode "-e cpu/event=0xa4,
umask=0x01/" or "-e cpu/slots/" to count slots events.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b746381307c7..d86a6ba8c3f9 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -31,10 +31,11 @@
  * 4 - PERF_COUNT_HW_CACHE_MISSES
  * 5 - PERF_COUNT_HW_BRANCH_INSTRUCTIONS
  * 6 - PERF_COUNT_HW_BRANCH_MISSES
+ * 7 - CPUID-hinted Topdown Slots event (available on gp counter)
  *
  * the second part of hw_events is defined by the generic kernel perf:
  *
- * 7 - PERF_COUNT_HW_REF_CPU_CYCLES
+ * 8 - PERF_COUNT_HW_REF_CPU_CYCLES
  */
 static struct kvm_pmu_hw_event intel_arch_events[] = {
 	[0] = { 0x3c, 0x00 },
@@ -44,12 +45,13 @@ static struct kvm_pmu_hw_event intel_arch_events[] = {
 	[4] = { 0x2e, 0x41 },
 	[5] = { 0xc4, 0x00 },
 	[6] = { 0xc5, 0x00 },
+	[7] = { 0xa4, 0x01 },
 	/* The above index must match CPUID 0x0A.EBX bit vector */
-	[7] = { 0x00, 0x03 },
+	[8] = { 0x00, 0x03 },
 };
 
 /* mapping between fixed pmc index and intel_arch_events array */
-static int fixed_pmc_events[] = {1, 0, 7};
+static int fixed_pmc_events[] = {1, 0, 8};
 
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
@@ -109,7 +111,7 @@ static bool intel_hw_event_available(struct kvm_pmc *pmc)
 			continue;
 
 		/* disable event that reported as not present by cpuid */
-		if ((i < 7) && !(pmu->available_event_types & (1 << i)))
+		if (i < 8 && !(pmu->available_event_types & (1 << i)))
 			return false;
 
 		break;

From patchwork Mon Dec 12 12:58:41 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071103
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F0CDDC4167B
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 13:00:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232553AbiLLNAB (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 08:00:01 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59098 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232487AbiLLM7l (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:41 -0500
Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com
 [IPv6:2607:f8b0:4864:20::102c])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2E23E96;
        Mon, 12 Dec 2022 04:59:26 -0800 (PST)
Received: by mail-pj1-x102c.google.com with SMTP id t17so11942852pjo.3;
        Mon, 12 Dec 2022 04:59:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=M9Y0ocwG9zsjdx0J9icRyIrQgGVTmoJrYRtW0USxDT4=;
        b=fz4/d0WM30pFVK77zeklGc4Rvk2A3SLGoDBg8Mv2EZDKnx79zXZaX62GWoKn1a+cEb
         eUS1AiQvyeWhCvJ97KJrkxO8Y+8O9HivOlIanbMEK/ZfMFcaEpBirpWyfHe+ni2Hcwg8
         JjRiHLfsLepHQH0xzJGw2ZXoJ2TZoJFvw4eknnQq1EMrnswQ/oofuCXkrLtBArtY1A9E
         MgPJHCQl3546Swh3hCiGHrshzB62tA+zoVwsKbA6p2bdC8vtk/K3l+41bH2oKf57ww8y
         3e9fBV2Z74OiTdrKGmQbvmUoiJkmOsffUl8MArFLBurrvS/2qC3h/gSZ22TAc2rN3dqd
         L3bQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=M9Y0ocwG9zsjdx0J9icRyIrQgGVTmoJrYRtW0USxDT4=;
        b=E95UurlgE7fHrXFqA2ZBJzTtpmBScDpW1LARean2cDurMSBOe1YloyHAZB+aPvCv9p
         46K6Y/Fy1FgpfKkYlL7nUgPP7s0lHbG4tdXtgv6uMmB7n8aqSWPZ0TrHCY5FQEOO0rIl
         uD7xfq0AB1FzASQEheudRjno/Pn8vI5JM76cBBG+2yiLBvkp5jU7GHsQqAwlnN1hIYx6
         Auxf2v33YcMYWwRn5VgrUQtTiSHF4hxoGvbg2iIz9yA9Je3UWvkiIx4YqMn6gpUgYfrn
         CbKeDXNISuDtpyCZRFlaKcInwXuu/pKdH+QjIF6T5jea9RR1cqfsKio2Gw8ln7U+fUEx
         oUBQ==
X-Gm-Message-State: ANoB5plOPRbC//crzsdYUojgwu5TpQKdEKJsHHp71TrHNInNJgXMHfUy
        +z9AH3XW/SIvlHo0e14lEl4=
X-Google-Smtp-Source: 
 AA0mqf45rp7nvp2txzzXp+3WgpaoxYAsQ305+KFHfNFhX5uLlAub1xPrSfRdyjynq7N4PytMnVdI3Q==
X-Received: by 2002:a17:902:f64d:b0:189:603d:ea71 with SMTP id
 m13-20020a170902f64d00b00189603dea71mr16863149plg.58.1670849966211;
        Mon, 12 Dec 2022 04:59:26 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.23
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:26 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org
Subject: [PATCH RFC 5/8] KVM: x86/pmu: Add kernel-defined slots event to
 enable Fixed Counter3
Date: Mon, 12 Dec 2022 20:58:41 +0800
Message-Id: <20221212125844.41157-6-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

The Topdown Slots event can be enabled on gp counter or fixed counter3 and
it does not differ from other fixed counters in terms of the use of count
and sampling modes (except for the hardware logic for event accumulation).

According to commit 6017608936c1 ("perf/x86/intel: Add Icelake support"),
KVM or any perf in-kernel user needs to reprogram fixed counter3 via the
kernel-defined Topdown Slots event for real fixed counter3 on the host.

Opportunistically fix a typo, s/msrs_to_saved_all/msrs_to_save_all/.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/vmx/pmu_intel.c    | 4 +++-
 arch/x86/kvm/x86.c              | 6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa4eb8cfcd7e..413f2e104543 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -513,7 +513,7 @@ struct kvm_pmc {
 #define KVM_INTEL_PMC_MAX_GENERIC	8
 #define MSR_ARCH_PERFMON_PERFCTR_MAX	(MSR_ARCH_PERFMON_PERFCTR0 + KVM_INTEL_PMC_MAX_GENERIC - 1)
 #define MSR_ARCH_PERFMON_EVENTSEL_MAX	(MSR_ARCH_PERFMON_EVENTSEL0 + KVM_INTEL_PMC_MAX_GENERIC - 1)
-#define KVM_PMC_MAX_FIXED	3
+#define KVM_PMC_MAX_FIXED	4
 #define KVM_AMD_PMC_MAX_GENERIC	6
 struct kvm_pmu {
 	unsigned nr_arch_gp_counters;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index d86a6ba8c3f9..637fd709f5f4 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -36,6 +36,7 @@
  * the second part of hw_events is defined by the generic kernel perf:
  *
  * 8 - PERF_COUNT_HW_REF_CPU_CYCLES
+ * 9 - Kernel-defined Topdown Slots event (available on fixed counter 3)
  */
 static struct kvm_pmu_hw_event intel_arch_events[] = {
 	[0] = { 0x3c, 0x00 },
@@ -48,10 +49,11 @@ static struct kvm_pmu_hw_event intel_arch_events[] = {
 	[7] = { 0xa4, 0x01 },
 	/* The above index must match CPUID 0x0A.EBX bit vector */
 	[8] = { 0x00, 0x03 },
+	[9] = { 0x00, 0x04 },
 };
 
 /* mapping between fixed pmc index and intel_arch_events array */
-static int fixed_pmc_events[] = {1, 0, 8};
+static int fixed_pmc_events[] = {1, 0, 8, 9};
 
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 312aea1854ae..0b61cb58c877 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1435,7 +1435,7 @@ static const u32 msrs_to_save_all[] = {
 	MSR_IA32_UMWAIT_CONTROL,
 
 	MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1,
-	MSR_ARCH_PERFMON_FIXED_CTR0 + 2,
+	MSR_ARCH_PERFMON_FIXED_CTR0 + 2, MSR_ARCH_PERFMON_FIXED_CTR0 + 3,
 	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
 	MSR_CORE_PERF_GLOBAL_CTRL, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
 	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
@@ -7001,8 +7001,8 @@ static void kvm_init_msr_list(void)
 	u32 dummy[2];
 	unsigned i;
 
-	BUILD_BUG_ON_MSG(KVM_PMC_MAX_FIXED != 3,
-			 "Please update the fixed PMCs in msrs_to_saved_all[]");
+	BUILD_BUG_ON_MSG(KVM_PMC_MAX_FIXED != 4,
+			 "Please update the fixed PMCs in msrs_to_save_all[]");
 
 	num_msrs_to_save = 0;
 	num_emulated_msrs = 0;

From patchwork Mon Dec 12 12:58:42 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071104
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D4921C4167B
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 13:00:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232578AbiLLNAF (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 08:00:05 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58706 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232490AbiLLM7n (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:43 -0500
Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com
 [IPv6:2607:f8b0:4864:20::634])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2059F645D;
        Mon, 12 Dec 2022 04:59:29 -0800 (PST)
Received: by mail-pl1-x634.google.com with SMTP id p24so11999097plw.1;
        Mon, 12 Dec 2022 04:59:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=SFgEeA5A7ARtU0YSkuHa+HRfQvgLKouYsg8RDNfDz84=;
        b=hYvw5QeX9zFE2tyoogipEA2SeBdLA6a/ID7vo7vUKHSbeiCDmKaygXXPX9Lo/3dQjL
         wDiPRNvYbLUhLxEz1lBKzyJmt13paqxtVTuhVHIblyclvXcrDilOupWQWydmpEVdpFpy
         lDgIiyKeSvJsJNuCp0Tu268yoqxmRnpHathFzO8QzT4e5PrMZG0D4DbnCrhwJELQMy79
         Ujk9s+ZmR0bDRaS36P2+h9a6hTisgO00ebt07mRmYVtX0KUHLztp2EvzowM+WBLx6uKU
         xFmaUoAvOb2K4a10+AlucSIyFqwuudy4fzPp21Q3xusMsOQjYiBwjvAGLMJwxb/LEkwz
         wgjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=SFgEeA5A7ARtU0YSkuHa+HRfQvgLKouYsg8RDNfDz84=;
        b=6XWQ1H7KaaZUWgp6unYZimEQE2BDMR/X92Yf4qyY74kzHkn6FrwRJSQlPVeX4NFvEZ
         0RWHfTLGAFhO21eLR0jyVq19JDZTD3SJDFl8wbRuZdaAtxNvTBU0yTgs5grxcwxY1Nf9
         BU0x+CmWlPwCC1JKjcILfJ4bTsV1Qfl19qRc1pE1Z8eSHRHEjP1s3khEkNu1GMKQ08Q/
         U5UeMOXZo/5iRGnbjFPSQEDF9uSxVzdTUlXG6ZgaBaxXxdUUgALlmi73MdkK6o351p7p
         6+lzWktduHvW862377smeEh0dWicKDVG3tl+s0RcO82RG+VgqAefZgmOQbqSThMC7ucD
         v39w==
X-Gm-Message-State: ANoB5pngMMcKZhSL3/QYoLlS6M5pSEnkRxaoP0uvUDPClW+CT35v+O9X
        yV5XYb9xn6C1XpFpH9adOggTmZ+C69Wiz7eu
X-Google-Smtp-Source: 
 AA0mqf5emy7VKp+UqKxtG/HsKTOuE75TU96mzsGRf9BNvRLY4E9/UMCoAuUikhRmrXE9VFgbHRWJ5A==
X-Received: by 2002:a17:903:18c:b0:186:b069:98d5 with SMTP id
 z12-20020a170903018c00b00186b06998d5mr20626737plg.69.1670849968562;
        Mon, 12 Dec 2022 04:59:28 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.26
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:28 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org
Subject: [PATCH RFC 6/8] KVM: x86/pmu: properly use INTEL_PMC_FIXED_RDPMC_BASE
 macro
Date: Mon, 12 Dec 2022 20:58:42 +0800
Message-Id: <20221212125844.41157-7-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

Use macro INTEL_PMC_FIXED_RDPMC_BASE in the rdpmc context to
improve readability. No functional change intended.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 637fd709f5f4..b69d337d51d9 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -136,7 +136,7 @@ static bool intel_pmc_is_enabled(struct kvm_pmc *pmc)
 static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-	bool fixed = idx & (1u << 30);
+	bool fixed = idx & INTEL_PMC_FIXED_RDPMC_BASE;
 
 	idx &= ~(3u << 30);
 
@@ -148,7 +148,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 					    unsigned int idx, u64 *mask)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-	bool fixed = idx & (1u << 30);
+	bool fixed = idx & INTEL_PMC_FIXED_RDPMC_BASE;
 	struct kvm_pmc *counters;
 	unsigned int num_counters;
 

From patchwork Mon Dec 12 12:58:43 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071105
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1685BC4332F
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 13:00:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232525AbiLLNAK (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 08:00:10 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59320 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232498AbiLLM7t (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 07:59:49 -0500
Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com
 [IPv6:2607:f8b0:4864:20::62e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 968F012A89;
        Mon, 12 Dec 2022 04:59:31 -0800 (PST)
Received: by mail-pl1-x62e.google.com with SMTP id d7so11973943pll.9;
        Mon, 12 Dec 2022 04:59:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=7atUrLJtNTGOgIKhNavIVJ+xQL4raMerkyaaS0gzJtA=;
        b=Pg3STCDimi35mpNhwD8sMpw6ZNnAuJ+8npxiRMF3tIO6ggOJ7bYOdcb4pUEm9h54Ir
         77a0BRJJqkX7/vDDZW6ZIxRSvz0uH1T6D2mjYqW3+oipDrnanyqg3Bpvwpv6kzlOnpbD
         2feS1Bj8G5OtWiH2y/XLCdowECNskdzsXiNpsjnqErAZkanVG9cqiTbGyYBe5newLdoV
         LwD58udI9Zx+vdQIFdRrOwXwVB6zubbokEru9npczrq8lRsmyPMjHcWH4QKJItG2jQkd
         6Vp21QK4HYPu73aEEuvofZz0W/hfzmPH3AlZFUFxBborLBQVN7g+bPRkYbdJY/S++/Ae
         mtjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=7atUrLJtNTGOgIKhNavIVJ+xQL4raMerkyaaS0gzJtA=;
        b=VSVEsvvUu2RZNhKkHzjQ7T/cubzPnk47UHB4Gy6jEyOvAyYrxvUxjacobbHru/q2Hq
         xqItdHXL1qy0wLg82KcqfCgADDb3ujpY6JHPcAWc/sRk9HSVVy5oopXXVYCUAVV2mQnS
         yFrde7I23oVaYoFdbb5cZElBQFbfejPbODL3pIU/jcL5OuhBcHQRjzrDxzKppOcZJ5jx
         hK9Octi/5hmhwVly+LyfkqERccv8Ypcz0F3mTd1R1sAeXg7zxTXDEoIWjNdXCR/GlCwY
         qDhOXP9zUqdkzdscOA/VNp4sbvYGEcJFZR+uhO3HnpRp2w/rWPTYiZ4a/dry7moNaIPt
         MhTA==
X-Gm-Message-State: ANoB5pmeqIZEwFc26YC6WUoQQKiLP4CpJpaCBAJXydOtx2dws0tj1PEA
        cgqIsC6gj3LdwziVc7yau2QV+ADZ44emoadS
X-Google-Smtp-Source: 
 AA0mqf4KCOWlgyWT+WvddG2AlGtXsFBx0Cg1wzWs1/N5aSdxymgWIAEKNtIwmH1WadJQw1w0nSDVvQ==
X-Received: by 2002:a17:902:a70b:b0:187:2501:72fb with SMTP id
 w11-20020a170902a70b00b00187250172fbmr16553823plq.46.1670849970963;
        Mon, 12 Dec 2022 04:59:30 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:30 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org
Subject: [PATCH RFC 7/8] KVM: x86/pmu: Use flex *event arrays to implement
 grouped events
Date: Mon, 12 Dec 2022 20:58:43 +0800
Message-Id: <20221212125844.41157-8-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

The current vPMU would expect that a pmc tends to represent a hardware
event and use pmc->counter to track the event value.

But from the perspective of the counter designer，a hardware counter
can choose to sacrifice the maximum bit width of the counter and instead
reflect the values of multiple hardware events on different fields of
the same counter.

For example, a 32-bit counter can be divided into four 8-bit wide hardware
counters, and these hardware events are kept incrementing independently
in their own bitfields. Software can read the counter values once to get
the latest values of multiple hardware events, thus saving the number of
software accesses to the hardware interface and in virtualization,
particularly reducing the number of vm-exits.

The most natural way to emulate this all-in-one hardware counter in KVM
is to use a 1-to-N mapping relationship, i.e., a pmc can be associated with
multiple perf_events, and these events are created in a group manner, which
are scheduled in the host perf as a group to obtain hardware resources and
present them to the guest.

In implementation, when the guest accesses this all-in-one counter,
its pmc->max_nr_events changes according to the hardware definition,
triggering the kvm's group event creation path, which is centrally
created and then enabled in order, which eliminates the code differences
of separate enablement. The grouped events are also released as a group.

Which hardware events correspond to each pmc and how to divide the
available bitfields is predefined by the hardware vendor, and KVM
does not freely combine them. A many-to-one pmc is no different from
a one-to-one pmc in many cases. The first pmc->perf_event is always
the source of the pmc->counter value, which implies that most functions
that manipulate the state of pmc->perf_event do not need to be modified.

Note, the specific rules for splicing multiple perf_event->count values
into a single value are defined by the hardware event, which does not
and should not appear in this generic change.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++-
 arch/x86/kvm/pmu.c              | 44 ++++++++++++++++++++++++++-------
 arch/x86/kvm/pmu.h              | 15 ++++++++---
 arch/x86/kvm/svm/pmu.c          |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    |  2 ++
 5 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 413f2e104543..73740aa891d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -494,12 +494,12 @@ enum pmc_type {
 struct kvm_pmc {
 	enum pmc_type type;
 	u8 idx;
+	u8 max_nr_events;
 	bool is_paused;
 	bool intr;
 	u64 counter;
 	u64 prev_counter;
 	u64 eventsel;
-	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
 	/*
 	 * only for creating or reusing perf_event,
@@ -507,6 +507,10 @@ struct kvm_pmc {
 	 * ctrl value for fixed counters.
 	 */
 	u64 current_config;
+	union {
+		struct perf_event *perf_event;
+		DECLARE_FLEX_ARRAY(struct perf_event *, perf_events);
+	};
 };
 
 /* More counters may conflict with other existing Architectural MSRs */
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index f6c8180241d7..ae53a8298dec 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -160,7 +160,7 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 				 bool intr)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-	struct perf_event *event;
+	struct perf_event *event, *group_leader;
 	struct perf_event_attr attr = {
 		.type = type,
 		.size = sizeof(attr),
@@ -172,6 +172,7 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 		.config = config,
 	};
 	bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
+	unsigned int i;
 
 	attr.sample_period = get_sample_period(pmc, pmc->counter);
 
@@ -204,18 +205,39 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 			attr.precise_ip = 3;
 	}
 
-	event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
-						 kvm_perf_overflow, pmc);
-	if (IS_ERR(event)) {
-		pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
-			    PTR_ERR(event), pmc->idx);
-		return PTR_ERR(event);
+	/*
+	 * To create grouped events, the first created perf_event doesn't
+	 * know it will be the group_leader and may move to an unexpected
+	 * enabling path, thus delay all enablement until after creation,
+	 * not affecting non-grouped events to save one perf interface call.
+	 */
+	if (pmc->max_nr_events > 1)
+		attr.disabled = 1;
+
+	for (i = 0; i < pmc->max_nr_events; i++) {
+		group_leader = i ? pmc->perf_event : NULL;
+		event = perf_event_create_kernel_counter(&attr, -1, current, group_leader,
+							 kvm_perf_overflow, pmc);
+		if (IS_ERR(event)) {
+			pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
+					     PTR_ERR(event), pmc->idx);
+			return PTR_ERR(event);
+		}
+
+		pmc->perf_events[i] = event;
+		pmc_to_pmu(pmc)->event_count++;
 	}
 
-	pmc->perf_event = event;
-	pmc_to_pmu(pmc)->event_count++;
 	pmc->is_paused = false;
 	pmc->intr = intr || pebs;
+
+	if (!attr.disabled)
+		return 0;
+
+	/* Enable grouped events in order. */
+	for (i = 0; i < pmc->max_nr_events; i++)
+		perf_event_enable(pmc->perf_events[i]);
+
 	return 0;
 }
 
@@ -223,6 +245,10 @@ static void pmc_pause_counter(struct kvm_pmc *pmc)
 {
 	u64 counter = pmc->counter;
 
+	/* The perf_event_pause() is not suitable for grouped events. */
+	if (pmc->max_nr_events > 1)
+		pmc_stop_counter(pmc);
+
 	if (!pmc->perf_event || pmc->is_paused)
 		return;
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b9e29a199ab8..e4b738b7c208 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -64,12 +64,19 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
 
 static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 {
-	if (pmc->perf_event) {
-		perf_event_release_kernel(pmc->perf_event);
-		pmc->perf_event = NULL;
-		pmc->current_config = 0;
+	unsigned int i;
+
+	for (i = 0; i < pmc->max_nr_events; i++) {
+		if (!pmc->perf_events[i])
+			continue;
+
+		perf_event_release_kernel(pmc->perf_events[i]);
+		pmc->perf_events[i] = NULL;
 		pmc_to_pmu(pmc)->event_count--;
 	}
+
+	pmc->current_config = 0;
+	pmc->max_nr_events = 1;
 }
 
 static inline void pmc_stop_counter(struct kvm_pmc *pmc)
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 0e313fbae055..09e81200f657 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -200,6 +200,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->gp_counters[i].vcpu = vcpu;
 		pmu->gp_counters[i].idx = i;
 		pmu->gp_counters[i].current_config = 0;
+		pmu->gp_counters[i].max_nr_events = 1;
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b69d337d51d9..8e1f679d4d9d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -642,6 +642,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->gp_counters[i].vcpu = vcpu;
 		pmu->gp_counters[i].idx = i;
 		pmu->gp_counters[i].current_config = 0;
+		pmu->gp_counters[i].max_nr_events = 1;
 	}
 
 	for (i = 0; i < KVM_PMC_MAX_FIXED; i++) {
@@ -649,6 +650,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->fixed_counters[i].vcpu = vcpu;
 		pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
 		pmu->fixed_counters[i].current_config = 0;
+		pmu->fixed_counters[i].max_nr_events = 1;
 	}
 
 	lbr_desc->records.nr = 0;

From patchwork Mon Dec 12 12:58:44 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Like Xu <like.xu.linux@gmail.com>
X-Patchwork-Id: 13071106
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EC701C4332F
	for <kvm@archiver.kernel.org>; Mon, 12 Dec 2022 13:00:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232138AbiLLNA2 (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Mon, 12 Dec 2022 08:00:28 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58598 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232508AbiLLNAA (ORCPT <rfc822;kvm@vger.kernel.org>);
        Mon, 12 Dec 2022 08:00:00 -0500
Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com
 [IPv6:2607:f8b0:4864:20::102e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C99B412AC5;
        Mon, 12 Dec 2022 04:59:34 -0800 (PST)
Received: by mail-pj1-x102e.google.com with SMTP id
 q17-20020a17090aa01100b002194cba32e9so15623375pjp.1;
        Mon, 12 Dec 2022 04:59:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ubEWeHlBxKHaYOMQNFJfVx4kICdW2pJ9feDQxbalcaw=;
        b=lezNuKasW5mowgc5jAeyXnxYCNRCyJ+LH+6/UbLQWeu8hmPNRvx5DXCSkkb1nnQFyG
         Sz4bGBpwekANhBlTXfBL4h29X7UStVHC8+KYY2X+jGPav6H4mPPznLitbIfC51IfZ8Gt
         T5IEd2mLEqJnp3hlO/dbHtuztOqDyJDDwZI07sZ4RLft9AYW3hkiYlVA+tRHutR/J7/6
         BGTFRV8njfx92ORk0kaB8heLBkBBh4PrX/zoyEee4m9UXPGmobsPIWss9KS1OvVKBu6r
         yTKbcYNYNj4kp9r+bKTlNDzPASuO16VPlBEC/sBiriN4sQbNoucDyJJ85ma9hl0yYBDI
         BrOA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ubEWeHlBxKHaYOMQNFJfVx4kICdW2pJ9feDQxbalcaw=;
        b=ObqqZLJpdQZ0go+x0dqTaQuTGBF1Jv6CAbeXym8slus4bumshg1PkbGKAEUNZIL0AU
         V93ekjHuGixjG4xZgGCY09TMWnrE1GImDQSSKKqSSFSymP8883k5o/IFUzgoVJgrv7+r
         heBbfFeIWrWX98oRLHvYNKGhzqBJs7QkyxB9RymYOYA/xu22diQP+8Ta6jndwDXXvyv3
         n77I0bGyj1IcX9wahzkBBhLgI5LeP44cXlP8QutLyUBbse2XuunFkPX5rhT11qzspdnc
         M+AwsrFpTdDGhtdhihi8V8NYBX+cUZSZUAOhGwqKS0rK4vtU9z/WDDdJsT65kNrVM1xF
         F81g==
X-Gm-Message-State: ANoB5pnHS5VZmAMCTLZBV3z1vEKOz1nrfKbDm0ooQsoedoxjZZ2UmRju
        4FGJWVj8UYswcoPjGaopBgI=
X-Google-Smtp-Source: 
 AA0mqf45N/otSyPPXNDKFC0sjlQ8lDAWctELzAsQRZSHFE4C3eBpW3pBikcEFkYm4/Z9U6ZWpWuGbQ==
X-Received: by 2002:a17:902:7e06:b0:189:ed85:94d2 with SMTP id
 b6-20020a1709027e0600b00189ed8594d2mr13373461plm.1.1670849973950;
        Mon, 12 Dec 2022 04:59:33 -0800 (PST)
Received: from localhost.localdomain ([103.7.29.32])
        by smtp.gmail.com with ESMTPSA id
 jc3-20020a17090325c300b00186f608c543sm6273927plb.304.2022.12.12.04.59.31
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Dec 2022 04:59:33 -0800 (PST)
From: Like Xu <like.xu.linux@gmail.com>
X-Google-Original-From: Like Xu <likexu@tencent.com>
To: Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
        kvm@vger.kernel.org
Subject: [PATCH RFC 8/8] KVM: x86/pmu: Add MSR_PERF_METRICS MSR emulation to
 enable Topdown
Date: Mon, 12 Dec 2022 20:58:44 +0800
Message-Id: <20221212125844.41157-9-likexu@tencent.com>
X-Mailer: git-send-email 2.38.2
In-Reply-To: <20221212125844.41157-1-likexu@tencent.com>
References: <20221212125844.41157-1-likexu@tencent.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Like Xu <likexu@tencent.com>

The Ice Lake core PMU provides built-in support for Top-down Micro-
architecture Analysis (TMA) method level 1 metrics. These metrics are
always available to cross-validate performance observations, freeing
general purpose counters to count other events in high counter
utilization scenarios.

A new MSR called MSR_PERF_METRICS, hinted by PERF_CAPABILITIES.
PERF_METRICS_AVAILABLE (bit 15), reports four (on icx) TMA Level 1
metrics directly. The lower 32 bits are divided into four 8-bit fields
each of which is an integer fraction of 255. When performance metrics
use type 2000H (INTEL_PMC_FIXED_RDPMC_METRICS), rdpmc could read
the value of perf_metric.

Bit EN_PERF_METRICS[48] have also been added to the following MSRs:
- MSR_CORE_PERF_GLOBAL_CTRL
- MSR_CORE_PERF_GLOBAL_STATUS
- MSR_CORE_PERF_GLOBAL_OVF_CTRL

When it comes to KVM implementation, the topdown mode is only
enabled when guest starts both registers, PERF_METRICS and fixed
counter 3, from zero.

In topdwon mode, vPMU creates a group of zero sample period events
for fixed counter3. The first pmc->event is a group leader slot event,
and its event->count is used to emulate counter value, marking bit 35
when it generates an interrupt, as usual. (Note, the fixed counter3 can
be used independently, for counting or sampling, but isn't compatible
with topdown mode at the same time).

The four (or more) sub-events for perf_metrics are each emulated with
a metric event (its event attr.config is kernel-defined, starting with
INTEL_TD_METRIC_RETIRING), which increments independently and
share the same bit INTEL_PERF_METRICS_IDX on overflow.

When pmu->perf_metric is read, vPMU collects all event->count from
multiple metric events, but their original values have already been
processed by the host perf core (defined in icl_get_metrics_event_value(),
with a division and multiplication). This part of calculation logic needs
to be reversed in vPMU to restore their real values and stitch them
together in turn to form pmu->perf_metric.

The pmc_read_counter() has been enhanced, moved out of the header file,
and wrapped with EXPORT_SYMBOL_GPL. A little trick is that the PMC
corresponding to bit 35 and 48 (INTEL_PERF_METRICS_IDX) are both
fixed counter3, in order to lazy release of all above grouped events.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/include/asm/kvm_host.h  |   6 ++
 arch/x86/include/asm/msr-index.h |   1 +
 arch/x86/kvm/pmu.c               | 105 ++++++++++++++++++++++++++++++-
 arch/x86/kvm/pmu.h               |  16 ++---
 arch/x86/kvm/vmx/pmu_intel.c     |  33 ++++++++++
 arch/x86/kvm/vmx/vmx.c           |   3 +
 arch/x86/kvm/x86.c               |   3 +
 7 files changed, 153 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 73740aa891d3..4f2e2ede09b6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -522,6 +522,7 @@ struct kvm_pmc {
 struct kvm_pmu {
 	unsigned nr_arch_gp_counters;
 	unsigned nr_arch_fixed_counters;
+	unsigned int nr_perf_metrics;
 	unsigned available_event_types;
 	u64 fixed_ctr_ctrl;
 	u64 fixed_ctr_ctrl_mask;
@@ -564,6 +565,11 @@ struct kvm_pmu {
 	 */
 	u64 host_cross_mapped_mask;
 
+	/* Intel Topdown Performance Metrics */
+	u64 perf_metrics;
+	u64 perf_metrics_mask;
+	bool metric_event_ovf;
+
 	/*
 	 * The gate to release perf_events not marked in
 	 * pmc_in_use only once in a vcpu time slice.
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4a2af82553e4..5dde0242ff28 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -246,6 +246,7 @@
 #define PERF_CAP_PEBS_BASELINE         BIT_ULL(14)
 #define PERF_CAP_PEBS_MASK	(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
 				 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)
+#define MSR_CAP_PERF_METRICS           BIT_ULL(15)
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index ae53a8298dec..4bc888462b4f 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -96,6 +96,11 @@ static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
 	kvm_pmu_deliver_pmi(vcpu);
 }
 
+static inline bool pmc_support_perf_metrics(struct kvm_pmc *pmc)
+{
+	return pmc->idx == (INTEL_PMC_IDX_FIXED + 3);
+}
+
 static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
@@ -115,6 +120,10 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 			skip_pmi = __test_and_set_bit(GLOBAL_STATUS_BUFFER_OVF_BIT,
 						      (unsigned long *)&pmu->global_status);
 		}
+	} else if (pmu->metric_event_ovf && pmc_support_perf_metrics(pmc)) {
+		/* At least one of PERF_METRICS sub-counter has overflowed */
+		__set_bit(INTEL_PERF_METRICS_IDX, (unsigned long *)&pmu->global_status);
+		pmu->metric_event_ovf = false;
 	} else {
 		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
 	}
@@ -155,6 +164,16 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
 	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 }
 
+static inline bool perf_metrics_is_enabled(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+	return !pmu->perf_metrics && !pmc->counter &&
+		pmc_support_perf_metrics(pmc) && pmc_speculative_in_use(pmc) &&
+		pmc_is_enabled(pmc) && test_bit(GLOBAL_CTRL_EN_PERF_METRICS,
+						(unsigned long *)&pmu->global_ctrl);
+}
+
 static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 				 bool exclude_user, bool exclude_kernel,
 				 bool intr)
@@ -172,6 +191,7 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 		.config = config,
 	};
 	bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
+	bool topdown = perf_metrics_is_enabled(pmc);
 	unsigned int i;
 
 	attr.sample_period = get_sample_period(pmc, pmc->counter);
@@ -205,6 +225,10 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 			attr.precise_ip = 3;
 	}
 
+	/* A group of events is needed to emulate each perf_metric. */
+	if (topdown)
+		pmc->max_nr_events = pmu->nr_perf_metrics + 1;
+
 	/*
 	 * To create grouped events, the first created perf_event doesn't
 	 * know it will be the group_leader and may move to an unexpected
@@ -215,6 +239,17 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 		attr.disabled = 1;
 
 	for (i = 0; i < pmc->max_nr_events; i++) {
+		if (topdown) {
+			/*
+			 * According to perf core, the group_leader slots event must
+			 * not be a sampling event for topdown metric use, and the
+			 * topdown metric events don't support sampling.
+			 */
+			attr.sample_period = 0;
+			if (i)
+				attr.config = INTEL_TD_METRIC_RETIRING + 0x100 * (i - 1);
+		}
+
 		group_leader = i ? pmc->perf_event : NULL;
 		event = perf_event_create_kernel_counter(&attr, -1, current, group_leader,
 							 kvm_perf_overflow, pmc);
@@ -229,7 +264,7 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 	}
 
 	pmc->is_paused = false;
-	pmc->intr = intr || pebs;
+	pmc->intr = intr || pebs || topdown;
 
 	if (!attr.disabled)
 		return 0;
@@ -263,6 +298,9 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
 	if (!pmc->perf_event)
 		return false;
 
+	if (perf_metrics_is_enabled(pmc) == (pmc->max_nr_events == 1))
+		return false;
+
 	/* recalibrate sample period and check if it's accepted by perf core */
 	if (is_sampling_event(pmc->perf_event) &&
 	    perf_event_period(pmc->perf_event,
@@ -447,6 +485,60 @@ static int kvm_pmu_rdpmc_vmware(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	return 0;
 }
 
+void pmu_read_perf_metrics(struct kvm_pmu *pmu, u64 slots)
+{
+	struct kvm_pmc *pmc = &pmu->fixed_counters[3];
+	u64 old_counter, enabled, running, delta;
+	union {
+		u8 field[KVM_PERF_METRICS_NUM_MAX];
+		u64	data;
+	} perf_metrics;
+	unsigned int i;
+
+	perf_metrics.data = pmu->perf_metrics;
+	for (i = 1; i < pmc->max_nr_events; i++) {
+		if (!pmc->perf_events[i])
+			continue;
+
+		old_counter = perf_metrics.field[i - 1];
+		delta = perf_event_read_value(pmc->perf_events[i], &enabled, &running);
+
+		/*
+		 * Reverse the actual metric counter value out
+		 * according to icl_get_metrics_event_value().
+		 */
+		delta = mul_u64_u64_div_u64(delta, 0xff, slots) + 1;
+		perf_metrics.field[i - 1] = 0xff & (old_counter + delta);
+
+		/* Check if any metric counter have been overflowed. */
+		if (perf_metrics.field[i - 1] < old_counter)
+			pmu->metric_event_ovf = true;
+	}
+
+	if (pmu->metric_event_ovf)
+		__kvm_perf_overflow(pmc, false);
+
+	pmu->perf_metrics = perf_metrics.data & (~pmu->perf_metrics_mask);
+	__set_bit(INTEL_PERF_METRICS_IDX, pmu->pmc_in_use);
+}
+
+u64 pmc_read_counter(struct kvm_pmc *pmc)
+{
+	u64 counter, enabled, running, delta;
+
+	counter = pmc->counter;
+	if (pmc->perf_event && !pmc->is_paused) {
+		delta = perf_event_read_value(pmc->perf_event, &enabled, &running);
+		if (delta && pmc_support_perf_metrics(pmc))
+			pmu_read_perf_metrics(pmc_to_pmu(pmc), delta);
+		counter += delta;
+	}
+
+	/* FIXME: Scaling needed? */
+	return counter & pmc_bitmask(pmc);
+}
+EXPORT_SYMBOL_GPL(pmc_read_counter);
+
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 {
 	bool fast_mode = idx & (1u << 31);
@@ -469,7 +561,16 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	    (kvm_read_cr0(vcpu) & X86_CR0_PE))
 		return 1;
 
-	*data = pmc_read_counter(pmc) & mask;
+	if (idx & INTEL_PMC_FIXED_RDPMC_METRICS) {
+		if (!(vcpu->arch.perf_capabilities & MSR_CAP_PERF_METRICS))
+			return 1;
+
+		pmc_read_counter(&pmu->fixed_counters[3]);
+		*data = pmu->perf_metrics;
+	} else {
+		*data = pmc_read_counter(pmc) & mask;
+	}
+
 	return 0;
 }
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index e4b738b7c208..be800a0c5366 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -10,6 +10,8 @@
 
 #define MSR_IA32_MISC_ENABLE_PMU_RO_MASK (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL |	\
 					  MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
+#define INTEL_PERF_METRICS_IDX GLOBAL_CTRL_EN_PERF_METRICS
+#define KVM_PERF_METRICS_NUM_MAX	8
 
 /* retrieve the 4 bits for EN and PMI out of IA32_FIXED_CTR_CTRL */
 #define fixed_ctrl_field(ctrl_reg, idx) (((ctrl_reg) >> ((idx)*4)) & 0xf)
@@ -42,6 +44,8 @@ struct kvm_pmu_ops {
 };
 
 void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops);
+void pmu_read_perf_metrics(struct kvm_pmu *pmu, u64 slots);
+u64 pmc_read_counter(struct kvm_pmc *pmc);
 
 static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
 {
@@ -50,18 +54,6 @@ static inline u64 pmc_bitmask(struct kvm_pmc *pmc)
 	return pmu->counter_bitmask[pmc->type];
 }
 
-static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
-{
-	u64 counter, enabled, running;
-
-	counter = pmc->counter;
-	if (pmc->perf_event && !pmc->is_paused)
-		counter += perf_event_read_value(pmc->perf_event,
-						 &enabled, &running);
-	/* FIXME: Scaling needed? */
-	return counter & pmc_bitmask(pmc);
-}
-
 static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 {
 	unsigned int i;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8e1f679d4d9d..52fef388fdb0 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -81,6 +81,8 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 	if (pmc_idx < INTEL_PMC_IDX_FIXED) {
 		return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + pmc_idx,
 				  MSR_P6_EVNTSEL0);
+	} else if (pmc_idx == INTEL_PERF_METRICS_IDX) {
+		return get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR3);
 	} else {
 		u32 idx = pmc_idx - INTEL_PMC_IDX_FIXED;
 
@@ -160,6 +162,12 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 		counters = pmu->gp_counters;
 		num_counters = pmu->nr_arch_gp_counters;
 	}
+
+	if (idx & INTEL_PMC_FIXED_RDPMC_METRICS) {
+		fixed = true;
+		idx = 3;
+	}
+
 	if (idx >= num_counters)
 		return NULL;
 	*mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
@@ -229,6 +237,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 		ret = (perf_capabilities & PERF_CAP_PEBS_BASELINE) &&
 			((perf_capabilities & PERF_CAP_PEBS_FORMAT) > 3);
 		break;
+	case MSR_PERF_METRICS:
+		ret = vcpu->arch.perf_capabilities & MSR_CAP_PERF_METRICS;
+		break;
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
 			get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -385,6 +396,10 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_PEBS_DATA_CFG:
 		msr_info->data = pmu->pebs_data_cfg;
 		return 0;
+	case MSR_PERF_METRICS:
+		pmc_read_counter(&pmu->fixed_counters[3]);
+		msr_info->data = pmu->perf_metrics;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -472,6 +487,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_PERF_METRICS:
+		if (pmu->perf_metrics == data)
+			return 0;
+		if (!(data & pmu->perf_metrics_mask)) {
+			pmu->perf_metrics = data;
+			return 0;
+		}
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -545,6 +568,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->fixed_ctr_ctrl_mask = ~0ull;
 	pmu->pebs_enable_mask = ~0ull;
 	pmu->pebs_data_cfg_mask = ~0ull;
+	pmu->perf_metrics_mask = ~0ull;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa);
 	if (!entry || !vcpu->kvm->arch.enable_pmu)
@@ -584,6 +608,13 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
 	counter_mask = ~(((1ull << pmu->nr_arch_gp_counters) - 1) |
 		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED));
+	if (vcpu->arch.perf_capabilities & MSR_CAP_PERF_METRICS) {
+		counter_mask &= ~(1ULL << INTEL_PERF_METRICS_IDX);
+		pmu->nr_perf_metrics = min_t(int, KVM_PERF_METRICS_NUM_MAX,
+					     kvm_pmu_cap.num_topdown_events);
+		for (i = 0; i < pmu->nr_perf_metrics; i++)
+			pmu->perf_metrics_mask &= ~(0xffull << (i * 8));
+	}
 	pmu->global_ctrl_mask = counter_mask;
 	pmu->global_ovf_ctrl_mask = pmu->global_ctrl_mask
 			& ~(MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF |
@@ -656,6 +687,8 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 	lbr_desc->records.nr = 0;
 	lbr_desc->event = NULL;
 	lbr_desc->msr_passthrough = false;
+
+	BUILD_BUG_ON(KVM_PERF_METRICS_NUM_MAX > INTEL_TD_METRIC_NUM);
 }
 
 static void intel_pmu_reset(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fe5615fd8295..57312b5a3d9d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7721,6 +7721,9 @@ static u64 vmx_get_perf_capabilities(void)
 			perf_cap &= ~PERF_CAP_PEBS_BASELINE;
 	}
 
+	if (kvm_pmu_cap.num_topdown_events)
+		perf_cap |= host_perf_cap & MSR_CAP_PERF_METRICS;
+
 	return perf_cap;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b61cb58c877..a50b3ad7294c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1439,6 +1439,7 @@ static const u32 msrs_to_save_all[] = {
 	MSR_CORE_PERF_FIXED_CTR_CTRL, MSR_CORE_PERF_GLOBAL_STATUS,
 	MSR_CORE_PERF_GLOBAL_CTRL, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
 	MSR_IA32_PEBS_ENABLE, MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
+	MSR_PERF_METRICS,
 
 	/* This part of MSRs should match KVM_INTEL_PMC_MAX_GENERIC. */
 	MSR_ARCH_PERFMON_PERFCTR0, MSR_ARCH_PERFMON_PERFCTR1,
@@ -3880,6 +3881,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_PEBS_ENABLE:
 	case MSR_IA32_DS_AREA:
 	case MSR_PEBS_DATA_CFG:
+	case MSR_PERF_METRICS:
 	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
@@ -3983,6 +3985,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_PEBS_ENABLE:
 	case MSR_IA32_DS_AREA:
 	case MSR_PEBS_DATA_CFG:
+	case MSR_PERF_METRICS:
 	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
 		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
 			return kvm_pmu_get_msr(vcpu, msr_info);