From patchwork Mon Mar 24 17:30:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mingwei Zhang X-Patchwork-Id: 14027616 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 137EF266B43 for ; Mon, 24 Mar 2025 17:33:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742837588; cv=none; b=FNqM/+zDVhxRyKH/fG5ff9aYB1EFLDV+XFajuwIRzqMgkJLcSW5iaSOWFpiADWOYhXzLbdXnrZ8geMhrZVoN7Eqr3m7x7h1tCikPtZzZ2GSMeCM7SfgmRY70JkyB9O1HuL/7PzVh5Gp3KDfuj1+GyTOxsCVnxmF6q4QqnCDckk0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742837588; c=relaxed/simple; bh=CzYGuYYUd7AX2Fs1hEVjbTbxTpXC9vjjk2D4/WR/4rs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nBd64gvuiYJ3avo+AGgtmUHAbO4lYNsRZP6s0TdfXTWPKR0iNKqWgT6NUp8ZUen1IAoN3T0eECznYcUop5PItJAqo+sTh5GQ5Q2oSVBw2hl+STVUxlA/fxNbtXlA74Yn1yzj/2YYdi6w7AhxCdHUG+NhuhYf5jU+/PvF4lrzLNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--mizhang.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1wqmLZQP; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--mizhang.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1wqmLZQP" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-223f3357064so61934535ad.3 for ; Mon, 24 Mar 2025 10:33:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742837586; x=1743442386; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=0bq5uIWHPtPqH2Nz7MB6IWHn/nwEFnOT5cezx9vd7Ig=; b=1wqmLZQPeQtUFnYNjfIpE9DNlCWBbD6sXlzENq5LTZxza9NkKcFNFstKe4P2x442iT apXGgx0RfgMYB56eQSX3jhwDoY10mhnYWcABNX19bBCrYYLZV0zpY7l7Uc58BSjB5bv9 sz6ME/HByP4/JfzTy5w3WxsSD3lBJEJmR329p0gZiZoP7q9mVvrEEbSDJ8CgyBnY0d9H L2LL5yF9fdatoFaIp7dp0jzXw/dyrlc8WqW/u5c9Mzh9Hir9z7yxhPYPqO+adD3G2RxV Wk5wZix9IWUeuSuw0S8YwtP5wNQca+HDt68r2nDzN1oZRKgHNDAdugwBotF8rTjMxiAT TvRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742837586; x=1743442386; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0bq5uIWHPtPqH2Nz7MB6IWHn/nwEFnOT5cezx9vd7Ig=; b=fc48/XmQF3UOTByOHRJ3TwW83/NDlwXfeP4uMAHE2ZYzjBl3vMCh7RDRZ8Ru43Wwvj m92UZ/beclIQtVDU8fIRWuFe++kgmdb17f/Nhp/niD2mygdyRpEm561bRIlKPx9vATcu 7DnaiG2O05kUXRORNV0KL+rcEO597mycUsZCBSWU2mhOcFU48WjymCZMP+utOhabtBRw yVzwKrrA5bzgIe89jhS7FmTBpbyFALXbIJmrxsXnjwux9N27gye5h68RQikMC1RTI9ML 9l1//HwuXDO3EALc9/IfI+1fFL6H58sWrmxYgFhB/89oGQexpialJLhZEs57I9OFKQEJ bYTA== X-Forwarded-Encrypted: i=1; AJvYcCWp1QvH/U2iTmN3lnf3zRVAm9uXgJHZZI9zW+7ctGF6vUhKr5s0y4kjCzG3t20JbB3Xqhw=@vger.kernel.org X-Gm-Message-State: AOJu0YyoINYFFjR9FdISN5Ms68ghJnNGCoD4K4wKnFEScuGYq4sJA4Pm h0/ijJILFx26/r00Pxn/859mY0pyQcWTYL/ULXv+sgfR+OHm3l9g4MWXon5B6GmbQ1+JszGYwx2 QN6zcuw== X-Google-Smtp-Source: AGHT+IFbjdGhv2on6ZGQymO61yqZIfPGWJmtEpWmLtg8c3el8x4krG9NA7ZJsIUev60Xj0G8Vxi6JsLrov10 X-Received: from pjbsi11.prod.google.com ([2002:a17:90b:528b:b0:2f9:c349:2f84]) (user=mizhang job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d2c9:b0:216:7926:8d69 with SMTP id d9443c01a7336-22780e26127mr197421695ad.47.1742837586412; Mon, 24 Mar 2025 10:33:06 -0700 (PDT) Reply-To: Mingwei Zhang Date: Mon, 24 Mar 2025 17:30:49 +0000 In-Reply-To: <20250324173121.1275209-1-mizhang@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250324173121.1275209-1-mizhang@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250324173121.1275209-10-mizhang@google.com> Subject: [PATCH v4 09/38] perf: Add switch_guest_ctx() interface From: Mingwei Zhang To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Liang@google.com, Kan , "H. Peter Anvin" , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, Mingwei Zhang , Yongwei Ma , Xiong Zhang , Dapeng Mi , Jim Mattson , Sandipan Das , Zide Chen , Eranian Stephane , Das Sandipan , Shukla Manali , Nikunj Dadhania From: Kan Liang When entering/exiting a guest, some contexts for a guest have to be switched. For examples, there is a dedicated interrupt vector for guests on Intel platforms. When PMI switch into a new guest vector, guest_lvtpc value need to be reflected onto HW, e,g., guest clear PMI mask bit, the HW PMI mask bit should be cleared also, then PMI can be generated continuously for guest. So guest_lvtpc parameter is added into perf_guest_enter() and switch_guest_ctx(). Add a dedicated list to track all the pmus with the PASSTHROUGH cap, which may require switching the guest context. It can avoid going through the huge pmus list. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang Signed-off-by: Mingwei Zhang --- include/linux/perf_event.h | 17 +++++++++++-- kernel/events/core.c | 51 +++++++++++++++++++++++++++++++++++++- 2 files changed, 65 insertions(+), 3 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 37187ee8e226..58c1cf6939bf 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -584,6 +584,11 @@ struct pmu { * Check period value for PERF_EVENT_IOC_PERIOD ioctl. */ int (*check_period) (struct perf_event *event, u64 value); /* optional */ + + /* + * Switch guest context when a guest enter/exit, e.g., interrupt vectors. + */ + void (*switch_guest_ctx) (bool enter, void *data); /* optional */ }; enum perf_addr_filter_action_t { @@ -1030,6 +1035,11 @@ struct perf_event_context { local_t nr_no_switch_fast; }; +struct mediated_pmus_list { + raw_spinlock_t lock; + struct list_head list; +}; + struct perf_cpu_pmu_context { struct perf_event_pmu_context epc; struct perf_event_pmu_context *task_epc; @@ -1044,6 +1054,9 @@ struct perf_cpu_pmu_context { struct hrtimer hrtimer; ktime_t hrtimer_interval; unsigned int hrtimer_active; + + /* Track the PMU with PERF_PMU_CAP_MEDIATED_VPMU cap */ + struct list_head mediated_entry; }; /** @@ -1822,7 +1835,7 @@ extern int perf_event_period(struct perf_event *event, u64 value); extern u64 perf_event_pause(struct perf_event *event, bool reset); int perf_get_mediated_pmu(void); void perf_put_mediated_pmu(void); -void perf_guest_enter(void); +void perf_guest_enter(u32 guest_lvtpc); void perf_guest_exit(void); #else /* !CONFIG_PERF_EVENTS: */ static inline void * @@ -1921,7 +1934,7 @@ static inline int perf_get_mediated_pmu(void) } static inline void perf_put_mediated_pmu(void) { } -static inline void perf_guest_enter(void) { } +static inline void perf_guest_enter(u32 guest_lvtpc) { } static inline void perf_guest_exit(void) { } #endif diff --git a/kernel/events/core.c b/kernel/events/core.c index d05487d465c9..406b86641f02 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -451,6 +451,7 @@ static inline bool is_include_guest_event(struct perf_event *event) static LIST_HEAD(pmus); static DEFINE_MUTEX(pmus_lock); static struct srcu_struct pmus_srcu; +static DEFINE_PER_CPU(struct mediated_pmus_list, mediated_pmus); static cpumask_var_t perf_online_mask; static cpumask_var_t perf_online_core_mask; static cpumask_var_t perf_online_die_mask; @@ -6053,8 +6054,26 @@ static inline void perf_host_exit(struct perf_cpu_context *cpuctx) } } +static void perf_switch_guest_ctx(bool enter, u32 guest_lvtpc) +{ + struct mediated_pmus_list *pmus = this_cpu_ptr(&mediated_pmus); + struct perf_cpu_pmu_context *cpc; + struct pmu *pmu; + + lockdep_assert_irqs_disabled(); + + rcu_read_lock(); + list_for_each_entry_rcu(cpc, &pmus->list, mediated_entry) { + pmu = cpc->epc.pmu; + + if (pmu->switch_guest_ctx) + pmu->switch_guest_ctx(enter, (void *)&guest_lvtpc); + } + rcu_read_unlock(); +} + /* When entering a guest, schedule out all exclude_guest events. */ -void perf_guest_enter(void) +void perf_guest_enter(u32 guest_lvtpc) { struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context); @@ -6067,6 +6086,8 @@ void perf_guest_enter(void) perf_host_exit(cpuctx); + perf_switch_guest_ctx(true, guest_lvtpc); + __this_cpu_write(perf_in_guest, true); unlock: @@ -6098,6 +6119,8 @@ void perf_guest_exit(void) if (WARN_ON_ONCE(!__this_cpu_read(perf_in_guest))) goto unlock; + perf_switch_guest_ctx(false, 0); + perf_host_enter(cpuctx); __this_cpu_write(perf_in_guest, false); @@ -12104,6 +12127,15 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type) cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu); __perf_init_event_pmu_context(&cpc->epc, pmu); __perf_mux_hrtimer_init(cpc, cpu); + + if (pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU) { + struct mediated_pmus_list *pmus; + + pmus = per_cpu_ptr(&mediated_pmus, cpu); + raw_spin_lock(&pmus->lock); + list_add_rcu(&cpc->mediated_entry, &pmus->list); + raw_spin_unlock(&pmus->lock); + } } if (!pmu->start_txn) { @@ -12162,6 +12194,20 @@ void perf_pmu_unregister(struct pmu *pmu) mutex_lock(&pmus_lock); list_del_rcu(&pmu->entry); + if (pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU) { + struct mediated_pmus_list *pmus; + struct perf_cpu_pmu_context *cpc; + int cpu; + + for_each_possible_cpu(cpu) { + cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu); + pmus = per_cpu_ptr(&mediated_pmus, cpu); + raw_spin_lock(&pmus->lock); + list_del_rcu(&cpc->mediated_entry); + raw_spin_unlock(&pmus->lock); + } + } + /* * We dereference the pmu list under both SRCU and regular RCU, so * synchronize against both of those. @@ -14252,6 +14298,9 @@ static void __init perf_event_init_all_cpus(void) INIT_LIST_HEAD(&per_cpu(sched_cb_list, cpu)); + INIT_LIST_HEAD(&per_cpu(mediated_pmus.list, cpu)); + raw_spin_lock_init(&per_cpu(mediated_pmus.lock, cpu)); + cpuctx = per_cpu_ptr(&perf_cpu_context, cpu); __perf_event_init_context(&cpuctx->ctx); lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);