From patchwork Wed Jan 31 14:23:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Rui" X-Patchwork-Id: 13539524 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8D328287A; Wed, 31 Jan 2024 14:23:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711033; cv=none; b=TpLseAR6GJDVKgvzordYkaqU2mLSm4FtWsyZApQe/5uX9GpswspBS5PvDpT8OMk0MFy/LuTMZ29ApuKPrCwgVoBbVGQBEF3l0S0dMQ8aciJYkgk7GvRV0aMmc7UE8ONdrHJ/xtAyvjBB683UtBUKfO01IGk0NxkB1MRDseTbjes= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711033; c=relaxed/simple; bh=ExThFueBF01BuVDK0GX8VUsqgRBQ4Ib5Eh3cgKUlZ7A=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ajEqjAc6/ZeDj1/d7TuArsfeRA+uuZTht4wTRL03A43kB1/dyddh5YBUYI5ywZGLJw9oaZkupj1x7EwwnK5IZp/s/yhEtckLyRrZzNA+30HFwkshxpgJOMsP6lNwLsJRumiCExYNINZmEqtTY0ysjRg0cQprvsTl6zG647AVDlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ou0OvIwy; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ou0OvIwy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706711032; x=1738247032; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ExThFueBF01BuVDK0GX8VUsqgRBQ4Ib5Eh3cgKUlZ7A=; b=Ou0OvIwyxtbk4ZJsRFz3z91toRLMryB7vvAgpKwHr3LfQahWIZwRqb1m 4VtwyvcN/wGViUv2jMt/MSamaswPXCNDH6ygOevL2903PIP7g7lawOLXQ Y4gep2OfqUzKf4OsVLFnOazoFyhZ5/x6bZ7koIecoRbxEY80Cy4ZkgO9r 1xzDCjAgAZRT7SwhtU3oZDcoqARVmhOp21JHo+6MQE5BWhg9YweiVt90k E6fWXrLfKmetRUobg+Inx2shsDMMpbTve4bUsHukz4E/VgP9nvQhzEEMI k7OguPLdwR7nfxeGr6xvj2vetQ4ZFrGq1X1Qwi926SRJqD4vY6kHJf+EW g==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="10995802" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="10995802" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="4068683" Received: from puhongt-mobl2.ccr.corp.intel.com (HELO rzhang1-mobl7.ccr.corp.intel.com) ([10.255.29.147]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:48 -0800 From: Zhang Rui To: rafael.j.wysocki@intel.com, peterz@infradead.org Cc: mingo@redhat.com, kan.liang@linux.intel.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, x86@kernel.org Subject: [PATCH 1/5] powercap: intel_rapl: Sort header files Date: Wed, 31 Jan 2024 22:23:31 +0800 Message-Id: <20240131142335.84218-2-rui.zhang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240131142335.84218-1-rui.zhang@intel.com> References: <20240131142335.84218-1-rui.zhang@intel.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sort header files alphabetically. Signed-off-by: Zhang Rui --- drivers/powercap/intel_rapl_common.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index aa627a6b12a4..315e304219e2 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -5,27 +5,27 @@ */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include +#include +#include +#include +#include #include -#include #include -#include -#include -#include #include -#include -#include -#include -#include +#include +#include #include -#include -#include #include -#include +#include +#include +#include +#include -#include #include #include +#include /* bitmasks for RAPL MSRs, used by primitive access functions */ #define ENERGY_STATUS_MASK 0xffffffff From patchwork Wed Jan 31 14:23:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Rui" X-Patchwork-Id: 13539525 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18EAF83CB2; Wed, 31 Jan 2024 14:23:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711037; cv=none; b=RP9Xsf67HY6c+K6nsNZtWE9nlIEu25rXxiuMCOqxbCXDmpmuFS31baVFeKz/nlRZaHTEDppYqDJLjT/peDUAbnWAOdwKn9ZemhBM+NVg8DLCjFRxt3mk9amCNAJHZuwCwr7eOgtunl//1pegV/NTErQjhNfLtvDW39rB2UyFZjY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711037; c=relaxed/simple; bh=MNYyJaRmNaZjkli6XFAM+Od3XgEgAiNmNHWd8Irf9ls=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=P0IGF3Jxv/tWDnWUbrS2vhpapqWCGYVDO1Z+Hsua+Tk4qNrg5qrWCw7DYb3dunZz4VHml5wwdoJO9ecvxJNqyhSTDjAbYzeobEGIQCeZCrIy+N3/n41Rxh9LYXPXK0cutF+308Mdbe6S40OWKzy35Yjqp4kl4wijmAbraIVy91k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ReevSnKX; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ReevSnKX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706711035; x=1738247035; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MNYyJaRmNaZjkli6XFAM+Od3XgEgAiNmNHWd8Irf9ls=; b=ReevSnKXPOrOyTCW8XSqruGQ1dv53gOCDOuAmMBRZvcNUr/MW6rWnZ20 dRgnebrOZzlpiRKAfy3fTBI+q+n6ngTaAPFOUzC5W6+SVzfXb7EyYyXBE kZ3u5VPUV3knL2OtqmOEWFlIRAT8vN+XGX2KBcmkdhFGcWdMjKP3uOGO4 IsNNXimwIwPGWwzMsFgyeyjQSDt0Q+QbB3Z+qtJ7WO39exL07In7X9NZ+ RAANI0Wvp8bb1OwsVDozzflCZc12AU2BEZka6r0IPvbHJXZwVD3jhMcGf cw3ByMg80Cs/Z88U2577pre4abfLh9Wvl8997TgIXpGqA+0Us92iPH7Up w==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="10995813" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="10995813" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="4068690" Received: from puhongt-mobl2.ccr.corp.intel.com (HELO rzhang1-mobl7.ccr.corp.intel.com) ([10.255.29.147]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:51 -0800 From: Zhang Rui To: rafael.j.wysocki@intel.com, peterz@infradead.org Cc: mingo@redhat.com, kan.liang@linux.intel.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, x86@kernel.org Subject: [PATCH 2/5] powercap: intel_rapl: Add PMU support Date: Wed, 31 Jan 2024 22:23:32 +0800 Message-Id: <20240131142335.84218-3-rui.zhang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240131142335.84218-1-rui.zhang@intel.com> References: <20240131142335.84218-1-rui.zhang@intel.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 MSR RAPL powercap sysfs is done in drivers/powercap/intel_rapl_msr.c. MSR RAPL PMU is done in arch/x86/events/rapl.c. They maintain two separate CPU model lists, describing the same feature available on the same set of hardware. This increases unnecessary maintenance burden a lot. Plus that, TPMI RAPL also needs PMU support, which shares similar code with MSR RAPL PMU. To unify all of these, introduce PMU support as part of RAPL framework. The idea is that, if a RAPL Package device is registered to RAPL framework, and is ready for energy reporting and control via sysfs, it is also ready for PMU. So register a PMU in RAPL framework that works for all registered RAPL Package devices with .enable_pmu flag set. Note the RAPL PMU does not distinguish RAPL Package devices from different Interface because MSR RAPL and TPMI RAPL do not co-exist on any platform, and MMIO RAPL does not set .enable_pmu flag. The RAPL PMU is fully compatible with current MSR RAPL PMU, including using the same PMU name and events name/id/unit/scale, etc. For example, on platforms use either MSR or TPMI, use the same command perf stat -e power/energy-pkg/ -e power/energy-ram/ -e power/energy-cores/ FOO to get the energy consumption when the events are in "perf list" output. Note that, MSR RAPL PMU is CPU model based and the events supported are known at driver initialization time. TPMI RAPL is probed dynamically, and the events supported by each TPMI RAPL device can be different. Thus, when a new RAPL Package device is registered with .enable_pmu set, 1. if PMU has not been registered yet, register the PMU 2. if current PMU already covers all events that the new RAPL Package device supports, do nothing (always true for MSR RAPL) 2. or else, unregister current PMU and re-register the PMU with events supported by all probed RAPL Package devices. For example, on a dual-package system using TPMI RAPL, it is possible that Package 1 behaves as TPMI domain root. In this case, register PMU without Psys event support when probing TPMI RAPL Package 0, and re-register the PMU with Psys event support when probing Package 1. Signed-off-by: Zhang Rui --- drivers/powercap/intel_rapl_common.c | 536 +++++++++++++++++++++++++++ include/linux/intel_rapl.h | 17 + 2 files changed, 553 insertions(+) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 315e304219e2..318174c87249 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include #include #include @@ -1506,6 +1508,532 @@ static int rapl_detect_domains(struct rapl_package *rp) return 0; } +#ifdef CONFIG_PERF_EVENTS + +/* Support for RAPL PMU */ + +struct rapl_pmu { + struct pmu pmu; + u64 timer_ms; + cpumask_t cpu_mask; + unsigned long domain_map; /* Events supported by all registered RAPL packages */ + bool registered; +}; + +static struct rapl_pmu rapl_pmu; + +/* PMU helpers */ +static int get_pmu_cpu(struct rapl_package *rp) +{ + int cpu; + + if (!rp->priv->enable_pmu) + return nr_cpu_ids; + + /* MSR RAPL uses rp->lead_cpu for PMU */ + if (rp->lead_cpu >= 0) + return rp->lead_cpu; + + /* TPMI RAPL uses any CPU in the package for PMU */ + for_each_online_cpu(cpu) + if (topology_physical_package_id(cpu) == rp->id) + return cpu; + + return nr_cpu_ids; +} + +static bool is_rp_pmu_cpu(struct rapl_package *rp, int cpu) +{ + if (!rp->priv->enable_pmu) + return false; + + /* MSR RAPL uses rp->lead_cpu for PMU */ + if (rp->lead_cpu >= 0) + return cpu == rp->lead_cpu; + + /* TPMI RAPL uses any CPU in the package for PMU */ + return topology_physical_package_id(cpu) == rp->id; +} + +static struct rapl_package_pmu_data *event_to_pmu_data(struct perf_event *event) +{ + struct rapl_package *rp = event->pmu_private; + + return &rp->pmu_data; +} + +/* PMU event callbacks */ + +/* Return 0 for unsupported events or failed read */ +static u64 event_read_counter(struct perf_event *event) +{ + struct rapl_package *rp = event->pmu_private; + u64 val; + int ret; + + if (event->hw.idx < 0) + return 0; + + ret = rapl_read_data_raw(&rp->domains[event->hw.idx], ENERGY_COUNTER, false, &val); + return ret ? 0 : val; +} + +static void __rapl_pmu_event_start(struct perf_event *event) +{ + struct rapl_package_pmu_data *data = event_to_pmu_data(event); + + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) + return; + + event->hw.state = 0; + + list_add_tail(&event->active_entry, &data->active_list); + + local64_set(&event->hw.prev_count, event_read_counter(event)); + if (++data->n_active == 1) + hrtimer_start(&data->hrtimer, data->timer_interval, + HRTIMER_MODE_REL_PINNED); +} + +static void rapl_pmu_event_start(struct perf_event *event, int mode) +{ + struct rapl_package_pmu_data *data = event_to_pmu_data(event); + unsigned long flags; + + raw_spin_lock_irqsave(&data->lock, flags); + __rapl_pmu_event_start(event); + raw_spin_unlock_irqrestore(&data->lock, flags); +} + +static u64 rapl_event_update(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + struct rapl_package_pmu_data *data = event_to_pmu_data(event); + u64 prev_raw_count, new_raw_count; + s64 delta, sdelta; + +again: + prev_raw_count = local64_read(&hwc->prev_count); + new_raw_count = event_read_counter(event); + + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count, + new_raw_count) != prev_raw_count) + goto again; + + /* + * Now we have the new raw value and have updated the prev + * timestamp already. We can now calculate the elapsed delta + * (event-)time and add that to the generic event. + */ + delta = new_raw_count - prev_raw_count; + + /* + * Scale delta to smallest unit (2^-32) + * users must then scale back: count * 1/(1e9*2^32) to get Joules + * or use ldexp(count, -32). + * Watts = Joules/Time delta + */ + sdelta = delta * data->scale[event->hw.flags]; + + local64_add(sdelta, &event->count); + + return new_raw_count; +} + +static void rapl_pmu_event_stop(struct perf_event *event, int mode) +{ + struct rapl_package_pmu_data *data = event_to_pmu_data(event); + struct hw_perf_event *hwc = &event->hw; + unsigned long flags; + + raw_spin_lock_irqsave(&data->lock, flags); + + /* Mark event as deactivated and stopped */ + if (!(hwc->state & PERF_HES_STOPPED)) { + WARN_ON_ONCE(data->n_active <= 0); + if (--data->n_active == 0) + hrtimer_cancel(&data->hrtimer); + + list_del(&event->active_entry); + + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + } + + /* Check if update of sw counter is necessary */ + if ((mode & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { + /* + * Drain the remaining delta count out of a event + * that we are disabling: + */ + rapl_event_update(event); + hwc->state |= PERF_HES_UPTODATE; + } + + raw_spin_unlock_irqrestore(&data->lock, flags); +} + +static int rapl_pmu_event_add(struct perf_event *event, int mode) +{ + struct rapl_package_pmu_data *data = event_to_pmu_data(event); + struct hw_perf_event *hwc = &event->hw; + unsigned long flags; + + raw_spin_lock_irqsave(&data->lock, flags); + + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + + if (mode & PERF_EF_START) + __rapl_pmu_event_start(event); + + raw_spin_unlock_irqrestore(&data->lock, flags); + + return 0; +} + +static void rapl_pmu_event_del(struct perf_event *event, int flags) +{ + rapl_pmu_event_stop(event, PERF_EF_UPDATE); +} + +/* + * RAPL energy status counters + */ +enum perf_rapl_events { + PERF_RAPL_PP0 = 0, /* all cores */ + PERF_RAPL_PKG, /* entire package */ + PERF_RAPL_RAM, /* DRAM */ + PERF_RAPL_PP1, /* gpu */ + PERF_RAPL_PSYS, /* psys */ +}; +#define RAPL_EVENT_MASK GENMASK(7, 0) + +static int event_to_domain(int event) +{ + switch (event) { + case PERF_RAPL_PP0: + return RAPL_DOMAIN_PP0; + case PERF_RAPL_PKG: + return RAPL_DOMAIN_PACKAGE; + case PERF_RAPL_RAM: + return RAPL_DOMAIN_DRAM; + case PERF_RAPL_PP1: + return RAPL_DOMAIN_PP1; + case PERF_RAPL_PSYS: + return RAPL_DOMAIN_PLATFORM; + default: + return RAPL_DOMAIN_MAX; + } +} + +static int rapl_pmu_event_init(struct perf_event *event) +{ + struct rapl_package *pos, *rp = NULL; + u64 cfg = event->attr.config & RAPL_EVENT_MASK; + int domain, idx; + + /* Only look at RAPL events */ + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* Check only supported bits are set */ + if (event->attr.config & ~RAPL_EVENT_MASK) + return -EINVAL; + + if (event->cpu < 0) + return -EINVAL; + + list_for_each_entry(pos, &rapl_packages, plist) { + if (is_rp_pmu_cpu(pos, event->cpu)) { + rp = pos; + break; + } + } + if (!rp) + return -ENODEV; + + domain = event_to_domain(cfg - 1); + if (domain >= RAPL_DOMAIN_MAX) + return -EINVAL; + + event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG; + event->pmu_private = rp; /* Which package */ + event->hw.flags = domain; /* Which domain */ + event->hw.idx = -1; /* Index in rp->domains[] to get domain pointer */ + for (idx = 0; idx < rp->nr_domains; idx++) { + if (rp->domains[idx].id == domain) { + event->hw.idx = idx; + break; + } + } + + return 0; +} + +static void rapl_pmu_event_read(struct perf_event *event) +{ + rapl_event_update(event); +} + +static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer) +{ + struct rapl_package_pmu_data *data = + container_of(hrtimer, struct rapl_package_pmu_data, hrtimer); + struct perf_event *event; + unsigned long flags; + + if (!data->n_active) + return HRTIMER_NORESTART; + + raw_spin_lock_irqsave(&data->lock, flags); + + list_for_each_entry(event, &data->active_list, active_entry) + rapl_event_update(event); + + raw_spin_unlock_irqrestore(&data->lock, flags); + + hrtimer_forward_now(hrtimer, data->timer_interval); + + return HRTIMER_RESTART; +} + +/* PMU sysfs attributes */ + +/* + * There are no default events, but we need to create "events" group (with + * empty attrs) before updating it with detected events. + */ +static struct attribute *attrs_empty[] = { + NULL, +}; + +static struct attribute_group pmu_events_group = { + .name = "events", + .attrs = attrs_empty, +}; + +static ssize_t cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct rapl_package *rp; + int cpu; + + cpus_read_lock(); + cpumask_clear(&rapl_pmu.cpu_mask); + + /* Choose a cpu for each RAPL Package */ + list_for_each_entry(rp, &rapl_packages, plist) { + cpu = get_pmu_cpu(rp); + if (cpu < nr_cpu_ids) + cpumask_set_cpu(cpu, &rapl_pmu.cpu_mask); + } + cpus_read_unlock(); + + return cpumap_print_to_pagebuf(true, buf, &rapl_pmu.cpu_mask); +} + +static DEVICE_ATTR_RO(cpumask); + +static struct attribute *pmu_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL +}; + +static struct attribute_group pmu_cpumask_group = { + .attrs = pmu_cpumask_attrs, +}; + +PMU_FORMAT_ATTR(event, "config:0-7"); +static struct attribute *pmu_format_attr[] = { + &format_attr_event.attr, + NULL +}; + +static struct attribute_group pmu_format_group = { + .name = "format", + .attrs = pmu_format_attr, +}; + +static const struct attribute_group *pmu_attr_groups[] = { + &pmu_events_group, + &pmu_cpumask_group, + &pmu_format_group, + NULL +}; + +#define RAPL_EVENT_ATTR_STR(_name, v, str) \ +static struct perf_pmu_events_attr event_attr_##v = { \ + .attr = __ATTR(_name, 0444, perf_event_sysfs_show, NULL), \ + .event_str = str, \ +} + +RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01"); +RAPL_EVENT_ATTR_STR(energy-pkg, rapl_pkg, "event=0x02"); +RAPL_EVENT_ATTR_STR(energy-ram, rapl_ram, "event=0x03"); +RAPL_EVENT_ATTR_STR(energy-gpu, rapl_gpu, "event=0x04"); +RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05"); + +RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_unit_cores, "Joules"); +RAPL_EVENT_ATTR_STR(energy-pkg.unit, rapl_unit_pkg, "Joules"); +RAPL_EVENT_ATTR_STR(energy-ram.unit, rapl_unit_ram, "Joules"); +RAPL_EVENT_ATTR_STR(energy-gpu.unit, rapl_unit_gpu, "Joules"); +RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_unit_psys, "Joules"); + +RAPL_EVENT_ATTR_STR(energy-cores.scale, rapl_scale_cores, "2.3283064365386962890625e-10"); +RAPL_EVENT_ATTR_STR(energy-pkg.scale, rapl_scale_pkg, "2.3283064365386962890625e-10"); +RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_scale_ram, "2.3283064365386962890625e-10"); +RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_scale_gpu, "2.3283064365386962890625e-10"); +RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_scale_psys, "2.3283064365386962890625e-10"); + +#define RAPL_EVENT_GROUP(_name, domain) \ +static struct attribute *pmu_attr_##_name[] = { \ + &event_attr_rapl_##_name.attr.attr, \ + &event_attr_rapl_unit_##_name.attr.attr, \ + &event_attr_rapl_scale_##_name.attr.attr, \ + NULL \ +}; \ +static umode_t is_visible_##_name(struct kobject *kobj, struct attribute *attr, int event) \ +{ \ + return rapl_pmu.domain_map & BIT(domain) ? attr->mode : 0; \ +} \ +static struct attribute_group pmu_group_##_name = { \ + .name = "events", \ + .attrs = pmu_attr_##_name, \ + .is_visible = is_visible_##_name, \ +} + +RAPL_EVENT_GROUP(cores, RAPL_DOMAIN_PP0); +RAPL_EVENT_GROUP(pkg, RAPL_DOMAIN_PACKAGE); +RAPL_EVENT_GROUP(ram, RAPL_DOMAIN_DRAM); +RAPL_EVENT_GROUP(gpu, RAPL_DOMAIN_PP1); +RAPL_EVENT_GROUP(psys, RAPL_DOMAIN_PLATFORM); + +static const struct attribute_group *pmu_attr_update[] = { + &pmu_group_cores, + &pmu_group_pkg, + &pmu_group_ram, + &pmu_group_gpu, + &pmu_group_psys, + NULL +}; + +static int rapl_pmu_update(struct rapl_package *rp) +{ + int ret; + + /* Return if PMU already covers all events supported by current RAPL Package */ + if (rapl_pmu.registered && !(rp->domain_map & (~rapl_pmu.domain_map))) + return 0; + + /* Unregister previous registered PMU */ + if (rapl_pmu.registered) { + perf_pmu_unregister(&rapl_pmu.pmu); + memset(&rapl_pmu.pmu, 0, sizeof(struct pmu)); + } + + rapl_pmu.domain_map |= rp->domain_map; + + memset(&rapl_pmu.pmu, 0, sizeof(struct pmu)); + rapl_pmu.pmu.attr_groups = pmu_attr_groups; + rapl_pmu.pmu.attr_update = pmu_attr_update; + rapl_pmu.pmu.task_ctx_nr = perf_invalid_context; + rapl_pmu.pmu.event_init = rapl_pmu_event_init; + rapl_pmu.pmu.add = rapl_pmu_event_add; + rapl_pmu.pmu.del = rapl_pmu_event_del; + rapl_pmu.pmu.start = rapl_pmu_event_start; + rapl_pmu.pmu.stop = rapl_pmu_event_stop; + rapl_pmu.pmu.read = rapl_pmu_event_read; + rapl_pmu.pmu.module = THIS_MODULE; + rapl_pmu.pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE | PERF_PMU_CAP_NO_INTERRUPT; + ret = perf_pmu_register(&rapl_pmu.pmu, "power", -1); + if (ret) + pr_warn("Failed to register PMU\n"); + + rapl_pmu.registered = !ret; + + return ret; +} + +static int rapl_package_add_pmu(struct rapl_package *rp) +{ + struct rapl_package_pmu_data *data = &rp->pmu_data; + int idx; + + if (!rp->priv->enable_pmu) + return 0; + + for (idx = 0; idx < rp->nr_domains; idx++) { + struct rapl_domain *rd = &rp->domains[idx]; + int domain = rd->id; + u64 val; + + if (!test_bit(domain, &rp->domain_map)) + continue; + + /* + * The RAPL PMU granularity is 2^-32 Joules + * data->scale[]: times of 2^-32 Joules for each ENERGY COUNTER increase + */ + val = rd->energy_unit * (1ULL << 32); + do_div(val, ENERGY_UNIT_SCALE * 1000000); + data->scale[domain] = val; + + if (!rapl_pmu.timer_ms) { + struct rapl_primitive_info *rpi = get_rpi(rp, ENERGY_COUNTER); + + + /* + * Calculate the timer rate: + * Use reference of 200W for scaling the timeout to avoid counter + * overflows. + * + * max_count = rpi->mask >> rpi->shift + 1 + * max_energy_pj = max_count * rd->energy_unit + * max_time_sec = (max_energy_pj / 1000000000) / 200w + * + * rapl_pmu.timer_ms = max_time_sec * 1000 / 2 + */ + val = (rpi->mask >> rpi->shift) + 1; + val *= rd->energy_unit; + do_div(val, 1000000 * 200 * 2); + rapl_pmu.timer_ms = val; + + pr_info("%llu ms ovfl timer\n", rapl_pmu.timer_ms); + } + + pr_info("Domain %s: hw unit %lld * 2^-32 Joules\n", rd->name, data->scale[domain]); + } + + /* Initialize per package PMU data */ + raw_spin_lock_init(&data->lock); + INIT_LIST_HEAD(&data->active_list); + data->timer_interval = ms_to_ktime(rapl_pmu.timer_ms); + hrtimer_init(&data->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + data->hrtimer.function = rapl_hrtimer_handle; + + return rapl_pmu_update(rp); +} + +static void rapl_package_remove_pmu(struct rapl_package *rp) +{ + struct rapl_package *pos; + + if (!rp->priv->enable_pmu || !rapl_pmu.registered) + return; + + list_for_each_entry(pos, &rapl_packages, plist) { + if (pos->priv->enable_pmu) + return; + } + + perf_pmu_unregister(&rapl_pmu.pmu); + memset(&rapl_pmu, 0, sizeof(struct rapl_pmu)); +} +#else +static int rapl_package_add_pmu(struct rapl_package *rp) { return 0; } +static void rapl_package_remove_pmu(struct rapl_package *rp) { } +#endif + /* called from CPU hotplug notifier, hotplug lock held */ void rapl_remove_package_cpuslocked(struct rapl_package *rp) { @@ -1534,6 +2062,8 @@ void rapl_remove_package_cpuslocked(struct rapl_package *rp) powercap_unregister_zone(rp->priv->control_type, &rd_package->power_zone); list_del(&rp->plist); + /* Do PMU update after rp removed from rapl_packages list */ + rapl_package_remove_pmu(rp); kfree(rp); } EXPORT_SYMBOL_GPL(rapl_remove_package_cpuslocked); @@ -1613,6 +2143,12 @@ struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *pr if (!ret) { INIT_LIST_HEAD(&rp->plist); list_add(&rp->plist, &rapl_packages); + /* + * PMU failure does not affect the registered RAPL packages. + * Thus it should not block the current RAPL package neither. + * Ignore the return value. + */ + rapl_package_add_pmu(rp); return rp; } diff --git a/include/linux/intel_rapl.h b/include/linux/intel_rapl.h index f3196f82fd8a..6f27380d4bab 100644 --- a/include/linux/intel_rapl.h +++ b/include/linux/intel_rapl.h @@ -138,6 +138,7 @@ struct reg_action { * @reg_unit: Register for getting energy/power/time unit. * @regs: Register sets for different RAPL Domains. * @limits: Number of power limits supported by each domain. + * @enable_pmu: Enable Perf PMU for Domain energy counters. * @read_raw: Callback for reading RAPL interface specific * registers. * @write_raw: Callback for writing RAPL interface specific @@ -152,12 +153,25 @@ struct rapl_if_priv { union rapl_reg reg_unit; union rapl_reg regs[RAPL_DOMAIN_MAX][RAPL_DOMAIN_REG_MAX]; int limits[RAPL_DOMAIN_MAX]; + bool enable_pmu; int (*read_raw)(int id, struct reg_action *ra); int (*write_raw)(int id, struct reg_action *ra); void *defaults; void *rpi; }; +#ifdef CONFIG_PERF_EVENTS +struct rapl_package_pmu_data { + u64 scale[RAPL_DOMAIN_MAX]; + raw_spinlock_t lock; + int n_active; + struct list_head active_list; + ktime_t timer_interval; + struct hrtimer hrtimer; + u8 domain_mask; +}; +#endif + /* maximum rapl package domain name: package-%d-die-%d */ #define PACKAGE_DOMAIN_NAME_LENGTH 30 @@ -176,6 +190,9 @@ struct rapl_package { struct cpumask cpumask; char name[PACKAGE_DOMAIN_NAME_LENGTH]; struct rapl_if_priv *priv; +#ifdef CONFIG_PERF_EVENTS + struct rapl_package_pmu_data pmu_data; +#endif }; struct rapl_package *rapl_find_package_domain_cpuslocked(int id, struct rapl_if_priv *priv, From patchwork Wed Jan 31 14:23:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Rui" X-Patchwork-Id: 13539526 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56A0983CC3; Wed, 31 Jan 2024 14:23:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711040; cv=none; b=nUY25QNYfEPQ0mizJbqrIS7aiXXCAGJwOXEeCCo/0ZE0LHBgIM9AMvTvLuFizHGEtmm4h7raBkmyYyenVPGr/z7jYmwN3Yy7uLmzoslD1cZpgRtxrLmul69D9Lo7bUVRqcZDsZKVMWI83cOqJr4rGdbUkFlnDN6o/SGIjZbsxlw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711040; c=relaxed/simple; bh=ux9RKxYXwHAgHtI+Px/ktM89Oy29Ds3+ltt+/Ms31jk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JY3aD2OttR6qWkHsHHKI3Vfu4pyxGtSMc6bv7bvoObmtUcPR382BpKy+JK9sYzHUtwhX1rrV+TY19nHsTKhGY/cIQGKc0r+gSgcPkL0GmDSoLDAdscNoSXTJLbVyNZMm/t+Q3RJ3NY1aQTnA/t8MBPf/iYCm9UCUcwJ3N6YTQ8Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IRcOUOHX; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IRcOUOHX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706711038; x=1738247038; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ux9RKxYXwHAgHtI+Px/ktM89Oy29Ds3+ltt+/Ms31jk=; b=IRcOUOHX+nb7A9DEVhNemv8N7p8CrArezGXatJdkhZ7zKcS1YU+v0lZG strD7AauE95ISdCLTDxPusQJTsKtLzVc4DVqorK6PIbZIne/B3HQNQWv0 kJWPAxjkKyAyw8uETDbisVnjHJhOG7oAyDjz1Sro9cUErwrSwnw+U+CQj +50uE49o719x5/YqEi+KZRAJ/OAz/nHsG92kpFTLerDFpL4VUmCDgHIuI A3I1WxBGzPwwnFSLG+ydf4gYNmKhgGYRBJLJpuhikuBYmYZ+OtUMI37fv lfy+J81MOsyi5BEI2vFfVhlee3/NG2uVDbj4FumZMEHswiPjYLRQjqIBO g==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="10995829" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="10995829" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="4068741" Received: from puhongt-mobl2.ccr.corp.intel.com (HELO rzhang1-mobl7.ccr.corp.intel.com) ([10.255.29.147]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:55 -0800 From: Zhang Rui To: rafael.j.wysocki@intel.com, peterz@infradead.org Cc: mingo@redhat.com, kan.liang@linux.intel.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, x86@kernel.org Subject: [PATCH 3/5] powercap: intel_rapl_tpmi: Enable PMU support for TPMI RAPL Date: Wed, 31 Jan 2024 22:23:33 +0800 Message-Id: <20240131142335.84218-4-rui.zhang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240131142335.84218-1-rui.zhang@intel.com> References: <20240131142335.84218-1-rui.zhang@intel.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Enable RAPL PMU support for TPMI RAPL driver. Signed-off-by: Zhang Rui --- drivers/powercap/intel_rapl_tpmi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/powercap/intel_rapl_tpmi.c b/drivers/powercap/intel_rapl_tpmi.c index f6b7f085977c..f131a086c535 100644 --- a/drivers/powercap/intel_rapl_tpmi.c +++ b/drivers/powercap/intel_rapl_tpmi.c @@ -285,6 +285,7 @@ static int intel_rapl_tpmi_probe(struct auxiliary_device *auxdev, trp->priv.read_raw = tpmi_rapl_read_raw; trp->priv.write_raw = tpmi_rapl_write_raw; trp->priv.control_type = tpmi_control_type; + trp->priv.enable_pmu = true; /* RAPL TPMI I/F is per physical package */ trp->rp = rapl_find_package_domain(info->package_id, &trp->priv, false); From patchwork Wed Jan 31 14:23:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Rui" X-Patchwork-Id: 13539527 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AA8E84A29; Wed, 31 Jan 2024 14:24:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711043; cv=none; b=p7G271R7BRvV38B0Jfou10JxtDx2vISYm+LbE4nMjEu3dZl95nJ4bzlHKasxKaQO57fxvz+RghJJYR9JxOZh9aqlGGJt6nauTPiAQi9YTjtJYtUGFrKr9DorG6u5J+mWt2o0U1ZzqrXn7cFh25kWniq5AkK6EbMWoza/t4RzCnY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711043; c=relaxed/simple; bh=P/1acTucceqt/grzvkOvrapX+WjE/maAY0kvtIA7xMU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iIAL5BOhOT6e/Q3Kqo1D6pCIRTpZZR06lG/2mOkAUBMaJs23Zenpe6nEn72IihNUicvxJVOMwpySv+U5lu3TzwsJNMdABBmweI18iz8q1Zi4DIqlyYcP/O3I4uj8gOUuyYzuTuuXFChHzK3pwi5wRkUVT3Cs7avO0qFD82YeVBs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cvFqLAnp; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cvFqLAnp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706711042; x=1738247042; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=P/1acTucceqt/grzvkOvrapX+WjE/maAY0kvtIA7xMU=; b=cvFqLAnp4Lci5GIF4NcpbU+t5OrEEgxSYox3S8S9KcV8ovom6CtTuc3+ 1l19x043Ep3O43NwLLJhLjwIRAVbRtS535/uggzynZW6CkneUNkCsqe0I mKFjl2zCaR9GH4ChOh31UDLC/BgQIQhOQkXWiT8t8tLqVGRJxSSNFgGYw csciAnwrVr+Qfq3KThjh5qPm/UPhkS2CCZJfcBMuI8TsF46/Cx237UMkr 3nDxE0HZwdPg8OiEjq2kGvIBk3cA51xIJ+jtzTDKq1Pc/r+lVFMIXs639 qJq/WqR3TnC5Ze0jEuikSM8VmAELZOz1reeCY1OpdFsmPlYtSJ5CqJQ3a g==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="10995839" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="10995839" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:24:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="4068756" Received: from puhongt-mobl2.ccr.corp.intel.com (HELO rzhang1-mobl7.ccr.corp.intel.com) ([10.255.29.147]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:23:58 -0800 From: Zhang Rui To: rafael.j.wysocki@intel.com, peterz@infradead.org Cc: mingo@redhat.com, kan.liang@linux.intel.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, x86@kernel.org Subject: [PATCH 4/5] powercap: intel_rapl: Fix BROADWELL_D Dram energy unit Date: Wed, 31 Jan 2024 22:23:34 +0800 Message-Id: <20240131142335.84218-5-rui.zhang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240131142335.84218-1-rui.zhang@intel.com> References: <20240131142335.84218-1-rui.zhang@intel.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 BROADWELL_D uses a fixed energy unit for Dram Domain like the other Xeon servers, e.g. BROADWELL_X. Signed-off-by: Zhang Rui --- drivers/powercap/intel_rapl_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 318174c87249..094e226b8bd3 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -1235,7 +1235,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL(BROADWELL, &rapl_defaults_core), X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_G, &rapl_defaults_core), - X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_D, &rapl_defaults_core), + X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_D, &rapl_defaults_hsw_server), X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, &rapl_defaults_hsw_server), X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE, &rapl_defaults_core), From patchwork Wed Jan 31 14:23:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Zhang, Rui" X-Patchwork-Id: 13539528 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55A7784A47; Wed, 31 Jan 2024 14:24:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711050; cv=none; b=CmwgRphocTUoRu7Ble+vmiE3mpB/Wgh27ZvWeQwYFTcArWyarqd/zBOvDPNXCHhOB9c7qM1tGZ4vZHH8sBaUqJi3cUIE4FirfM05S2H9sOVik6hn1Y1BoVoQQ7y9Fq6AHmPFdfceUX/Eg/40qO27+RnLOFybLdTP2ovzZiK5zEI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706711050; c=relaxed/simple; bh=zFbrDHjMoBgUim1lK6ftm8LbXXs0k4K2Fz9zYDcRkQM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=k0Aoh4cCC3OYtVWcDewxpFSnynFOOzIQy9/XrSaer7SyGt6uu1Xqn0V1bHAqwSEExGlCSwzDrhrbN9DqLGHClksnsPLw78S0DCMIhY2Y9l68iWWZvj5tNXhCXE6zW9sRNb+j1Z1RlFV2O0fiip0dXdgq6UMy6Qb17krQHDUu+10= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=D39H2NH7; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="D39H2NH7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706711047; x=1738247047; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zFbrDHjMoBgUim1lK6ftm8LbXXs0k4K2Fz9zYDcRkQM=; b=D39H2NH7wH7HFtMb9pm3nttDfz4UiQlFUo+dX7WlHKyTgZUDg3Azga8V PDrOS/1as7+iiT/3mnCDL+pD9PlhBHUe/o7dSYQ5aMSS2DSM6ZD4+H0V7 ARPpc8nt15+T1+88SmtiTJ2+3ZrA5BJKcd2KzsEpDOAaRKPZJIMTXU0/m udOxSOeGOw+ma8hSjw9gKag4f8VuF3zdP6OsvrdQbTi847YB0bpduF9dC 6HnI1wkATh24qp7i18Lh1AlI5anXGVjI5QeUmFKL/RGtnkHbtRErlcSrZ AdR9GZlJpIkSqW9N3/CLlFqFwrAC5kumobtWphfiqHi6D0Le5cMnwt05Y g==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="10995860" X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="10995860" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:24:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,231,1701158400"; d="scan'208";a="4068779" Received: from puhongt-mobl2.ccr.corp.intel.com (HELO rzhang1-mobl7.ccr.corp.intel.com) ([10.255.29.147]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 06:24:02 -0800 From: Zhang Rui To: rafael.j.wysocki@intel.com, peterz@infradead.org Cc: mingo@redhat.com, kan.liang@linux.intel.com, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, x86@kernel.org Subject: [PATCH 5/5] powercap: intel_rapl_msr: Enable PMU support for MSR RAPL Date: Wed, 31 Jan 2024 22:23:35 +0800 Message-Id: <20240131142335.84218-6-rui.zhang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240131142335.84218-1-rui.zhang@intel.com> References: <20240131142335.84218-1-rui.zhang@intel.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Enable PMU support in intel_rapl_msr driver and remove previous MSR RAPL PMU support in arch/x86/events/rapl.c. Note that, there are some divergence between the CPU model lists of these two drivers, switching from arch/x86/events/rapl.c CPU model list to the intel_rapl driver CPU model list indeed brings some functional changes including 1. Enable PMU for some Intel platforms, which were missing in arch/x86/events/rapl.c. This includes ICELAKE_NNPI ROCKETLAKE LUNARLAKE_M LAKEFIELD ATOM_SILVERMONT ATOM_SILVERMONT_MID ATOM_AIRMONT ATOM_AIRMONT_MID ATOM_TREMONT ATOM_TREMONT_D ATOM_TREMONT_L 2. Change the logic for enumerating AMD/HYGON platforms Previously, it was X86_MATCH_FEATURE(X86_FEATURE_RAPL, &model_amd_hygon) And now it is X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd) X86_MATCH_VENDOR_FAM(AMD, 0x19, &rapl_defaults_amd) X86_MATCH_VENDOR_FAM(HYGON, 0x18, &rapl_defaults_amd) Signed-off-by: Zhang Rui --- Given that both of the CPU model lists describe the same feature that is available on the same set of hardwares, any bug during this transition suggests a gap in either of the previous code. So IMO, the risk is acceptable. --- arch/x86/events/Kconfig | 8 - arch/x86/events/Makefile | 1 - arch/x86/events/rapl.c | 871 ------------------------------ drivers/powercap/intel_rapl_msr.c | 2 + 4 files changed, 2 insertions(+), 880 deletions(-) delete mode 100644 arch/x86/events/rapl.c diff --git a/arch/x86/events/Kconfig b/arch/x86/events/Kconfig index dabdf3d7bf84..21f7ea817dcf 100644 --- a/arch/x86/events/Kconfig +++ b/arch/x86/events/Kconfig @@ -9,14 +9,6 @@ config PERF_EVENTS_INTEL_UNCORE Include support for Intel uncore performance events. These are available on NehalemEX and more modern processors. -config PERF_EVENTS_INTEL_RAPL - tristate "Intel/AMD rapl performance events" - depends on PERF_EVENTS && (CPU_SUP_INTEL || CPU_SUP_AMD) && PCI - default y - help - Include support for Intel and AMD rapl performance events for power - monitoring on modern processors. - config PERF_EVENTS_INTEL_CSTATE tristate "Intel cstate performance events" depends on PERF_EVENTS && CPU_SUP_INTEL && PCI diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile index 86a76efa8bb6..b9d51fa83b06 100644 --- a/arch/x86/events/Makefile +++ b/arch/x86/events/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y += core.o probe.o utils.o -obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += rapl.o obj-y += amd/ obj-$(CONFIG_X86_LOCAL_APIC) += msr.o obj-$(CONFIG_CPU_SUP_INTEL) += intel/ diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c deleted file mode 100644 index 8d98d468b976..000000000000 --- a/arch/x86/events/rapl.c +++ /dev/null @@ -1,871 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Support Intel/AMD RAPL energy consumption counters - * Copyright (C) 2013 Google, Inc., Stephane Eranian - * - * Intel RAPL interface is specified in the IA-32 Manual Vol3b - * section 14.7.1 (September 2013) - * - * AMD RAPL interface for Fam17h is described in the public PPR: - * https://bugzilla.kernel.org/show_bug.cgi?id=206537 - * - * RAPL provides more controls than just reporting energy consumption - * however here we only expose the 3 energy consumption free running - * counters (pp0, pkg, dram). - * - * Each of those counters increments in a power unit defined by the - * RAPL_POWER_UNIT MSR. On SandyBridge, this unit is 1/(2^16) Joules - * but it can vary. - * - * Counter to rapl events mappings: - * - * pp0 counter: consumption of all physical cores (power plane 0) - * event: rapl_energy_cores - * perf code: 0x1 - * - * pkg counter: consumption of the whole processor package - * event: rapl_energy_pkg - * perf code: 0x2 - * - * dram counter: consumption of the dram domain (servers only) - * event: rapl_energy_dram - * perf code: 0x3 - * - * gpu counter: consumption of the builtin-gpu domain (client only) - * event: rapl_energy_gpu - * perf code: 0x4 - * - * psys counter: consumption of the builtin-psys domain (client only) - * event: rapl_energy_psys - * perf code: 0x5 - * - * We manage those counters as free running (read-only). They may be - * use simultaneously by other tools, such as turbostat. - * - * The events only support system-wide mode counting. There is no - * sampling support because it does not make sense and is not - * supported by the RAPL hardware. - * - * Because we want to avoid floating-point operations in the kernel, - * the events are all reported in fixed point arithmetic (32.32). - * Tools must adjust the counts to convert them to Watts using - * the duration of the measurement. Tools may use a function such as - * ldexp(raw_count, -32); - */ - -#define pr_fmt(fmt) "RAPL PMU: " fmt - -#include -#include -#include -#include -#include -#include -#include "perf_event.h" -#include "probe.h" - -MODULE_LICENSE("GPL"); - -/* - * RAPL energy status counters - */ -enum perf_rapl_events { - PERF_RAPL_PP0 = 0, /* all cores */ - PERF_RAPL_PKG, /* entire package */ - PERF_RAPL_RAM, /* DRAM */ - PERF_RAPL_PP1, /* gpu */ - PERF_RAPL_PSYS, /* psys */ - - PERF_RAPL_MAX, - NR_RAPL_DOMAINS = PERF_RAPL_MAX, -}; - -static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { - "pp0-core", - "package", - "dram", - "pp1-gpu", - "psys", -}; - -/* - * event code: LSB 8 bits, passed in attr->config - * any other bit is reserved - */ -#define RAPL_EVENT_MASK 0xFFULL -#define RAPL_CNTR_WIDTH 32 - -#define RAPL_EVENT_ATTR_STR(_name, v, str) \ -static struct perf_pmu_events_attr event_attr_##v = { \ - .attr = __ATTR(_name, 0444, perf_event_sysfs_show, NULL), \ - .id = 0, \ - .event_str = str, \ -}; - -struct rapl_pmu { - raw_spinlock_t lock; - int n_active; - int cpu; - struct list_head active_list; - struct pmu *pmu; - ktime_t timer_interval; - struct hrtimer hrtimer; -}; - -struct rapl_pmus { - struct pmu pmu; - unsigned int maxdie; - struct rapl_pmu *pmus[] __counted_by(maxdie); -}; - -enum rapl_unit_quirk { - RAPL_UNIT_QUIRK_NONE, - RAPL_UNIT_QUIRK_INTEL_HSW, - RAPL_UNIT_QUIRK_INTEL_SPR, -}; - -struct rapl_model { - struct perf_msr *rapl_msrs; - unsigned long events; - unsigned int msr_power_unit; - enum rapl_unit_quirk unit_quirk; -}; - - /* 1/2^hw_unit Joule */ -static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly; -static struct rapl_pmus *rapl_pmus; -static cpumask_t rapl_cpu_mask; -static unsigned int rapl_cntr_mask; -static u64 rapl_timer_ms; -static struct perf_msr *rapl_msrs; - -static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu) -{ - unsigned int dieid = topology_logical_die_id(cpu); - - /* - * The unsigned check also catches the '-1' return value for non - * existent mappings in the topology map. - */ - return dieid < rapl_pmus->maxdie ? rapl_pmus->pmus[dieid] : NULL; -} - -static inline u64 rapl_read_counter(struct perf_event *event) -{ - u64 raw; - rdmsrl(event->hw.event_base, raw); - return raw; -} - -static inline u64 rapl_scale(u64 v, int cfg) -{ - if (cfg > NR_RAPL_DOMAINS) { - pr_warn("Invalid domain %d, failed to scale data\n", cfg); - return v; - } - /* - * scale delta to smallest unit (1/2^32) - * users must then scale back: count * 1/(1e9*2^32) to get Joules - * or use ldexp(count, -32). - * Watts = Joules/Time delta - */ - return v << (32 - rapl_hw_unit[cfg - 1]); -} - -static u64 rapl_event_update(struct perf_event *event) -{ - struct hw_perf_event *hwc = &event->hw; - u64 prev_raw_count, new_raw_count; - s64 delta, sdelta; - int shift = RAPL_CNTR_WIDTH; - - prev_raw_count = local64_read(&hwc->prev_count); - do { - rdmsrl(event->hw.event_base, new_raw_count); - } while (!local64_try_cmpxchg(&hwc->prev_count, - &prev_raw_count, new_raw_count)); - - /* - * Now we have the new raw value and have updated the prev - * timestamp already. We can now calculate the elapsed delta - * (event-)time and add that to the generic event. - * - * Careful, not all hw sign-extends above the physical width - * of the count. - */ - delta = (new_raw_count << shift) - (prev_raw_count << shift); - delta >>= shift; - - sdelta = rapl_scale(delta, event->hw.config); - - local64_add(sdelta, &event->count); - - return new_raw_count; -} - -static void rapl_start_hrtimer(struct rapl_pmu *pmu) -{ - hrtimer_start(&pmu->hrtimer, pmu->timer_interval, - HRTIMER_MODE_REL_PINNED); -} - -static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer) -{ - struct rapl_pmu *pmu = container_of(hrtimer, struct rapl_pmu, hrtimer); - struct perf_event *event; - unsigned long flags; - - if (!pmu->n_active) - return HRTIMER_NORESTART; - - raw_spin_lock_irqsave(&pmu->lock, flags); - - list_for_each_entry(event, &pmu->active_list, active_entry) - rapl_event_update(event); - - raw_spin_unlock_irqrestore(&pmu->lock, flags); - - hrtimer_forward_now(hrtimer, pmu->timer_interval); - - return HRTIMER_RESTART; -} - -static void rapl_hrtimer_init(struct rapl_pmu *pmu) -{ - struct hrtimer *hr = &pmu->hrtimer; - - hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - hr->function = rapl_hrtimer_handle; -} - -static void __rapl_pmu_event_start(struct rapl_pmu *pmu, - struct perf_event *event) -{ - if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) - return; - - event->hw.state = 0; - - list_add_tail(&event->active_entry, &pmu->active_list); - - local64_set(&event->hw.prev_count, rapl_read_counter(event)); - - pmu->n_active++; - if (pmu->n_active == 1) - rapl_start_hrtimer(pmu); -} - -static void rapl_pmu_event_start(struct perf_event *event, int mode) -{ - struct rapl_pmu *pmu = event->pmu_private; - unsigned long flags; - - raw_spin_lock_irqsave(&pmu->lock, flags); - __rapl_pmu_event_start(pmu, event); - raw_spin_unlock_irqrestore(&pmu->lock, flags); -} - -static void rapl_pmu_event_stop(struct perf_event *event, int mode) -{ - struct rapl_pmu *pmu = event->pmu_private; - struct hw_perf_event *hwc = &event->hw; - unsigned long flags; - - raw_spin_lock_irqsave(&pmu->lock, flags); - - /* mark event as deactivated and stopped */ - if (!(hwc->state & PERF_HES_STOPPED)) { - WARN_ON_ONCE(pmu->n_active <= 0); - pmu->n_active--; - if (pmu->n_active == 0) - hrtimer_cancel(&pmu->hrtimer); - - list_del(&event->active_entry); - - WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); - hwc->state |= PERF_HES_STOPPED; - } - - /* check if update of sw counter is necessary */ - if ((mode & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { - /* - * Drain the remaining delta count out of a event - * that we are disabling: - */ - rapl_event_update(event); - hwc->state |= PERF_HES_UPTODATE; - } - - raw_spin_unlock_irqrestore(&pmu->lock, flags); -} - -static int rapl_pmu_event_add(struct perf_event *event, int mode) -{ - struct rapl_pmu *pmu = event->pmu_private; - struct hw_perf_event *hwc = &event->hw; - unsigned long flags; - - raw_spin_lock_irqsave(&pmu->lock, flags); - - hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; - - if (mode & PERF_EF_START) - __rapl_pmu_event_start(pmu, event); - - raw_spin_unlock_irqrestore(&pmu->lock, flags); - - return 0; -} - -static void rapl_pmu_event_del(struct perf_event *event, int flags) -{ - rapl_pmu_event_stop(event, PERF_EF_UPDATE); -} - -static int rapl_pmu_event_init(struct perf_event *event) -{ - u64 cfg = event->attr.config & RAPL_EVENT_MASK; - int bit, ret = 0; - struct rapl_pmu *pmu; - - /* only look at RAPL events */ - if (event->attr.type != rapl_pmus->pmu.type) - return -ENOENT; - - /* check only supported bits are set */ - if (event->attr.config & ~RAPL_EVENT_MASK) - return -EINVAL; - - if (event->cpu < 0) - return -EINVAL; - - event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG; - - if (!cfg || cfg >= NR_RAPL_DOMAINS + 1) - return -EINVAL; - - cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1); - bit = cfg - 1; - - /* check event supported */ - if (!(rapl_cntr_mask & (1 << bit))) - return -EINVAL; - - /* unsupported modes and filters */ - if (event->attr.sample_period) /* no sampling */ - return -EINVAL; - - /* must be done before validate_group */ - pmu = cpu_to_rapl_pmu(event->cpu); - if (!pmu) - return -EINVAL; - event->cpu = pmu->cpu; - event->pmu_private = pmu; - event->hw.event_base = rapl_msrs[bit].msr; - event->hw.config = cfg; - event->hw.idx = bit; - - return ret; -} - -static void rapl_pmu_event_read(struct perf_event *event) -{ - rapl_event_update(event); -} - -static ssize_t rapl_get_attr_cpumask(struct device *dev, - struct device_attribute *attr, char *buf) -{ - return cpumap_print_to_pagebuf(true, buf, &rapl_cpu_mask); -} - -static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL); - -static struct attribute *rapl_pmu_attrs[] = { - &dev_attr_cpumask.attr, - NULL, -}; - -static struct attribute_group rapl_pmu_attr_group = { - .attrs = rapl_pmu_attrs, -}; - -RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01"); -RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02"); -RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03"); -RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04"); -RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05"); - -RAPL_EVENT_ATTR_STR(energy-cores.unit, rapl_cores_unit, "Joules"); -RAPL_EVENT_ATTR_STR(energy-pkg.unit , rapl_pkg_unit, "Joules"); -RAPL_EVENT_ATTR_STR(energy-ram.unit , rapl_ram_unit, "Joules"); -RAPL_EVENT_ATTR_STR(energy-gpu.unit , rapl_gpu_unit, "Joules"); -RAPL_EVENT_ATTR_STR(energy-psys.unit, rapl_psys_unit, "Joules"); - -/* - * we compute in 0.23 nJ increments regardless of MSR - */ -RAPL_EVENT_ATTR_STR(energy-cores.scale, rapl_cores_scale, "2.3283064365386962890625e-10"); -RAPL_EVENT_ATTR_STR(energy-pkg.scale, rapl_pkg_scale, "2.3283064365386962890625e-10"); -RAPL_EVENT_ATTR_STR(energy-ram.scale, rapl_ram_scale, "2.3283064365386962890625e-10"); -RAPL_EVENT_ATTR_STR(energy-gpu.scale, rapl_gpu_scale, "2.3283064365386962890625e-10"); -RAPL_EVENT_ATTR_STR(energy-psys.scale, rapl_psys_scale, "2.3283064365386962890625e-10"); - -/* - * There are no default events, but we need to create - * "events" group (with empty attrs) before updating - * it with detected events. - */ -static struct attribute *attrs_empty[] = { - NULL, -}; - -static struct attribute_group rapl_pmu_events_group = { - .name = "events", - .attrs = attrs_empty, -}; - -PMU_FORMAT_ATTR(event, "config:0-7"); -static struct attribute *rapl_formats_attr[] = { - &format_attr_event.attr, - NULL, -}; - -static struct attribute_group rapl_pmu_format_group = { - .name = "format", - .attrs = rapl_formats_attr, -}; - -static const struct attribute_group *rapl_attr_groups[] = { - &rapl_pmu_attr_group, - &rapl_pmu_format_group, - &rapl_pmu_events_group, - NULL, -}; - -static struct attribute *rapl_events_cores[] = { - EVENT_PTR(rapl_cores), - EVENT_PTR(rapl_cores_unit), - EVENT_PTR(rapl_cores_scale), - NULL, -}; - -static struct attribute_group rapl_events_cores_group = { - .name = "events", - .attrs = rapl_events_cores, -}; - -static struct attribute *rapl_events_pkg[] = { - EVENT_PTR(rapl_pkg), - EVENT_PTR(rapl_pkg_unit), - EVENT_PTR(rapl_pkg_scale), - NULL, -}; - -static struct attribute_group rapl_events_pkg_group = { - .name = "events", - .attrs = rapl_events_pkg, -}; - -static struct attribute *rapl_events_ram[] = { - EVENT_PTR(rapl_ram), - EVENT_PTR(rapl_ram_unit), - EVENT_PTR(rapl_ram_scale), - NULL, -}; - -static struct attribute_group rapl_events_ram_group = { - .name = "events", - .attrs = rapl_events_ram, -}; - -static struct attribute *rapl_events_gpu[] = { - EVENT_PTR(rapl_gpu), - EVENT_PTR(rapl_gpu_unit), - EVENT_PTR(rapl_gpu_scale), - NULL, -}; - -static struct attribute_group rapl_events_gpu_group = { - .name = "events", - .attrs = rapl_events_gpu, -}; - -static struct attribute *rapl_events_psys[] = { - EVENT_PTR(rapl_psys), - EVENT_PTR(rapl_psys_unit), - EVENT_PTR(rapl_psys_scale), - NULL, -}; - -static struct attribute_group rapl_events_psys_group = { - .name = "events", - .attrs = rapl_events_psys, -}; - -static bool test_msr(int idx, void *data) -{ - return test_bit(idx, (unsigned long *) data); -} - -/* Only lower 32bits of the MSR represents the energy counter */ -#define RAPL_MSR_MASK 0xFFFFFFFF - -static struct perf_msr intel_rapl_msrs[] = { - [PERF_RAPL_PP0] = { MSR_PP0_ENERGY_STATUS, &rapl_events_cores_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_PKG] = { MSR_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_RAM] = { MSR_DRAM_ENERGY_STATUS, &rapl_events_ram_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_PP1] = { MSR_PP1_ENERGY_STATUS, &rapl_events_gpu_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, &rapl_events_psys_group, test_msr, false, RAPL_MSR_MASK }, -}; - -static struct perf_msr intel_rapl_spr_msrs[] = { - [PERF_RAPL_PP0] = { MSR_PP0_ENERGY_STATUS, &rapl_events_cores_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_PKG] = { MSR_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_RAM] = { MSR_DRAM_ENERGY_STATUS, &rapl_events_ram_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_PP1] = { MSR_PP1_ENERGY_STATUS, &rapl_events_gpu_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_PSYS] = { MSR_PLATFORM_ENERGY_STATUS, &rapl_events_psys_group, test_msr, true, RAPL_MSR_MASK }, -}; - -/* - * Force to PERF_RAPL_MAX size due to: - * - perf_msr_probe(PERF_RAPL_MAX) - * - want to use same event codes across both architectures - */ -static struct perf_msr amd_rapl_msrs[] = { - [PERF_RAPL_PP0] = { 0, &rapl_events_cores_group, NULL, false, 0 }, - [PERF_RAPL_PKG] = { MSR_AMD_PKG_ENERGY_STATUS, &rapl_events_pkg_group, test_msr, false, RAPL_MSR_MASK }, - [PERF_RAPL_RAM] = { 0, &rapl_events_ram_group, NULL, false, 0 }, - [PERF_RAPL_PP1] = { 0, &rapl_events_gpu_group, NULL, false, 0 }, - [PERF_RAPL_PSYS] = { 0, &rapl_events_psys_group, NULL, false, 0 }, -}; - -static int rapl_cpu_offline(unsigned int cpu) -{ - struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu); - int target; - - /* Check if exiting cpu is used for collecting rapl events */ - if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask)) - return 0; - - pmu->cpu = -1; - /* Find a new cpu to collect rapl events */ - target = cpumask_any_but(topology_die_cpumask(cpu), cpu); - - /* Migrate rapl events to the new target */ - if (target < nr_cpu_ids) { - cpumask_set_cpu(target, &rapl_cpu_mask); - pmu->cpu = target; - perf_pmu_migrate_context(pmu->pmu, cpu, target); - } - return 0; -} - -static int rapl_cpu_online(unsigned int cpu) -{ - struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu); - int target; - - if (!pmu) { - pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); - if (!pmu) - return -ENOMEM; - - raw_spin_lock_init(&pmu->lock); - INIT_LIST_HEAD(&pmu->active_list); - pmu->pmu = &rapl_pmus->pmu; - pmu->timer_interval = ms_to_ktime(rapl_timer_ms); - rapl_hrtimer_init(pmu); - - rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu; - } - - /* - * Check if there is an online cpu in the package which collects rapl - * events already. - */ - target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu)); - if (target < nr_cpu_ids) - return 0; - - cpumask_set_cpu(cpu, &rapl_cpu_mask); - pmu->cpu = cpu; - return 0; -} - -static int rapl_check_hw_unit(struct rapl_model *rm) -{ - u64 msr_rapl_power_unit_bits; - int i; - - /* protect rdmsrl() to handle virtualization */ - if (rdmsrl_safe(rm->msr_power_unit, &msr_rapl_power_unit_bits)) - return -1; - for (i = 0; i < NR_RAPL_DOMAINS; i++) - rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL; - - switch (rm->unit_quirk) { - /* - * DRAM domain on HSW server and KNL has fixed energy unit which can be - * different than the unit from power unit MSR. See - * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2 - * of 2. Datasheet, September 2014, Reference Number: 330784-001 " - */ - case RAPL_UNIT_QUIRK_INTEL_HSW: - rapl_hw_unit[PERF_RAPL_RAM] = 16; - break; - /* SPR uses a fixed energy unit for Psys domain. */ - case RAPL_UNIT_QUIRK_INTEL_SPR: - rapl_hw_unit[PERF_RAPL_PSYS] = 0; - break; - default: - break; - } - - - /* - * Calculate the timer rate: - * Use reference of 200W for scaling the timeout to avoid counter - * overflows. 200W = 200 Joules/sec - * Divide interval by 2 to avoid lockstep (2 * 100) - * if hw unit is 32, then we use 2 ms 1/200/2 - */ - rapl_timer_ms = 2; - if (rapl_hw_unit[0] < 32) { - rapl_timer_ms = (1000 / (2 * 100)); - rapl_timer_ms *= (1ULL << (32 - rapl_hw_unit[0] - 1)); - } - return 0; -} - -static void __init rapl_advertise(void) -{ - int i; - - pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n", - hweight32(rapl_cntr_mask), rapl_timer_ms); - - for (i = 0; i < NR_RAPL_DOMAINS; i++) { - if (rapl_cntr_mask & (1 << i)) { - pr_info("hw unit of domain %s 2^-%d Joules\n", - rapl_domain_names[i], rapl_hw_unit[i]); - } - } -} - -static void cleanup_rapl_pmus(void) -{ - int i; - - for (i = 0; i < rapl_pmus->maxdie; i++) - kfree(rapl_pmus->pmus[i]); - kfree(rapl_pmus); -} - -static const struct attribute_group *rapl_attr_update[] = { - &rapl_events_cores_group, - &rapl_events_pkg_group, - &rapl_events_ram_group, - &rapl_events_gpu_group, - &rapl_events_psys_group, - NULL, -}; - -static int __init init_rapl_pmus(void) -{ - int maxdie = topology_max_packages() * topology_max_die_per_package(); - size_t size; - - size = sizeof(*rapl_pmus) + maxdie * sizeof(struct rapl_pmu *); - rapl_pmus = kzalloc(size, GFP_KERNEL); - if (!rapl_pmus) - return -ENOMEM; - - rapl_pmus->maxdie = maxdie; - rapl_pmus->pmu.attr_groups = rapl_attr_groups; - rapl_pmus->pmu.attr_update = rapl_attr_update; - rapl_pmus->pmu.task_ctx_nr = perf_invalid_context; - rapl_pmus->pmu.event_init = rapl_pmu_event_init; - rapl_pmus->pmu.add = rapl_pmu_event_add; - rapl_pmus->pmu.del = rapl_pmu_event_del; - rapl_pmus->pmu.start = rapl_pmu_event_start; - rapl_pmus->pmu.stop = rapl_pmu_event_stop; - rapl_pmus->pmu.read = rapl_pmu_event_read; - rapl_pmus->pmu.module = THIS_MODULE; - rapl_pmus->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE; - return 0; -} - -static struct rapl_model model_snb = { - .events = BIT(PERF_RAPL_PP0) | - BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_PP1), - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_msrs, -}; - -static struct rapl_model model_snbep = { - .events = BIT(PERF_RAPL_PP0) | - BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_RAM), - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_msrs, -}; - -static struct rapl_model model_hsw = { - .events = BIT(PERF_RAPL_PP0) | - BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_RAM) | - BIT(PERF_RAPL_PP1), - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_msrs, -}; - -static struct rapl_model model_hsx = { - .events = BIT(PERF_RAPL_PP0) | - BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_RAM), - .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW, - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_msrs, -}; - -static struct rapl_model model_knl = { - .events = BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_RAM), - .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW, - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_msrs, -}; - -static struct rapl_model model_skl = { - .events = BIT(PERF_RAPL_PP0) | - BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_RAM) | - BIT(PERF_RAPL_PP1) | - BIT(PERF_RAPL_PSYS), - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_msrs, -}; - -static struct rapl_model model_spr = { - .events = BIT(PERF_RAPL_PP0) | - BIT(PERF_RAPL_PKG) | - BIT(PERF_RAPL_RAM) | - BIT(PERF_RAPL_PSYS), - .unit_quirk = RAPL_UNIT_QUIRK_INTEL_SPR, - .msr_power_unit = MSR_RAPL_POWER_UNIT, - .rapl_msrs = intel_rapl_spr_msrs, -}; - -static struct rapl_model model_amd_hygon = { - .events = BIT(PERF_RAPL_PKG), - .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT, - .rapl_msrs = amd_rapl_msrs, -}; - -static const struct x86_cpu_id rapl_model_match[] __initconst = { - X86_MATCH_FEATURE(X86_FEATURE_RAPL, &model_amd_hygon), - X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE, &model_snb), - X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X, &model_snbep), - X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE, &model_snb), - X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE_X, &model_snbep), - X86_MATCH_INTEL_FAM6_MODEL(HASWELL, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, &model_hsx), - X86_MATCH_INTEL_FAM6_MODEL(HASWELL_L, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(HASWELL_G, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(BROADWELL, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_G, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, &model_hsx), - X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_D, &model_hsx), - X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &model_knl), - X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &model_knl), - X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &model_hsx), - X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(CANNONLAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_D, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_PLUS, &model_hsw), - X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(ICELAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &model_hsx), - X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &model_hsx), - X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(ATOM_GRACEMONT, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &model_spr), - X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, &model_spr), - X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE, &model_skl), - X86_MATCH_INTEL_FAM6_MODEL(METEORLAKE_L, &model_skl), - {}, -}; -MODULE_DEVICE_TABLE(x86cpu, rapl_model_match); - -static int __init rapl_pmu_init(void) -{ - const struct x86_cpu_id *id; - struct rapl_model *rm; - int ret; - - id = x86_match_cpu(rapl_model_match); - if (!id) - return -ENODEV; - - rm = (struct rapl_model *) id->driver_data; - - rapl_msrs = rm->rapl_msrs; - - rapl_cntr_mask = perf_msr_probe(rapl_msrs, PERF_RAPL_MAX, - false, (void *) &rm->events); - - ret = rapl_check_hw_unit(rm); - if (ret) - return ret; - - ret = init_rapl_pmus(); - if (ret) - return ret; - - /* - * Install callbacks. Core will call them for each online cpu. - */ - ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE, - "perf/x86/rapl:online", - rapl_cpu_online, rapl_cpu_offline); - if (ret) - goto out; - - ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1); - if (ret) - goto out1; - - rapl_advertise(); - return 0; - -out1: - cpuhp_remove_state(CPUHP_AP_PERF_X86_RAPL_ONLINE); -out: - pr_warn("Initialization failed (%d), disabled\n", ret); - cleanup_rapl_pmus(); - return ret; -} -module_init(rapl_pmu_init); - -static void __exit intel_rapl_exit(void) -{ - cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE); - perf_pmu_unregister(&rapl_pmus->pmu); - cleanup_rapl_pmus(); -} -module_exit(intel_rapl_exit); diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c index b4b6930cacb0..564623471a68 100644 --- a/drivers/powercap/intel_rapl_msr.c +++ b/drivers/powercap/intel_rapl_msr.c @@ -53,6 +53,7 @@ static struct rapl_if_priv rapl_msr_priv_intel = { .regs[RAPL_DOMAIN_PLATFORM][RAPL_DOMAIN_REG_STATUS].msr = MSR_PLATFORM_ENERGY_STATUS, .limits[RAPL_DOMAIN_PACKAGE] = BIT(POWER_LIMIT2), .limits[RAPL_DOMAIN_PLATFORM] = BIT(POWER_LIMIT2), + .enable_pmu = true, }; static struct rapl_if_priv rapl_msr_priv_amd = { @@ -60,6 +61,7 @@ static struct rapl_if_priv rapl_msr_priv_amd = { .reg_unit.msr = MSR_AMD_RAPL_POWER_UNIT, .regs[RAPL_DOMAIN_PACKAGE][RAPL_DOMAIN_REG_STATUS].msr = MSR_AMD_PKG_ENERGY_STATUS, .regs[RAPL_DOMAIN_PP0][RAPL_DOMAIN_REG_STATUS].msr = MSR_AMD_CORE_ENERGY_STATUS, + .enable_pmu = true, }; /* Handles CPU hotplug on multi-socket systems.