From patchwork Sat Jun 24 05:11:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Len Brown X-Patchwork-Id: 9807585 X-Patchwork-Delegate: rjw@sisk.pl Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 60ACE6086C for ; Sat, 24 Jun 2017 05:12:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4F3942854A for ; Sat, 24 Jun 2017 05:12:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 431F628558; Sat, 24 Jun 2017 05:12:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8F7D28555 for ; Sat, 24 Jun 2017 05:12:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751264AbdFXFMF (ORCPT ); Sat, 24 Jun 2017 01:12:05 -0400 Received: from mail-qt0-f196.google.com ([209.85.216.196]:36174 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751184AbdFXFMD (ORCPT ); Sat, 24 Jun 2017 01:12:03 -0400 Received: by mail-qt0-f196.google.com with SMTP id v31so3721650qtb.3; Fri, 23 Jun 2017 22:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:reply-to:organization; bh=jzNGKc07Kpg/Q4lTsjeIw4PoFCwVHo7/80WBdWvmcxI=; b=ukfZtU8VRfFVGMoKWqNZw7Pq0hoAhHAKwTdHVDJg6mRx2Y5XY0Sl8N3azTAUi00CE8 cV/ZVdJt7UmWW/bp8G3l4QwQMitcWrXmlzd73VrTHQ2uhJZfZeLyHJ2QJ8BQj4ZW6glK wFCg+McCE0V7DOZ1TE9f4XZSpl43soy66CXjDXazIXFIDO6F9r7A+74hHEeNM3uiPOT+ jHTUPLlM3K+daZEj9LZX2ndmvQxLjTZpK5JTO2kF9daQzpPnbKOlpUl2Z1c7vHSsaUyc B7a+PJbF3vhNvlH0XiE2tXJdPyFXdu5U27UO5bXF5L9KAGor6YAnKVpp46pWOOwGqwoR 9lAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:in-reply-to:references:reply-to:organization; bh=jzNGKc07Kpg/Q4lTsjeIw4PoFCwVHo7/80WBdWvmcxI=; b=nkFxVbVtFekQ/nzasH9H7Fed+RodfW1oWu026CExaH30GHngkUPuSImeaJ6gdc6QN3 vXVrXLiNrjO7dNSaCluHlKInLbKhVXr5RQ9BjjvZy63XZ/qXoT/YTlxGLu6wep+MZqIC ltOUZ9rNuYaA5cAyURUlrSp4n0ZvPYcJkJQN0/1r/oOJJkv4S3Pucp1kqbImhIWQOBcm Jv1oSzA9mmiA1+F91BpuYGORAmqU1eq/PdVpU2GqlXN4OUiXeMiChQ1X3NQgHNGh3OXI BounMl7gEmJILhGfZNb2411Vxc6jTCOKjyOfNwX8bdoVwpMc1ffHZgX3Vuq+1hVw/3Wf 7CAg== X-Gm-Message-State: AKS2vOxyI8tj0jFitCVUmK8wL9NYncaWcWAxD/laKFpvzIMGyi5brSoL sZhEQmTCPPCq5mi3 X-Received: by 10.237.32.202 with SMTP id 68mr14142059qtb.128.1498281122096; Fri, 23 Jun 2017 22:12:02 -0700 (PDT) Received: from localhost.localdomain (pool-173-48-65-169.bstnma.fios.verizon.net. [173.48.65.169]) by smtp.gmail.com with ESMTPSA id k6sm5060704qtk.10.2017.06.23.22.12.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 23 Jun 2017 22:12:01 -0700 (PDT) From: Len Brown To: rafael@kernel.org, tglx@linutronix.de Cc: x86@kernel.org, srinivas.pandruvada@linux.intel.com, peterz@infradead.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Len Brown Subject: [PATCH 2/4 v2] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF Date: Fri, 23 Jun 2017 22:11:52 -0700 Message-Id: <7c8784a63cdab6ff5ff7756060be6a77f5fe5915.1498280509.git.len.brown@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1498281114-3868-1-git-send-email-lenb@kernel.org> References: <1498281114-3868-1-git-send-email-lenb@kernel.org> In-Reply-To: References: Reply-To: Len Brown Organization: Intel Open Source Technology Center Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Len Brown The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown Reviewed-by: Thomas Gleixner --- arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/aperfmperf.c | 79 ++++++++++++++++++++++++++++++++++++++++ drivers/cpufreq/cpufreq.c | 12 +++++- include/linux/cpufreq.h | 2 + 4 files changed, 93 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/cpu/aperfmperf.c diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 5200001..cdf8249 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -21,6 +21,7 @@ obj-y += common.o obj-y += rdrand.o obj-y += match.o obj-y += bugs.o +obj-$(CONFIG_CPU_FREQ) += aperfmperf.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c new file mode 100644 index 0000000..d869c86 --- /dev/null +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -0,0 +1,79 @@ +/* + * x86 APERF/MPERF KHz calculation for + * /sys/.../cpufreq/scaling_cur_freq + * + * Copyright (C) 2017 Intel Corp. + * Author: Len Brown + * + * This file is licensed under GPLv2. + */ + +#include +#include +#include +#include + +struct aperfmperf_sample { + unsigned int khz; + unsigned long jiffies; + u64 aperf; + u64 mperf; +}; + +static DEFINE_PER_CPU(struct aperfmperf_sample, samples); + +/* + * aperfmperf_snapshot_khz() + * On the current CPU, snapshot APERF, MPERF, and jiffies + * unless we already did it within 10ms + * calculate kHz, save snapshot + */ +static void aperfmperf_snapshot_khz(void *dummy) +{ + u64 aperf, aperf_delta; + u64 mperf, mperf_delta; + struct aperfmperf_sample *s = this_cpu_ptr(&samples); + + /* Don't bother re-computing within 10 ms */ + if (time_before(jiffies, s->jiffies + HZ/100)) + return; + + rdmsrl(MSR_IA32_APERF, aperf); + rdmsrl(MSR_IA32_MPERF, mperf); + + aperf_delta = aperf - s->aperf; + mperf_delta = mperf - s->mperf; + + /* + * There is no architectural guarantee that MPERF + * increments faster than we can read it. + */ + if (mperf_delta == 0) + return; + + /* + * if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then + * khz = (cpu_khz * aperf_delta) / mperf_delta + */ + if (div64_u64(ULLONG_MAX, cpu_khz) > aperf_delta) + s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); + else /* khz = aperf_delta / (mperf_delta / cpu_khz) */ + s->khz = div64_u64(aperf_delta, + div64_u64(mperf_delta, cpu_khz)); + s->jiffies = jiffies; + s->aperf = aperf; + s->mperf = mperf; +} + +unsigned int arch_freq_get_on_cpu(int cpu) +{ + if (!cpu_khz) + return 0; + + if (!static_cpu_has(X86_FEATURE_APERFMPERF)) + return 0; + + smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); + + return per_cpu(samples.khz, cpu); +} diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 26b643d..6e7424d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -632,11 +632,21 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency); show_one(scaling_min_freq, min); show_one(scaling_max_freq, max); +__weak unsigned int arch_freq_get_on_cpu(int cpu) +{ + return 0; +} + static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf) { ssize_t ret; + unsigned int freq; - if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get) + freq = arch_freq_get_on_cpu(policy->cpu); + if (freq) + ret = sprintf(buf, "%u\n", freq); + else if (cpufreq_driver && cpufreq_driver->setpolicy && + cpufreq_driver->get) ret = sprintf(buf, "%u\n", cpufreq_driver->get(policy->cpu)); else ret = sprintf(buf, "%u\n", policy->cur); diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index a5ce0bbe..905117b 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -883,6 +883,8 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy) } #endif +extern unsigned int arch_freq_get_on_cpu(int cpu); + /* the following are really really optional */ extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs; extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;