From patchwork Fri Apr 1 12:40:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Rafael J. Wysocki" X-Patchwork-Id: 8723461 X-Patchwork-Delegate: rjw@sisk.pl Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 16728C0553 for ; Fri, 1 Apr 2016 12:38:14 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E9A22203B8 for ; Fri, 1 Apr 2016 12:38:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C8BEA20398 for ; Fri, 1 Apr 2016 12:38:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752972AbcDAMhx (ORCPT ); Fri, 1 Apr 2016 08:37:53 -0400 Received: from cloudserver094114.home.net.pl ([79.96.170.134]:44457 "HELO cloudserver094114.home.net.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752102AbcDAMhw convert rfc822-to-8bit (ORCPT ); Fri, 1 Apr 2016 08:37:52 -0400 Received: from cmv128.neoplus.adsl.tpnet.pl (83.31.149.128) (HELO vostro.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer v0.80.1) id 1dd7cba1ba62be42; Fri, 1 Apr 2016 14:37:50 +0200 From: "Rafael J. Wysocki" To: =?ISO-8859-1?Q?J=F6rg?= Otte Cc: "Rafael J. Wysocki" , Linux Kernel Mailing List , Linux PM list , Srinivas Pandruvada , Doug Smythies Subject: Re: [intel-pstate driver regression] processor frequency very high even if in idle Date: Fri, 01 Apr 2016 14:40:15 +0200 Message-ID: <3623107.tlAuqH4F7s@vostro.rjw.lan> User-Agent: KMail/4.11.5 (Linux/4.5.0-rc1+; KDE/4.11.5; x86_64; ; ) In-Reply-To: References: <2727017.UmaUvtBLeX@vostro.rjw.lan> MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Friday, April 01, 2016 11:20:42 AM Jörg Otte wrote: > 2016-03-31 17:43 GMT+02:00 Rafael J. Wysocki : > > On Thursday, March 31, 2016 05:25:18 PM Jörg Otte wrote: > >> 2016-03-31 13:42 GMT+02:00 Rafael J. Wysocki : > >> > On Thursday, March 31, 2016 11:05:56 AM Jörg Otte wrote: > >> > > >> > [cut] > >> > > >> >> > > >> >> > >> >> Yes, works for me. > >> >> > >> >> CPUID(7): No-SGX > >> >> CPU Avg_MHz Busy% Bzy_MHz TSC_MHz > >> >> - 11 0.66 1682 2494 > >> >> 0 11 0.60 1856 2494 > >> >> 1 6 0.34 1898 2494 > >> >> 2 13 0.82 1628 2494 > >> >> 3 13 0.87 1528 2494 > >> >> CPU Avg_MHz Busy% Bzy_MHz TSC_MHz > >> >> - 6 0.58 963 2494 > >> >> 0 8 0.83 957 2494 > >> >> 1 1 0.08 984 2494 > >> >> 2 10 1.04 975 2494 > >> >> 3 3 0.35 934 2494 > >> >> > >> > > > > > [cut] > > > >> > > >> > >> No, this patch doesn't help. > > > > Well, more work to do then. > > > > I've just noticed a bug in this patch, which is not relevant for the results, > > but below is a new version. > > > >> CPUID(7): No-SGX > >> CPU Avg_MHz Busy% Bzy_MHz TSC_MHz > >> - 8 0.32 2507 2495 > >> 0 13 0.53 2505 2495 > >> 1 3 0.11 2523 2495 > >> 2 1 0.06 2555 2495 > >> 3 15 0.59 2500 2495 > >> CPU Avg_MHz Busy% Bzy_MHz TSC_MHz > >> - 8 0.33 2486 2495 > >> 0 12 0.50 2482 2495 > >> 1 5 0.22 2489 2495 > >> 2 1 0.04 2492 2495 > >> 3 15 0.59 2487 2495 > > [cut] > > here they are. > Thanks! First of all, the sampling mechanics works as expected in the failing case, which is the most important thing I wanted to know. However, there are anomalies in the failing case trace. The core_busy column is clearly suspicious and it looks like CPUs 2 and 3 never really go idle. I guess we'll need to find out why they don't go idle to get to the bottom of this, but it firmly falls into the weird stuff territory already. In the meantime, below is one more patch to test, on top of the previous one (that is, https://patchwork.kernel.org/patch/8714401/). Again, this is a change I'd like to make regardless, so it would be good to know if anything more has to be done before we go further. --- From: Rafael J. Wysocki Subject: [PATCH] intel_pstate: Avoid extra invocation of intel_pstate_sample() The initialization of intel_pstate for a given CPU involves populating the fields of its struct cpudata that represent the previous sample, but currently that is done in a problematic way. Namely, intel_pstate_init_cpu() makes an extra call to intel_pstate_sample() so it reads the current register values that will be used to populate the "previous sample" record during the next invocation of intel_pstate_sample(). However, after commit a4675fbc4a7a (cpufreq: intel_pstate: Replace timers with utilization update callbacks) that doesn't work for last_sample_time, because the time value is passed to intel_pstate_sample() as an argument now. Passing 0 to it from intel_pstate_init_cpu() is problematic, because that causes cpu->last_sample_time == 0 to be visible in get_target_pstate_use_performance() (and hence the extra cpu->last_sample_time > 0 check in there) and effectively allows the first invocation of intel_pstate_sample() from intel_pstate_update_util() to happen immediately after the initialization which may lead to a significant "turn on" effect in the governor algorithm. To mitigate that issue, rework the initialization to avoid the extra intel_pstate_sample() call from intel_pstate_init_cpu(). Instead, make intel_pstate_sample() return false if it has been called with cpu->sample.time equal to zero, which will make intel_pstate_update_util() skip the sample in that case, and reset cpu->sample.time from intel_pstate_set_update_util_hook() to make the algorithm start properly every time the hook is set. Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Index: linux-pm/drivers/cpufreq/intel_pstate.c =================================================================== --- linux-pm.orig/drivers/cpufreq/intel_pstate.c +++ linux-pm/drivers/cpufreq/intel_pstate.c @@ -910,7 +910,14 @@ static inline bool intel_pstate_sample(s cpu->prev_aperf = aperf; cpu->prev_mperf = mperf; cpu->prev_tsc = tsc; - return true; + /* + * First time this function is invoked in a given cycle, all of the + * previous sample data fields are equal to zero or stale and they must + * be populated with meaningful numbers for things to work, so assume + * that sample.time will always be reset before setting the utilization + * update hook and make the caller skip the sample then. + */ + return !!cpu->last_sample_time; } static inline int32_t get_avg_frequency(struct cpudata *cpu) @@ -984,8 +991,7 @@ static inline int32_t get_target_pstate_ * enough period of time to adjust our busyness. */ duration_ns = cpu->sample.time - cpu->last_sample_time; - if ((s64)duration_ns > pid_params.sample_rate_ns * 3 - && cpu->last_sample_time > 0) { + if ((s64)duration_ns > pid_params.sample_rate_ns * 3) { sample_ratio = div_fp(int_tofp(pid_params.sample_rate_ns), int_tofp(duration_ns)); core_busy = mul_fp(core_busy, sample_ratio); @@ -1100,7 +1106,6 @@ static int intel_pstate_init_cpu(unsigne intel_pstate_get_cpu_pstates(cpu); intel_pstate_busy_pid_reset(cpu); - intel_pstate_sample(cpu, 0); cpu->update_util.func = intel_pstate_update_util; @@ -1121,9 +1126,13 @@ static unsigned int intel_pstate_get(uns return get_avg_frequency(cpu); } -static void intel_pstate_set_update_util_hook(unsigned int cpu) +static void intel_pstate_set_update_util_hook(unsigned int cpu_num) { - cpufreq_set_update_util_data(cpu, &all_cpu_data[cpu]->update_util); + struct cpudata *cpu = all_cpu_data[cpu_num]; + + /* Prevent intel_pstate_update_util() from using stale data. */ + cpu->sample.time = 0; + cpufreq_set_update_util_data(cpu_num, &cpu->update_util); } static void intel_pstate_clear_update_util_hook(unsigned int cpu)