[v6,7/7,Update] cpufreq: schedutil: New governor based on scheduler utilization data

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] cpufreq: schedutil: New governor based on scheduler utilization data

Add a new cpufreq scaling governor, called "schedutil", that uses
scheduler-provided CPU utilization information as input for making
its decisions.

Doing that is possible after commit 34e2c555f3e1 (cpufreq: Add
mechanism for registering utilization update callbacks) that
introduced cpufreq_update_util() called by the scheduler on
utilization changes (from CFS) and RT/DL task status updates.
In particular, CPU frequency scaling decisions may be based on
the the utilization data passed to cpufreq_update_util() by CFS.

The new governor is relatively simple.

The frequency selection formula used by it depends on whether or not
the utilization is frequency-invariant.  In the frequency-invariant
case the new CPU frequency is given by

	next_freq = 1.25 * max_freq * util / max

where util and max are the last two arguments of cpufreq_update_util().
In turn, if util is not frequency-invariant, the maximum frequency in
the above formula is replaced with the current frequency of the CPU:

	next_freq = 1.25 * curr_freq * util / max

The coefficient 1.25 corresponds to the frequency tipping point at
(util / max) = 0.8.

All of the computations are carried out in the utilization update
handlers provided by the new governor.  One of those handlers is
used for cpufreq policies shared between multiple CPUs and the other
one is for policies with one CPU only (and therefore it doesn't need
to use any extra synchronization means).

The governor supports fast frequency switching if that is supported
by the cpufreq driver in use and possible for the given policy.
In the fast switching case, all operations of the governor take
place in its utilization update handlers.  If fast switching cannot
be used, the frequency switch operations are carried out with the
help of a work item which only calls __cpufreq_driver_target()
(under a mutex) to trigger a frequency update (to a value already
computed beforehand in one of the utilization update handlers).

Currently, the governor treats all of the RT and DL tasks as
"unknown utilization" and sets the frequency to the allowed
maximum when updated from the RT or DL sched classes.  That
heavy-handed approach should be replaced with something more
subtle and specifically targeted at RT and DL tasks.

The governor shares some tunables management code with the
"ondemand" and "conservative" governors and uses some common
definitions from cpufreq_governor.h, but apart from that it
is stand-alone.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Addressing comments from Peter and Juri, fixes.

Changes from v5:
- Fixed sugov_update_commit() to set sg_policy->next_freq properly
  in the "work item" branch.
- Used smp_processor_id() in sugov_irq_work() and restored work_in_progress.

Changes from v4:
- Use TICK_NSEC in sugov_next_freq_shared().
- Use schedule_work_on() to schedule work items and replace
  work_in_progress with work_cpu (which is used both for scheduling
  work items and as a "work in progress" marker).
- Rearrange sugov_update_commit() to only check policy->min/max if
  fast switching is enabled.
- Replace util > max checks with util == ULONG_MAX checks to make
  it clear that they are about a special case (RT/DL).

Changes from v3:
- The "next frequency" formula based on
  http://marc.info/?l=linux-acpi&m=145756618321500&w=4 and
  http://marc.info/?l=linux-kernel&m=145760739700716&w=4
- The governor goes into kernel/sched/ (again).

Changes from v2:
- The governor goes into drivers/cpufreq/.
- The "next frequency" formula has an additional 1.1 factor to allow
  more util/max values to map onto the top-most frequency in case the
  distance between that and the previous one is unproportionally small.
- sugov_update_commit() traces CPU frequency even if the new one is
  the same as the previous one (otherwise, if the system is 100% loaded
  for long enough, powertop starts to report that all CPUs are 100% idle).

---
 drivers/cpufreq/Kconfig          |   26 +
 kernel/sched/Makefile            |    1 
 kernel/sched/cpufreq_schedutil.c |  528 +++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h             |    8 
 4 files changed, 563 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	1614814.usHvZ58O6A@vostro.rjw.lan (mailing list archive)
State	Superseded, archived
Delegated to:	Rafael Wysocki
Headers	show Return-Path: <linux-pm-owner@kernel.org> X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 8D55AC0553 for <patchwork-linux-pm@patchwork.kernel.org>; Thu, 17 Mar 2016 16:00:05 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C76E420361 for <patchwork-linux-pm@patchwork.kernel.org>; Thu, 17 Mar 2016 16:00:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CC97520373 for <patchwork-linux-pm@patchwork.kernel.org>; Thu, 17 Mar 2016 16:00:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935445AbcCQP76 (ORCPT <rfc822;patchwork-linux-pm@patchwork.kernel.org>); Thu, 17 Mar 2016 11:59:58 -0400 Received: from v094114.home.net.pl ([79.96.170.134]:57585 "HELO v094114.home.net.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S935295AbcCQP74 (ORCPT <rfc822;linux-pm@vger.kernel.org>); Thu, 17 Mar 2016 11:59:56 -0400 Received: from agow246.neoplus.adsl.tpnet.pl (217.99.254.246) (HELO vostro.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer v0.80.1) id b9bf04cc8a7ad224; Thu, 17 Mar 2016 16:59:54 +0100 From: "Rafael J. Wysocki" <rjw@rjwysocki.net> To: Linux PM list <linux-pm@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@arm.com> Cc: Steve Muckle <steve.muckle@linaro.org>, ACPI Devel Maling List <linux-acpi@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>, Viresh Kumar <viresh.kumar@linaro.org>, Vincent Guittot <vincent.guittot@linaro.org>, Michael Turquette <mturquette@baylibre.com>, Ingo Molnar <mingo@kernel.org> Subject: [PATCH v6 7/7][Update] cpufreq: schedutil: New governor based on scheduler utilization data Date: Thu, 17 Mar 2016 17:01:59 +0100 Message-ID: <1614814.usHvZ58O6A@vostro.rjw.lan> User-Agent: KMail/4.11.5 (Linux/4.5.0-rc1+; KDE/4.11.5; x86_64; ; ) In-Reply-To: <1711281.bPmSjlBT7c@vostro.rjw.lan> References: <2495375.dFbdlAZmA6@vostro.rjw.lan> <4088601.C2vItRYpQn@vostro.rjw.lan> <1711281.bPmSjlBT7c@vostro.rjw.lan> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="utf-8" Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: <linux-pm.vger.kernel.org> X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP

[v6,7/7,Update] cpufreq: schedutil: New governor based on scheduler utilization data

Commit Message

Comments

Patch