[v4,01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping

The SCHED_DEADLINE scheduling class provides an advanced and formal
model to define tasks requirements which can be translated into proper
decisions for both task placements and frequencies selections.
Other classes have a more simplified model which is essentially based on
the relatively simple concept of POSIX priorities.

Such a simple priority based model however does not allow to exploit
some of the most advanced features of the Linux scheduler like, for
example, driving frequencies selection via the schedutil cpufreq
governor. However, also for non SCHED_DEADLINE tasks, it's still
interesting to define tasks properties which can be used to better
support certain scheduler decisions.

Utilization clamping aims at exposing to user-space a new set of
per-task attributes which can be used to provide the scheduler with some
hints about the expected/required utilization for a task.
This will allow to implement a more advanced per-task frequency control
mechanism which is not based just on a "passive" measured task
utilization but on a more "active" approach. For example, it could be
possible to boost interactive tasks, thus getting better performance, or
cap background tasks, thus being more energy efficient.
Ultimately, such a mechanism can be considered similar to the cpufreq's
powersave, performance and userspace governor but with a much fine
grained and per-task control.

Let's introduce a new API to set utilization clamping values for a
specified task by extending sched_setattr, a syscall which already
allows to define task specific properties for different scheduling
classes.
Specifically, a new pair of attributes allows to specify a minimum and
maximum utilization which the scheduler should consider for a task.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-api@vger.kernel.org

---
Changes in v4:
 Message-ID: <87897157-0b49-a0be-f66c-81cc2942b4dd@infradead.org>
 - remove not required default setting
 - fixed some tabs/spaces
 Message-ID: <20180807095905.GB2288@localhost.localdomain>
 - replace/rephrase "bandwidth" references to use "capacity"
 - better stress that this do not enforce any bandwidth requirement
   but "just" give hints to the scheduler
 - fixed some typos
 Others:
 - add support for SCHED_FLAG_RESET_ON_FORK
   default clamps are now set for init_task and inherited/reset at
   fork time (when then flag is set for the parent)
 - rebased on v4.19-rc1

Changes in v3:
 Message-ID: <CAJuCfpF6=L=0LrmNnJrTNPazT4dWKqNv+thhN0dwpKCgUzs9sg@mail.gmail.com>
 - removed UCLAMP_NONE not used by this patch
 Others:
 - rebased on tip/sched/core
Changes in v2:
 - rebased on v4.18-rc4
 - move at the head of the series

As discussed at OSPM, using a [0..SCHED_CAPACITY_SCALE] range seems to
be acceptable. However, an additional patch has been added at the end of
the series which introduces a simple abstraction to use a more
generic [0..100] range.

At OSPM we also discarded the idea to "recycle" the usage of
sched_runtime and sched_period which would have made the API too
much complex for limited benefits.
---
 include/linux/sched.h            | 13 +++++++
 include/uapi/linux/sched.h       |  4 +-
 include/uapi/linux/sched/types.h | 66 +++++++++++++++++++++++++++-----
 init/Kconfig                     | 21 ++++++++++
 init/init_task.c                 |  5 +++
 kernel/sched/core.c              | 39 +++++++++++++++++++
 6 files changed, 138 insertions(+), 10 deletions(-)

Message ID	20180828135324.21976-2-patrick.bellasi@arm.com (mailing list archive)
State	Changes Requested, archived
Headers	show Return-Path: <linux-pm-owner@kernel.org> From: Patrick Bellasi <patrick.bellasi@arm.com> To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>, Viresh Kumar <viresh.kumar@linaro.org>, Vincent Guittot <vincent.guittot@linaro.org>, Paul Turner <pjt@google.com>, Quentin Perret <quentin.perret@arm.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Morten Rasmussen <morten.rasmussen@arm.com>, Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>, Joel Fernandes <joelaf@google.com>, Steve Muckle <smuckle@google.com>, Suren Baghdasaryan <surenb@google.com>, Randy Dunlap <rdunlap@infradead.org>, linux-api@vger.kernel.org Subject: [PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping Date: Tue, 28 Aug 2018 14:53:09 +0100 Message-Id: <20180828135324.21976-2-patrick.bellasi@arm.com> In-Reply-To: <20180828135324.21976-1-patrick.bellasi@arm.com> References: <20180828135324.21976-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk
Series	Add utilization clamping support \| expand [v4,00/16] Add utilization clamping support [v4,01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping [v4,02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups [v4,03/16] sched/core: uclamp: add CPU's clamp groups accounting [v4,04/16] sched/core: uclamp: update CPU's refcount on clamp changes [v4,05/16] sched/core: uclamp: enforce last task UCLAMP_MAX [v4,06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks [v4,07/16] sched/core: uclamp: extend cpu's cgroup controller [v4,08/16] sched/core: uclamp: propagate parent clamps [v4,09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups [v4,10/16] sched/core: uclamp: use TG's clamps to restrict Task's clamps [v4,11/16] sched/core: uclamp: add system default clamps [v4,12/16] sched/core: uclamp: update CPU's refcount on TG's clamp changes [v4,13/16] sched/core: uclamp: use percentage clamp values [v4,14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default [v4,15/16] sched/core: uclamp: add clamp group discretization support [v4,16/16] sched/cpufreq: uclamp: add utilization clamping for RT tasks

[v4,01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping

Commit Message

Comments

Patch