[v3,0/2] Exposing nice CPU usage to userspace

Message ID	20240923142006.3592304-1-joshua.hahnjy@gmail.com (mailing list archive)
Headers	show Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0EAF719F473; Mon, 23 Sep 2024 14:20:08 +0000 (UTC) From: Joshua Hahn <joshua.hahnjy@gmail.com> To: tj@kernel.org Cc: cgroups@vger.kernel.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, lizefan.x@bytedance.com, mkoutny@suse.com, shuah@kernel.org Subject: [PATCH v3 0/2] Exposing nice CPU usage to userspace Date: Mon, 23 Sep 2024 07:20:04 -0700 Message-ID: <20240923142006.3592304-1-joshua.hahnjy@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Exposing nice CPU usage to userspace \| expand [v3,0/2] Exposing nice CPU usage to userspace [v3,1/2] cgroup/rstat: Tracking cgroup-level niced CPU time [v3,2/2] cgroup/rstat: Selftests for niced CPU statistics

Message ID

20240923142006.3592304-1-joshua.hahnjy@gmail.com (mailing list archive)

Headers

From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: tj@kernel.org
Cc: cgroups@vger.kernel.org,
	hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	lizefan.x@bytedance.com,
	mkoutny@suse.com,
	shuah@kernel.org
Subject: [PATCH v3 0/2] Exposing nice CPU usage to userspace 
Date: Mon, 23 Sep 2024 07:20:04 -0700
Message-ID: <20240923142006.3592304-1-joshua.hahnjy@gmail.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Exposing nice CPU usage to userspace | expand

Message

Joshua Hahn Sept. 23, 2024, 2:20 p.m. UTC

From: Joshua Hahn <joshua.hahn6@gmail.com>

v2 -> v3: Signed-off-by & renamed subject for clarity.
v1 -> v2: Edited commit messages for clarity.

Niced CPU usage is a metric reported in host-level /prot/stat, but is
not reported in cgroup-level statistics in cpu.stat. However, when a
host contains multiple tasks across different workloads, it becomes
difficult to gauge how much of the task is being spent on niced
processes based on /proc/stat alone, since host-level metrics do not
provide this cgroup-level granularity.

Exposing this metric will allow users to accurately probe the niced CPU
metric for each workload, and make more informed decisions when
directing higher priority tasks.

Joshua Hahn (2):
  Tracking cgroup-level niced CPU time
  Selftests for niced CPU statistics

 include/linux/cgroup-defs.h               |  1 +
 kernel/cgroup/rstat.c                     | 16 ++++-
 tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++
 3 files changed, 86 insertions(+), 3 deletions(-)

Comments

Michal Koutný Sept. 26, 2024, 6:10 p.m. UTC | #1

On Mon, Sep 23, 2024 at 07:20:04AM GMT, Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> From: Joshua Hahn <joshua.hahn6@gmail.com>
> 
> v2 -> v3: Signed-off-by & renamed subject for clarity.
> v1 -> v2: Edited commit messages for clarity.

Thanks for the version changelog, appreciated!

...
> Exposing this metric will allow users to accurately probe the niced CPU
> metric for each workload, and make more informed decisions when
> directing higher priority tasks.

Possibly an example of how this value (combined with some other?) is
used for decisions could shed some light on this and justify adding this
attribute.

Thanks,
Michal

(I'll respond here to Tejun's message from v2 thread.)

On Tue, Sep 10, 2024 at 11:01:07AM GMT, Tejun Heo <tj@kernel.org> wrote:
> I think it's as useful as system-wide nice metric is.

Exactly -- and I don't understand how that system-wide value (without
any cgroups) is useful.
If I don't know how many there are niced and non-niced tasks and what
their runnable patterns are, the aggregated nice time can have ambiguous
interpretations.

> I think there are benefits to mirroring system wide metrics, at least
> ones as widely spread as nice.

I agree with benefits of mirroring of some system wide metrics when they
are useful <del>but not all of them because it's difficult/impossible to take
them away once they're exposed</del>. Actually, readers _should_ handle
missing keys gracefuly, so this may be just fine.

(Is this nice time widely spread? (I remember the field from `top`, still
not sure how to use it.) Are other proc_stat(5) fields different?

I see how this can be the global analog on leaf cgroups but
interpretting middle cgroups with children of different cpu.weights?)

Tejun Heo Sept. 26, 2024, 6:47 p.m. UTC | #2

Hello, Michal.

On Thu, Sep 26, 2024 at 08:10:35PM +0200, Michal Koutný wrote:
...
> On Tue, Sep 10, 2024 at 11:01:07AM GMT, Tejun Heo <tj@kernel.org> wrote:
> > I think it's as useful as system-wide nice metric is.
> 
> Exactly -- and I don't understand how that system-wide value (without
> any cgroups) is useful.
> If I don't know how many there are niced and non-niced tasks and what
> their runnable patterns are, the aggregated nice time can have ambiguous
> interpretations.
> 
> > I think there are benefits to mirroring system wide metrics, at least
> > ones as widely spread as nice.
> 
> I agree with benefits of mirroring of some system wide metrics when they
> are useful <del>but not all of them because it's difficult/impossible to take
> them away once they're exposed</del>. Actually, readers _should_ handle
> missing keys gracefuly, so this may be just fine.
> 
> (Is this nice time widely spread? (I remember the field from `top`, still
> not sure how to use it.) Are other proc_stat(5) fields different?

A personal anecdote: I usually run compile jobs with nice and look at the
nice utilization to see what the system is doing. I think it'd be simliar
for most folks. Because the number has always been there and ubiqutous
across many monitoring tools, people end up using it for something. It's not
a great metric but a long-standing and widely available one, so it ends up
with usages.

BTW, there are numbers which are actively silly - e.g. iowait, especially
due to how it gets aggregated across multiple CPUs. That, we want to
actively drop especially as the pressure metrics is the better substitute. I
don't think nice is in that category. It's not the best metric there is but
not useless or misleading.

> I see how this can be the global analog on leaf cgroups but
> interpretting middle cgroups with children of different cpu.weights?)

I think aggregating per-thread numbers is the right thing to do. It's just
sum of CPU cycles spent by threads which got niced.

Thanks.