Message ID | 20240830141939.723729-1-joshua.hahnjy@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Exposing nice CPU usage to userspace | expand |
Hello Joshua. On Fri, Aug 30, 2024 at 07:19:37AM GMT, Joshua Hahn <joshua.hahnjy@gmail.com> wrote: > Exposing this metric will allow users to accurately probe the niced CPU > metric for each workload, and make more informed decisions when > directing higher priority tasks. I'm afraid I can't still appreciate exposing this value: - It makes (some) sense only on leave cgroups (where variously nice'd tasks are competing against each other). Not so much on inner node cgroups (where it's a mere sum but sibling cgroups could have different weights, so the absolute times would contribute differently). - When all tasks have nice > 0 (or nice <= 0), it loses any information it could have had. (Thus I don't know whether to commit to exposing that value via cgroups.) I wonder, wouldn't your use case be equally served by some post-processing [1] of /sys/kernel/debug/sched/debug info which is already available? Regards, Michal [1] Your metric is supposed to represent Σ_i^tasks ∫_t is_nice(i, t) dt If I try to address the second remark by looking at Σ_i^tasks ∫_t nice(i, t) dt and that resembles (nice=0 ~ weight=1024) Σ_i^tasks ∫_t weight(i, t) dt swap sum and int ∫_t Σ_i^tasks weight(i, t) dt where Σ_i^tasks weight(i, t) can be taken from /sys/kernel/debug/sched/debug:cfs_rq[0].load_avg above is only for CPU nr=0. So processing would mean sampling that file over all CPUs and time.
Hello, Michal. On Mon, Sep 02, 2024 at 05:45:39PM +0200, Michal Koutný wrote: > - It makes (some) sense only on leave cgroups (where variously nice'd > tasks are competing against each other). Not so much on inner node > cgroups (where it's a mere sum but sibling cgroups could have different > weights, so the absolute times would contribute differently). > > - When all tasks have nice > 0 (or nice <= 0), it loses any information > it could have had. I think it's as useful as system-wide nice metric is. It's not a versatile metric but is widely available and understood and people use it. Maybe a workload is split across a sub-hierarchy and they wanna collect how much lowpri threads are consuming. cpu.stats is available without cpu control being enabled and people use it as a way to just aggregate metrics across a portion of the system. > (Thus I don't know whether to commit to exposing that value via cgroups.) > > I wonder, wouldn't your use case be equally served by some > post-processing [1] of /sys/kernel/debug/sched/debug info which is > already available? ... > above is only for CPU nr=0. So processing would mean sampling that file > over all CPUs and time. I think there are benefits to mirroring system wide metrics, at least ones as widely spread as nice. Thanks.
From: Joshua Hahn <joshua.hahn6@gmail.com> v1 -> v2: Edited commit messages for clarity. Niced CPU usage is a metric reported in host-level /prot/stat, but is not reported in cgroup-level statistics in cpu.stat. However, when a host contains multiple tasks across different workloads, it becomes difficult to gauge how much of the task is being spent on niced processes based on /proc/stat alone, since host-level metrics do not provide this cgroup-level granularity. Exposing this metric will allow users to accurately probe the niced CPU metric for each workload, and make more informed decisions when directing higher priority tasks. Joshua Hahn (2): Tracking cgroup-level niced CPU time Selftests for niced CPU statistics include/linux/cgroup-defs.h | 1 + kernel/cgroup/rstat.c | 16 ++++- tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++ 3 files changed, 86 insertions(+), 3 deletions(-)