Message ID | 7be82809-79d3-f6a1-dfe8-dd14d2b35219@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NUMA Balancer Suite | expand |
On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote: > This patch introduced numa execution information, to imply the numa > efficiency. > > By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we > see new output line heading with 'exectime', like: > > exectime 24399843 27865444 > > which means the tasks of this cgroup executed 24399843 ticks on node 0, > and 27865444 ticks on node 1. I think we stopped reporting time in HZ to userspace a long long time ago. Please don't do that.
On 2019/4/23 下午4:52, Peter Zijlstra wrote: > On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote: >> This patch introduced numa execution information, to imply the numa >> efficiency. >> >> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we >> see new output line heading with 'exectime', like: >> >> exectime 24399843 27865444 >> >> which means the tasks of this cgroup executed 24399843 ticks on node 0, >> and 27865444 ticks on node 1. > > I think we stopped reporting time in HZ to userspace a long long time > ago. Please don't do that. Ah I see, let's make it us maybe? Regards, Michael Wang >
On Tue, Apr 23, 2019 at 05:36:25PM +0800, 王贇 wrote: > > > On 2019/4/23 下午4:52, Peter Zijlstra wrote: > > On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote: > >> This patch introduced numa execution information, to imply the numa > >> efficiency. > >> > >> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we > >> see new output line heading with 'exectime', like: > >> > >> exectime 24399843 27865444 > >> > >> which means the tasks of this cgroup executed 24399843 ticks on node 0, > >> and 27865444 ticks on node 1. > > > > I think we stopped reporting time in HZ to userspace a long long time > > ago. Please don't do that. > > Ah I see, let's make it us maybe? ms might be best I think.
On 2019/4/23 下午5:46, Peter Zijlstra wrote: > On Tue, Apr 23, 2019 at 05:36:25PM +0800, 王贇 wrote: >> >> >> On 2019/4/23 下午4:52, Peter Zijlstra wrote: >>> On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote: >>>> This patch introduced numa execution information, to imply the numa >>>> efficiency. >>>> >>>> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we >>>> see new output line heading with 'exectime', like: >>>> >>>> exectime 24399843 27865444 >>>> >>>> which means the tasks of this cgroup executed 24399843 ticks on node 0, >>>> and 27865444 ticks on node 1. >>> >>> I think we stopped reporting time in HZ to userspace a long long time >>> ago. Please don't do that. >> >> Ah I see, let's make it us maybe? > > ms might be best I think. Will be in next version. Regards, Michael Wang >
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index bb62e6294484..e784d6252d5e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -197,6 +197,7 @@ enum memcg_numa_locality_interval { struct memcg_stat_numa { u64 locality[NR_NL_INTERVAL]; + u64 exectime; }; #endif diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b810d4e9c906..91bcd71fc38a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3409,6 +3409,18 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v) seq_printf(m, " %llu", sum); } seq_putc(m, '\n'); + + seq_puts(m, "exectime"); + for_each_online_node(nr) { + int cpu; + u64 sum = 0; + + for_each_cpu(cpu, cpumask_of_node(nr)) + sum += per_cpu(memcg->stat_numa->exectime, cpu); + + seq_printf(m, " %llu", sum); + } + seq_putc(m, '\n'); #endif return 0; @@ -3437,6 +3449,7 @@ void memcg_stat_numa_update(struct task_struct *p) memcg = mem_cgroup_from_task(p); if (idx != -1) this_cpu_inc(memcg->stat_numa->locality[idx]); + this_cpu_inc(memcg->stat_numa->exectime); rcu_read_unlock(); } #endif
This patch introduced numa execution information, to imply the numa efficiency. By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we see new output line heading with 'exectime', like: exectime 24399843 27865444 which means the tasks of this cgroup executed 24399843 ticks on node 0, and 27865444 ticks on node 1. Combined with the memory node info, we can estimate the numa efficiency, for example the memory.numa_stat show: total=4613257 N0=6849 N1=3928327 ... exectime 24399843 27865444 there could be unmovable or cache pages on N1, then good locality could mean nothing since we are not tracing these type of pages, thus bind the workloads on the cpus of N1 worth a try, in order to achieve the maximum performance bonus. Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 13 +++++++++++++ 2 files changed, 14 insertions(+)