diff mbox series

[RFC,2/5] numa: append per-node execution info in memory.numa_stat

Message ID 7be82809-79d3-f6a1-dfe8-dd14d2b35219@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series NUMA Balancer Suite | expand

Commit Message

王贇 April 22, 2019, 2:12 a.m. UTC
This patch introduced numa execution information, to imply the numa
efficiency.

By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we
see new output line heading with 'exectime', like:

  exectime 24399843 27865444

which means the tasks of this cgroup executed 24399843 ticks on node 0,
and 27865444 ticks on node 1.

Combined with the memory node info, we can estimate the numa efficiency,
for example the memory.numa_stat show:

  total=4613257 N0=6849 N1=3928327
  ...
  exectime 24399843 27865444

there could be unmovable or cache pages on N1, then good locality could
mean nothing since we are not tracing these type of pages, thus bind the
workloads on the cpus of N1 worth a try, in order to achieve the maximum
performance bonus.

Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
 include/linux/memcontrol.h |  1 +
 mm/memcontrol.c            | 13 +++++++++++++
 2 files changed, 14 insertions(+)

Comments

Peter Zijlstra April 23, 2019, 8:52 a.m. UTC | #1
On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote:
> This patch introduced numa execution information, to imply the numa
> efficiency.
> 
> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we
> see new output line heading with 'exectime', like:
> 
>   exectime 24399843 27865444
> 
> which means the tasks of this cgroup executed 24399843 ticks on node 0,
> and 27865444 ticks on node 1.

I think we stopped reporting time in HZ to userspace a long long time
ago. Please don't do that.
王贇 April 23, 2019, 9:36 a.m. UTC | #2
On 2019/4/23 下午4:52, Peter Zijlstra wrote:
> On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote:
>> This patch introduced numa execution information, to imply the numa
>> efficiency.
>>
>> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we
>> see new output line heading with 'exectime', like:
>>
>>   exectime 24399843 27865444
>>
>> which means the tasks of this cgroup executed 24399843 ticks on node 0,
>> and 27865444 ticks on node 1.
> 
> I think we stopped reporting time in HZ to userspace a long long time
> ago. Please don't do that.

Ah I see, let's make it us maybe?

Regards,
Michael Wang

>
Peter Zijlstra April 23, 2019, 9:46 a.m. UTC | #3
On Tue, Apr 23, 2019 at 05:36:25PM +0800, 王贇 wrote:
> 
> 
> On 2019/4/23 下午4:52, Peter Zijlstra wrote:
> > On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote:
> >> This patch introduced numa execution information, to imply the numa
> >> efficiency.
> >>
> >> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we
> >> see new output line heading with 'exectime', like:
> >>
> >>   exectime 24399843 27865444
> >>
> >> which means the tasks of this cgroup executed 24399843 ticks on node 0,
> >> and 27865444 ticks on node 1.
> > 
> > I think we stopped reporting time in HZ to userspace a long long time
> > ago. Please don't do that.
> 
> Ah I see, let's make it us maybe?

ms might be best I think.
王贇 April 23, 2019, 10:01 a.m. UTC | #4
On 2019/4/23 下午5:46, Peter Zijlstra wrote:
> On Tue, Apr 23, 2019 at 05:36:25PM +0800, 王贇 wrote:
>>
>>
>> On 2019/4/23 下午4:52, Peter Zijlstra wrote:
>>> On Mon, Apr 22, 2019 at 10:12:20AM +0800, 王贇 wrote:
>>>> This patch introduced numa execution information, to imply the numa
>>>> efficiency.
>>>>
>>>> By doing 'cat /sys/fs/cgroup/memory/CGROUP_PATH/memory.numa_stat', we
>>>> see new output line heading with 'exectime', like:
>>>>
>>>>   exectime 24399843 27865444
>>>>
>>>> which means the tasks of this cgroup executed 24399843 ticks on node 0,
>>>> and 27865444 ticks on node 1.
>>>
>>> I think we stopped reporting time in HZ to userspace a long long time
>>> ago. Please don't do that.
>>
>> Ah I see, let's make it us maybe?
> 
> ms might be best I think.

Will be in next version.

Regards,
Michael Wang

>
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bb62e6294484..e784d6252d5e 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -197,6 +197,7 @@  enum memcg_numa_locality_interval {

 struct memcg_stat_numa {
 	u64 locality[NR_NL_INTERVAL];
+	u64 exectime;
 };

 #endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b810d4e9c906..91bcd71fc38a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3409,6 +3409,18 @@  static int memcg_numa_stat_show(struct seq_file *m, void *v)
 		seq_printf(m, " %llu", sum);
 	}
 	seq_putc(m, '\n');
+
+	seq_puts(m, "exectime");
+	for_each_online_node(nr) {
+		int cpu;
+		u64 sum = 0;
+
+		for_each_cpu(cpu, cpumask_of_node(nr))
+			sum += per_cpu(memcg->stat_numa->exectime, cpu);
+
+		seq_printf(m, " %llu", sum);
+	}
+	seq_putc(m, '\n');
 #endif

 	return 0;
@@ -3437,6 +3449,7 @@  void memcg_stat_numa_update(struct task_struct *p)
 	memcg = mem_cgroup_from_task(p);
 	if (idx != -1)
 		this_cpu_inc(memcg->stat_numa->locality[idx]);
+	this_cpu_inc(memcg->stat_numa->exectime);
 	rcu_read_unlock();
 }
 #endif