diff mbox series

mm/memcontrol: add memory.peak in cgroup root

Message ID 20230221143421.10385-1-matthew.chae@axis.com (mailing list archive)
State New
Headers show
Series mm/memcontrol: add memory.peak in cgroup root | expand

Commit Message

Matthew Chae Feb. 21, 2023, 2:34 p.m. UTC
The kernel currently doesn't provide any method to show the overall
system's peak memory usage recorded. Instead, only each slice's peak
memory usage recorded except for cgroup root is shown through each
memory.peak.

Each slice might consume their peak memory at different time. This is
stored at memory.peak in each own slice. The sum of every memory.peak
doesn't mean the total system's peak memory usage recorded. The sum at
certain point without having a peak memory usage in their slice can have
the largest value.

       time |  slice1  |  slice2  |   sum
      =======================================
        t1  |    50    |   200    |   250
      ---------------------------------------
        t2  |   150    |   150    |   300
      ---------------------------------------
        t3  |   180    |    20    |   200
      ---------------------------------------
        t4  |    80    |    20    |   100

memory.peak value of slice1 is 180 and memory.peak value of slice2 is 200.
Only these information are provided through memory.peak value from each
slice without providing the overall system's peak memory usage. The total
sum of these two value is 380, but this doesn't represent the real peak
memory usage of the overall system. The peak value what we want to get is
shown in t2 as 300, which doesn't have any biggest number even in one
slice. Therefore the proper way to show the system's overall peak memory
usage recorded needs to be provided.

Hence, expose memory.peak in the cgrop root in order to allow this.

Co-developed-by: Christopher Wong <christopher.wong@axis.com>
Signed-off-by: Christopher Wong <christopher.wong@axis.com>
Signed-off-by: Matthew Chae <matthew.chae@axis.com>
---
 mm/memcontrol.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Michal Hocko Feb. 21, 2023, 3:07 p.m. UTC | #1
On Tue 21-02-23 15:34:20, Matthew Chae wrote:
> The kernel currently doesn't provide any method to show the overall
> system's peak memory usage recorded. Instead, only each slice's peak
> memory usage recorded except for cgroup root is shown through each
> memory.peak.
> 
> Each slice might consume their peak memory at different time. This is
> stored at memory.peak in each own slice. The sum of every memory.peak
> doesn't mean the total system's peak memory usage recorded. The sum at
> certain point without having a peak memory usage in their slice can have
> the largest value.
> 
>        time |  slice1  |  slice2  |   sum
>       =======================================
>         t1  |    50    |   200    |   250
>       ---------------------------------------
>         t2  |   150    |   150    |   300
>       ---------------------------------------
>         t3  |   180    |    20    |   200
>       ---------------------------------------
>         t4  |    80    |    20    |   100
> 
> memory.peak value of slice1 is 180 and memory.peak value of slice2 is 200.
> Only these information are provided through memory.peak value from each
> slice without providing the overall system's peak memory usage. The total
> sum of these two value is 380, but this doesn't represent the real peak
> memory usage of the overall system. The peak value what we want to get is
> shown in t2 as 300, which doesn't have any biggest number even in one
> slice. Therefore the proper way to show the system's overall peak memory
> usage recorded needs to be provided.

The problem I can see is that the root's peak value doesn't really
represent the system peak memory usage because it only reflects memcg
accounted memory. So there is plenty of memory consumption which is not
covered. On top of that a lot of memory contributed to the root memcg is
not accounted at all (see try_charge and its callers) so the cumulative
hierarchical value is incomplete and I believe misleading as well.
Christopher Wong Feb. 21, 2023, 4:13 p.m. UTC | #2
Hi Michal,

Thanks for the quick response! I think we are just trying to get the same value that was available for us in cgroup v1 memory.max_usage_in_bytes. I guess this value also is incomplete for representing the system memory usage. Is it due the incompleteness that the memory.peak has been left out in the root of cgroup v2?

Best regards,
Christopher Wong
Axis Communications AB
Michal Hocko Feb. 21, 2023, 4:16 p.m. UTC | #3
On Tue 21-02-23 16:13:14, Christopher Wong wrote:
> Hi Michal,
> 
> Thanks for the quick response! I think we are just trying to
> get the same value that was available for us in cgroup v1
> memory.max_usage_in_bytes. I guess this value also is incomplete for
> representing the system memory usage.

Correct.

> Is it due the incompleteness that the memory.peak has been left out in
> the root of cgroup v2?

I think so but I do not remember 100%. You might want to look into email
archives.
Michal Koutný Feb. 23, 2023, 2:20 p.m. UTC | #4
Hello Matthew.

On Tue, Feb 21, 2023 at 03:34:20PM +0100, Matthew Chae <matthew.chae@axis.com> wrote:
> The kernel currently doesn't provide any method to show the overall
> system's peak memory usage recorded. Instead, only each slice's peak
> memory usage recorded except for cgroup root is shown through each
> memory.peak.

The memory.peak value is useful as a calibration insight when you want to
configure memcg limit.
But there is no global (memcg) limit on memory. So what would be this
(not clearly) defined value good for? Or better then userspace sampling
of chosen available metric?

Thanks,
Michal
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 73afff8062f9..974fc044a7e7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6646,7 +6646,6 @@  static struct cftype memory_files[] = {
 	},
 	{
 		.name = "peak",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = memory_peak_read,
 	},
 	{