diff mbox series

[v2] Port hierarchical_{memory,swap}_limit cgroup1->cgroup2

Message ID ZcmaPqZ9HzoN0GFM@host1.jankratochvil.net (mailing list archive)
State New
Headers show
Series [v2] Port hierarchical_{memory,swap}_limit cgroup1->cgroup2 | expand

Commit Message

Jan Kratochvil Feb. 12, 2024, 4:10 a.m. UTC
Hello,

cgroup1 (by function memcg1_stat_format) already contains two lines
	hierarchical_memory_limit %llu
	hierarchical_memsw_limit %llu

which are useful for userland to easily and performance-wise find out the
effective cgroup limits being applied. Otherwise userland has to
open+read+close the file "memory.max" and/or "memory.swap.max" in multiple
parent directories of a nested cgroup.

For cgroup1 it was implemented by:
	memcg: show real limit under hierarchy mode
	https://github.com/torvalds/linux/commit/fee7b548e6f2bd4bfd03a1a45d3afd593de7d5e9
	Date:   Wed Jan 7 18:08:26 2009 -0800

But for cgroup2 it has been missing so far, this is just a copy-paste of the
cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
tracks. I have added it to the end of "memory.stat" to prevent possible
compatibility problems with existing code parsing that file.


Jan Kratochvil


Signed-off-by: Jan Kratochvil (Azul) <jkratochvil@azul.com>

 mm/memcontrol.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Michal Koutný Feb. 12, 2024, 3 p.m. UTC | #1
Hello.

Something like this would come quite handy.

On Mon, Feb 12, 2024 at 12:10:38PM +0800, "Jan Kratochvil (Azul)" <jkratochvil@azul.com> wrote:
> which are useful for userland to easily and performance-wise find out the
> effective cgroup limits being applied.

And the only way to figure out inside cgroupns.

> But for cgroup2 it has been missing so far, this is just a copy-paste of the
> cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
> tracks. I have added it to the end of "memory.stat" to prevent possible
> compatibility problems with existing code parsing that file.

I was thinking of memory.max.effective (and others).

- no need to (possibly flush) stats when reading memory.stat
- can be generalized also for pids controller (and other "limiting" controllers) 
- analogous to precedent of cpuset.cpus.effective

Whereas, using v1 approach in v2:
- memory.stat mixes true stats and limits,
- memmory.stat is hierarchical by default, no need for the prefix.

What do you think of the separate .effective file(s)?

Thanks
Michal
Waiman Long Feb. 12, 2024, 3:26 p.m. UTC | #2
On 2/12/24 10:00, Michal Koutný wrote:
> Hello.
>
> Something like this would come quite handy.
>
> On Mon, Feb 12, 2024 at 12:10:38PM +0800, "Jan Kratochvil (Azul)" <jkratochvil@azul.com> wrote:
>> which are useful for userland to easily and performance-wise find out the
>> effective cgroup limits being applied.
> And the only way to figure out inside cgroupns.
>
>> But for cgroup2 it has been missing so far, this is just a copy-paste of the
>> cgroup1 code while changing s/memsw/swap/ as that is what cgroup1 vs. cgroup2
>> tracks. I have added it to the end of "memory.stat" to prevent possible
>> compatibility problems with existing code parsing that file.
> I was thinking of memory.max.effective (and others).
>
> - no need to (possibly flush) stats when reading memory.stat
> - can be generalized also for pids controller (and other "limiting" controllers)
> - analogous to precedent of cpuset.cpus.effective
>
> Whereas, using v1 approach in v2:
> - memory.stat mixes true stats and limits,
> - memmory.stat is hierarchical by default, no need for the prefix.
>
> What do you think of the separate .effective file(s)?

This is certainly a good alternative.

Cheers,
Longman
diff mbox series

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 46d8d0211..2631dd810 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1636,6 +1636,8 @@  static inline unsigned long memcg_page_state_local_output(
 static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
 {
 	int i;
+	unsigned long memory, swap;
+	struct mem_cgroup *mi;
 
 	/*
 	 * Provide statistics on the state of the memory subsystem as
@@ -1682,6 +1684,17 @@  static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
 			       memcg_events(memcg, memcg_vm_event_stat[i]));
 	}
 
+	/* Hierarchical information */
+	memory = swap = PAGE_COUNTER_MAX;
+	for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) {
+		memory = min(memory, READ_ONCE(mi->memory.max));
+		swap = min(swap, READ_ONCE(mi->swap.max));
+	}
+	seq_buf_printf(s, "hierarchical_memory_limit %llu\n",
+		       (u64)memory * PAGE_SIZE);
+	seq_buf_printf(s, "hierarchical_swap_limit %llu\n",
+		       (u64)swap * PAGE_SIZE);
+
 	/* The above should easily fit into one page */
 	WARN_ON_ONCE(seq_buf_has_overflowed(s));
 }