diff mbox series

[RFC,v5,1/3] memcg: Add memory.max.effective attribute

Message ID 20240606152232.20253-2-mkoutny@suse.com (mailing list archive)
State New
Headers show
Series Add memory.max.effective for application's allocators | expand

Commit Message

Michal Koutný June 6, 2024, 3:22 p.m. UTC
Some applications use memory cgroup limits to scale their own memory
needs. Reading of the immediate membership cgroup's memory.max is not
sufficient because of possible ancestral limits. The application could
traverse upwards to figure out the tightest limit but this would not
work in cgroup namespace where the view of cgroup hierarchy is
incomplete and the limit may apply from outer world.

(cgroup v1 used memory.stat:hierarchical_memory_limit to report the
value but there's no such counterpart in cgroup v2 memory.stat.)

Introduce a new memcg attribute file that contains the effective value
of memory limit for given cgroup (following cpuset.cpus.effective
pattern).

Signed-off-by: Jan Kratochvil (Azul) <jkratochvil@azul.com>
[ mkoutny: rewrite commit message, split out memory.swap.max]
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  6 ++++++
 mm/memcontrol.c                         | 18 ++++++++++++++++++
 2 files changed, 24 insertions(+)
diff mbox series

Patch

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8fbb0519d556..988f26264054 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1293,6 +1293,12 @@  PAGE_SIZE multiple when read back.
 	Caller could retry them differently, return into userspace
 	as -ENOMEM or silently ignore in cases like disk readahead.
 
+  memory.max.effective
+	A read-only file that provides effective value of cgroup's hard usage
+	limit.  It incorporates limits of all ancestors, even those not visible
+	in cgroupns. The value change in this file generates a file modified
+	event.
+
   memory.reclaim
 	A write-only nested-keyed file which exists for all cgroups.
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7fad15b2290c..86bcec84fe7b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7065,6 +7065,19 @@  static ssize_t memory_max_write(struct kernfs_open_file *of,
 	return nbytes;
 }
 
+static int memory_max_effective_show(struct seq_file *m, void *v)
+{
+	unsigned long memory;
+	struct mem_cgroup *mi;
+
+	/* Hierarchical information */
+	memory = PAGE_COUNTER_MAX;
+	for (mi = mem_cgroup_from_seq(m); mi; mi = parent_mem_cgroup(mi))
+		memory = min(memory, READ_ONCE(mi->memory.max));
+
+	return seq_puts_memcg_tunable(m, memory);
+}
+
 /*
  * Note: don't forget to update the 'samples/cgroup/memcg_event_listener'
  * if any new events become available.
@@ -7259,6 +7272,11 @@  static struct cftype memory_files[] = {
 		.seq_show = memory_max_show,
 		.write = memory_max_write,
 	},
+	{
+		.name = "max.effective",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = memory_max_effective_show,
+	},
 	{
 		.name = "events",
 		.flags = CFTYPE_NOT_ON_ROOT,