diff mbox series

[net-next,v2] net-memcg: Fix scope of sockmem pressure indicators

Message ID 20230814070923.35769-1-wuyun.abel@bytedance.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v2] net-memcg: Fix scope of sockmem pressure indicators | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8009 this patch: 8009
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 2382 this patch: 2382
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8705 this patch: 8705
netdev/checkpatch warning WARNING: 'memroy' may be misspelled - perhaps 'memory'?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Abel Wu Aug. 14, 2023, 7:09 a.m. UTC
Now there are two indicators of socket memory pressure sit inside
struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
memory reclaim pressure in memcg->memory and ->tcpmem respectively.

When in legacy mode (cgroupv1), the socket memory is charged into
->tcpmem which is independent of ->memory, so socket_pressure has
nothing to do with socket's pressure at all. Things could be worse
by taking socket_pressure into consideration in legacy mode, as a
pressure in ->memory can lead to premature reclamation/throttling
in socket.

While for the default mode (cgroupv2), the socket memory is charged
into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.

So {socket,tcpmem}_pressure are only used in default/legacy mode
respectively for indicating socket memory pressure. This patch fixes
the pieces of code that make mixed use of both.

Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
v2:
 - Add a comment to clarify the scope of indicators (Roman)
 - Drop code cleanup on tcpmem_pressure (Roman)
 - Fix the lack of a Fixes: tag (Eric)
---
 include/linux/memcontrol.h | 9 +++++++--
 mm/vmpressure.c            | 8 ++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

Comments

Shakeel Butt Aug. 14, 2023, 8:18 p.m. UTC | #1
On Mon, Aug 14, 2023 at 12:09 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>
> Now there are two indicators of socket memory pressure sit inside
> struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
> memory reclaim pressure in memcg->memory and ->tcpmem respectively.
>
> When in legacy mode (cgroupv1), the socket memory is charged into
> ->tcpmem which is independent of ->memory, so socket_pressure has
> nothing to do with socket's pressure at all. Things could be worse
> by taking socket_pressure into consideration in legacy mode, as a
> pressure in ->memory can lead to premature reclamation/throttling
> in socket.
>
> While for the default mode (cgroupv2), the socket memory is charged
> into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.
>
> So {socket,tcpmem}_pressure are only used in default/legacy mode
> respectively for indicating socket memory pressure. This patch fixes
> the pieces of code that make mixed use of both.
>
> Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>

So, this is undoing the unintended exposure of v2 functionality for
the v1. I wonder if someone might have started depending upon that
behavior but I am more convinced that no one is using v1's tcpmem
accounting due to performance impact. So, this looks good to me.

Acked-by: Shakeel Butt <shakeelb@google.com>

I do think we should start the deprecation process of v1's tcpmem accounting.
Abel Wu Aug. 15, 2023, 2:53 a.m. UTC | #2
On 8/15/23 4:18 AM, Shakeel Butt wrote:
> On Mon, Aug 14, 2023 at 12:09 AM Abel Wu <wuyun.abel@bytedance.com> wrote:
>>
>> Now there are two indicators of socket memory pressure sit inside
>> struct mem_cgroup, socket_pressure and tcpmem_pressure, indicating
>> memory reclaim pressure in memcg->memory and ->tcpmem respectively.
>>
>> When in legacy mode (cgroupv1), the socket memory is charged into
>> ->tcpmem which is independent of ->memory, so socket_pressure has
>> nothing to do with socket's pressure at all. Things could be worse
>> by taking socket_pressure into consideration in legacy mode, as a
>> pressure in ->memory can lead to premature reclamation/throttling
>> in socket.
>>
>> While for the default mode (cgroupv2), the socket memory is charged
>> into ->memory, and ->tcpmem/->tcpmem_pressure are simply not used.
>>
>> So {socket,tcpmem}_pressure are only used in default/legacy mode
>> respectively for indicating socket memory pressure. This patch fixes
>> the pieces of code that make mixed use of both.
>>
>> Fixes: 8e8ae645249b ("mm: memcontrol: hook up vmpressure to socket pressure")
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> 
> So, this is undoing the unintended exposure of v2 functionality for

Exactly.

> the v1. I wonder if someone might have started depending upon that
> behavior but I am more convinced that no one is using v1's tcpmem
> accounting due to performance impact. So, this looks good to me.

Agreed. The performance impact is not negligible. While not accounting
tcpmem is also undesired for Resource Manager to do provision properly.
So we have to migrate to cgroupv2, and now we encountered a new issue.
Some discussion with Roman can be found here:

https://lore.kernel.org/netdev/29de901f-ae4c-a900-a553-17ec4f096f0e@bytedance.com/

It would be great if you can shed some light on this!

> 
> Acked-by: Shakeel Butt <shakeelb@google.com>

Thanks!
	Abel

> 
> I do think we should start the deprecation process of v1's tcpmem accounting.
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5818af8eca5a..dbf26bc89dd4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -284,6 +284,11 @@  struct mem_cgroup {
 	atomic_long_t		memory_events[MEMCG_NR_MEMORY_EVENTS];
 	atomic_long_t		memory_events_local[MEMCG_NR_MEMORY_EVENTS];
 
+	/*
+	 * Hint of reclaim pressure for socket memroy management. Note
+	 * that this indicator should NOT be used in legacy cgroup mode
+	 * where socket memory is accounted/charged separately.
+	 */
 	unsigned long		socket_pressure;
 
 	/* Legacy tcp memory accounting */
@@ -1727,8 +1732,8 @@  void mem_cgroup_sk_alloc(struct sock *sk);
 void mem_cgroup_sk_free(struct sock *sk);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
-		return true;
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+		return !!memcg->tcpmem_pressure;
 	do {
 		if (time_before(jiffies, READ_ONCE(memcg->socket_pressure)))
 			return true;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index b52644771cc4..22c6689d9302 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -244,6 +244,14 @@  void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
 	if (mem_cgroup_disabled())
 		return;
 
+	/*
+	 * The in-kernel users only care about the reclaim efficiency
+	 * for this @memcg rather than the whole subtree, and there
+	 * isn't and won't be any in-kernel user in a legacy cgroup.
+	 */
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !tree)
+		return;
+
 	vmpr = memcg_to_vmpressure(memcg);
 
 	/*