diff mbox series

[v2,1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0

Message ID 20250404012435.656045-1-longman@redhat.com (mailing list archive)
State New
Headers show
Series [v2,1/2] memcg: Don't generate low/min events if either low/min or elow/emin is 0 | expand

Commit Message

Waiman Long April 4, 2025, 1:24 a.m. UTC
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.

The two failed use cases are as follows:

1) memory.low is set to 0, but low events can still be triggered and
   so the cgroup may have a non-zero low event count. I doubt users are
   looking for that as they didn't set memory.low at all.

2) memory.low is set to a non-zero value but the cgroup has no task in
   it so that it has an effective low value of 0. Again it may have a
   non-zero low event count if memory reclaim happens. This is probably
   not a result expected by the users and it is really doubtful that
   users will check an empty cgroup with no task in it and expecting
   some non-zero event counts.

The simple and naive fix of changing the operator to ">", however,
changes the memory reclaim behavior which can lead to other failures
as low events are needed to facilitate memory reclaim.  So we can't do
that without some relatively riskier changes in memory reclaim.

Another simpler alternative is to avoid reporting below_low failure
if either memory.low or its effective equivalent is 0 which is done
by this patch specifically for the two failed use cases above.

With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.

To be consistent, similar change is appled to mem_cgroup_below_min()
as to avoid the two failed use cases above with low replaced by min.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/memcontrol.h | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Tejun Heo April 4, 2025, 5:12 p.m. UTC | #1
Hello,

On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
...
> The simple and naive fix of changing the operator to ">", however,
> changes the memory reclaim behavior which can lead to other failures
> as low events are needed to facilitate memory reclaim.  So we can't do
> that without some relatively riskier changes in memory reclaim.

I'm doubtful using ">" would change reclaim behavior in a meaningful way and
that'd be more straightforward. What do mm people think?

Thanks.
Waiman Long April 4, 2025, 5:25 p.m. UTC | #2
On 4/4/25 1:12 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
> ...
>> The simple and naive fix of changing the operator to ">", however,
>> changes the memory reclaim behavior which can lead to other failures
>> as low events are needed to facilitate memory reclaim.  So we can't do
>> that without some relatively riskier changes in memory reclaim.
> I'm doubtful using ">" would change reclaim behavior in a meaningful way and
> that'd be more straightforward. What do mm people think?

I haven't looked deeply into why that is the case, but 
test_memcg_low/min tests had other failures when I made this change.

Cheers,
Longman

>
> Thanks.
>
Johannes Weiner April 4, 2025, 6:13 p.m. UTC | #3
On Fri, Apr 04, 2025 at 01:25:33PM -0400, Waiman Long wrote:
> 
> On 4/4/25 1:12 PM, Tejun Heo wrote:
> > Hello,
> >
> > On Thu, Apr 03, 2025 at 09:24:34PM -0400, Waiman Long wrote:
> > ...
> >> The simple and naive fix of changing the operator to ">", however,
> >> changes the memory reclaim behavior which can lead to other failures
> >> as low events are needed to facilitate memory reclaim.  So we can't do
> >> that without some relatively riskier changes in memory reclaim.
> > I'm doubtful using ">" would change reclaim behavior in a meaningful way and
> > that'd be more straightforward. What do mm people think?

The knob documentation uses "within low" and "above low" to
distinguish whether you are protected or not, so at least from a code
clarity pov, >= makes more sense to me: if your protection is N and
you use exactly N, you're considered protected.

That also means that by definition an empty cgroup is protected. It's
not in excess of its protection. The test result isn't wrong.

The real weirdness is issuing a "low reclaim" event when no reclaim is
going to happen*.

The patch effectively special cases "empty means in excess" to avoid
the event and fall through to reclaim, which then does nothing as a
result of its own scan target calculations. That seems convoluted.

Why not skip empty cgroups before running inapplicable checks?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b620d74b0f66..260ab238ec22 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5963,6 +5963,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 
 		mem_cgroup_calculate_protection(target_memcg, memcg);
 
+		if (!mem_cgroup_usage(memcg, false))
+			continue;
+
 		if (mem_cgroup_below_min(target_memcg, memcg)) {
 			/*
 			 * Hard protection.

> I haven't looked deeply into why that is the case, but 
> test_memcg_low/min tests had other failures when I made this change.

It surprises me as well that it makes any practical difference.

* Waiman points out that the weirdness is seeing low events without
  having a low configured. Eh, this isn't really true with recursive
  propagation; you may or may not have an elow depending on parental
  configuration and sibling behavior.
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 53364526d877..4d4a1f159eaa 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -601,21 +601,31 @@  static inline bool mem_cgroup_unprotected(struct mem_cgroup *target,
 static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
 					struct mem_cgroup *memcg)
 {
+	unsigned long elow;
+
 	if (mem_cgroup_unprotected(target, memcg))
 		return false;
 
-	return READ_ONCE(memcg->memory.elow) >=
-		page_counter_read(&memcg->memory);
+	elow = READ_ONCE(memcg->memory.elow);
+	if (!elow || !READ_ONCE(memcg->memory.low))
+		return false;
+
+	return page_counter_read(&memcg->memory) <= elow;
 }
 
 static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
 					struct mem_cgroup *memcg)
 {
+	unsigned long emin;
+
 	if (mem_cgroup_unprotected(target, memcg))
 		return false;
 
-	return READ_ONCE(memcg->memory.emin) >=
-		page_counter_read(&memcg->memory);
+	emin = READ_ONCE(memcg->memory.emin);
+	if (!emin || !READ_ONCE(memcg->memory.min))
+		return false;
+
+	return page_counter_read(&memcg->memory) <= emin;
 }
 
 int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);