Message ID | 20250414021249.3232315-2-longman@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memcg: Fix test_memcg_min/low test failures | expand |
On Sun, Apr 13, 2025 at 10:12:48PM -0400, Waiman Long <longman@redhat.com> wrote: > 2) memory.low is set to a non-zero value but the cgroup has no task in > it so that it has an effective low value of 0. Again it may have a > non-zero low event count if memory reclaim happens. This is probably > not a result expected by the users and it is really doubtful that > users will check an empty cgroup with no task in it and expecting > some non-zero event counts. I think you want to distinguish "no tasks" vs "no usage" in this paragraph. > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > mem_cgroup_calculate_protection(target_memcg, memcg); > > + /* Skip memcg with no usage */ > + if (!mem_cgroup_usage(memcg, false)) > + continue; > + > if (mem_cgroup_below_min(target_memcg, memcg)) { As I think more about this -- the idea expressed by the diff makes sense. But is it really a change? For non-root memcgs, they'll be skipped because 0 >= 0 (in mem_cgroup_below_min()) and root memcg would hardly be skipped. > --- a/tools/testing/selftests/cgroup/test_memcontrol.c > +++ b/tools/testing/selftests/cgroup/test_memcontrol.c > @@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal); > * > * Then it checks actual memory usages and expects that: > * A/B memory.current ~= 50M > - * A/B/C memory.current ~= 29M > - * A/B/D memory.current ~= 21M > - * A/B/E memory.current ~= 0 > - * A/B/F memory.current = 0 > + * A/B/C memory.current ~= 29M [memory.events:low > 0] > + * A/B/D memory.current ~= 21M [memory.events:low > 0] > + * A/B/E memory.current ~= 0 [memory.events:low == 0 if !memory_recursiveprot, > 0 otherwise] Please note the subtlety in my suggestion -- I want the test with memory_recursiveprot _not_ to check events count at all. Because: a) it forces single interpretation of low events wrt effective low limit b) effective low limit should still be 0 in E in this testcase (there should be no unclaimed protection of C and D). > + * A/B/F memory.current = 0 [memory.events:low == 0] Thanks, Michal
On 4/14/25 8:42 AM, Michal Koutný wrote: > On Sun, Apr 13, 2025 at 10:12:48PM -0400, Waiman Long <longman@redhat.com> wrote: >> 2) memory.low is set to a non-zero value but the cgroup has no task in >> it so that it has an effective low value of 0. Again it may have a >> non-zero low event count if memory reclaim happens. This is probably >> not a result expected by the users and it is really doubtful that >> users will check an empty cgroup with no task in it and expecting >> some non-zero event counts. > I think you want to distinguish "no tasks" vs "no usage" in this > paragraph. Good point. Will update it if I need to send a new version. >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) >> >> mem_cgroup_calculate_protection(target_memcg, memcg); >> >> + /* Skip memcg with no usage */ >> + if (!mem_cgroup_usage(memcg, false)) >> + continue; >> + >> if (mem_cgroup_below_min(target_memcg, memcg)) { > As I think more about this -- the idea expressed by the diff makes > sense. But is it really a change? > For non-root memcgs, they'll be skipped because 0 >= 0 (in > mem_cgroup_below_min()) and root memcg would hardly be skipped. I did see some low event in the no usage case because of the ">=" comparison used in mem_cgroup_below_min(). I originally planning to guard against the elow == 0 case but Johannes advised against it. > > >> --- a/tools/testing/selftests/cgroup/test_memcontrol.c >> +++ b/tools/testing/selftests/cgroup/test_memcontrol.c >> @@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal); >> * >> * Then it checks actual memory usages and expects that: >> * A/B memory.current ~= 50M >> - * A/B/C memory.current ~= 29M >> - * A/B/D memory.current ~= 21M >> - * A/B/E memory.current ~= 0 >> - * A/B/F memory.current = 0 >> + * A/B/C memory.current ~= 29M [memory.events:low > 0] >> + * A/B/D memory.current ~= 21M [memory.events:low > 0] >> + * A/B/E memory.current ~= 0 [memory.events:low == 0 if !memory_recursiveprot, > 0 otherwise] > Please note the subtlety in my suggestion -- I want the test with > memory_recursiveprot _not_ to check events count at all. Because: > a) it forces single interpretation of low events wrt effective > low limit > b) effective low limit should still be 0 in E in this testcase > (there should be no unclaimed protection of C and D). Yes, low event count for E is 0 in the !memory_recursiveprot case, but C/D still have low events and setting no_low_events_index to -1 will fail the test and it is not the same as not checking low event counts at all. Cheers, Longman
On Mon, Apr 14, 2025 at 09:15:57AM -0400, Waiman Long <llong@redhat.com> wrote: > I did see some low event in the no usage case because of the ">=" comparison > used in mem_cgroup_below_min(). Do you refer to A/B/E or A/B/F from the test? It's OK to see some events if there was non-zero usage initially. Nevertheless, which situation this patch changes that is not handled by mem_cgroup_below_min() already? > Yes, low event count for E is 0 in the !memory_recursiveprot case, but C/D > still have low events and setting no_low_events_index to -1 will fail the > test and it is not the same as not checking low event counts at all. I added yet another ignore_low_events_index variable (in my original proposal) not to fail the test. But feel free to come up with another implementation, I wanted to point out the "not specified" expectation for E with memory_recursiveprot. Michal
On Mon, Apr 14, 2025 at 03:55:39PM +0200, Michal Koutný wrote: > On Mon, Apr 14, 2025 at 09:15:57AM -0400, Waiman Long <llong@redhat.com> wrote: > > I did see some low event in the no usage case because of the ">=" comparison > > used in mem_cgroup_below_min(). > > Do you refer to A/B/E or A/B/F from the test? > It's OK to see some events if there was non-zero usage initially. > > Nevertheless, which situation this patch changes that is not handled by > mem_cgroup_below_min() already? It's not a functional change to the protection semantics or the reclaim behavior. The problem is if we go into low_reclaim and encounter an empty group, we'll issue "low-protected group is being reclaimed" events, which is kind of absurd (nothing will be reclaimed) and thus confusing to users (I didn't even configure any protection!) I suggested, instead of redefining the protection definitions for that special case, to bypass all the checks and the scan count calculations when we already know the group is empty and none of this applies. https://lore.kernel.org/linux-mm/20250404181308.GA300138@cmpxchg.org/
On Mon, Apr 14, 2025 at 12:47:21PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote: > It's not a functional change to the protection semantics or the > reclaim behavior. Yes, that's how I understand it, therefore I'm wondering what does it change. If this is taken: if (!mem_cgroup_usage(memcg, false)) continue; this would've been taken too: if (mem_cgroup_below_min(target_memcg, memcg)) continue; (unless target_memcg == memcg but that's not interesting for the events here) > The problem is if we go into low_reclaim and encounter an empty group, > we'll issue "low-protected group is being reclaimed" events, How can this happen when page_counter_read(&memcg->memory) <= memcg->memory.emin ? (I.e. in this case 0 <= emin and emin >= 0.) > which is kind of absurd (nothing will be reclaimed) and thus confusing > to users (I didn't even configure any protection!) Yes. > I suggested, instead of redefining the protection definitions for that > special case, to bypass all the checks and the scan count calculations > when we already know the group is empty and none of this applies. > > https://lore.kernel.org/linux-mm/20250404181308.GA300138@cmpxchg.org/ Is this non-functional change to make shrink_node_memcgs() robust against possible future redefinitions of mem_cgroup_below_*()? Michal
On Mon, Apr 14, 2025 at 08:01:42PM +0200, Michal Koutný wrote: > On Mon, Apr 14, 2025 at 12:47:21PM -0400, Johannes Weiner <hannes@cmpxchg.org> wrote: > > It's not a functional change to the protection semantics or the > > reclaim behavior. > > Yes, that's how I understand it, therefore I'm wondering what does it > change. > > If this is taken: > if (!mem_cgroup_usage(memcg, false)) > continue; > > this would've been taken too: > if (mem_cgroup_below_min(target_memcg, memcg)) > continue; > (unless target_memcg == memcg but that's not interesting for the events > here) D'oh. > > The problem is if we go into low_reclaim and encounter an empty group, > > we'll issue "low-protected group is being reclaimed" events, > > How can this happen when > page_counter_read(&memcg->memory) <= memcg->memory.emin > ? (I.e. in this case 0 <= emin and emin >= 0.) > > > which is kind of absurd (nothing will be reclaimed) and thus confusing > > to users (I didn't even configure any protection!) > > Yes. > > > I suggested, instead of redefining the protection definitions for that > > special case, to bypass all the checks and the scan count calculations > > when we already know the group is empty and none of this applies. > > > > https://lore.kernel.org/linux-mm/20250404181308.GA300138@cmpxchg.org/ > > Is this non-functional change to make shrink_node_memcgs() robust > against possible future redefinitions of mem_cgroup_below_*()? No, this was really just aimed to stop low events on empty groups. But as you rightfully point out, they should not get past the min check in the first place. So something seems missing here.
diff --git a/mm/internal.h b/mm/internal.h index 50c2f590b2d0..c06fb0e8d75c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1535,6 +1535,15 @@ void __meminit __init_page_from_nid(unsigned long pfn, int nid); unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, int priority); +#ifdef CONFIG_MEMCG +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); +#else +static inline unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) +{ + return 1UL; +} +#endif + #ifdef CONFIG_SHRINKER_DEBUG static inline __printf(2, 0) int shrinker_debugfs_name_alloc( struct shrinker *shrinker, const char *fmt, va_list ap) diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h index 6358464bb416..e92b21af92b1 100644 --- a/mm/memcontrol-v1.h +++ b/mm/memcontrol-v1.h @@ -22,8 +22,6 @@ iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) -unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); - void drain_all_stock(struct mem_cgroup *root_memcg); unsigned long memcg_events(struct mem_cgroup *memcg, int event); diff --git a/mm/vmscan.c b/mm/vmscan.c index b620d74b0f66..a771a0145a12 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) mem_cgroup_calculate_protection(target_memcg, memcg); + /* Skip memcg with no usage */ + if (!mem_cgroup_usage(memcg, false)) + continue; + if (mem_cgroup_below_min(target_memcg, memcg)) { /* * Hard protection. diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 16f5d74ae762..5a5dcbe57b56 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -380,10 +380,10 @@ static bool reclaim_until(const char *memcg, long goal); * * Then it checks actual memory usages and expects that: * A/B memory.current ~= 50M - * A/B/C memory.current ~= 29M - * A/B/D memory.current ~= 21M - * A/B/E memory.current ~= 0 - * A/B/F memory.current = 0 + * A/B/C memory.current ~= 29M [memory.events:low > 0] + * A/B/D memory.current ~= 21M [memory.events:low > 0] + * A/B/E memory.current ~= 0 [memory.events:low == 0 if !memory_recursiveprot, > 0 otherwise] + * A/B/F memory.current = 0 [memory.events:low == 0] * (for origin of the numbers, see model in memcg_protection.m.) * * After that it tries to allocate more than there is @@ -525,8 +525,14 @@ static int test_memcg_protection(const char *root, bool min) goto cleanup; } + /* + * Child 2 has memory.low=0, but some low protection is still being + * distributed down from its parent with memory.low=50M if cgroup2 + * memory_recursiveprot mount option is enabled. So the low event + * count will be non-zero in this case. + */ for (i = 0; i < ARRAY_SIZE(children); i++) { - int no_low_events_index = 1; + int no_low_events_index = has_recursiveprot ? 2 : 1; long low, oom; oom = cg_read_key_long(children[i], "memory.events", "oom ");
The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to elow. The two failed use cases are as follows: 1) memory.low is set to 0, but low events can still be triggered and so the cgroup may have a non-zero low event count. 2) memory.low is set to a non-zero value but the cgroup has no task in it so that it has an effective low value of 0. Again it may have a non-zero low event count if memory reclaim happens. This is probably not a result expected by the users and it is really doubtful that users will check an empty cgroup with no task in it and expecting some non-zero event counts. In the first case, even though memory.low isn't set, it may still have some low protection if memory.low is set in the parent and the cgroup2 memory_recursiveprot mount option is enabled. So low event may still be recorded. The test_memcontrol.c test has to be modified to account for that. For the second case, it really doesn't make sense to have non-zero low event if the cgroup has 0 usage. So we need to skip this corner case in shrink_node_memcgs() by skipping the !usage case. With this patch applied, the test_memcg_low sub-test finishes successfully without failure in most cases. Though both test_memcg_low and test_memcg_min sub-tests may still fail occasionally if the memory.current values fall outside of the expected ranges. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Waiman Long <longman@redhat.com> --- mm/internal.h | 9 +++++++++ mm/memcontrol-v1.h | 2 -- mm/vmscan.c | 4 ++++ tools/testing/selftests/cgroup/test_memcontrol.c | 16 +++++++++++----- 4 files changed, 24 insertions(+), 7 deletions(-)