diff mbox series

[v3,1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs()

Message ID 20250406024010.1177927-2-longman@redhat.com (mailing list archive)
State New
Headers show
Series memcg: Fix test_memcg_min/low test failures | expand

Commit Message

Waiman Long April 6, 2025, 2:40 a.m. UTC
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.

The two failed use cases are as follows:

1) memory.low is set to 0, but low events can still be triggered and
   so the cgroup may have a non-zero low event count. I doubt users are
   looking for that as they didn't set memory.low at all.

2) memory.low is set to a non-zero value but the cgroup has no task in
   it so that it has an effective low value of 0. Again it may have a
   non-zero low event count if memory reclaim happens. This is probably
   not a result expected by the users and it is really doubtful that
   users will check an empty cgroup with no task in it and expecting
   some non-zero event counts.

In the first case, even though memory.low isn't set, it may still have
some low protection if memory.low is set in the parent. So low event may
still be recorded. The test_memcontrol.c test has to be modified to
account for that.

For the second case, it really doesn't make sense to have non-zero low
event if the cgroup has 0 usage. So we need to skip this corner case
in shrink_node_memcgs().

With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/vmscan.c                                      | 4 ++++
 tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

Comments

kernel test robot April 6, 2025, 4:27 a.m. UTC | #1
Hi Waiman,

kernel test robot noticed the following build errors:

[auto build test ERROR on tj-cgroup/for-next]
[also build test ERROR on akpm-mm/mm-everything linus/master v6.14 next-20250404]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/mm-vmscan-Skip-memcg-with-usage-in-shrink_node_memcgs/20250406-104208
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
patch link:    https://lore.kernel.org/r/20250406024010.1177927-2-longman%40redhat.com
patch subject: [PATCH v3 1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs()
config: arc-randconfig-002-20250406 (https://download.01.org/0day-ci/archive/20250406/202504061257.GMkEJUOs-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 11.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250406/202504061257.GMkEJUOs-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504061257.GMkEJUOs-lkp@intel.com/

All errors (new ones prefixed by >>):

   mm/vmscan.c: In function 'shrink_node_memcgs':
>> mm/vmscan.c:5929:46: error: invalid use of undefined type 'struct mem_cgroup'
    5929 |                 if (!page_counter_read(&memcg->memory))
         |                                              ^~


vim +5929 mm/vmscan.c

  5890	
  5891	static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
  5892	{
  5893		struct mem_cgroup *target_memcg = sc->target_mem_cgroup;
  5894		struct mem_cgroup_reclaim_cookie reclaim = {
  5895			.pgdat = pgdat,
  5896		};
  5897		struct mem_cgroup_reclaim_cookie *partial = &reclaim;
  5898		struct mem_cgroup *memcg;
  5899	
  5900		/*
  5901		 * In most cases, direct reclaimers can do partial walks
  5902		 * through the cgroup tree, using an iterator state that
  5903		 * persists across invocations. This strikes a balance between
  5904		 * fairness and allocation latency.
  5905		 *
  5906		 * For kswapd, reliable forward progress is more important
  5907		 * than a quick return to idle. Always do full walks.
  5908		 */
  5909		if (current_is_kswapd() || sc->memcg_full_walk)
  5910			partial = NULL;
  5911	
  5912		memcg = mem_cgroup_iter(target_memcg, NULL, partial);
  5913		do {
  5914			struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
  5915			unsigned long reclaimed;
  5916			unsigned long scanned;
  5917	
  5918			/*
  5919			 * This loop can become CPU-bound when target memcgs
  5920			 * aren't eligible for reclaim - either because they
  5921			 * don't have any reclaimable pages, or because their
  5922			 * memory is explicitly protected. Avoid soft lockups.
  5923			 */
  5924			cond_resched();
  5925	
  5926			mem_cgroup_calculate_protection(target_memcg, memcg);
  5927	
  5928			/* Skip memcg with no usage */
> 5929			if (!page_counter_read(&memcg->memory))
  5930				continue;
  5931	
  5932			if (mem_cgroup_below_min(target_memcg, memcg)) {
  5933				/*
  5934				 * Hard protection.
  5935				 * If there is no reclaimable memory, OOM.
  5936				 */
  5937				continue;
  5938			} else if (mem_cgroup_below_low(target_memcg, memcg)) {
  5939				/*
  5940				 * Soft protection.
  5941				 * Respect the protection only as long as
  5942				 * there is an unprotected supply
  5943				 * of reclaimable memory from other cgroups.
  5944				 */
  5945				if (!sc->memcg_low_reclaim) {
  5946					sc->memcg_low_skipped = 1;
  5947					continue;
  5948				}
  5949				memcg_memory_event(memcg, MEMCG_LOW);
  5950			}
  5951	
  5952			reclaimed = sc->nr_reclaimed;
  5953			scanned = sc->nr_scanned;
  5954	
  5955			shrink_lruvec(lruvec, sc);
  5956	
  5957			shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
  5958				    sc->priority);
  5959	
  5960			/* Record the group's reclaim efficiency */
  5961			if (!sc->proactive)
  5962				vmpressure(sc->gfp_mask, memcg, false,
  5963					   sc->nr_scanned - scanned,
  5964					   sc->nr_reclaimed - reclaimed);
  5965	
  5966			/* If partial walks are allowed, bail once goal is reached */
  5967			if (partial && sc->nr_reclaimed >= sc->nr_to_reclaim) {
  5968				mem_cgroup_iter_break(target_memcg, memcg);
  5969				break;
  5970			}
  5971		} while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial)));
  5972	}
  5973
kernel test robot April 6, 2025, 5:08 a.m. UTC | #2
Hi Waiman,

kernel test robot noticed the following build errors:

[auto build test ERROR on tj-cgroup/for-next]
[also build test ERROR on akpm-mm/mm-everything linus/master v6.14 next-20250404]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/mm-vmscan-Skip-memcg-with-usage-in-shrink_node_memcgs/20250406-104208
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
patch link:    https://lore.kernel.org/r/20250406024010.1177927-2-longman%40redhat.com
patch subject: [PATCH v3 1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs()
config: arm-randconfig-001-20250406 (https://download.01.org/0day-ci/archive/20250406/202504061254.DqfqHfM7-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 92c93f5286b9ff33f27ff694d2dc33da1c07afdd)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250406/202504061254.DqfqHfM7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504061254.DqfqHfM7-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/vmscan.c:5929:32: error: incomplete definition of type 'struct mem_cgroup'
    5929 |                 if (!page_counter_read(&memcg->memory))
         |                                         ~~~~~^
   include/linux/mm_types.h:33:8: note: forward declaration of 'struct mem_cgroup'
      33 | struct mem_cgroup;
         |        ^
   1 error generated.


vim +5929 mm/vmscan.c

  5890	
  5891	static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
  5892	{
  5893		struct mem_cgroup *target_memcg = sc->target_mem_cgroup;
  5894		struct mem_cgroup_reclaim_cookie reclaim = {
  5895			.pgdat = pgdat,
  5896		};
  5897		struct mem_cgroup_reclaim_cookie *partial = &reclaim;
  5898		struct mem_cgroup *memcg;
  5899	
  5900		/*
  5901		 * In most cases, direct reclaimers can do partial walks
  5902		 * through the cgroup tree, using an iterator state that
  5903		 * persists across invocations. This strikes a balance between
  5904		 * fairness and allocation latency.
  5905		 *
  5906		 * For kswapd, reliable forward progress is more important
  5907		 * than a quick return to idle. Always do full walks.
  5908		 */
  5909		if (current_is_kswapd() || sc->memcg_full_walk)
  5910			partial = NULL;
  5911	
  5912		memcg = mem_cgroup_iter(target_memcg, NULL, partial);
  5913		do {
  5914			struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
  5915			unsigned long reclaimed;
  5916			unsigned long scanned;
  5917	
  5918			/*
  5919			 * This loop can become CPU-bound when target memcgs
  5920			 * aren't eligible for reclaim - either because they
  5921			 * don't have any reclaimable pages, or because their
  5922			 * memory is explicitly protected. Avoid soft lockups.
  5923			 */
  5924			cond_resched();
  5925	
  5926			mem_cgroup_calculate_protection(target_memcg, memcg);
  5927	
  5928			/* Skip memcg with no usage */
> 5929			if (!page_counter_read(&memcg->memory))
  5930				continue;
  5931	
  5932			if (mem_cgroup_below_min(target_memcg, memcg)) {
  5933				/*
  5934				 * Hard protection.
  5935				 * If there is no reclaimable memory, OOM.
  5936				 */
  5937				continue;
  5938			} else if (mem_cgroup_below_low(target_memcg, memcg)) {
  5939				/*
  5940				 * Soft protection.
  5941				 * Respect the protection only as long as
  5942				 * there is an unprotected supply
  5943				 * of reclaimable memory from other cgroups.
  5944				 */
  5945				if (!sc->memcg_low_reclaim) {
  5946					sc->memcg_low_skipped = 1;
  5947					continue;
  5948				}
  5949				memcg_memory_event(memcg, MEMCG_LOW);
  5950			}
  5951	
  5952			reclaimed = sc->nr_reclaimed;
  5953			scanned = sc->nr_scanned;
  5954	
  5955			shrink_lruvec(lruvec, sc);
  5956	
  5957			shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
  5958				    sc->priority);
  5959	
  5960			/* Record the group's reclaim efficiency */
  5961			if (!sc->proactive)
  5962				vmpressure(sc->gfp_mask, memcg, false,
  5963					   sc->nr_scanned - scanned,
  5964					   sc->nr_reclaimed - reclaimed);
  5965	
  5966			/* If partial walks are allowed, bail once goal is reached */
  5967			if (partial && sc->nr_reclaimed >= sc->nr_to_reclaim) {
  5968				mem_cgroup_iter_break(target_memcg, memcg);
  5969				break;
  5970			}
  5971		} while ((memcg = mem_cgroup_iter(target_memcg, memcg, partial)));
  5972	}
  5973
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b620d74b0f66..2a2957b9dc99 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5963,6 +5963,10 @@  static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 
 		mem_cgroup_calculate_protection(target_memcg, memcg);
 
+		/* Skip memcg with no usage */
+		if (!page_counter_read(&memcg->memory))
+			continue;
+
 		if (mem_cgroup_below_min(target_memcg, memcg)) {
 			/*
 			 * Hard protection.
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 16f5d74ae762..bab826b6b7b0 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -525,8 +525,13 @@  static int test_memcg_protection(const char *root, bool min)
 		goto cleanup;
 	}
 
+	/*
+	 * Child 2 has memory.low=0, but some low protection is still being
+	 * distributed down from its parent with memory.low=50M. So the low
+	 * event count will be non-zero.
+	 */
 	for (i = 0; i < ARRAY_SIZE(children); i++) {
-		int no_low_events_index = 1;
+		int no_low_events_index = 2;
 		long low, oom;
 
 		oom = cg_read_key_long(children[i], "memory.events", "oom ");