diff mbox series

[1/1] mm: vmscan: Reduce throttling due to a failure to make progress

Message ID 20211202131842.9217-1-mgorman@techsingularity.net (mailing list archive)
State New, archived
Headers show
Series [1/1] mm: vmscan: Reduce throttling due to a failure to make progress | expand

Commit Message

Mel Gorman Dec. 2, 2021, 1:18 p.m. UTC
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.
In Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling. In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being
made") introduced the problem although commit a19594ca4a8b ("mm/vmscan:
increase the timeout if page reclaim is not making progress") made it
worse. Systems at or near an OOM state that cannot be recovered must
reach OOM quickly and memcg should kill tasks if a memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there were
excessive pages pending for writeback. If kswapd has stopped reclaiming due
to excessive failures, do not stall at all so that OOM triggers relatively
quickly. Similarly, if an LRU is simply congested, only lightly throttle
similar to NOPROGRESS.

Alexey's original case was the most straight forward

	for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a lot
with lots of frames missing and numerous audio glitches. With this patch
applies, the video plays similarly to 5.15.

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net

Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/mmzone.h        |  1 +
 include/trace/events/vmscan.h |  4 ++-
 mm/vmscan.c                   | 64 ++++++++++++++++++++++++++++++-----
 3 files changed, 59 insertions(+), 10 deletions(-)

Comments

kernel test robot Dec. 2, 2021, 2:51 p.m. UTC | #1
Hi Mel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on hnaz-mm/master]

url:    https://github.com/0day-ci/linux/commits/Mel-Gorman/mm-vmscan-Reduce-throttling-due-to-a-failure-to-make-progress/20211202-212004
base:   https://github.com/hnaz/linux-mm master
config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20211202/202112022232.gARZWe0c-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/01dada07590ae9c69a9415ba9af96d5ae184d861
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mel-Gorman/mm-vmscan-Reduce-throttling-due-to-a-failure-to-make-progress/20211202-212004
        git checkout 01dada07590ae9c69a9415ba9af96d5ae184d861
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/vmscan.c:1024:6: warning: no previous prototype for 'skip_throttle_noprogress' [-Wmissing-prototypes]
    1024 | bool skip_throttle_noprogress(pg_data_t *pgdat)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~


vim +/skip_throttle_noprogress +1024 mm/vmscan.c

  1023	
> 1024	bool skip_throttle_noprogress(pg_data_t *pgdat)
  1025	{
  1026		int reclaimable = 0, write_pending = 0;
  1027		int i;
  1028	
  1029		/*
  1030		 * If kswapd is disabled, reschedule if necessary but do not
  1031		 * throttle as the system is likely near OOM.
  1032		 */
  1033		if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
  1034			return true;
  1035	
  1036		/*
  1037		 * If there are a lot of dirty/writeback pages then do not
  1038		 * throttle as throttling will occur when the pages cycle
  1039		 * towards the end of the LRU if still under writeback.
  1040		 */
  1041		for (i = 0; i < MAX_NR_ZONES; i++) {
  1042			struct zone *zone = pgdat->node_zones + i;
  1043	
  1044			if (!populated_zone(zone))
  1045				continue;
  1046	
  1047			reclaimable += zone_reclaimable_pages(zone);
  1048			write_pending += zone_page_state_snapshot(zone,
  1049							  NR_ZONE_WRITE_PENDING);
  1050		}
  1051		if (2 * write_pending <= reclaimable)
  1052			return true;
  1053	
  1054		return false;
  1055	}
  1056	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
kernel test robot Dec. 2, 2021, 4:02 p.m. UTC | #2
Hi Mel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on hnaz-mm/master]

url:    https://github.com/0day-ci/linux/commits/Mel-Gorman/mm-vmscan-Reduce-throttling-due-to-a-failure-to-make-progress/20211202-212004
base:   https://github.com/hnaz/linux-mm master
config: x86_64-buildonly-randconfig-r001-20211202 (https://download.01.org/0day-ci/archive/20211203/202112030001.HUiErCyK-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 4b553297ef3ee4dc2119d5429adf3072e90fac38)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/01dada07590ae9c69a9415ba9af96d5ae184d861
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mel-Gorman/mm-vmscan-Reduce-throttling-due-to-a-failure-to-make-progress/20211202-212004
        git checkout 01dada07590ae9c69a9415ba9af96d5ae184d861
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/vmscan.c:1024:6: warning: no previous prototype for function 'skip_throttle_noprogress' [-Wmissing-prototypes]
   bool skip_throttle_noprogress(pg_data_t *pgdat)
        ^
   mm/vmscan.c:1024:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   bool skip_throttle_noprogress(pg_data_t *pgdat)
   ^
   static 
   1 warning generated.


vim +/skip_throttle_noprogress +1024 mm/vmscan.c

  1023	
> 1024	bool skip_throttle_noprogress(pg_data_t *pgdat)
  1025	{
  1026		int reclaimable = 0, write_pending = 0;
  1027		int i;
  1028	
  1029		/*
  1030		 * If kswapd is disabled, reschedule if necessary but do not
  1031		 * throttle as the system is likely near OOM.
  1032		 */
  1033		if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
  1034			return true;
  1035	
  1036		/*
  1037		 * If there are a lot of dirty/writeback pages then do not
  1038		 * throttle as throttling will occur when the pages cycle
  1039		 * towards the end of the LRU if still under writeback.
  1040		 */
  1041		for (i = 0; i < MAX_NR_ZONES; i++) {
  1042			struct zone *zone = pgdat->node_zones + i;
  1043	
  1044			if (!populated_zone(zone))
  1045				continue;
  1046	
  1047			reclaimable += zone_reclaimable_pages(zone);
  1048			write_pending += zone_page_state_snapshot(zone,
  1049							  NR_ZONE_WRITE_PENDING);
  1050		}
  1051		if (2 * write_pending <= reclaimable)
  1052			return true;
  1053	
  1054		return false;
  1055	}
  1056	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..936dc0b6c226 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -277,6 +277,7 @@  enum vmscan_throttle_state {
 	VMSCAN_THROTTLE_WRITEBACK,
 	VMSCAN_THROTTLE_ISOLATED,
 	VMSCAN_THROTTLE_NOPROGRESS,
+	VMSCAN_THROTTLE_CONGESTED,
 	NR_VMSCAN_THROTTLE,
 };
 
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index f25a6149d3ba..ca2e9009a651 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -30,12 +30,14 @@ 
 #define _VMSCAN_THROTTLE_WRITEBACK	(1 << VMSCAN_THROTTLE_WRITEBACK)
 #define _VMSCAN_THROTTLE_ISOLATED	(1 << VMSCAN_THROTTLE_ISOLATED)
 #define _VMSCAN_THROTTLE_NOPROGRESS	(1 << VMSCAN_THROTTLE_NOPROGRESS)
+#define _VMSCAN_THROTTLE_CONGESTED	(1 << VMSCAN_THROTTLE_CONGESTED)
 
 #define show_throttle_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",					\
 		{_VMSCAN_THROTTLE_WRITEBACK,	"VMSCAN_THROTTLE_WRITEBACK"},	\
 		{_VMSCAN_THROTTLE_ISOLATED,	"VMSCAN_THROTTLE_ISOLATED"},	\
-		{_VMSCAN_THROTTLE_NOPROGRESS,	"VMSCAN_THROTTLE_NOPROGRESS"}	\
+		{_VMSCAN_THROTTLE_NOPROGRESS,	"VMSCAN_THROTTLE_NOPROGRESS"},	\
+		{_VMSCAN_THROTTLE_CONGESTED,	"VMSCAN_THROTTLE_CONGESTED"}	\
 		) : "VMSCAN_THROTTLE_NONE"
 
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fb9584641ac7..e3f2dd1e8cd9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1021,6 +1021,39 @@  static void handle_write_error(struct address_space *mapping,
 	unlock_page(page);
 }
 
+bool skip_throttle_noprogress(pg_data_t *pgdat)
+{
+	int reclaimable = 0, write_pending = 0;
+	int i;
+
+	/*
+	 * If kswapd is disabled, reschedule if necessary but do not
+	 * throttle as the system is likely near OOM.
+	 */
+	if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+		return true;
+
+	/*
+	 * If there are a lot of dirty/writeback pages then do not
+	 * throttle as throttling will occur when the pages cycle
+	 * towards the end of the LRU if still under writeback.
+	 */
+	for (i = 0; i < MAX_NR_ZONES; i++) {
+		struct zone *zone = pgdat->node_zones + i;
+
+		if (!populated_zone(zone))
+			continue;
+
+		reclaimable += zone_reclaimable_pages(zone);
+		write_pending += zone_page_state_snapshot(zone,
+						  NR_ZONE_WRITE_PENDING);
+	}
+	if (2 * write_pending <= reclaimable)
+		return true;
+
+	return false;
+}
+
 void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
 {
 	wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason];
@@ -1056,8 +1089,16 @@  void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason)
 		}
 
 		break;
+	case VMSCAN_THROTTLE_CONGESTED:
+		fallthrough;
 	case VMSCAN_THROTTLE_NOPROGRESS:
-		timeout = HZ/2;
+		if (skip_throttle_noprogress(pgdat)) {
+			cond_resched();
+			return;
+		}
+
+		timeout = 1;
+
 		break;
 	case VMSCAN_THROTTLE_ISOLATED:
 		timeout = HZ/50;
@@ -3321,7 +3362,7 @@  static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 	if (!current_is_kswapd() && current_may_throttle() &&
 	    !sc->hibernation_mode &&
 	    test_bit(LRUVEC_CONGESTED, &target_lruvec->flags))
-		reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+		reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED);
 
 	if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
 				    sc))
@@ -3386,16 +3427,16 @@  static void consider_reclaim_throttle(pg_data_t *pgdat, struct scan_control *sc)
 	}
 
 	/*
-	 * Do not throttle kswapd on NOPROGRESS as it will throttle on
-	 * VMSCAN_THROTTLE_WRITEBACK if there are too many pages under
-	 * writeback and marked for immediate reclaim at the tail of
-	 * the LRU.
+	 * Do not throttle kswapd or cgroup reclaim on NOPROGRESS as it will
+	 * throttle on VMSCAN_THROTTLE_WRITEBACK if there are too many pages
+	 * under writeback and marked for immediate reclaim at the tail of the
+	 * LRU.
 	 */
-	if (current_is_kswapd())
+	if (current_is_kswapd() || cgroup_reclaim(sc))
 		return;
 
 	/* Throttle if making no progress at high prioities. */
-	if (sc->priority < DEF_PRIORITY - 2)
+	if (sc->priority == 1 && !sc->nr_reclaimed)
 		reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS);
 }
 
@@ -3415,6 +3456,7 @@  static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	unsigned long nr_soft_scanned;
 	gfp_t orig_mask;
 	pg_data_t *last_pgdat = NULL;
+	pg_data_t *first_pgdat = NULL;
 
 	/*
 	 * If the number of buffer_heads in the machine exceeds the maximum
@@ -3478,14 +3520,18 @@  static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			/* need some check for avoid more shrink_zone() */
 		}
 
+		if (!first_pgdat)
+			first_pgdat = zone->zone_pgdat;
+
 		/* See comment about same check for global reclaim above */
 		if (zone->zone_pgdat == last_pgdat)
 			continue;
 		last_pgdat = zone->zone_pgdat;
 		shrink_node(zone->zone_pgdat, sc);
-		consider_reclaim_throttle(zone->zone_pgdat, sc);
 	}
 
+	consider_reclaim_throttle(first_pgdat, sc);
+
 	/*
 	 * Restore to original mask to avoid the impact on the caller if we
 	 * promoted it to __GFP_HIGHMEM.