diff mbox series

mm: terminate the reclaim early when direct reclaiming

Message ID 1532683165-19416-1-git-send-email-zhaoyang.huang@spreadtrum.com (mailing list archive)
State New, archived
Headers show
Series mm: terminate the reclaim early when direct reclaiming | expand

Commit Message

Zhaoyang Huang July 27, 2018, 9:19 a.m. UTC
This patch try to let the direct reclaim finish earlier than it used
to be. The problem comes from We observing that the direct reclaim
took a long time to finish when memcg is enabled. By debugging, we
find that the reason is the softlimit is too low to meet the loop
end criteria. So we add two barriers to judge if it has reclaimed
enough memory as same criteria as it is in shrink_lruvec:
1. for each memcg softlimit reclaim.
2. before starting the global reclaim in shrink_zone.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@spreadtrum.com>
---
 include/linux/memcontrol.h |  3 ++-
 mm/memcontrol.c            |  3 +++
 mm/vmscan.c                | 24 ++++++++++++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

Comments

Johannes Weiner July 27, 2018, 7:58 p.m. UTC | #1
Hi Zhaoyang,

On Fri, Jul 27, 2018 at 05:19:25PM +0800, Zhaoyang Huang wrote:
> This patch try to let the direct reclaim finish earlier than it used
> to be. The problem comes from We observing that the direct reclaim
> took a long time to finish when memcg is enabled. By debugging, we
> find that the reason is the softlimit is too low to meet the loop
> end criteria. So we add two barriers to judge if it has reclaimed
> enough memory as same criteria as it is in shrink_lruvec:
> 1. for each memcg softlimit reclaim.
> 2. before starting the global reclaim in shrink_zone.

Yes, the soft limit reclaim cycle is fairly aggressive and can
introduce quite some allocation latency into the system. Let me say
right up front, though, that we've spend hours in conference sessions
and phone calls trying to fix this and could never agree on
anything. You might have better luck trying cgroup2 which implements
memory.low in a more scalable manner. (Due to the default value of 0
instead of infinitity, it can use a smoother 2-pass reclaim cycle.)

On your patch specifically:

should_continue_reclaim() is for compacting higher order pages. It
assumes you have already made a full reclaim cycle and returns false
for most allocations without checking any sort of reclaim progress.

You may end up in a situation where soft limit reclaim finds nothing,
and you still abort without trying a regular reclaim cycle. That can
trigger the OOM killer while there is still plenty of reclaimable
memory in other groups.

So if you want to fix this, you'd have to look for a different
threshold for soft limit reclaim and. Maybe something like this
already works:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ee91e8cbeb5a..5b2388fa6bc4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2786,7 +2786,8 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 						&nr_soft_scanned);
 			sc->nr_reclaimed += nr_soft_reclaimed;
 			sc->nr_scanned += nr_soft_scanned;
-			/* need some check for avoid more shrink_zone() */
+			if (nr_soft_reclaimed)
+				continue;
 		}
 
 		/* See comment about same check for global reclaim above */
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6c6fb11..cdf5de6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -325,7 +325,8 @@  void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
 void mem_cgroup_uncharge_list(struct list_head *page_list);
 
 void mem_cgroup_migrate(struct page *oldpage, struct page *newpage);
-
+bool direct_reclaim_reach_sflimit(pg_data_t *pgdat, unsigned long nr_reclaimed,
+			unsigned long nr_scanned, gfp_t gfp_mask, int order);
 static struct mem_cgroup_per_node *
 mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8c0280b..4e38223 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2577,6 +2577,9 @@  unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 			(next_mz == NULL ||
 			loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
 			break;
+		if (direct_reclaim_reach_sflimit(pgdat, nr_reclaimed,
+					*total_scanned, gfp_mask, order))
+			break;
 	} while (!nr_reclaimed);
 	if (next_mz)
 		css_put(&next_mz->memcg->css);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f8..77fcda4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2518,12 +2518,36 @@  static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
 		(memcg && memcg_congested(pgdat, memcg));
 }
 
+bool direct_reclaim_reach_sflimit(pg_data_t *pgdat, unsigned long nr_reclaimed,
+		unsigned long nr_scanned, gfp_t gfp_mask,
+		int order)
+{
+	struct scan_control sc = {
+		.gfp_mask = gfp_mask,
+		.order = order,
+		.priority = DEF_PRIORITY,
+		.nr_reclaimed = nr_reclaimed,
+		.nr_scanned = nr_scanned,
+	};
+	if (!current_is_kswapd() && !should_continue_reclaim(pgdat,
+				sc.nr_reclaimed, sc.nr_scanned, &sc))
+		return true;
+	return false;
+}
+EXPORT_SYMBOL(direct_reclaim_reach_sflimit);
+
 static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 {
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	unsigned long nr_reclaimed, nr_scanned;
 	bool reclaimable = false;
 
+	if (!current_is_kswapd() && !should_continue_reclaim(pgdat,
+		sc->nr_reclaimed, sc->nr_scanned, sc)) {
+
+		return !!sc->nr_reclaimed;
+	}
+
 	do {
 		struct mem_cgroup *root = sc->target_mem_cgroup;
 		struct mem_cgroup_reclaim_cookie reclaim = {