diff mbox series

[1/3] mm, reclaim: make should_continue_reclaim perform dryrun detection

Message ID 20190802223930.30971-2-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series address hugetlb page allocation stalls | expand

Commit Message

Mike Kravetz Aug. 2, 2019, 10:39 p.m. UTC
From: Hillf Danton <hdanton@sina.com>

Address the issue of should_continue_reclaim continuing true too often
for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
This could happen during hugetlb page allocation causing stalls for
minutes or hours.

We can stop reclaiming pages if compaction reports it can make a progress.
A code reshuffle is needed to do that. And it has side-effects, however,
with allocation latencies in other cases but that would come at the cost
of potential premature reclaim which has consequences of itself.

We can also bail out of reclaiming pages if we know that there are not
enough inactive lru pages left to satisfy the costly allocation.

We can give up reclaiming pages too if we see dryrun occur, with the
certainty of plenty of inactive pages. IOW with dryrun detected, we are
sure we have reclaimed as many pages as we could.

Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

Comments

Vlastimil Babka Aug. 5, 2019, 8:42 a.m. UTC | #1
On 8/3/19 12:39 AM, Mike Kravetz wrote:
> From: Hillf Danton <hdanton@sina.com>
> 
> Address the issue of should_continue_reclaim continuing true too often
> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
> This could happen during hugetlb page allocation causing stalls for
> minutes or hours.
> 
> We can stop reclaiming pages if compaction reports it can make a progress.
> A code reshuffle is needed to do that.

> And it has side-effects, however,
> with allocation latencies in other cases but that would come at the cost
> of potential premature reclaim which has consequences of itself.

Based on Mel's longer explanation, can we clarify the wording here? e.g.:

There might be side-effect for other high-order allocations that would
potentially benefit from more reclaim before compaction for them to be
faster and less likely to stall, but the consequences of
premature/over-reclaim are considered worse.

> We can also bail out of reclaiming pages if we know that there are not
> enough inactive lru pages left to satisfy the costly allocation.
> 
> We can give up reclaiming pages too if we see dryrun occur, with the
> certainty of plenty of inactive pages. IOW with dryrun detected, we are
> sure we have reclaimed as many pages as we could.
> 
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Hillf Danton <hdanton@sina.com>
> Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
> Acked-by: Mel Gorman <mgorman@suse.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
I will send some followup cleanup.

There should be also Mike's SOB?



> ---
>  mm/vmscan.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 47aa2158cfac..a386c5351592 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2738,18 +2738,6 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
>  			return false;
>  	}
>  
> -	/*
> -	 * If we have not reclaimed enough pages for compaction and the
> -	 * inactive lists are large enough, continue reclaiming
> -	 */
> -	pages_for_compaction = compact_gap(sc->order);
> -	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
> -	if (get_nr_swap_pages() > 0)
> -		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
> -	if (sc->nr_reclaimed < pages_for_compaction &&
> -			inactive_lru_pages > pages_for_compaction)
> -		return true;
> -
>  	/* If compaction would go ahead or the allocation would succeed, stop */
>  	for (z = 0; z <= sc->reclaim_idx; z++) {
>  		struct zone *zone = &pgdat->node_zones[z];
> @@ -2765,7 +2753,21 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
>  			;
>  		}
>  	}
> -	return true;
> +
> +	/*
> +	 * If we have not reclaimed enough pages for compaction and the
> +	 * inactive lists are large enough, continue reclaiming
> +	 */
> +	pages_for_compaction = compact_gap(sc->order);
> +	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
> +	if (get_nr_swap_pages() > 0)
> +		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
> +
> +	return inactive_lru_pages > pages_for_compaction &&
> +		/*
> +		 * avoid dryrun with plenty of inactive pages
> +		 */
> +		nr_scanned && nr_reclaimed;
>  }
>  
>  static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
>
Vlastimil Babka Aug. 5, 2019, 10:57 a.m. UTC | #2
On 8/5/19 10:42 AM, Vlastimil Babka wrote:
> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>> From: Hillf Danton <hdanton@sina.com>
>>
>> Address the issue of should_continue_reclaim continuing true too often
>> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
>> This could happen during hugetlb page allocation causing stalls for
>> minutes or hours.
>>
>> We can stop reclaiming pages if compaction reports it can make a progress.
>> A code reshuffle is needed to do that.
> 
>> And it has side-effects, however,
>> with allocation latencies in other cases but that would come at the cost
>> of potential premature reclaim which has consequences of itself.
> 
> Based on Mel's longer explanation, can we clarify the wording here? e.g.:
> 
> There might be side-effect for other high-order allocations that would
> potentially benefit from more reclaim before compaction for them to be
> faster and less likely to stall, but the consequences of
> premature/over-reclaim are considered worse.
> 
>> We can also bail out of reclaiming pages if we know that there are not
>> enough inactive lru pages left to satisfy the costly allocation.
>>
>> We can give up reclaiming pages too if we see dryrun occur, with the
>> certainty of plenty of inactive pages. IOW with dryrun detected, we are
>> sure we have reclaimed as many pages as we could.
>>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Hillf Danton <hdanton@sina.com>
>> Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Acked-by: Mel Gorman <mgorman@suse.de>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> I will send some followup cleanup.

How about this?
----8<----
From 0040b32462587171ad22395a56699cc036ad483f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Mon, 5 Aug 2019 12:49:40 +0200
Subject: [PATCH] mm, reclaim: cleanup should_continue_reclaim()

After commit "mm, reclaim: make should_continue_reclaim perform dryrun
detection", closer look at the function shows, that nr_reclaimed == 0 means
the function will always return false. And since non-zero nr_reclaimed implies
non_zero nr_scanned, testing nr_scanned serves no purpose, and so does the
testing for __GFP_RETRY_MAYFAIL.

This patch thus cleans up the function to test only !nr_reclaimed upfront, and
remove the __GFP_RETRY_MAYFAIL test and nr_scanned parameter completely.
Comment is also updated, explaining that approximating "full LRU list has been
scanned" with nr_scanned == 0 didn't really work.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 43 ++++++++++++++-----------------------------
 1 file changed, 14 insertions(+), 29 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ad498b76e492..db3c9e06a888 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2582,7 +2582,6 @@ static bool in_reclaim_compaction(struct scan_control *sc)
  */
 static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 					unsigned long nr_reclaimed,
-					unsigned long nr_scanned,
 					struct scan_control *sc)
 {
 	unsigned long pages_for_compaction;
@@ -2593,28 +2592,18 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	if (!in_reclaim_compaction(sc))
 		return false;
 
-	/* Consider stopping depending on scan and reclaim activity */
-	if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) {
-		/*
-		 * For __GFP_RETRY_MAYFAIL allocations, stop reclaiming if the
-		 * full LRU list has been scanned and we are still failing
-		 * to reclaim pages. This full LRU scan is potentially
-		 * expensive but a __GFP_RETRY_MAYFAIL caller really wants to succeed
-		 */
-		if (!nr_reclaimed && !nr_scanned)
-			return false;
-	} else {
-		/*
-		 * For non-__GFP_RETRY_MAYFAIL allocations which can presumably
-		 * fail without consequence, stop if we failed to reclaim
-		 * any pages from the last SWAP_CLUSTER_MAX number of
-		 * pages that were scanned. This will return to the
-		 * caller faster at the risk reclaim/compaction and
-		 * the resulting allocation attempt fails
-		 */
-		if (!nr_reclaimed)
-			return false;
-	}
+	/*
+	 * Stop if we failed to reclaim any pages from the last SWAP_CLUSTER_MAX
+	 * number of pages that were scanned. This will return to the caller
+	 * with the risk reclaim/compaction and the resulting allocation attempt
+	 * fails. In the past we have tried harder for __GFP_RETRY_MAYFAIL
+	 * allocations through requiring that the full LRU list has been scanned
+	 * first, by assuming that zero delta of sc->nr_scanned means full LRU
+	 * scan, but that approximation was wrong, and there were corner cases
+	 * where always a non-zero amount of pages were scanned.
+	 */
+	if (!nr_reclaimed)
+		return false;
 
 	/* If compaction would go ahead or the allocation would succeed, stop */
 	for (z = 0; z <= sc->reclaim_idx; z++) {
@@ -2641,11 +2630,7 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	if (get_nr_swap_pages() > 0)
 		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
 
-	return inactive_lru_pages > pages_for_compaction &&
-		/*
-		 * avoid dryrun with plenty of inactive pages
-		 */
-		nr_scanned && nr_reclaimed;
+	return inactive_lru_pages > pages_for_compaction;
 }
 
 static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
@@ -2810,7 +2795,7 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 			wait_iff_congested(BLK_RW_ASYNC, HZ/10);
 
 	} while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
-					 sc->nr_scanned - nr_scanned, sc));
+					 sc));
 
 	/*
 	 * Kswapd gives up on balancing particular nodes after too
Mike Kravetz Aug. 5, 2019, 4:54 p.m. UTC | #3
On 8/5/19 1:42 AM, Vlastimil Babka wrote:
> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>> From: Hillf Danton <hdanton@sina.com>
>>
>> Address the issue of should_continue_reclaim continuing true too often
>> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
>> This could happen during hugetlb page allocation causing stalls for
>> minutes or hours.
>>
>> We can stop reclaiming pages if compaction reports it can make a progress.
>> A code reshuffle is needed to do that.
> 
>> And it has side-effects, however,
>> with allocation latencies in other cases but that would come at the cost
>> of potential premature reclaim which has consequences of itself.
> 
> Based on Mel's longer explanation, can we clarify the wording here? e.g.:
> 
> There might be side-effect for other high-order allocations that would
> potentially benefit from more reclaim before compaction for them to be
> faster and less likely to stall, but the consequences of
> premature/over-reclaim are considered worse.
> 
>> We can also bail out of reclaiming pages if we know that there are not
>> enough inactive lru pages left to satisfy the costly allocation.
>>
>> We can give up reclaiming pages too if we see dryrun occur, with the
>> certainty of plenty of inactive pages. IOW with dryrun detected, we are
>> sure we have reclaimed as many pages as we could.
>>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Hillf Danton <hdanton@sina.com>
>> Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
>> Acked-by: Mel Gorman <mgorman@suse.de>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> I will send some followup cleanup.
> 
> There should be also Mike's SOB?

Will do.
My apologies, the process of handling patches created by others is new
to me.

Also, will incorporate Mel's explanation.
Mike Kravetz Aug. 5, 2019, 4:58 p.m. UTC | #4
On 8/5/19 3:57 AM, Vlastimil Babka wrote:
> On 8/5/19 10:42 AM, Vlastimil Babka wrote:
>> On 8/3/19 12:39 AM, Mike Kravetz wrote:
>>> From: Hillf Danton <hdanton@sina.com>
>>>
>>> Address the issue of should_continue_reclaim continuing true too often
>>> for __GFP_RETRY_MAYFAIL attempts when !nr_reclaimed and nr_scanned.
>>> This could happen during hugetlb page allocation causing stalls for
>>> minutes or hours.
>>>
>>> We can stop reclaiming pages if compaction reports it can make a progress.
>>> A code reshuffle is needed to do that.
>>
>>> And it has side-effects, however,
>>> with allocation latencies in other cases but that would come at the cost
>>> of potential premature reclaim which has consequences of itself.
>>
>> Based on Mel's longer explanation, can we clarify the wording here? e.g.:
>>
>> There might be side-effect for other high-order allocations that would
>> potentially benefit from more reclaim before compaction for them to be
>> faster and less likely to stall, but the consequences of
>> premature/over-reclaim are considered worse.
>>
>>> We can also bail out of reclaiming pages if we know that there are not
>>> enough inactive lru pages left to satisfy the costly allocation.
>>>
>>> We can give up reclaiming pages too if we see dryrun occur, with the
>>> certainty of plenty of inactive pages. IOW with dryrun detected, we are
>>> sure we have reclaimed as many pages as we could.
>>>
>>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>>> Cc: Mel Gorman <mgorman@suse.de>
>>> Cc: Michal Hocko <mhocko@kernel.org>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Signed-off-by: Hillf Danton <hdanton@sina.com>
>>> Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
>>> Acked-by: Mel Gorman <mgorman@suse.de>
>>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> I will send some followup cleanup.
> 
> How about this?
> ----8<----
> From 0040b32462587171ad22395a56699cc036ad483f Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Mon, 5 Aug 2019 12:49:40 +0200
> Subject: [PATCH] mm, reclaim: cleanup should_continue_reclaim()
> 
> After commit "mm, reclaim: make should_continue_reclaim perform dryrun
> detection", closer look at the function shows, that nr_reclaimed == 0 means
> the function will always return false. And since non-zero nr_reclaimed implies
> non_zero nr_scanned, testing nr_scanned serves no purpose, and so does the
> testing for __GFP_RETRY_MAYFAIL.
> 
> This patch thus cleans up the function to test only !nr_reclaimed upfront, and
> remove the __GFP_RETRY_MAYFAIL test and nr_scanned parameter completely.
> Comment is also updated, explaining that approximating "full LRU list has been
> scanned" with nr_scanned == 0 didn't really work.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>

Would you like me to add this to the series, or do you want to send later?
Vlastimil Babka Aug. 5, 2019, 6:34 p.m. UTC | #5
On 8/5/19 6:58 PM, Mike Kravetz wrote:
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Would you like me to add this to the series, or do you want to send later?

Please add, thanks!
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 47aa2158cfac..a386c5351592 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2738,18 +2738,6 @@  static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			return false;
 	}
 
-	/*
-	 * If we have not reclaimed enough pages for compaction and the
-	 * inactive lists are large enough, continue reclaiming
-	 */
-	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
-	if (get_nr_swap_pages() > 0)
-		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
-	if (sc->nr_reclaimed < pages_for_compaction &&
-			inactive_lru_pages > pages_for_compaction)
-		return true;
-
 	/* If compaction would go ahead or the allocation would succeed, stop */
 	for (z = 0; z <= sc->reclaim_idx; z++) {
 		struct zone *zone = &pgdat->node_zones[z];
@@ -2765,7 +2753,21 @@  static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 			;
 		}
 	}
-	return true;
+
+	/*
+	 * If we have not reclaimed enough pages for compaction and the
+	 * inactive lists are large enough, continue reclaiming
+	 */
+	pages_for_compaction = compact_gap(sc->order);
+	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
+	if (get_nr_swap_pages() > 0)
+		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+	return inactive_lru_pages > pages_for_compaction &&
+		/*
+		 * avoid dryrun with plenty of inactive pages
+		 */
+		nr_scanned && nr_reclaimed;
 }
 
 static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)