diff mbox series

[RFC,2/3] mm, compaction: use MIN_COMPACT_COSTLY_PRIORITY everywhere for costly orders

Message ID 20190724175014.9935-3-mike.kravetz@oracle.com (mailing list archive)
State New, archived
Headers show
Series fix hugetlb page allocation stalls | expand

Commit Message

Mike Kravetz July 24, 2019, 5:50 p.m. UTC
For PAGE_ALLOC_COSTLY_ORDER allocations, MIN_COMPACT_COSTLY_PRIORITY is
minimum (highest priority).  Other places in the compaction code key off
of MIN_COMPACT_PRIORITY.  Costly order allocations will never get to
MIN_COMPACT_PRIORITY.  Therefore, some conditions will never be met for
costly order allocations.

This was observed when hugetlb allocations could stall for minutes or
hours when should_compact_retry() would return true more often then it
should.  Specifically, this was in the case where compact_result was
COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
made.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/compaction.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Comments

Mel Gorman July 25, 2019, 8:06 a.m. UTC | #1
On Wed, Jul 24, 2019 at 10:50:13AM -0700, Mike Kravetz wrote:
> For PAGE_ALLOC_COSTLY_ORDER allocations, MIN_COMPACT_COSTLY_PRIORITY is
> minimum (highest priority).  Other places in the compaction code key off
> of MIN_COMPACT_PRIORITY.  Costly order allocations will never get to
> MIN_COMPACT_PRIORITY.  Therefore, some conditions will never be met for
> costly order allocations.
> 
> This was observed when hugetlb allocations could stall for minutes or
> hours when should_compact_retry() would return true more often then it
> should.  Specifically, this was in the case where compact_result was
> COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
> made.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Mel Gorman <mgorman@suse.de>
Vlastimil Babka July 31, 2019, 12:06 p.m. UTC | #2
On 7/24/19 7:50 PM, Mike Kravetz wrote:
> For PAGE_ALLOC_COSTLY_ORDER allocations, MIN_COMPACT_COSTLY_PRIORITY is
> minimum (highest priority).  Other places in the compaction code key off
> of MIN_COMPACT_PRIORITY.  Costly order allocations will never get to
> MIN_COMPACT_PRIORITY.  Therefore, some conditions will never be met for
> costly order allocations.
> 
> This was observed when hugetlb allocations could stall for minutes or
> hours when should_compact_retry() would return true more often then it
> should.  Specifically, this was in the case where compact_result was
> COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
> made.

Hmm, the point of MIN_COMPACT_COSTLY_PRIORITY was that costly
allocations will not reach the priority where compaction becomes too
expensive. With your patch, they still don't reach that priority value,
but are allowed to be thorough anyway, even sooner. That just seems like
a wrong way to fix the problem. If should_compact_retry() returns
misleading results for costly allocations, then that should be fixed
instead?

Alternatively, you might want to say that hugetlb allocations are not
like other random costly allocations, because the admin setting
nr_hugepages is prepared to take the cost (I thought that was indicated
by the __GFP_RETRY_MAYFAIL flag, but seeing all the other users of it,
I'm not sure anymore). In that case should_compact_retry() could take
__GFP_RETRY_MAYFAIL into account and allow MIN_COMPACT_PRIORITY even for
costly allocations.

> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/compaction.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 952dc2fb24e5..325b746068d1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2294,9 +2294,15 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
>  		.alloc_flags = alloc_flags,
>  		.classzone_idx = classzone_idx,
>  		.direct_compaction = true,
> -		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
> -		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
> -		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
> +		.whole_zone = ((order > PAGE_ALLOC_COSTLY_ORDER) ?
> +				(prio == MIN_COMPACT_COSTLY_PRIORITY) :
> +				(prio == MIN_COMPACT_PRIORITY)),
> +		.ignore_skip_hint = ((order > PAGE_ALLOC_COSTLY_ORDER) ?
> +				(prio == MIN_COMPACT_COSTLY_PRIORITY) :
> +				(prio == MIN_COMPACT_PRIORITY)),
> +		.ignore_block_suitable = ((order > PAGE_ALLOC_COSTLY_ORDER) ?
> +				(prio == MIN_COMPACT_COSTLY_PRIORITY) :
> +				(prio == MIN_COMPACT_PRIORITY))
>  	};
>  	struct capture_control capc = {
>  		.cc = &cc,
> @@ -2338,6 +2344,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  	int may_perform_io = gfp_mask & __GFP_IO;
>  	struct zoneref *z;
>  	struct zone *zone;
> +	int min_priority;
>  	enum compact_result rc = COMPACT_SKIPPED;
>  
>  	/*
> @@ -2350,12 +2357,13 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
>  	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
>  
>  	/* Compact each zone in the list */
> +	min_priority = (order > PAGE_ALLOC_COSTLY_ORDER) ?
> +			MIN_COMPACT_COSTLY_PRIORITY : MIN_COMPACT_PRIORITY;
>  	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
>  								ac->nodemask) {
>  		enum compact_result status;
>  
> -		if (prio > MIN_COMPACT_PRIORITY
> -					&& compaction_deferred(zone, order)) {
> +		if (prio > min_priority && compaction_deferred(zone, order)) {
>  			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
>  			continue;
>  		}
>
Mike Kravetz July 31, 2019, 8:30 p.m. UTC | #3
On 7/31/19 5:06 AM, Vlastimil Babka wrote:
> On 7/24/19 7:50 PM, Mike Kravetz wrote:
>> For PAGE_ALLOC_COSTLY_ORDER allocations, MIN_COMPACT_COSTLY_PRIORITY is
>> minimum (highest priority).  Other places in the compaction code key off
>> of MIN_COMPACT_PRIORITY.  Costly order allocations will never get to
>> MIN_COMPACT_PRIORITY.  Therefore, some conditions will never be met for
>> costly order allocations.
>>
>> This was observed when hugetlb allocations could stall for minutes or
>> hours when should_compact_retry() would return true more often then it
>> should.  Specifically, this was in the case where compact_result was
>> COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being
>> made.
> 
> Hmm, the point of MIN_COMPACT_COSTLY_PRIORITY was that costly
> allocations will not reach the priority where compaction becomes too
> expensive. With your patch, they still don't reach that priority value,
> but are allowed to be thorough anyway, even sooner. That just seems like
> a wrong way to fix the problem.

Thanks Vlastimil, here is why I took the approach I did.

I instrumented some of the long stalls.  Here is one common example:
should_compact_retry returned true 5000000 consecutive times.  However,
the variable compaction_retries is zero.  We never get to the code that
increments the compaction_retries count because compaction_made_progress
is false and compaction_withdrawn is true.  As suggested earlier, I noted
why compaction_withdrawn is true.  Of the 5000000 calls,
4921875 were COMPACT_DEFERRED
78125 were COMPACT_PARTIAL_SKIPPED
Note that 5000000/64(1 << COMPACT_MAX_DEFER_SHIFT) == 78125

I then started looking into why COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED
were being set/returned so often.
COMPACT_DEFERRED is set/returned in try_to_compact_pages.  Specifically,
		if (prio > MIN_COMPACT_PRIORITY
					&& compaction_deferred(zone, order)) {
			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
			continue;
		}
COMPACT_PARTIAL_SKIPPED is set/returned in __compact_finished. Specifically,
	if (compact_scanners_met(cc)) {
		/* Let the next compaction start anew. */
		reset_cached_positions(cc->zone);

		/* ... */

		if (cc->direct_compaction)
			cc->zone->compact_blockskip_flush = true;

		if (cc->whole_zone)
			return COMPACT_COMPLETE;
		else
			return COMPACT_PARTIAL_SKIPPED;
	}

In both cases, compact_priority being MIN_COMPACT_COSTLY_PRIORITY and not
being able to go to MIN_COMPACT_PRIORITY caused the 'compaction_withdrawn'
result to be set/returned.

I do not know the subtleties of the compaction code, but it seems like
retrying in this manner does not make sense.

>                                 If should_compact_retry() returns
> misleading results for costly allocations, then that should be fixed
> instead?
> 
> Alternatively, you might want to say that hugetlb allocations are not
> like other random costly allocations, because the admin setting
> nr_hugepages is prepared to take the cost (I thought that was indicated
> by the __GFP_RETRY_MAYFAIL flag, but seeing all the other users of it,
> I'm not sure anymore).

The example above, resulted in a stall of a little over 5 minutes.  However,
I have seen them last for hours.  Sure, the caller (admin for hugetlbfs)
knows there may be high costs.  But, I think minutes/hours to try and allocate
a single huge page is too much.  We should fail sooner that that.

>                        In that case should_compact_retry() could take
> __GFP_RETRY_MAYFAIL into account and allow MIN_COMPACT_PRIORITY even for
> costly allocations.

I'll put something like this together to test.
Vlastimil Babka Aug. 1, 2019, 1:01 p.m. UTC | #4
On 7/31/19 10:30 PM, Mike Kravetz wrote:
> On 7/31/19 5:06 AM, Vlastimil Babka wrote:
>> On 7/24/19 7:50 PM, Mike Kravetz wrote:
>>> For PAGE_ALLOC_COSTLY_ORDER allocations,
>>> MIN_COMPACT_COSTLY_PRIORITY is minimum (highest priority).  Other
>>> places in the compaction code key off of MIN_COMPACT_PRIORITY.
>>> Costly order allocations will never get to MIN_COMPACT_PRIORITY.
>>> Therefore, some conditions will never be met for costly order
>>> allocations.
>>> 
>>> This was observed when hugetlb allocations could stall for
>>> minutes or hours when should_compact_retry() would return true
>>> more often then it should.  Specifically, this was in the case
>>> where compact_result was COMPACT_DEFERRED and
>>> COMPACT_PARTIAL_SKIPPED and no progress was being made.
>> 
>> Hmm, the point of MIN_COMPACT_COSTLY_PRIORITY was that costly 
>> allocations will not reach the priority where compaction becomes
>> too expensive. With your patch, they still don't reach that
>> priority value, but are allowed to be thorough anyway, even sooner.
>> That just seems like a wrong way to fix the problem.
> 
> Thanks Vlastimil, here is why I took the approach I did.

Thanks for the explanation.

> I instrumented some of the long stalls.  Here is one common example: 
> should_compact_retry returned true 5000000 consecutive times.
> However, the variable compaction_retries is zero.  We never get to
> the code that increments the compaction_retries count because
> compaction_made_progress is false and compaction_withdrawn is true.
> As suggested earlier, I noted why compaction_withdrawn is true.  Of
> the 5000000 calls, 4921875 were COMPACT_DEFERRED 78125 were
> COMPACT_PARTIAL_SKIPPED Note that 5000000/64(1 <<
> COMPACT_MAX_DEFER_SHIFT) == 78125
> 
> I then started looking into why COMPACT_DEFERRED and
> COMPACT_PARTIAL_SKIPPED were being set/returned so often. 
> COMPACT_DEFERRED is set/returned in try_to_compact_pages.
> Specifically, if (prio > MIN_COMPACT_PRIORITY &&
> compaction_deferred(zone, order)) {

Ah, so I see it now, this is indeed why you get so many
COMPACT_DEFERRED, as prio is always above MIN_COMPACT_PRIORITY.

> rc = max_t(enum compact_result, COMPACT_DEFERRED, rc); continue; } 
> COMPACT_PARTIAL_SKIPPED is set/returned in __compact_finished.
> Specifically, if (compact_scanners_met(cc)) { /* Let the next
> compaction start anew. */ reset_cached_positions(cc->zone);
> 
> /* ... */
> 
> if (cc->direct_compaction) cc->zone->compact_blockskip_flush = true;
> 
> if (cc->whole_zone) return COMPACT_COMPLETE; else return
> COMPACT_PARTIAL_SKIPPED; }
> 
> In both cases, compact_priority being MIN_COMPACT_COSTLY_PRIORITY and
> not being able to go to MIN_COMPACT_PRIORITY caused the
> 'compaction_withdrawn' result to be set/returned.

Hmm, looks like compaction_withdrawn() is just too blunt a test. It
mixes up results where the reaction should be more reclaim
(COMPACT_SKIPPED), and the results that depend on compaction priority
(the rest), and then we should either increase the priority, or fail.

> I do not know the subtleties of the compaction code, but it seems
> like retrying in this manner does not make sense.

I agree it doesn't, if we can't go for MIN_COMPACT_PRIORITY.

>> If should_compact_retry() returns misleading results for costly
>> allocations, then that should be fixed instead?
>> 
>> Alternatively, you might want to say that hugetlb allocations are
>> not like other random costly allocations, because the admin
>> setting nr_hugepages is prepared to take the cost (I thought that
>> was indicated by the __GFP_RETRY_MAYFAIL flag, but seeing all the
>> other users of it, I'm not sure anymore).
> 
> The example above, resulted in a stall of a little over 5 minutes.
> However, I have seen them last for hours.  Sure, the caller (admin
> for hugetlbfs) knows there may be high costs.  But, I think
> minutes/hours to try and allocate a single huge page is too much.  We
> should fail sooner that that.

Sure. We should eliminate the pointless retries in any case, the
question is whether we allow the MIN_COMPACT_PRIORITY over
MIN_COMPACT_COSTLY_PRIORITY.

>> In that case should_compact_retry() could take __GFP_RETRY_MAYFAIL
>> into account and allow MIN_COMPACT_PRIORITY even for costly
>> allocations.
> 
> I'll put something like this together to test.

Could you try testing the patch below instead? It should hopefully
eliminate the stalls. If it makes hugepage allocation give up too early,
we'll know we have to involve __GFP_RETRY_MAYFAIL in allowing the
MIN_COMPACT_PRIORITY priority. Thanks!

----8<----
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 9569e7c786d3..b8bfe8d5d2e9 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -129,11 +129,7 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
-/*
- * Compaction  has backed off for some reason. It might be throttling or
- * lock contention. Retrying is still worthwhile.
- */
-static inline bool compaction_withdrawn(enum compact_result result)
+static inline bool compaction_needs_reclaim(enum compact_result result)
 {
 	/*
 	 * Compaction backed off due to watermark checks for order-0
@@ -142,6 +138,15 @@ static inline bool compaction_withdrawn(enum compact_result result)
 	if (result == COMPACT_SKIPPED)
 		return true;
 
+	return false;
+}
+
+/*
+ * Compaction  has backed off for some reason. It might be throttling or
+ * lock contention. Retrying is still worthwhile.
+ */
+static inline bool compaction_withdrawn(enum compact_result result)
+{
 	/*
 	 * If compaction is deferred for high-order allocations, it is
 	 * because sync compaction recently failed. If this is the case
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..3dfce1f79112 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3965,6 +3965,11 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (compaction_failed(compact_result))
 		goto check_priority;
 
+	if (compaction_needs_reclaim(compact_result)) {
+		ret = compaction_zonelist_suitable(ac, order, alloc_flags);
+		goto out;
+	}
+
 	/*
 	 * make sure the compaction wasn't deferred or didn't bail out early
 	 * due to locks contention before we declare that we should give up.
@@ -3972,8 +3977,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	 * compaction.
 	 */
 	if (compaction_withdrawn(compact_result)) {
-		ret = compaction_zonelist_suitable(ac, order, alloc_flags);
-		goto out;
+		goto check_priority;
 	}
 
 	/*
Mike Kravetz Aug. 1, 2019, 8:33 p.m. UTC | #5
On 8/1/19 6:01 AM, Vlastimil Babka wrote:
> Could you try testing the patch below instead? It should hopefully
> eliminate the stalls. If it makes hugepage allocation give up too early,
> we'll know we have to involve __GFP_RETRY_MAYFAIL in allowing the
> MIN_COMPACT_PRIORITY priority. Thanks!

Thanks.  This patch does eliminate the stalls I was seeing.

In my testing, there is little difference in how many hugetlb pages are
allocated.  It does not appear to be giving up/failing too early.  But,
this is only with __GFP_RETRY_MAYFAIL.  The real concern would with THP
requests.  Any suggestions on how to test that?
Vlastimil Babka Aug. 2, 2019, 10:20 a.m. UTC | #6
On 8/1/19 10:33 PM, Mike Kravetz wrote:
> On 8/1/19 6:01 AM, Vlastimil Babka wrote:
>> Could you try testing the patch below instead? It should hopefully
>> eliminate the stalls. If it makes hugepage allocation give up too early,
>> we'll know we have to involve __GFP_RETRY_MAYFAIL in allowing the
>> MIN_COMPACT_PRIORITY priority. Thanks!
> 
> Thanks.  This patch does eliminate the stalls I was seeing.

Great, thanks! I'll send a proper patch then.

> In my testing, there is little difference in how many hugetlb pages are
> allocated.  It does not appear to be giving up/failing too early.  But,
> this is only with __GFP_RETRY_MAYFAIL.  The real concern would with THP
> requests.  Any suggestions on how to test that?

AFAICS the default THP defrag mode is unaffected, as GFP_TRANSHUGE_LIGHT doesn't
include __GFP_DIRECT_RECLAIM, so it never reaches this code. Madvised THP
allocations will be affected, which should best be tested the same way as Andrea
and Mel did in the __GFP_THISNODE debate.
Vlastimil Babka Aug. 2, 2019, 12:05 p.m. UTC | #7
On 8/1/19 10:33 PM, Mike Kravetz wrote:
> On 8/1/19 6:01 AM, Vlastimil Babka wrote:
>> Could you try testing the patch below instead? It should hopefully
>> eliminate the stalls. If it makes hugepage allocation give up too early,
>> we'll know we have to involve __GFP_RETRY_MAYFAIL in allowing the
>> MIN_COMPACT_PRIORITY priority. Thanks!
> 
> Thanks.  This patch does eliminate the stalls I was seeing.
> 
> In my testing, there is little difference in how many hugetlb pages are
> allocated.  It does not appear to be giving up/failing too early.  But,
> this is only with __GFP_RETRY_MAYFAIL.  The real concern would with THP
> requests.  Any suggestions on how to test that?

Here's the full patch, can you include it in your series?
As madvised THP allocations might be affected (hopefully just by stalling less,
not by failing too much), adding relevant people to CC - testing the scenarios
you care about is appreciated. Thanks.

----8<----
From 67d9db457f023434e8912a3ea571e545fb772a1b Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 2 Aug 2019 13:13:35 +0200
Subject: [PATCH] mm, compaction: raise compaction priority after it withdrawns

Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours
when should_compact_retry() would return true more often then it should.
Specifically, this was in the case where compact_result was COMPACT_DEFERRED
and COMPACT_PARTIAL_SKIPPED and no progress was being made."

The problem is that the compaction_withdrawn() test in should_compact_retry()
includes compaction outcomes that are only possible on low compaction priority,
and results in a retry without increasing the priority. This may result in
furter reclaim, and more incomplete compaction attempts.

With this patch, compaction priority is raised when possible, or
should_compact_retry() returns false.

The COMPACT_SKIPPED result doesn't really fit together with the other outcomes
in compaction_withdrawn(), as that's a result caused by insufficient order-0
pages, not due to low compaction priority. With this patch, it is moved to
a new compaction_needs_reclaim() function, and for that outcome we keep the
current logic of retrying if it looks like reclaim will be able to help.

Reported-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/compaction.h | 22 +++++++++++++++++-----
 mm/page_alloc.c            | 16 ++++++++++++----
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 9569e7c786d3..4b898cdbdf05 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -129,11 +129,8 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
-/*
- * Compaction  has backed off for some reason. It might be throttling or
- * lock contention. Retrying is still worthwhile.
- */
-static inline bool compaction_withdrawn(enum compact_result result)
+/* Compaction needs reclaim to be performed first, so it can continue. */
+static inline bool compaction_needs_reclaim(enum compact_result result)
 {
 	/*
 	 * Compaction backed off due to watermark checks for order-0
@@ -142,6 +139,16 @@ static inline bool compaction_withdrawn(enum compact_result result)
 	if (result == COMPACT_SKIPPED)
 		return true;
 
+	return false;
+}
+
+/*
+ * Compaction has backed off for some reason after doing some work or none
+ * at all. It might be throttling or lock contention. Retrying might be still
+ * worthwhile, but with a higher priority if allowed.
+ */
+static inline bool compaction_withdrawn(enum compact_result result)
+{
 	/*
 	 * If compaction is deferred for high-order allocations, it is
 	 * because sync compaction recently failed. If this is the case
@@ -207,6 +214,11 @@ static inline bool compaction_failed(enum compact_result result)
 	return false;
 }
 
+static inline bool compaction_needs_reclaim(enum compact_result result)
+{
+	return false;
+}
+
 static inline bool compaction_withdrawn(enum compact_result result)
 {
 	return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..bd0f00f8cfa3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3965,15 +3965,23 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (compaction_failed(compact_result))
 		goto check_priority;
 
+	/*
+	 * compaction was skipped because there are not enough order-0 pages
+	 * to work with, so we retry only if it looks like reclaim can help.
+	 */
+	if (compaction_needs_reclaim(compact_result)) {
+		ret = compaction_zonelist_suitable(ac, order, alloc_flags);
+		goto out;
+	}
+
 	/*
 	 * make sure the compaction wasn't deferred or didn't bail out early
 	 * due to locks contention before we declare that we should give up.
-	 * But do not retry if the given zonelist is not suitable for
-	 * compaction.
+	 * But the next retry should use a higher priority if allowed, so
+	 * we don't just keep bailing out endlessly.
 	 */
 	if (compaction_withdrawn(compact_result)) {
-		ret = compaction_zonelist_suitable(ac, order, alloc_flags);
-		goto out;
+		goto check_priority;
 	}
 
 	/*
Mike Kravetz Aug. 2, 2019, 5:44 p.m. UTC | #8
On 8/2/19 5:05 AM, Vlastimil Babka wrote:
> 
> On 8/1/19 10:33 PM, Mike Kravetz wrote:
>> On 8/1/19 6:01 AM, Vlastimil Babka wrote:
>>> Could you try testing the patch below instead? It should hopefully
>>> eliminate the stalls. If it makes hugepage allocation give up too early,
>>> we'll know we have to involve __GFP_RETRY_MAYFAIL in allowing the
>>> MIN_COMPACT_PRIORITY priority. Thanks!
>>
>> Thanks.  This patch does eliminate the stalls I was seeing.
>>
>> In my testing, there is little difference in how many hugetlb pages are
>> allocated.  It does not appear to be giving up/failing too early.  But,
>> this is only with __GFP_RETRY_MAYFAIL.  The real concern would with THP
>> requests.  Any suggestions on how to test that?
> 
> Here's the full patch, can you include it in your series?

Yes.  Thank you!
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index 952dc2fb24e5..325b746068d1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2294,9 +2294,15 @@  static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.alloc_flags = alloc_flags,
 		.classzone_idx = classzone_idx,
 		.direct_compaction = true,
-		.whole_zone = (prio == MIN_COMPACT_PRIORITY),
-		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
-		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
+		.whole_zone = ((order > PAGE_ALLOC_COSTLY_ORDER) ?
+				(prio == MIN_COMPACT_COSTLY_PRIORITY) :
+				(prio == MIN_COMPACT_PRIORITY)),
+		.ignore_skip_hint = ((order > PAGE_ALLOC_COSTLY_ORDER) ?
+				(prio == MIN_COMPACT_COSTLY_PRIORITY) :
+				(prio == MIN_COMPACT_PRIORITY)),
+		.ignore_block_suitable = ((order > PAGE_ALLOC_COSTLY_ORDER) ?
+				(prio == MIN_COMPACT_COSTLY_PRIORITY) :
+				(prio == MIN_COMPACT_PRIORITY))
 	};
 	struct capture_control capc = {
 		.cc = &cc,
@@ -2338,6 +2344,7 @@  enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	int may_perform_io = gfp_mask & __GFP_IO;
 	struct zoneref *z;
 	struct zone *zone;
+	int min_priority;
 	enum compact_result rc = COMPACT_SKIPPED;
 
 	/*
@@ -2350,12 +2357,13 @@  enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 	trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
 
 	/* Compact each zone in the list */
+	min_priority = (order > PAGE_ALLOC_COSTLY_ORDER) ?
+			MIN_COMPACT_COSTLY_PRIORITY : MIN_COMPACT_PRIORITY;
 	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->high_zoneidx,
 								ac->nodemask) {
 		enum compact_result status;
 
-		if (prio > MIN_COMPACT_PRIORITY
-					&& compaction_deferred(zone, order)) {
+		if (prio > min_priority && compaction_deferred(zone, order)) {
 			rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
 			continue;
 		}