[11/25] mm, compaction: Use free lists to quickly locate a migration source
diff mbox series

Message ID 20190104125011.16071-12-mgorman@techsingularity.net
State New
Headers show
Series
  • Increase success rates and reduce latency of compaction v2
Related show

Commit Message

Mel Gorman Jan. 4, 2019, 12:49 p.m. UTC
The migration scanner is a linear scan of a zone with a potentiall large
search space.  Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire block.
The lowest PFN found during searches is tracked as the basis of the start
for the linear search after the first search of the free list fails.
After the search, the free list is shuffled so that the next search will
not encounter the same page. If the search fails then the subsequent
searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used. It
is somewhat pointless to use the list search in those cases. Small free
pages must be used for the search and there is no guarantee that movable
pages are located within that block that are contiguous.

                                        4.20.0                 4.20.0
                                failfast-v2r15          findmig-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3833.72 (   0.00%)     3505.69 (   8.56%)
Amean     fault-both-5      4967.15 (   0.00%)     5794.13 * -16.65%*
Amean     fault-both-7      7139.19 (   0.00%)     7663.09 (  -7.34%)
Amean     fault-both-12    11326.30 (   0.00%)    10983.36 (   3.03%)
Amean     fault-both-18    16270.70 (   0.00%)    13602.71 *  16.40%*
Amean     fault-both-24    19839.65 (   0.00%)    16145.77 *  18.62%*
Amean     fault-both-30    21707.05 (   0.00%)    19753.82 (   9.00%)
Amean     fault-both-32    21968.16 (   0.00%)    20616.16 (   6.15%)

                                   4.20.0                 4.20.0
                           failfast-v2r15          findmig-v2r15
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        84.62 (   0.00%)       90.58 (   7.05%)
Percentage huge-5        88.43 (   0.00%)       91.34 (   3.29%)
Percentage huge-7        88.33 (   0.00%)       92.21 (   4.39%)
Percentage huge-12       88.74 (   0.00%)       92.48 (   4.21%)
Percentage huge-18       86.52 (   0.00%)       91.65 (   5.93%)
Percentage huge-24       86.42 (   0.00%)       90.23 (   4.41%)
Percentage huge-30       86.67 (   0.00%)       90.17 (   4.04%)
Percentage huge-32       86.00 (   0.00%)       89.72 (   4.32%)

This shows an improvement in allocation latencies and a slight increase
in allocation success rates. While not presented, there was a 13% reduction
in migration scanning and a 10% reduction on system CPU usage. A 2-socket
machine showed similar benefits.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/compaction.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 mm/internal.h   |   2 +
 2 files changed, 179 insertions(+), 2 deletions(-)

Comments

Vlastimil Babka Jan. 16, 2019, 1:15 p.m. UTC | #1
On 1/4/19 1:49 PM, Mel Gorman wrote:
> The migration scanner is a linear scan of a zone with a potentiall large
> search space.  Furthermore, many pageblocks are unusable such as those
> filled with reserved pages or partially filled with pages that cannot
> migrate. These still get scanned in the common case of allocating a THP
> and the cost accumulates.
> 
> The patch uses a partial search of the free lists to locate a migration
> source candidate that is marked as MOVABLE when allocating a THP. It
> prefers picking a block with a larger number of free pages already on
> the basis that there are fewer pages to migrate to free the entire block.
> The lowest PFN found during searches is tracked as the basis of the start
> for the linear search after the first search of the free list fails.
> After the search, the free list is shuffled so that the next search will
> not encounter the same page. If the search fails then the subsequent
> searches will be shorter and the linear scanner is used.
> 
> If this search fails, or if the request is for a small or
> unmovable/reclaimable allocation then the linear scanner is still used. It
> is somewhat pointless to use the list search in those cases. Small free
> pages must be used for the search and there is no guarantee that movable
> pages are located within that block that are contiguous.
> 
>                                         4.20.0                 4.20.0
>                                 failfast-v2r15          findmig-v2r15
> Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
> Amean     fault-both-3      3833.72 (   0.00%)     3505.69 (   8.56%)
> Amean     fault-both-5      4967.15 (   0.00%)     5794.13 * -16.65%*
> Amean     fault-both-7      7139.19 (   0.00%)     7663.09 (  -7.34%)
> Amean     fault-both-12    11326.30 (   0.00%)    10983.36 (   3.03%)
> Amean     fault-both-18    16270.70 (   0.00%)    13602.71 *  16.40%*
> Amean     fault-both-24    19839.65 (   0.00%)    16145.77 *  18.62%*
> Amean     fault-both-30    21707.05 (   0.00%)    19753.82 (   9.00%)
> Amean     fault-both-32    21968.16 (   0.00%)    20616.16 (   6.15%)
> 
>                                    4.20.0                 4.20.0
>                            failfast-v2r15          findmig-v2r15
> Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
> Percentage huge-3        84.62 (   0.00%)       90.58 (   7.05%)
> Percentage huge-5        88.43 (   0.00%)       91.34 (   3.29%)
> Percentage huge-7        88.33 (   0.00%)       92.21 (   4.39%)
> Percentage huge-12       88.74 (   0.00%)       92.48 (   4.21%)
> Percentage huge-18       86.52 (   0.00%)       91.65 (   5.93%)
> Percentage huge-24       86.42 (   0.00%)       90.23 (   4.41%)
> Percentage huge-30       86.67 (   0.00%)       90.17 (   4.04%)
> Percentage huge-32       86.00 (   0.00%)       89.72 (   4.32%)
> 
> This shows an improvement in allocation latencies and a slight increase
> in allocation success rates. While not presented, there was a 13% reduction
> in migration scanning and a 10% reduction on system CPU usage. A 2-socket
> machine showed similar benefits.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  mm/compaction.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  mm/internal.h   |   2 +
>  2 files changed, 179 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8f0ce44dba41..137e32e8a2f5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1050,6 +1050,12 @@ static bool suitable_migration_target(struct compact_control *cc,
>  	return false;
>  }
>  
> +static inline unsigned int
> +freelist_scan_limit(struct compact_control *cc)
> +{
> +	return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
> +}
> +
>  /*
>   * Test whether the free scanner has reached the same or lower pageblock than
>   * the migration scanner, and compaction should thus terminate.
> @@ -1060,6 +1066,19 @@ static inline bool compact_scanners_met(struct compact_control *cc)
>  		<= (cc->migrate_pfn >> pageblock_order);
>  }
>  
> +/* Reorder the free list to reduce repeated future searches */
> +static void
> +move_freelist_tail(struct list_head *freelist, struct page *freepage)
> +{
> +	LIST_HEAD(sublist);
> +
> +	if (!list_is_last(freelist, &freepage->lru)) {
> +		list_cut_position(&sublist, freelist, &freepage->lru);
> +		if (!list_empty(&sublist))
> +			list_splice_tail(&sublist, freelist);
> +	}
> +}
> +
>  /*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
> @@ -1217,6 +1236,160 @@ typedef enum {
>   */
>  int sysctl_compact_unevictable_allowed __read_mostly = 1;
>  
> +static inline void
> +update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
> +{
> +	if (cc->fast_start_pfn == ULONG_MAX)
> +		return;
> +
> +	if (!cc->fast_start_pfn)
> +		cc->fast_start_pfn = pfn;
> +
> +	cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
> +}
> +
> +static inline void
> +reinit_migrate_pfn(struct compact_control *cc)
> +{
> +	if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
> +		return;
> +
> +	cc->migrate_pfn = cc->fast_start_pfn;
> +	cc->fast_start_pfn = ULONG_MAX;
> +}
> +
> +/*
> + * Briefly search the free lists for a migration source that already has
> + * some free pages to reduce the number of pages that need migration
> + * before a pageblock is free.
> + */
> +static unsigned long fast_find_migrateblock(struct compact_control *cc)
> +{
> +	unsigned int limit = freelist_scan_limit(cc);
> +	unsigned int nr_scanned = 0;
> +	unsigned long distance;
> +	unsigned long pfn = cc->migrate_pfn;
> +	unsigned long high_pfn;
> +	int order;
> +
> +	/* Skip hints are relied on to avoid repeats on the fast search */
> +	if (cc->ignore_skip_hint)
> +		return pfn;
> +
> +	/*
> +	 * If the migrate_pfn is not at the start of a zone or the start
> +	 * of a pageblock then assume this is a continuation of a previous
> +	 * scan restarted due to COMPACT_CLUSTER_MAX.
> +	 */
> +	if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn))
> +		return pfn;
> +
> +	/*
> +	 * For smaller orders, just linearly scan as the number of pages
> +	 * to migrate should be relatively small and does not necessarily
> +	 * justify freeing up a large block for a small allocation.
> +	 */
> +	if (cc->order <= PAGE_ALLOC_COSTLY_ORDER)
> +		return pfn;
> +
> +	/*
> +	 * Only allow kcompactd and direct requests for movable pages to
> +	 * quickly clear out a MOVABLE pageblock for allocation. This
> +	 * reduces the risk that a large movable pageblock is freed for
> +	 * an unmovable/reclaimable small allocation.
> +	 */
> +	if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
> +		return pfn;
> +
> +	/*
> +	 * When starting the migration scanner, pick any pageblock within the
> +	 * first half of the search space. Otherwise try and pick a pageblock
> +	 * within the first eighth to reduce the chances that a migration
> +	 * target later becomes a source.
> +	 */
> +	distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
> +	if (cc->migrate_pfn != cc->zone->zone_start_pfn)
> +		distance >>= 2;
> +	high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
> +
> +	for (order = cc->order - 1;
> +	     order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
> +	     order--) {
> +		struct free_area *area = &cc->zone->free_area[order];
> +		struct list_head *freelist;
> +		unsigned long nr_skipped = 0;
> +		unsigned long flags;
> +		struct page *freepage;
> +
> +		if (!area->nr_free)
> +			continue;
> +
> +		spin_lock_irqsave(&cc->zone->lock, flags);
> +		freelist = &area->free_list[MIGRATE_MOVABLE];
> +		list_for_each_entry(freepage, freelist, lru) {
> +			unsigned long free_pfn;
> +
> +			nr_scanned++;
> +			free_pfn = page_to_pfn(freepage);
> +			if (free_pfn < high_pfn) {
> +				update_fast_start_pfn(cc, free_pfn);
> +
> +				/*
> +				 * Avoid if skipped recently. Move to the tail
> +				 * of the list so it will not be found again
> +				 * soon
> +				 */
> +				if (get_pageblock_skip(freepage)) {
> +
> +					if (list_is_last(freelist, &freepage->lru))
> +						break;
> +
> +					nr_skipped++;
> +					list_del(&freepage->lru);
> +					list_add_tail(&freepage->lru, freelist);

Use list_move_tail() instead of del+add ? Also is this even safe inside
list_for_each_entry() and not list_for_each_entry_safe()? I guess
without the extra safe iterator, we moved freepage, which is our
iterator, to the tail, so the for cycle will immediately end?
Also is this moving of one page needed when you also have
move_freelist_tail() to move everything we scanned at once?


> +					if (nr_skipped > 2)
> +						break;

Counting number of skips per order seems weird. What's the intention, is
it not to encounter again a page that we already moved to tail? That
could be solved differently, e.g. using only move_freelist_tail()?

> +					continue;
> +				}
> +
> +				/* Reorder to so a future search skips recent pages */
> +				move_freelist_tail(freelist, freepage);
> +
> +				pfn = pageblock_start_pfn(free_pfn);
> +				cc->fast_search_fail = 0;
> +				set_pageblock_skip(freepage);

Hmm with pageblock skip bit set, we return to isolate_migratepages(),
and there's isolation_suitable() check which tests the skip bit, so
AFAICS in the end we skip the pageblock we found here?

> +				break;
> +			}
> +
> +			/*
> +			 * If low PFNs are being found and discarded then
> +			 * limit the scan as fast searching is finding
> +			 * poor candidates.
> +			 */

I wonder about the "low PFNs are being found and discarded" part. Maybe
I'm missing it, but I don't see them being discarded above, this seems
to be the first check against cc->migrate_pfn. With the min() part in
update_fast_start_pfn(), does it mean we can actually go back and rescan
(or skip thanks to skip bits, anyway) again pageblocks that we already
scanned?

> +			if (free_pfn < cc->migrate_pfn)
> +				limit >>= 1;
> +
> +			if (nr_scanned >= limit) {
> +				cc->fast_search_fail++;
> +				move_freelist_tail(freelist, freepage);
> +				break;
> +			}
> +		}
> +		spin_unlock_irqrestore(&cc->zone->lock, flags);
> +	}
> +
> +	cc->total_migrate_scanned += nr_scanned;
> +
> +	/*
> +	 * If fast scanning failed then use a cached entry for a page block
> +	 * that had free pages as the basis for starting a linear scan.
> +	 */
> +	if (pfn == cc->migrate_pfn)
> +		reinit_migrate_pfn(cc);
> +
> +	return pfn;
> +}
> +
>  /*
>   * Isolate all pages that can be migrated from the first suitable block,
>   * starting at the block pointed to by the migrate scanner pfn within
> @@ -1235,9 +1408,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  
>  	/*
>  	 * Start at where we last stopped, or beginning of the zone as
> -	 * initialized by compact_zone()
> +	 * initialized by compact_zone(). The first failure will use
> +	 * the lowest PFN as the starting point for linear scanning.
>  	 */
> -	low_pfn = cc->migrate_pfn;
> +	low_pfn = fast_find_migrateblock(cc);
>  	block_start_pfn = pageblock_start_pfn(low_pfn);
>  	if (block_start_pfn < zone->zone_start_pfn)
>  		block_start_pfn = zone->zone_start_pfn;
> @@ -1560,6 +1734,7 @@ static enum compact_result compact_zone(struct compact_control *cc)
>  	 * want to compact the whole zone), but check that it is initialised
>  	 * by ensuring the values are within zone boundaries.
>  	 */
> +	cc->fast_start_pfn = 0;
>  	if (cc->whole_zone) {
>  		cc->migrate_pfn = start_pfn;
>  		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
> diff --git a/mm/internal.h b/mm/internal.h
> index edb4029f64c8..b25b33c5dd80 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -187,9 +187,11 @@ struct compact_control {
>  	unsigned int nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> +	unsigned long fast_start_pfn;	/* a pfn to start linear scan from */
>  	struct zone *zone;
>  	unsigned long total_migrate_scanned;
>  	unsigned long total_free_scanned;
> +	unsigned int fast_search_fail;	/* failures to use free list searches */
>  	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* migratetype of direct compactor */
>
Mel Gorman Jan. 16, 2019, 2:33 p.m. UTC | #2
On Wed, Jan 16, 2019 at 02:15:10PM +0100, Vlastimil Babka wrote:
> > <SNIP>
> > +			if (free_pfn < high_pfn) {
> > +				update_fast_start_pfn(cc, free_pfn);
> > +
> > +				/*
> > +				 * Avoid if skipped recently. Move to the tail
> > +				 * of the list so it will not be found again
> > +				 * soon
> > +				 */
> > +				if (get_pageblock_skip(freepage)) {
> > +
> > +					if (list_is_last(freelist, &freepage->lru))
> > +						break;
> > +
> > +					nr_skipped++;
> > +					list_del(&freepage->lru);
> > +					list_add_tail(&freepage->lru, freelist);
> 
> Use list$_move_tail() instead of del+add ?

Yep, that will work fine.

> Also is this even safe inside
> list_for_each_entry() and not list_for_each_entry_safe()? I guess
> without the extra safe iterator, we moved freepage, which is our
> iterator, to the tail, so the for cycle will immediately end?

This is an oversight. In an earlier iteration, it always terminated
after a list adjustment. This was abandoned and I should have used the
safe variant after that to avoid an early termination. Will fix.

> Also is this moving of one page needed when you also have
> move_freelist_tail() to move everything we scanned at once?
> 
> 
> > +					if (nr_skipped > 2)
> > +						break;
> 
> Counting number of skips per order seems weird. What's the intention, is
> it not to encounter again a page that we already moved to tail? That
> could be solved differently, e.g. using only move_freelist_tail()?
> 

The intention was to avoid searching a full list of pages that are
marked for skipping and to instead bail early. However, the benefit may
be marginal when the full series is taken into account so rather than
defend it, I'll remove it and readd in isolation iff I have better
supporting data for it.

I'm not going to rely on move_freelist_tail() only because that is used
after a candidate is found and the list terminates where as I want to
continue searching the freelist if the block has been marked for skip.

> > +					continue;
> > +				}
> > +
> > +				/* Reorder to so a future search skips recent pages */
> > +				move_freelist_tail(freelist, freepage);
> > +
> > +				pfn = pageblock_start_pfn(free_pfn);
> > +				cc->fast_search_fail = 0;
> > +				set_pageblock_skip(freepage);
> 
> Hmm with pageblock skip bit set, we return to isolate_migratepages(),
> and there's isolation_suitable() check which tests the skip bit, so
> AFAICS in the end we skip the pageblock we found here?
> 

Hmm, yes. This partially "worked" because the linear scan would continue
after but it starts in the wrong place so the search cost was still
reduced.

> > +				break;
> > +			}
> > +
> > +			/*
> > +			 * If low PFNs are being found and discarded then
> > +			 * limit the scan as fast searching is finding
> > +			 * poor candidates.
> > +			 */
> 
> I wonder about the "low PFNs are being found and discarded" part. Maybe
> I'm missing it, but I don't see them being discarded above, this seems
> to be the first check against cc->migrate_pfn. With the min() part in
> update_fast_start_pfn(), does it mean we can actually go back and rescan
> (or skip thanks to skip bits, anyway) again pageblocks that we already
> scanned?
> 

Extremely poor phrasing. My mind was thinking in terms of discarding
unsuitable candidates as they were below the migration scanner and it
did not translate properly.

Based on your feedback, how does the following untested diff look?

diff --git a/mm/compaction.c b/mm/compaction.c
index 6f649a3bd256..ccc38af323ab 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1572,16 +1572,15 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 	     order--) {
 		struct free_area *area = &cc->zone->free_area[order];
 		struct list_head *freelist;
-		unsigned long nr_skipped = 0;
 		unsigned long flags;
-		struct page *freepage;
+		struct page *freepage, *tmp;
 
 		if (!area->nr_free)
 			continue;
 
 		spin_lock_irqsave(&cc->zone->lock, flags);
 		freelist = &area->free_list[MIGRATE_MOVABLE];
-		list_for_each_entry(freepage, freelist, lru) {
+		list_for_each_entry_safe(freepage, tmp, freelist, lru) {
 			unsigned long free_pfn;
 
 			nr_scanned++;
@@ -1593,15 +1592,10 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 				 * soon
 				 */
 				if (get_pageblock_skip(freepage)) {
-
 					if (list_is_last(freelist, &freepage->lru))
 						break;
 
-					nr_skipped++;
-					list_del(&freepage->lru);
-					list_add_tail(&freepage->lru, freelist);
-					if (nr_skipped > 2)
-						break;
+					list_move_tail(&freepage->lru, freelist);
 					continue;
 				}
 
@@ -1616,9 +1610,11 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc)
 			}
 
 			/*
-			 * If low PFNs are being found and discarded then
-			 * limit the scan as fast searching is finding
-			 * poor candidates.
+			 * PFNs below the migrate scanner are not suitable as
+			 * it may result in pageblocks being rescanned that
+			 * not necessarily marked for skipping. Limit the
+			 * search if unsuitable candidates are being found
+			 * on the freelist.
 			 */
 			if (free_pfn < cc->migrate_pfn)
 				limit >>= 1;
@@ -1659,6 +1655,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	const isolate_mode_t isolate_mode =
 		(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
 		(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+	bool fast_find_block;
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
@@ -1670,6 +1667,13 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	if (block_start_pfn < zone->zone_start_pfn)
 		block_start_pfn = zone->zone_start_pfn;
 
+	/*
+	 * fast_find_migrateblock marks a pageblock skipped so to avoid
+	 * the isolation_suitable check below, check whether the fast
+	 * search was successful.
+	 */
+	fast_find_block = low_pfn != cc->migrate_pfn && !cc->fast_search_fail;
+
 	/* Only scan within a pageblock boundary */
 	block_end_pfn = pageblock_end_pfn(low_pfn);
 
@@ -1678,6 +1682,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	 * Do not cross the free scanner.
 	 */
 	for (; block_end_pfn <= cc->free_pfn;
+			fast_find_block = false,
 			low_pfn = block_end_pfn,
 			block_start_pfn = block_end_pfn,
 			block_end_pfn += pageblock_nr_pages) {
@@ -1703,7 +1708,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * not scan the same block.
 		 */
 		if (IS_ALIGNED(low_pfn, pageblock_nr_pages) &&
-		    !isolation_suitable(cc, page))
+		    !fast_find_block && !isolation_suitable(cc, page))
 			continue;
 
 		/*
Vlastimil Babka Jan. 16, 2019, 3 p.m. UTC | #3
On 1/16/19 3:33 PM, Mel Gorman wrote:
>>> +				break;
>>> +			}
>>> +
>>> +			/*
>>> +			 * If low PFNs are being found and discarded then
>>> +			 * limit the scan as fast searching is finding
>>> +			 * poor candidates.
>>> +			 */
>>
>> I wonder about the "low PFNs are being found and discarded" part. Maybe
>> I'm missing it, but I don't see them being discarded above, this seems
>> to be the first check against cc->migrate_pfn. With the min() part in
>> update_fast_start_pfn(), does it mean we can actually go back and rescan
>> (or skip thanks to skip bits, anyway) again pageblocks that we already
>> scanned?
>>
> 
> Extremely poor phrasing. My mind was thinking in terms of discarding
> unsuitable candidates as they were below the migration scanner and it
> did not translate properly.
> 
> Based on your feedback, how does the following untested diff look?

IMHO better. Meanwhile I noticed that the next patch removes the
set_pageblock_skip() so maybe it's needless churn to introduce the
fast_find_block, but I'll check more closely.

The new comment about pfns below cc->migrate_pfn is better but I still
wonder if it would be better to really skip over those candidates (they
are still called unsuitable) and not go backwards with cc->migrate_pfn.
But if you think the pageblock skip bits and halving of limit minimizes
pointless rescan sufficiently, then fine.
Mel Gorman Jan. 16, 2019, 3:43 p.m. UTC | #4
On Wed, Jan 16, 2019 at 04:00:22PM +0100, Vlastimil Babka wrote:
> On 1/16/19 3:33 PM, Mel Gorman wrote:
> >>> +				break;
> >>> +			}
> >>> +
> >>> +			/*
> >>> +			 * If low PFNs are being found and discarded then
> >>> +			 * limit the scan as fast searching is finding
> >>> +			 * poor candidates.
> >>> +			 */
> >>
> >> I wonder about the "low PFNs are being found and discarded" part. Maybe
> >> I'm missing it, but I don't see them being discarded above, this seems
> >> to be the first check against cc->migrate_pfn. With the min() part in
> >> update_fast_start_pfn(), does it mean we can actually go back and rescan
> >> (or skip thanks to skip bits, anyway) again pageblocks that we already
> >> scanned?
> >>
> > 
> > Extremely poor phrasing. My mind was thinking in terms of discarding
> > unsuitable candidates as they were below the migration scanner and it
> > did not translate properly.
> > 
> > Based on your feedback, how does the following untested diff look?
> 
> IMHO better. Meanwhile I noticed that the next patch removes the
> set_pageblock_skip() so maybe it's needless churn to introduce the
> fast_find_block, but I'll check more closely.
> 

Indeed but the patches should standalone and preserve bisection as best
as possible so while it's weird looking, I'll add the logic and just take
it back out again in the next patch. Merging the patches together would
be lead to a tricky review!

> The new comment about pfns below cc->migrate_pfn is better but I still
> wonder if it would be better to really skip over those candidates (they
> are still called unsuitable) and not go backwards with cc->migrate_pfn.
> But if you think the pageblock skip bits and halving of limit minimizes
> pointless rescan sufficiently, then fine.

I'll check if it works out better to ensure they are really skipped.

Patch
diff mbox series

diff --git a/mm/compaction.c b/mm/compaction.c
index 8f0ce44dba41..137e32e8a2f5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1050,6 +1050,12 @@  static bool suitable_migration_target(struct compact_control *cc,
 	return false;
 }
 
+static inline unsigned int
+freelist_scan_limit(struct compact_control *cc)
+{
+	return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
+}
+
 /*
  * Test whether the free scanner has reached the same or lower pageblock than
  * the migration scanner, and compaction should thus terminate.
@@ -1060,6 +1066,19 @@  static inline bool compact_scanners_met(struct compact_control *cc)
 		<= (cc->migrate_pfn >> pageblock_order);
 }
 
+/* Reorder the free list to reduce repeated future searches */
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+	LIST_HEAD(sublist);
+
+	if (!list_is_last(freelist, &freepage->lru)) {
+		list_cut_position(&sublist, freelist, &freepage->lru);
+		if (!list_empty(&sublist))
+			list_splice_tail(&sublist, freelist);
+	}
+}
+
 /*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
@@ -1217,6 +1236,160 @@  typedef enum {
  */
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
 
+static inline void
+update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
+{
+	if (cc->fast_start_pfn == ULONG_MAX)
+		return;
+
+	if (!cc->fast_start_pfn)
+		cc->fast_start_pfn = pfn;
+
+	cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
+}
+
+static inline void
+reinit_migrate_pfn(struct compact_control *cc)
+{
+	if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
+		return;
+
+	cc->migrate_pfn = cc->fast_start_pfn;
+	cc->fast_start_pfn = ULONG_MAX;
+}
+
+/*
+ * Briefly search the free lists for a migration source that already has
+ * some free pages to reduce the number of pages that need migration
+ * before a pageblock is free.
+ */
+static unsigned long fast_find_migrateblock(struct compact_control *cc)
+{
+	unsigned int limit = freelist_scan_limit(cc);
+	unsigned int nr_scanned = 0;
+	unsigned long distance;
+	unsigned long pfn = cc->migrate_pfn;
+	unsigned long high_pfn;
+	int order;
+
+	/* Skip hints are relied on to avoid repeats on the fast search */
+	if (cc->ignore_skip_hint)
+		return pfn;
+
+	/*
+	 * If the migrate_pfn is not at the start of a zone or the start
+	 * of a pageblock then assume this is a continuation of a previous
+	 * scan restarted due to COMPACT_CLUSTER_MAX.
+	 */
+	if (pfn != cc->zone->zone_start_pfn && pfn != pageblock_start_pfn(pfn))
+		return pfn;
+
+	/*
+	 * For smaller orders, just linearly scan as the number of pages
+	 * to migrate should be relatively small and does not necessarily
+	 * justify freeing up a large block for a small allocation.
+	 */
+	if (cc->order <= PAGE_ALLOC_COSTLY_ORDER)
+		return pfn;
+
+	/*
+	 * Only allow kcompactd and direct requests for movable pages to
+	 * quickly clear out a MOVABLE pageblock for allocation. This
+	 * reduces the risk that a large movable pageblock is freed for
+	 * an unmovable/reclaimable small allocation.
+	 */
+	if (cc->direct_compaction && cc->migratetype != MIGRATE_MOVABLE)
+		return pfn;
+
+	/*
+	 * When starting the migration scanner, pick any pageblock within the
+	 * first half of the search space. Otherwise try and pick a pageblock
+	 * within the first eighth to reduce the chances that a migration
+	 * target later becomes a source.
+	 */
+	distance = (cc->free_pfn - cc->migrate_pfn) >> 1;
+	if (cc->migrate_pfn != cc->zone->zone_start_pfn)
+		distance >>= 2;
+	high_pfn = pageblock_start_pfn(cc->migrate_pfn + distance);
+
+	for (order = cc->order - 1;
+	     order >= PAGE_ALLOC_COSTLY_ORDER && pfn == cc->migrate_pfn && nr_scanned < limit;
+	     order--) {
+		struct free_area *area = &cc->zone->free_area[order];
+		struct list_head *freelist;
+		unsigned long nr_skipped = 0;
+		unsigned long flags;
+		struct page *freepage;
+
+		if (!area->nr_free)
+			continue;
+
+		spin_lock_irqsave(&cc->zone->lock, flags);
+		freelist = &area->free_list[MIGRATE_MOVABLE];
+		list_for_each_entry(freepage, freelist, lru) {
+			unsigned long free_pfn;
+
+			nr_scanned++;
+			free_pfn = page_to_pfn(freepage);
+			if (free_pfn < high_pfn) {
+				update_fast_start_pfn(cc, free_pfn);
+
+				/*
+				 * Avoid if skipped recently. Move to the tail
+				 * of the list so it will not be found again
+				 * soon
+				 */
+				if (get_pageblock_skip(freepage)) {
+
+					if (list_is_last(freelist, &freepage->lru))
+						break;
+
+					nr_skipped++;
+					list_del(&freepage->lru);
+					list_add_tail(&freepage->lru, freelist);
+					if (nr_skipped > 2)
+						break;
+					continue;
+				}
+
+				/* Reorder to so a future search skips recent pages */
+				move_freelist_tail(freelist, freepage);
+
+				pfn = pageblock_start_pfn(free_pfn);
+				cc->fast_search_fail = 0;
+				set_pageblock_skip(freepage);
+				break;
+			}
+
+			/*
+			 * If low PFNs are being found and discarded then
+			 * limit the scan as fast searching is finding
+			 * poor candidates.
+			 */
+			if (free_pfn < cc->migrate_pfn)
+				limit >>= 1;
+
+			if (nr_scanned >= limit) {
+				cc->fast_search_fail++;
+				move_freelist_tail(freelist, freepage);
+				break;
+			}
+		}
+		spin_unlock_irqrestore(&cc->zone->lock, flags);
+	}
+
+	cc->total_migrate_scanned += nr_scanned;
+
+	/*
+	 * If fast scanning failed then use a cached entry for a page block
+	 * that had free pages as the basis for starting a linear scan.
+	 */
+	if (pfn == cc->migrate_pfn)
+		reinit_migrate_pfn(cc);
+
+	return pfn;
+}
+
 /*
  * Isolate all pages that can be migrated from the first suitable block,
  * starting at the block pointed to by the migrate scanner pfn within
@@ -1235,9 +1408,10 @@  static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
-	 * initialized by compact_zone()
+	 * initialized by compact_zone(). The first failure will use
+	 * the lowest PFN as the starting point for linear scanning.
 	 */
-	low_pfn = cc->migrate_pfn;
+	low_pfn = fast_find_migrateblock(cc);
 	block_start_pfn = pageblock_start_pfn(low_pfn);
 	if (block_start_pfn < zone->zone_start_pfn)
 		block_start_pfn = zone->zone_start_pfn;
@@ -1560,6 +1734,7 @@  static enum compact_result compact_zone(struct compact_control *cc)
 	 * want to compact the whole zone), but check that it is initialised
 	 * by ensuring the values are within zone boundaries.
 	 */
+	cc->fast_start_pfn = 0;
 	if (cc->whole_zone) {
 		cc->migrate_pfn = start_pfn;
 		cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
diff --git a/mm/internal.h b/mm/internal.h
index edb4029f64c8..b25b33c5dd80 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -187,9 +187,11 @@  struct compact_control {
 	unsigned int nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
+	unsigned long fast_start_pfn;	/* a pfn to start linear scan from */
 	struct zone *zone;
 	unsigned long total_migrate_scanned;
 	unsigned long total_free_scanned;
+	unsigned int fast_search_fail;	/* failures to use free list searches */
 	const gfp_t gfp_mask;		/* gfp mask of a direct compactor */
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* migratetype of direct compactor */