Patchwork [4/5] mm: vmscan: only write dirty pages that the scanner has seen twice

login
register
mail settings
Submitter Johannes Weiner
Date Jan. 23, 2017, 6:16 p.m.
Message ID <20170123181641.23938-5-hannes@cmpxchg.org>
Download mbox | patch
Permalink /patch/9533313/
State New
Headers show

Comments

Johannes Weiner - Jan. 23, 2017, 6:16 p.m.
Dirty pages can easily reach the end of the LRU while there are still
clean pages to reclaim around. Don't let kswapd write them back just
because there are a lot of them. It costs more CPU to find the clean
pages, but that's almost certainly better than to disrupt writeback
from the flushers with LRU-order single-page writes from reclaim. And
the flushers have been woken up by that point, so we spend IO capacity
on flushing and CPU capacity on finding the clean cache.

Only start writing dirty pages if they have cycled around the LRU
twice now and STILL haven't been queued on the IO device. It's
possible that the dirty pages are so sparsely distributed across
different bdis, inodes, memory cgroups, that the flushers take forever
to get to the ones we want reclaimed. Once we see them twice on the
LRU, we know that's the quicker way to find them, so do LRU writeback.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/vmscan.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)
Minchan Kim - Jan. 26, 2017, 1:42 a.m.
On Mon, Jan 23, 2017 at 01:16:40PM -0500, Johannes Weiner wrote:
> Dirty pages can easily reach the end of the LRU while there are still
> clean pages to reclaim around. Don't let kswapd write them back just
> because there are a lot of them. It costs more CPU to find the clean
> pages, but that's almost certainly better than to disrupt writeback
> from the flushers with LRU-order single-page writes from reclaim. And
> the flushers have been woken up by that point, so we spend IO capacity
> on flushing and CPU capacity on finding the clean cache.
> 
> Only start writing dirty pages if they have cycled around the LRU
> twice now and STILL haven't been queued on the IO device. It's
> possible that the dirty pages are so sparsely distributed across
> different bdis, inodes, memory cgroups, that the flushers take forever
> to get to the ones we want reclaimed. Once we see them twice on the
> LRU, we know that's the quicker way to find them, so do LRU writeback.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Mel Gorman - Jan. 26, 2017, 10:08 a.m.
On Mon, Jan 23, 2017 at 01:16:40PM -0500, Johannes Weiner wrote:
> Dirty pages can easily reach the end of the LRU while there are still
> clean pages to reclaim around. Don't let kswapd write them back just
> because there are a lot of them. It costs more CPU to find the clean
> pages, but that's almost certainly better than to disrupt writeback
> from the flushers with LRU-order single-page writes from reclaim. And
> the flushers have been woken up by that point, so we spend IO capacity
> on flushing and CPU capacity on finding the clean cache.
> 
> Only start writing dirty pages if they have cycled around the LRU
> twice now and STILL haven't been queued on the IO device. It's
> possible that the dirty pages are so sparsely distributed across
> different bdis, inodes, memory cgroups, that the flushers take forever
> to get to the ones we want reclaimed. Once we see them twice on the
> LRU, we know that's the quicker way to find them, so do LRU writeback.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Mel Gorman <mgorman@suse.de>
Michal Hocko - Jan. 26, 2017, 1:29 p.m.
On Mon 23-01-17 13:16:40, Johannes Weiner wrote:
> Dirty pages can easily reach the end of the LRU while there are still
> clean pages to reclaim around. Don't let kswapd write them back just
> because there are a lot of them. It costs more CPU to find the clean
> pages, but that's almost certainly better than to disrupt writeback
> from the flushers with LRU-order single-page writes from reclaim. And
> the flushers have been woken up by that point, so we spend IO capacity
> on flushing and CPU capacity on finding the clean cache.
> 
> Only start writing dirty pages if they have cycled around the LRU
> twice now and STILL haven't been queued on the IO device. It's
> possible that the dirty pages are so sparsely distributed across
> different bdis, inodes, memory cgroups, that the flushers take forever
> to get to the ones we want reclaimed. Once we see them twice on the
> LRU, we know that's the quicker way to find them, so do LRU writeback.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmscan.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 915fc658de41..df0fe0cc438e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1153,13 +1153,18 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  
>  		if (PageDirty(page)) {
>  			/*
> -			 * Only kswapd can writeback filesystem pages to
> -			 * avoid risk of stack overflow but only writeback
> -			 * if many dirty pages have been encountered.
> +			 * Only kswapd can writeback filesystem pages
> +			 * to avoid risk of stack overflow. But avoid
> +			 * injecting inefficient single-page IO into
> +			 * flusher writeback as much as possible: only
> +			 * write pages when we've encountered many
> +			 * dirty pages, and when we've already scanned
> +			 * the rest of the LRU for clean pages and see
> +			 * the same dirty pages again (PageReclaim).
>  			 */
>  			if (page_is_file_cache(page) &&
> -					(!current_is_kswapd() ||
> -					 !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
> +			    (!current_is_kswapd() || !PageReclaim(page) ||
> +			     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
>  				/*
>  				 * Immediately reclaim when written back.
>  				 * Similar in principal to deactivate_page()
> -- 
> 2.11.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 915fc658de41..df0fe0cc438e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1153,13 +1153,18 @@  static unsigned long shrink_page_list(struct list_head *page_list,
 
 		if (PageDirty(page)) {
 			/*
-			 * Only kswapd can writeback filesystem pages to
-			 * avoid risk of stack overflow but only writeback
-			 * if many dirty pages have been encountered.
+			 * Only kswapd can writeback filesystem pages
+			 * to avoid risk of stack overflow. But avoid
+			 * injecting inefficient single-page IO into
+			 * flusher writeback as much as possible: only
+			 * write pages when we've encountered many
+			 * dirty pages, and when we've already scanned
+			 * the rest of the LRU for clean pages and see
+			 * the same dirty pages again (PageReclaim).
 			 */
 			if (page_is_file_cache(page) &&
-					(!current_is_kswapd() ||
-					 !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
+			    (!current_is_kswapd() || !PageReclaim(page) ||
+			     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
 				/*
 				 * Immediately reclaim when written back.
 				 * Similar in principal to deactivate_page()