mm: Optimize TLB flushes during page reclaim

Message ID	CALf+9Yc3WjbG89uRWiDC_qYshJ5z_9WCrbEJe42Vbv+gJjs26g@mail.gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> MIME-Version: 1.0 From: Vinay Banakar <vny@google.com> Date: Mon, 20 Jan 2025 16:47:29 -0600 Message-ID: <CALf+9Yc3WjbG89uRWiDC_qYshJ5z_9WCrbEJe42Vbv+gJjs26g@mail.gmail.com> Subject: [PATCH] mm: Optimize TLB flushes during page reclaim To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, mgorman@suse.de, Wei Xu <weixugc@google.com>, Greg Thelen <gthelen@google.com> Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: Optimize TLB flushes during page reclaim \| expand mm: Optimize TLB flushes during page reclaim

Message ID

CALf+9Yc3WjbG89uRWiDC_qYshJ5z_9WCrbEJe42Vbv+gJjs26g@mail.gmail.com (mailing list archive)

State

New

Headers

MIME-Version: 1.0
From: Vinay Banakar <vny@google.com>
Date: Mon, 20 Jan 2025 16:47:29 -0600
Message-ID: 
 <CALf+9Yc3WjbG89uRWiDC_qYshJ5z_9WCrbEJe42Vbv+gJjs26g@mail.gmail.com>
Subject: [PATCH] mm: Optimize TLB flushes during page reclaim
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: akpm@linux-foundation.org, willy@infradead.org, mgorman@suse.de,
	Wei Xu <weixugc@google.com>, Greg Thelen <gthelen@google.com>
Content-Type: text/plain; charset="UTF-8"
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

mm: Optimize TLB flushes during page reclaim | expand

Commit Message

Vinay Banakar Jan. 20, 2025, 10:47 p.m. UTC

The current implementation in shrink_folio_list() performs full TLB
flushes and issues IPIs for each individual page being reclaimed. This
causes unnecessary overhead during memory reclaim, whether triggered
by madvise(MADV_PAGEOUT) or kswapd, especially in scenarios where
applications are actively moving cold pages to swap while maintaining
high performance requirements for hot pages.

The current code:
1. Clears PTE and unmaps each page individually
2. Performs a full TLB flush on all cores using the VMA (via CR3 write) or
issues individual TLB shootdowns (invlpg+invlpcid) for single-core usage
3. Submits each page individually to BIO

This approach results in:
- Excessive full TLB flushes across all cores
- Unnecessary IPI storms when processing multiple pages
- Suboptimal I/O submission patterns

I initially tried using selective TLB shootdowns (invlpg) instead of
full TLB flushes per each page to avoid interference with other
threads. However, this approach still required sending IPIs to all
cores for each page, which did not significantly improve application
throughput.

This patch instead optimizes the process by batching operations,
issuing one IPI per PMD instead of per page. This reduces interrupts
by a factor of 512 and enables batching page submissions to BIO. The
new approach:
1. Collect dirty pages that need to be written back
2. Issue a single TLB flush for all dirty pages in the batch
3. Process the collected pages for writebacks (submit to BIO)

Testing shows significant reduction in application throughput impact
during page-out operations. Applications maintain better performance
during memory reclaim, when triggered by explicit
madvise(MADV_PAGEOUT) calls.

I'd appreciate your feedback on this approach, especially on the
correctness of batched BIO submissions. Looking forward to your
comments.

Signed-off-by: Vinay Banakar <vny@google.com>
---
 mm/vmscan.c | 107
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 74 insertions(+), 33 deletions(-)

  bool do_demote_pass;
@@ -1351,39 +1352,9 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
  if (!sc->may_writepage)
  goto keep_locked;

- /*
- * Folio is dirty. Flush the TLB if a writable entry
- * potentially exists to avoid CPU writes after I/O
- * starts and then write it out here.
- */
- try_to_unmap_flush_dirty();
- switch (pageout(folio, mapping, &plug)) {
- case PAGE_KEEP:
- goto keep_locked;
- case PAGE_ACTIVATE:
- goto activate_locked;
- case PAGE_SUCCESS:
- stat->nr_pageout += nr_pages;
-
- if (folio_test_writeback(folio))
- goto keep;
- if (folio_test_dirty(folio))
- goto keep;
-
- /*
- * A synchronous write - probably a ramdisk.  Go
- * ahead and try to reclaim the folio.
- */
- if (!folio_trylock(folio))
- goto keep;
- if (folio_test_dirty(folio) ||
-    folio_test_writeback(folio))
- goto keep_locked;
- mapping = folio_mapping(folio);
- fallthrough;
- case PAGE_CLEAN:
- ; /* try to free the folio below */
- }
+ /* Add to pageout list for defered bio submissions */
+ list_add(&folio->lru, &pageout_list);
+ continue;
  }

  /*
@@ -1494,6 +1465,76 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
  }
  /* 'folio_list' is always empty here */

+ if (!list_empty(&pageout_list)) {
+ /*
+ * Batch TLB flushes by flushing once before processing all dirty pages.
+ * Since we operate on one PMD at a time, this batches TLB flushes at
+ * PMD granularity rather than per-page, reducing IPIs.
+ */
+ struct address_space *mapping;
+ try_to_unmap_flush_dirty();
+
+ while (!list_empty(&pageout_list)) {
+ struct folio *folio = lru_to_folio(&pageout_list);
+ list_del(&folio->lru);
+
+ /* Recheck if page got reactivated */
+ if (folio_test_active(folio) ||
+    (folio_mapped(folio) && folio_test_young(folio)))
+ goto skip_pageout_locked;
+
+ mapping = folio_mapping(folio);
+ pageout_t pageout_res = pageout(folio, mapping, &plug);
+ switch (pageout_res) {
+ case PAGE_KEEP:
+ goto skip_pageout_locked;
+ case PAGE_ACTIVATE:
+ goto skip_pageout_locked;
+ case PAGE_SUCCESS:
+ stat->nr_pageout += folio_nr_pages(folio);
+
+ if (folio_test_writeback(folio) ||
+    folio_test_dirty(folio))
+ goto skip_pageout;
+
+ /*
+ * A synchronous write - probably a ramdisk.  Go
+ * ahead and try to reclaim the folio.
+ */
+ if (!folio_trylock(folio))
+ goto skip_pageout;
+ if (folio_test_dirty(folio) ||
+    folio_test_writeback(folio))
+ goto skip_pageout_locked;
+
+ // Try to free the page
+ if (!mapping ||
+    !__remove_mapping(mapping, folio, true,
+      sc->target_mem_cgroup))
+ goto skip_pageout_locked;
+
+ nr_reclaimed += folio_nr_pages(folio);
+ folio_unlock(folio);
+ continue;
+
+ case PAGE_CLEAN:
+ if (!mapping ||
+    !__remove_mapping(mapping, folio, true,
+      sc->target_mem_cgroup))
+ goto skip_pageout_locked;
+
+ nr_reclaimed += folio_nr_pages(folio);
+ folio_unlock(folio);
+ continue;
+ }
+
+skip_pageout_locked:
+ folio_unlock(folio);
+skip_pageout:
+ list_add(&folio->lru, &ret_folios);
+ }
+ }
+
  /* Migrate folios selected for demotion */
  nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
  /* Folios that could not be demoted are still in @demote_folios */

Comments

Vinay Banakar Jan. 21, 2025, 12:05 a.m. UTC | #1

Sorry, the previous patch was unreadable due to damaged whitespace.
Here is the same patch with fixed indentation.

Signed-off-by: Vinay Banakar <vny@google.com>
---
 mm/vmscan.c | 107
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 74 insertions(+), 33 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd489c1af..1bd510622 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1035,6 +1035,7 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
        struct folio_batch free_folios;
        LIST_HEAD(ret_folios);
        LIST_HEAD(demote_folios);
+       LIST_HEAD(pageout_list);
        unsigned int nr_reclaimed = 0;
        unsigned int pgactivate = 0;
        bool do_demote_pass;
@@ -1351,39 +1352,9 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
                        if (!sc->may_writepage)
                                goto keep_locked;

-                       /*
-                        * Folio is dirty. Flush the TLB if a writable entry
-                        * potentially exists to avoid CPU writes after I/O
-                        * starts and then write it out here.
-                        */
-                       try_to_unmap_flush_dirty();
-                       switch (pageout(folio, mapping, &plug)) {
-                       case PAGE_KEEP:
-                               goto keep_locked;
-                       case PAGE_ACTIVATE:
-                               goto activate_locked;
-                       case PAGE_SUCCESS:
-                               stat->nr_pageout += nr_pages;
-
-                               if (folio_test_writeback(folio))
-                                       goto keep;
-                               if (folio_test_dirty(folio))
-                                       goto keep;
-
-                               /*
-                                * A synchronous write - probably a ramdisk.  Go
-                                * ahead and try to reclaim the folio.
-                                */
-                               if (!folio_trylock(folio))
-                                       goto keep;
-                               if (folio_test_dirty(folio) ||
-                                   folio_test_writeback(folio))
-                                       goto keep_locked;
-                               mapping = folio_mapping(folio);
-                               fallthrough;
-                       case PAGE_CLEAN:
-                               ; /* try to free the folio below */
-                       }
+                       /* Add to pageout list for defered bio submissions */
+                       list_add(&folio->lru, &pageout_list);
+                       continue;
        }

        /*
@@ -1494,6 +1465,76 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
        }
        /* 'folio_list' is always empty here */

+       if (!list_empty(&pageout_list)) {
+               /*
+               * Batch TLB flushes by flushing once before processing
all dirty pages.
+               * Since we operate on one PMD at a time, this batches
TLB flushes at
+               * PMD granularity rather than per-page, reducing IPIs.
+               */
+               struct address_space *mapping;
+               try_to_unmap_flush_dirty();
+
+               while (!list_empty(&pageout_list)) {
+                       struct folio *folio = lru_to_folio(&pageout_list);
+                       list_del(&folio->lru);
+
+                       /* Recheck if page got reactivated */
+                       if (folio_test_active(folio) ||
+                           (folio_mapped(folio) && folio_test_young(folio)))
+                               goto skip_pageout_locked;
+
+                       mapping = folio_mapping(folio);
+                       pageout_t pageout_res = pageout(folio, mapping, &plug);
+                       switch (pageout_res) {
+                       case PAGE_KEEP:
+                               goto skip_pageout_locked;
+                       case PAGE_ACTIVATE:
+                               goto skip_pageout_locked;
+                       case PAGE_SUCCESS:
+                               stat->nr_pageout += folio_nr_pages(folio);
+
+                               if (folio_test_writeback(folio) ||
+                                   folio_test_dirty(folio))
+                                       goto skip_pageout;
+
+                               /*
+                                * A synchronous write - probably a ramdisk.  Go
+                                * ahead and try to reclaim the folio.
+                                */
+                               if (!folio_trylock(folio))
+                                       goto skip_pageout;
+                               if (folio_test_dirty(folio) ||
+                                   folio_test_writeback(folio))
+                                       goto skip_pageout_locked;
+
+                               // Try to free the page
+                               if (!mapping ||
+                                   !__remove_mapping(mapping, folio, true,
+                                                   sc->target_mem_cgroup))
+                                       goto skip_pageout_locked;
+
+                               nr_reclaimed += folio_nr_pages(folio);
+                               folio_unlock(folio);
+                               continue;
+
+                       case PAGE_CLEAN:
+                               if (!mapping ||
+                                   !__remove_mapping(mapping, folio, true,
+                                                   sc->target_mem_cgroup))
+                                       goto skip_pageout_locked;
+
+                               nr_reclaimed += folio_nr_pages(folio);
+                               folio_unlock(folio);
+                               continue;
+                       }
+
+skip_pageout_locked:
+                       folio_unlock(folio);
+skip_pageout:
+                       list_add(&folio->lru, &ret_folios);
+               }
+       }
+
        /* Migrate folios selected for demotion */
        nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
        /* Folios that could not be demoted are still in @demote_folios */

On Mon, Jan 20, 2025 at 4:47 PM Vinay Banakar <vny@google.com> wrote:
>
> The current implementation in shrink_folio_list() performs full TLB
> flushes and issues IPIs for each individual page being reclaimed. This
> causes unnecessary overhead during memory reclaim, whether triggered
> by madvise(MADV_PAGEOUT) or kswapd, especially in scenarios where
> applications are actively moving cold pages to swap while maintaining
> high performance requirements for hot pages.
>
> The current code:
> 1. Clears PTE and unmaps each page individually
> 2. Performs a full TLB flush on all cores using the VMA (via CR3 write) or
> issues individual TLB shootdowns (invlpg+invlpcid) for single-core usage
> 3. Submits each page individually to BIO
>
> This approach results in:
> - Excessive full TLB flushes across all cores
> - Unnecessary IPI storms when processing multiple pages
> - Suboptimal I/O submission patterns
>
> I initially tried using selective TLB shootdowns (invlpg) instead of
> full TLB flushes per each page to avoid interference with other
> threads. However, this approach still required sending IPIs to all
> cores for each page, which did not significantly improve application
> throughput.
>
> This patch instead optimizes the process by batching operations,
> issuing one IPI per PMD instead of per page. This reduces interrupts
> by a factor of 512 and enables batching page submissions to BIO. The
> new approach:
> 1. Collect dirty pages that need to be written back
> 2. Issue a single TLB flush for all dirty pages in the batch
> 3. Process the collected pages for writebacks (submit to BIO)
>
> Testing shows significant reduction in application throughput impact
> during page-out operations. Applications maintain better performance
> during memory reclaim, when triggered by explicit
> madvise(MADV_PAGEOUT) calls.
>
> I'd appreciate your feedback on this approach, especially on the
> correctness of batched BIO submissions. Looking forward to your
> comments.
>
> Signed-off-by: Vinay Banakar <vny@google.com>
> ---
>  mm/vmscan.c | 107
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
>  1 file changed, 74 insertions(+), 33 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd489c1af..1bd510622 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1035,6 +1035,7 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   struct folio_batch free_folios;
>   LIST_HEAD(ret_folios);
>   LIST_HEAD(demote_folios);
> + LIST_HEAD(pageout_list);
>   unsigned int nr_reclaimed = 0;
>   unsigned int pgactivate = 0;
>   bool do_demote_pass;
> @@ -1351,39 +1352,9 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   if (!sc->may_writepage)
>   goto keep_locked;
>
> - /*
> - * Folio is dirty. Flush the TLB if a writable entry
> - * potentially exists to avoid CPU writes after I/O
> - * starts and then write it out here.
> - */
> - try_to_unmap_flush_dirty();
> - switch (pageout(folio, mapping, &plug)) {
> - case PAGE_KEEP:
> - goto keep_locked;
> - case PAGE_ACTIVATE:
> - goto activate_locked;
> - case PAGE_SUCCESS:
> - stat->nr_pageout += nr_pages;
> -
> - if (folio_test_writeback(folio))
> - goto keep;
> - if (folio_test_dirty(folio))
> - goto keep;
> -
> - /*
> - * A synchronous write - probably a ramdisk.  Go
> - * ahead and try to reclaim the folio.
> - */
> - if (!folio_trylock(folio))
> - goto keep;
> - if (folio_test_dirty(folio) ||
> -    folio_test_writeback(folio))
> - goto keep_locked;
> - mapping = folio_mapping(folio);
> - fallthrough;
> - case PAGE_CLEAN:
> - ; /* try to free the folio below */
> - }
> + /* Add to pageout list for defered bio submissions */
> + list_add(&folio->lru, &pageout_list);
> + continue;
>   }
>
>   /*
> @@ -1494,6 +1465,76 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   }
>   /* 'folio_list' is always empty here */
>
> + if (!list_empty(&pageout_list)) {
> + /*
> + * Batch TLB flushes by flushing once before processing all dirty pages.
> + * Since we operate on one PMD at a time, this batches TLB flushes at
> + * PMD granularity rather than per-page, reducing IPIs.
> + */
> + struct address_space *mapping;
> + try_to_unmap_flush_dirty();
> +
> + while (!list_empty(&pageout_list)) {
> + struct folio *folio = lru_to_folio(&pageout_list);
> + list_del(&folio->lru);
> +
> + /* Recheck if page got reactivated */
> + if (folio_test_active(folio) ||
> +    (folio_mapped(folio) && folio_test_young(folio)))
> + goto skip_pageout_locked;
> +
> + mapping = folio_mapping(folio);
> + pageout_t pageout_res = pageout(folio, mapping, &plug);
> + switch (pageout_res) {
> + case PAGE_KEEP:
> + goto skip_pageout_locked;
> + case PAGE_ACTIVATE:
> + goto skip_pageout_locked;
> + case PAGE_SUCCESS:
> + stat->nr_pageout += folio_nr_pages(folio);
> +
> + if (folio_test_writeback(folio) ||
> +    folio_test_dirty(folio))
> + goto skip_pageout;
> +
> + /*
> + * A synchronous write - probably a ramdisk.  Go
> + * ahead and try to reclaim the folio.
> + */
> + if (!folio_trylock(folio))
> + goto skip_pageout;
> + if (folio_test_dirty(folio) ||
> +    folio_test_writeback(folio))
> + goto skip_pageout_locked;
> +
> + // Try to free the page
> + if (!mapping ||
> +    !__remove_mapping(mapping, folio, true,
> +      sc->target_mem_cgroup))
> + goto skip_pageout_locked;
> +
> + nr_reclaimed += folio_nr_pages(folio);
> + folio_unlock(folio);
> + continue;
> +
> + case PAGE_CLEAN:
> + if (!mapping ||
> +    !__remove_mapping(mapping, folio, true,
> +      sc->target_mem_cgroup))
> + goto skip_pageout_locked;
> +
> + nr_reclaimed += folio_nr_pages(folio);
> + folio_unlock(folio);
> + continue;
> + }
> +
> +skip_pageout_locked:
> + folio_unlock(folio);
> +skip_pageout:
> + list_add(&folio->lru, &ret_folios);
> + }
> + }
> +
>   /* Migrate folios selected for demotion */
>   nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
>   /* Folios that could not be demoted are still in @demote_folios */

Byungchul Park Jan. 21, 2025, 1:43 a.m. UTC | #2

On Mon, Jan 20, 2025 at 04:47:29PM -0600, Vinay Banakar wrote:
> The current implementation in shrink_folio_list() performs full TLB
> flushes and issues IPIs for each individual page being reclaimed. This
> causes unnecessary overhead during memory reclaim, whether triggered
> by madvise(MADV_PAGEOUT) or kswapd, especially in scenarios where
> applications are actively moving cold pages to swap while maintaining
> high performance requirements for hot pages.
> 
> The current code:
> 1. Clears PTE and unmaps each page individually
> 2. Performs a full TLB flush on all cores using the VMA (via CR3 write) or
> issues individual TLB shootdowns (invlpg+invlpcid) for single-core usage
> 3. Submits each page individually to BIO
> 
> This approach results in:
> - Excessive full TLB flushes across all cores
> - Unnecessary IPI storms when processing multiple pages
> - Suboptimal I/O submission patterns
> 
> I initially tried using selective TLB shootdowns (invlpg) instead of
> full TLB flushes per each page to avoid interference with other
> threads. However, this approach still required sending IPIs to all
> cores for each page, which did not significantly improve application
> throughput.
> 
> This patch instead optimizes the process by batching operations,
> issuing one IPI per PMD instead of per page. This reduces interrupts
> by a factor of 512 and enables batching page submissions to BIO. The
> new approach:
> 1. Collect dirty pages that need to be written back
> 2. Issue a single TLB flush for all dirty pages in the batch
> 3. Process the collected pages for writebacks (submit to BIO)

The *interesting* IPIs will be reduced by 1/512 at most.  Can we see the
improvement number?

	Byungchul

> Testing shows significant reduction in application throughput impact
> during page-out operations. Applications maintain better performance
> during memory reclaim, when triggered by explicit
> madvise(MADV_PAGEOUT) calls.
> 
> I'd appreciate your feedback on this approach, especially on the
> correctness of batched BIO submissions. Looking forward to your
> comments.
> 
> Signed-off-by: Vinay Banakar <vny@google.com>
> ---
>  mm/vmscan.c | 107
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
>  1 file changed, 74 insertions(+), 33 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bd489c1af..1bd510622 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1035,6 +1035,7 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   struct folio_batch free_folios;
>   LIST_HEAD(ret_folios);
>   LIST_HEAD(demote_folios);
> + LIST_HEAD(pageout_list);
>   unsigned int nr_reclaimed = 0;
>   unsigned int pgactivate = 0;
>   bool do_demote_pass;
> @@ -1351,39 +1352,9 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   if (!sc->may_writepage)
>   goto keep_locked;
> 
> - /*
> - * Folio is dirty. Flush the TLB if a writable entry
> - * potentially exists to avoid CPU writes after I/O
> - * starts and then write it out here.
> - */
> - try_to_unmap_flush_dirty();
> - switch (pageout(folio, mapping, &plug)) {
> - case PAGE_KEEP:
> - goto keep_locked;
> - case PAGE_ACTIVATE:
> - goto activate_locked;
> - case PAGE_SUCCESS:
> - stat->nr_pageout += nr_pages;
> -
> - if (folio_test_writeback(folio))
> - goto keep;
> - if (folio_test_dirty(folio))
> - goto keep;
> -
> - /*
> - * A synchronous write - probably a ramdisk.  Go
> - * ahead and try to reclaim the folio.
> - */
> - if (!folio_trylock(folio))
> - goto keep;
> - if (folio_test_dirty(folio) ||
> -    folio_test_writeback(folio))
> - goto keep_locked;
> - mapping = folio_mapping(folio);
> - fallthrough;
> - case PAGE_CLEAN:
> - ; /* try to free the folio below */
> - }
> + /* Add to pageout list for defered bio submissions */
> + list_add(&folio->lru, &pageout_list);
> + continue;
>   }
> 
>   /*
> @@ -1494,6 +1465,76 @@ static unsigned int shrink_folio_list(struct
> list_head *folio_list,
>   }
>   /* 'folio_list' is always empty here */
> 
> + if (!list_empty(&pageout_list)) {
> + /*
> + * Batch TLB flushes by flushing once before processing all dirty pages.
> + * Since we operate on one PMD at a time, this batches TLB flushes at
> + * PMD granularity rather than per-page, reducing IPIs.
> + */
> + struct address_space *mapping;
> + try_to_unmap_flush_dirty();
> +
> + while (!list_empty(&pageout_list)) {
> + struct folio *folio = lru_to_folio(&pageout_list);
> + list_del(&folio->lru);
> +
> + /* Recheck if page got reactivated */
> + if (folio_test_active(folio) ||
> +    (folio_mapped(folio) && folio_test_young(folio)))
> + goto skip_pageout_locked;
> +
> + mapping = folio_mapping(folio);
> + pageout_t pageout_res = pageout(folio, mapping, &plug);
> + switch (pageout_res) {
> + case PAGE_KEEP:
> + goto skip_pageout_locked;
> + case PAGE_ACTIVATE:
> + goto skip_pageout_locked;
> + case PAGE_SUCCESS:
> + stat->nr_pageout += folio_nr_pages(folio);
> +
> + if (folio_test_writeback(folio) ||
> +    folio_test_dirty(folio))
> + goto skip_pageout;
> +
> + /*
> + * A synchronous write - probably a ramdisk.  Go
> + * ahead and try to reclaim the folio.
> + */
> + if (!folio_trylock(folio))
> + goto skip_pageout;
> + if (folio_test_dirty(folio) ||
> +    folio_test_writeback(folio))
> + goto skip_pageout_locked;
> +
> + // Try to free the page
> + if (!mapping ||
> +    !__remove_mapping(mapping, folio, true,
> +      sc->target_mem_cgroup))
> + goto skip_pageout_locked;
> +
> + nr_reclaimed += folio_nr_pages(folio);
> + folio_unlock(folio);
> + continue;
> +
> + case PAGE_CLEAN:
> + if (!mapping ||
> +    !__remove_mapping(mapping, folio, true,
> +      sc->target_mem_cgroup))
> + goto skip_pageout_locked;
> +
> + nr_reclaimed += folio_nr_pages(folio);
> + folio_unlock(folio);
> + continue;
> + }
> +
> +skip_pageout_locked:
> + folio_unlock(folio);
> +skip_pageout:
> + list_add(&folio->lru, &ret_folios);
> + }
> + }
> +
>   /* Migrate folios selected for demotion */
>   nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
>   /* Folios that could not be demoted are still in @demote_folios */

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd489c1af..1bd510622 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1035,6 +1035,7 @@  static unsigned int shrink_folio_list(struct
list_head *folio_list,
  struct folio_batch free_folios;
  LIST_HEAD(ret_folios);
  LIST_HEAD(demote_folios);
+ LIST_HEAD(pageout_list);
  unsigned int nr_reclaimed = 0;
  unsigned int pgactivate = 0;

mm: Optimize TLB flushes during page reclaim

Commit Message

Comments

Patch