Message ID | 20250115093135.3288234-6-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Get rid of PG_reclaim and rename PG_dropbehind | expand |
On Wed, Jan 15, 2025 at 2:32 AM Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > The recently introduced PG_dropbehind allows for freeing folios > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > to be involved to get the folio freed. > > The new flag allows to replace whole deactivate_file_folio() machinery > with simple folio_set_dropbehind(). > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Yu Zhao <yuzhao@google.com>
On Wed, Jan 15, 2025 at 11:31:29AM +0200, Kirill A. Shutemov wrote: > -static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > -{ > - bool active = folio_test_active(folio) || lru_gen_enabled(); > - long nr_pages = folio_nr_pages(folio); > - > - if (folio_test_unevictable(folio)) > - return; > - > - /* Some processes are using the folio */ > - if (folio_mapped(folio)) > - return; > - > - lruvec_del_folio(lruvec, folio); > - folio_clear_active(folio); > - folio_clear_referenced(folio); > - > - if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > - /* > - * Setting the reclaim flag could race with > - * folio_end_writeback() and confuse readahead. But the > - * race window is _really_ small and it's not a critical > - * problem. > - */ > - lruvec_add_folio(lruvec, folio); > - folio_set_reclaim(folio); > - } else { > - /* > - * The folio's writeback ended while it was in the batch. > - * We move that folio to the tail of the inactive list. > - */ > - lruvec_add_folio_tail(lruvec, folio); > - __count_vm_events(PGROTATED, nr_pages); > - } > - > - if (active) { > - __count_vm_events(PGDEACTIVATE, nr_pages); > - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, > - nr_pages); > - } > -} > +++ b/mm/truncate.c > @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, > * of interest and try to speed up its reclaim. > */ > if (!ret) { > - deactivate_file_folio(folio); > + folio_set_dropbehind(folio); brr. This is a fairly substantial change in semantics, and maybe it's fine. At a high level, we're trying to remove pages from an inode that aren't in use. But we might find that some of them are in use (eg they're mapped or under writeback). If they are mapped, we don't currently try to accelerate their reclaim, but now we're going to mark them as dropbehind. I think that's wrong. If they're dirty or under writeback, then yes, mark them as dropbehind, but I think we need to be a little more surgical here. Maybe preserve the unevictable check too.
On Wed, Jan 15, 2025 at 2:35 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Jan 15, 2025 at 11:31:29AM +0200, Kirill A. Shutemov wrote: > > -static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > > -{ > > - bool active = folio_test_active(folio) || lru_gen_enabled(); > > - long nr_pages = folio_nr_pages(folio); > > - > > - if (folio_test_unevictable(folio)) > > - return; > > - > > - /* Some processes are using the folio */ > > - if (folio_mapped(folio)) > > - return; > > - > > - lruvec_del_folio(lruvec, folio); > > - folio_clear_active(folio); > > - folio_clear_referenced(folio); > > - > > - if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > > - /* > > - * Setting the reclaim flag could race with > > - * folio_end_writeback() and confuse readahead. But the > > - * race window is _really_ small and it's not a critical > > - * problem. > > - */ > > - lruvec_add_folio(lruvec, folio); > > - folio_set_reclaim(folio); > > - } else { > > - /* > > - * The folio's writeback ended while it was in the batch. > > - * We move that folio to the tail of the inactive list. > > - */ > > - lruvec_add_folio_tail(lruvec, folio); > > - __count_vm_events(PGROTATED, nr_pages); > > - } > > - > > - if (active) { > > - __count_vm_events(PGDEACTIVATE, nr_pages); > > - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, > > - nr_pages); > > - } > > -} > > > +++ b/mm/truncate.c > > @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, > > * of interest and try to speed up its reclaim. > > */ > > if (!ret) { > > - deactivate_file_folio(folio); > > + folio_set_dropbehind(folio); > > brr. > > This is a fairly substantial change in semantics, and maybe it's fine. > > At a high level, we're trying to remove pages from an inode that aren't > in use. But we might find that some of them are in use (eg they're > mapped or under writeback). If they are mapped, we don't currently > try to accelerate their reclaim, but now we're going to mark them > as dropbehind. I think that's wrong. > > If they're dirty or under writeback, then yes, mark them as dropbehind, but > I think we need to be a little more surgical here. Maybe preserve the > unevictable check too. Right -- deactivate_file_folio() does make sure the folio is not unevictable or mapped. So probably something like below would the change in semantics be close enough? if (!folio_test_unevictable(folio) && !folio_mapped(folio)) folio_set_dropbehind(folio);
On Wed, Jan 15, 2025 at 02:46:44PM -0700, Yu Zhao wrote: > On Wed, Jan 15, 2025 at 2:35 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Wed, Jan 15, 2025 at 11:31:29AM +0200, Kirill A. Shutemov wrote: > > > -static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > > > -{ > > > - bool active = folio_test_active(folio) || lru_gen_enabled(); > > > - long nr_pages = folio_nr_pages(folio); > > > - > > > - if (folio_test_unevictable(folio)) > > > - return; > > > - > > > - /* Some processes are using the folio */ > > > - if (folio_mapped(folio)) > > > - return; > > > - > > > - lruvec_del_folio(lruvec, folio); > > > - folio_clear_active(folio); > > > - folio_clear_referenced(folio); > > > - > > > - if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > > > - /* > > > - * Setting the reclaim flag could race with > > > - * folio_end_writeback() and confuse readahead. But the > > > - * race window is _really_ small and it's not a critical > > > - * problem. > > > - */ > > > - lruvec_add_folio(lruvec, folio); > > > - folio_set_reclaim(folio); > > > - } else { > > > - /* > > > - * The folio's writeback ended while it was in the batch. > > > - * We move that folio to the tail of the inactive list. > > > - */ > > > - lruvec_add_folio_tail(lruvec, folio); > > > - __count_vm_events(PGROTATED, nr_pages); > > > - } > > > - > > > - if (active) { > > > - __count_vm_events(PGDEACTIVATE, nr_pages); > > > - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, > > > - nr_pages); > > > - } > > > -} > > > > > +++ b/mm/truncate.c > > > @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, > > > * of interest and try to speed up its reclaim. > > > */ > > > if (!ret) { > > > - deactivate_file_folio(folio); > > > + folio_set_dropbehind(folio); > > > > brr. > > > > This is a fairly substantial change in semantics, and maybe it's fine. > > > > At a high level, we're trying to remove pages from an inode that aren't > > in use. But we might find that some of them are in use (eg they're > > mapped or under writeback). If they are mapped, we don't currently > > try to accelerate their reclaim, but now we're going to mark them > > as dropbehind. I think that's wrong. > > > > If they're dirty or under writeback, then yes, mark them as dropbehind, but > > I think we need to be a little more surgical here. Maybe preserve the > > unevictable check too. > > Right -- deactivate_file_folio() does make sure the folio is not > unevictable or mapped. So probably something like below would the > change in semantics be close enough? > > if (!folio_test_unevictable(folio) && !folio_mapped(folio)) > folio_set_dropbehind(folio); Okay, makes sense to me.
diff --git a/mm/internal.h b/mm/internal.h index 109ef30fee11..93e6dac2077a 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -379,7 +379,6 @@ static inline vm_fault_t vmf_anon_prepare(struct vm_fault *vmf) vm_fault_t do_swap_page(struct vm_fault *vmf); void folio_rotate_reclaimable(struct folio *folio); bool __folio_end_writeback(struct folio *folio); -void deactivate_file_folio(struct folio *folio); void folio_activate(struct folio *folio); void free_pgtables(struct mmu_gather *tlb, struct ma_state *mas, diff --git a/mm/swap.c b/mm/swap.c index fc8281ef4241..7a0dffd5973a 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -54,7 +54,6 @@ struct cpu_fbatches { */ local_lock_t lock; struct folio_batch lru_add; - struct folio_batch lru_deactivate_file; struct folio_batch lru_deactivate; struct folio_batch lru_lazyfree; #ifdef CONFIG_SMP @@ -524,68 +523,6 @@ void folio_add_lru_vma(struct folio *folio, struct vm_area_struct *vma) folio_add_lru(folio); } -/* - * If the folio cannot be invalidated, it is moved to the - * inactive list to speed up its reclaim. It is moved to the - * head of the list, rather than the tail, to give the flusher - * threads some time to write it out, as this is much more - * effective than the single-page writeout from reclaim. - * - * If the folio isn't mapped and dirty/writeback, the folio - * could be reclaimed asap using the reclaim flag. - * - * 1. active, mapped folio -> none - * 2. active, dirty/writeback folio -> inactive, head, reclaim - * 3. inactive, mapped folio -> none - * 4. inactive, dirty/writeback folio -> inactive, head, reclaim - * 5. inactive, clean -> inactive, tail - * 6. Others -> none - * - * In 4, it moves to the head of the inactive list so the folio is - * written out by flusher threads as this is much more efficient - * than the single-page writeout from reclaim. - */ -static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) -{ - bool active = folio_test_active(folio) || lru_gen_enabled(); - long nr_pages = folio_nr_pages(folio); - - if (folio_test_unevictable(folio)) - return; - - /* Some processes are using the folio */ - if (folio_mapped(folio)) - return; - - lruvec_del_folio(lruvec, folio); - folio_clear_active(folio); - folio_clear_referenced(folio); - - if (folio_test_writeback(folio) || folio_test_dirty(folio)) { - /* - * Setting the reclaim flag could race with - * folio_end_writeback() and confuse readahead. But the - * race window is _really_ small and it's not a critical - * problem. - */ - lruvec_add_folio(lruvec, folio); - folio_set_reclaim(folio); - } else { - /* - * The folio's writeback ended while it was in the batch. - * We move that folio to the tail of the inactive list. - */ - lruvec_add_folio_tail(lruvec, folio); - __count_vm_events(PGROTATED, nr_pages); - } - - if (active) { - __count_vm_events(PGDEACTIVATE, nr_pages); - __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, - nr_pages); - } -} - static void lru_deactivate(struct lruvec *lruvec, struct folio *folio) { long nr_pages = folio_nr_pages(folio); @@ -652,10 +589,6 @@ void lru_add_drain_cpu(int cpu) local_unlock_irqrestore(&cpu_fbatches.lock_irq, flags); } - fbatch = &fbatches->lru_deactivate_file; - if (folio_batch_count(fbatch)) - folio_batch_move_lru(fbatch, lru_deactivate_file); - fbatch = &fbatches->lru_deactivate; if (folio_batch_count(fbatch)) folio_batch_move_lru(fbatch, lru_deactivate); @@ -667,28 +600,6 @@ void lru_add_drain_cpu(int cpu) folio_activate_drain(cpu); } -/** - * deactivate_file_folio() - Deactivate a file folio. - * @folio: Folio to deactivate. - * - * This function hints to the VM that @folio is a good reclaim candidate, - * for example if its invalidation fails due to the folio being dirty - * or under writeback. - * - * Context: Caller holds a reference on the folio. - */ -void deactivate_file_folio(struct folio *folio) -{ - /* Deactivating an unevictable folio will not accelerate reclaim */ - if (folio_test_unevictable(folio)) - return; - - if (lru_gen_enabled() && lru_gen_clear_refs(folio)) - return; - - folio_batch_add_and_move(folio, lru_deactivate_file, true); -} - /* * folio_deactivate - deactivate a folio * @folio: folio to deactivate @@ -772,7 +683,6 @@ static bool cpu_needs_drain(unsigned int cpu) /* Check these in order of likelihood that they're not zero */ return folio_batch_count(&fbatches->lru_add) || folio_batch_count(&fbatches->lru_move_tail) || - folio_batch_count(&fbatches->lru_deactivate_file) || folio_batch_count(&fbatches->lru_deactivate) || folio_batch_count(&fbatches->lru_lazyfree) || folio_batch_count(&fbatches->lru_activate) || diff --git a/mm/truncate.c b/mm/truncate.c index e2e115adfbc5..864aaadc1e91 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, * of interest and try to speed up its reclaim. */ if (!ret) { - deactivate_file_folio(folio); + folio_set_dropbehind(folio); /* Likely in the lru cache of a remote CPU */ if (nr_failed) (*nr_failed)++;
The recently introduced PG_dropbehind allows for freeing folios immediately after writeback. Unlike PG_reclaim, it does not need vmscan to be involved to get the folio freed. The new flag allows to replace whole deactivate_file_folio() machinery with simple folio_set_dropbehind(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/internal.h | 1 - mm/swap.c | 90 --------------------------------------------------- mm/truncate.c | 2 +- 3 files changed, 1 insertion(+), 92 deletions(-)