Message ID | 20241220010931.3603111-1-chenridong@huaweicloud.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [-next,v5] mm: vmscan: retry folios written back while isolated for traditional LRU | expand |
On Fri, Dec 20, 2024 at 2:19 PM Chen Ridong <chenridong@huaweicloud.com> wrote: > > From: Chen Ridong <chenridong@huawei.com> > > The page reclaim isolates a batch of folios from the tail of one of the > LRU lists and works on those folios one by one. For a suitable > swap-backed folio, if the swap device is async, it queues that folio for > writeback. After the page reclaim finishes an entire batch, it puts back > the folios it queued for writeback to the head of the original LRU list. > > In the meantime, the page writeback flushes the queued folios also by > batches. Its batching logic is independent from that of the page reclaim. > For each of the folios it writes back, the page writeback calls > folio_rotate_reclaimable() which tries to rotate a folio to the tail. > > folio_rotate_reclaimable() only works for a folio after the page reclaim > has put it back. If an async swap device is fast enough, the page > writeback can finish with that folio while the page reclaim is still > working on the rest of the batch containing it. In this case, that folio > will remain at the head and the page reclaim will not retry it before > reaching there. > > The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back > while isolated") only fixed the issue for mglru. However, this issue > also exists in the traditional active/inactive LRU. This issue will be > worse if THP is split, which makes the list longer and needs longer time > to finish a batch of folios reclaim. > > This issue should be fixed in the same way for the traditional LRU. > Therefore, the common logic was extracted to the 'find_folios_written_back' > function firstly, which is then reused in the 'shrink_inactive_list' > function. Finally, retry reclaiming those folios that may have missed the > rotation for traditional LRU. > > Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/ > Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/ > Signed-off-by: Chen Ridong <chenridong@huawei.com> > --- > mm/vmscan.c | 108 ++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 70 insertions(+), 38 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 39886f435ec5..e67e446540ba 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -283,6 +283,39 @@ static void set_task_reclaim_state(struct task_struct *task, > task->reclaim_state = rs; > } > > +/** > + * find_folios_written_back - Find and move the written back folios to a new list. > + * @list: filios list > + * @clean: the written back folios list > + * @is_retried: whether the list has already been retried. > + */ > +static inline void find_folios_written_back(struct list_head *list, > + struct list_head *clean, bool is_retried) > +{ > + struct folio *folio; > + struct folio *next; > + > + list_for_each_entry_safe_reverse(folio, next, list, lru) { > + if (!folio_evictable(folio)) { > + list_del(&folio->lru); > + folio_putback_lru(folio); > + continue; > + } > + > + /* retry folios that may have missed folio_rotate_reclaimable() */ > + if (!is_retried && !folio_test_active(folio) && !folio_mapped(folio) && > + !folio_test_dirty(folio) && !folio_test_writeback(folio)) { > + list_move(&folio->lru, clean); > + continue; > + } > + > + /* don't add rejected folios to the oldest generation */ > + if (lru_gen_enabled() && !lru_gen_distance(folio, false)) > + set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); > + } > + > +} > + > /* > * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to > * scan_control->nr_reclaimed. > @@ -1959,14 +1992,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > enum lru_list lru) > { > LIST_HEAD(folio_list); > + LIST_HEAD(clean_list); > unsigned long nr_scanned; > - unsigned int nr_reclaimed = 0; > + unsigned int nr_reclaimed, total_reclaimed = 0; > + unsigned int nr_pageout = 0; > + unsigned int nr_unqueued_dirty = 0; > unsigned long nr_taken; > struct reclaim_stat stat; > bool file = is_file_lru(lru); > enum vm_event_item item; > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > bool stalled = false; > + bool is_retried = false; > > while (unlikely(too_many_isolated(pgdat, file, sc))) { > if (stalled) > @@ -2000,22 +2037,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > if (nr_taken == 0) > return 0; > > +retry: > nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false); > > + sc->nr.dirty += stat.nr_dirty; > + sc->nr.congested += stat.nr_congested; > + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; > + sc->nr.writeback += stat.nr_writeback; > + sc->nr.immediate += stat.nr_immediate; > + total_reclaimed += nr_reclaimed; > + nr_pageout += stat.nr_pageout; > + nr_unqueued_dirty += stat.nr_unqueued_dirty; > + > + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, > + nr_scanned, nr_reclaimed, &stat, sc->priority, file); This is a bit odd, as nr_scanned during a retry still uses the previous nr_scanned value. However, I find that mglru shows no difference. retry: reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; sc->nr_reclaimed += reclaimed; trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, scanned, reclaimed, &stat, sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); Currently, the active/inactive state aligns with mglru in this trace. It seems that the userspace BPF should recognize that the nr_scanned during a retry doesn't indicate we are isolating new nr_scanned folios. Ideally, the is_retry flag should be passed to the trace, allowing userspace to identify that it's a retry and disregard the nr_scanned value. It might be worth addressing this in a separate patch. Add Bixuan to clarify how userspace depends on this trace and if "retry" will break his userspace BPF for both MGLRU and active/inactive cases. Otherwise, the patch looks good to me. > + > + find_folios_written_back(&folio_list, &clean_list, is_retried); > + > spin_lock_irq(&lruvec->lru_lock); > move_folios_to_lru(lruvec, &folio_list); > > __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), > stat.nr_demoted); > - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); > item = PGSTEAL_KSWAPD + reclaimer_offset(); > if (!cgroup_reclaim(sc)) > __count_vm_events(item, nr_reclaimed); > __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); > __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); > + > + if (!list_empty(&clean_list)) { > + list_splice_init(&clean_list, &folio_list); > + is_retried = true; > + spin_unlock_irq(&lruvec->lru_lock); > + goto retry; > + } > + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); > spin_unlock_irq(&lruvec->lru_lock); > + sc->nr.taken += nr_taken; > + if (file) > + sc->nr.file_taken += nr_taken; > > - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); > + lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed); > > /* > * If dirty folios are scanned that are not queued for IO, it > @@ -2028,7 +2090,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > * the flushers simply cannot keep up with the allocation > * rate. Nudge the flusher threads in case they are asleep. > */ > - if (stat.nr_unqueued_dirty == nr_taken) { > + if (nr_unqueued_dirty == nr_taken) { > wakeup_flusher_threads(WB_REASON_VMSCAN); > /* > * For cgroupv1 dirty throttling is achieved by waking up > @@ -2043,18 +2105,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); > } > > - sc->nr.dirty += stat.nr_dirty; > - sc->nr.congested += stat.nr_congested; > - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; > - sc->nr.writeback += stat.nr_writeback; > - sc->nr.immediate += stat.nr_immediate; > - sc->nr.taken += nr_taken; > - if (file) > - sc->nr.file_taken += nr_taken; > - > - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, > - nr_scanned, nr_reclaimed, &stat, sc->priority, file); > - return nr_reclaimed; > + return total_reclaimed; > } > > /* > @@ -4585,12 +4636,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > int reclaimed; > LIST_HEAD(list); > LIST_HEAD(clean); > - struct folio *folio; > - struct folio *next; > enum vm_event_item item; > struct reclaim_stat stat; > struct lru_gen_mm_walk *walk; > - bool skip_retry = false; > + bool is_retried = false; > struct lru_gen_folio *lrugen = &lruvec->lrugen; > struct mem_cgroup *memcg = lruvec_memcg(lruvec); > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > @@ -4616,24 +4665,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > scanned, reclaimed, &stat, sc->priority, > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); > > - list_for_each_entry_safe_reverse(folio, next, &list, lru) { > - if (!folio_evictable(folio)) { > - list_del(&folio->lru); > - folio_putback_lru(folio); > - continue; > - } > - > - /* retry folios that may have missed folio_rotate_reclaimable() */ > - if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && > - !folio_test_dirty(folio) && !folio_test_writeback(folio)) { > - list_move(&folio->lru, &clean); > - continue; > - } > - > - /* don't add rejected folios to the oldest generation */ > - if (!lru_gen_distance(folio, false)) > - set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); > - } > + find_folios_written_back(&list, &clean, is_retried); > > spin_lock_irq(&lruvec->lru_lock); > > @@ -4656,7 +4688,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > list_splice_init(&clean, &list); > > if (!list_empty(&list)) { > - skip_retry = true; > + is_retried = true; > goto retry; > } > > -- > 2.34.1 > Thanks barry
On Fri, Dec 20, 2024 at 3:30 PM Barry Song <21cnbao@gmail.com> wrote: > > On Fri, Dec 20, 2024 at 2:19 PM Chen Ridong <chenridong@huaweicloud.com> wrote: > > > > From: Chen Ridong <chenridong@huawei.com> > > > > The page reclaim isolates a batch of folios from the tail of one of the > > LRU lists and works on those folios one by one. For a suitable > > swap-backed folio, if the swap device is async, it queues that folio for > > writeback. After the page reclaim finishes an entire batch, it puts back > > the folios it queued for writeback to the head of the original LRU list. > > > > In the meantime, the page writeback flushes the queued folios also by > > batches. Its batching logic is independent from that of the page reclaim. > > For each of the folios it writes back, the page writeback calls > > folio_rotate_reclaimable() which tries to rotate a folio to the tail. > > > > folio_rotate_reclaimable() only works for a folio after the page reclaim > > has put it back. If an async swap device is fast enough, the page > > writeback can finish with that folio while the page reclaim is still > > working on the rest of the batch containing it. In this case, that folio > > will remain at the head and the page reclaim will not retry it before > > reaching there. > > > > The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back > > while isolated") only fixed the issue for mglru. However, this issue > > also exists in the traditional active/inactive LRU. This issue will be > > worse if THP is split, which makes the list longer and needs longer time > > to finish a batch of folios reclaim. > > > > This issue should be fixed in the same way for the traditional LRU. > > Therefore, the common logic was extracted to the 'find_folios_written_back' > > function firstly, which is then reused in the 'shrink_inactive_list' > > function. Finally, retry reclaiming those folios that may have missed the > > rotation for traditional LRU. > > > > Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/ > > Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/ > > Signed-off-by: Chen Ridong <chenridong@huawei.com> > > --- > > mm/vmscan.c | 108 ++++++++++++++++++++++++++++++++++------------------ > > 1 file changed, 70 insertions(+), 38 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 39886f435ec5..e67e446540ba 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -283,6 +283,39 @@ static void set_task_reclaim_state(struct task_struct *task, > > task->reclaim_state = rs; > > } > > > > +/** > > + * find_folios_written_back - Find and move the written back folios to a new list. > > + * @list: filios list > > + * @clean: the written back folios list > > + * @is_retried: whether the list has already been retried. > > + */ > > +static inline void find_folios_written_back(struct list_head *list, > > + struct list_head *clean, bool is_retried) > > +{ > > + struct folio *folio; > > + struct folio *next; > > + > > + list_for_each_entry_safe_reverse(folio, next, list, lru) { > > + if (!folio_evictable(folio)) { > > + list_del(&folio->lru); > > + folio_putback_lru(folio); > > + continue; > > + } > > + > > + /* retry folios that may have missed folio_rotate_reclaimable() */ > > + if (!is_retried && !folio_test_active(folio) && !folio_mapped(folio) && > > + !folio_test_dirty(folio) && !folio_test_writeback(folio)) { > > + list_move(&folio->lru, clean); > > + continue; > > + } > > + > > + /* don't add rejected folios to the oldest generation */ > > + if (lru_gen_enabled() && !lru_gen_distance(folio, false)) > > + set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); > > + } > > + > > +} > > + > > /* > > * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to > > * scan_control->nr_reclaimed. > > @@ -1959,14 +1992,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > > enum lru_list lru) > > { > > LIST_HEAD(folio_list); > > + LIST_HEAD(clean_list); > > unsigned long nr_scanned; > > - unsigned int nr_reclaimed = 0; > > + unsigned int nr_reclaimed, total_reclaimed = 0; > > + unsigned int nr_pageout = 0; > > + unsigned int nr_unqueued_dirty = 0; > > unsigned long nr_taken; > > struct reclaim_stat stat; > > bool file = is_file_lru(lru); > > enum vm_event_item item; > > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > > bool stalled = false; > > + bool is_retried = false; The name is_retried is a bit confusing. It should be is_retry or is_retrying since we are currently retrying, not that we have already retried. > > > > while (unlikely(too_many_isolated(pgdat, file, sc))) { > > if (stalled) > > @@ -2000,22 +2037,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > > if (nr_taken == 0) > > return 0; > > > > +retry: > > nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false); > > > > + sc->nr.dirty += stat.nr_dirty; > > + sc->nr.congested += stat.nr_congested; > > + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; > > + sc->nr.writeback += stat.nr_writeback; > > + sc->nr.immediate += stat.nr_immediate; > > + total_reclaimed += nr_reclaimed; > > + nr_pageout += stat.nr_pageout; > > + nr_unqueued_dirty += stat.nr_unqueued_dirty; > > + > > + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, > > + nr_scanned, nr_reclaimed, &stat, sc->priority, file); > > This is a bit odd, as nr_scanned during a retry still uses the > previous nr_scanned > value. However, I find that mglru shows no difference. > > retry: > reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); > sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; > sc->nr_reclaimed += reclaimed; > trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, > scanned, reclaimed, &stat, sc->priority, > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); > > Currently, the active/inactive state aligns with mglru in this trace. > It seems that > the userspace BPF should recognize that the nr_scanned during a retry doesn't > indicate we are isolating new nr_scanned folios. Ideally, the is_retry > flag should > be passed to the trace, allowing userspace to identify that it's a retry and > disregard the nr_scanned value. > > It might be worth addressing this in a separate patch. Add Bixuan to clarify > how userspace depends on this trace and if "retry" will break his userspace > BPF for both MGLRU and active/inactive cases. > > Otherwise, the patch looks good to me. > By the way, it's completely clear that the trace was added after mglru's retry: https://lore.kernel.org/linux-mm/20240105013607.2868-3-cuibixuan@vivo.com/ Therefore, I don't believe the potential confusion about nr_scanned in the trace should prevent Ridong's fix for the missed rotation of written-back folios from proceeding. If there is an issue with that, we should open a separate thread to address the trace. Please feel free to add the below in the future version after you fix "is_retried". Reviewed-by: Barry Song <baohua@kernel.org> > > + > > + find_folios_written_back(&folio_list, &clean_list, is_retried); > > + > > spin_lock_irq(&lruvec->lru_lock); > > move_folios_to_lru(lruvec, &folio_list); > > > > __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), > > stat.nr_demoted); > > - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); > > item = PGSTEAL_KSWAPD + reclaimer_offset(); > > if (!cgroup_reclaim(sc)) > > __count_vm_events(item, nr_reclaimed); > > __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); > > __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); > > + > > + if (!list_empty(&clean_list)) { > > + list_splice_init(&clean_list, &folio_list); > > + is_retried = true; > > + spin_unlock_irq(&lruvec->lru_lock); > > + goto retry; > > + } > > + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); > > spin_unlock_irq(&lruvec->lru_lock); > > + sc->nr.taken += nr_taken; > > + if (file) > > + sc->nr.file_taken += nr_taken; > > > > - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); > > + lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed); > > > > /* > > * If dirty folios are scanned that are not queued for IO, it > > @@ -2028,7 +2090,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > > * the flushers simply cannot keep up with the allocation > > * rate. Nudge the flusher threads in case they are asleep. > > */ > > - if (stat.nr_unqueued_dirty == nr_taken) { > > + if (nr_unqueued_dirty == nr_taken) { > > wakeup_flusher_threads(WB_REASON_VMSCAN); > > /* > > * For cgroupv1 dirty throttling is achieved by waking up > > @@ -2043,18 +2105,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, > > reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); > > } > > > > - sc->nr.dirty += stat.nr_dirty; > > - sc->nr.congested += stat.nr_congested; > > - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; > > - sc->nr.writeback += stat.nr_writeback; > > - sc->nr.immediate += stat.nr_immediate; > > - sc->nr.taken += nr_taken; > > - if (file) > > - sc->nr.file_taken += nr_taken; > > - > > - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, > > - nr_scanned, nr_reclaimed, &stat, sc->priority, file); > > - return nr_reclaimed; > > + return total_reclaimed; > > } > > > > /* > > @@ -4585,12 +4636,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > > int reclaimed; > > LIST_HEAD(list); > > LIST_HEAD(clean); > > - struct folio *folio; > > - struct folio *next; > > enum vm_event_item item; > > struct reclaim_stat stat; > > struct lru_gen_mm_walk *walk; > > - bool skip_retry = false; > > + bool is_retried = false; > > struct lru_gen_folio *lrugen = &lruvec->lrugen; > > struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > > @@ -4616,24 +4665,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > > scanned, reclaimed, &stat, sc->priority, > > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); > > > > - list_for_each_entry_safe_reverse(folio, next, &list, lru) { > > - if (!folio_evictable(folio)) { > > - list_del(&folio->lru); > > - folio_putback_lru(folio); > > - continue; > > - } > > - > > - /* retry folios that may have missed folio_rotate_reclaimable() */ > > - if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && > > - !folio_test_dirty(folio) && !folio_test_writeback(folio)) { > > - list_move(&folio->lru, &clean); > > - continue; > > - } > > - > > - /* don't add rejected folios to the oldest generation */ > > - if (!lru_gen_distance(folio, false)) > > - set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); > > - } > > + find_folios_written_back(&list, &clean, is_retried); > > > > spin_lock_irq(&lruvec->lru_lock); > > > > @@ -4656,7 +4688,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > > list_splice_init(&clean, &list); > > > > if (!list_empty(&list)) { > > - skip_retry = true; > > + is_retried = true; > > goto retry; > > } > > > > -- > > 2.34.1 > > > Thanks barry
On 2024/12/20 11:09, Barry Song wrote: > On Fri, Dec 20, 2024 at 3:30 PM Barry Song <21cnbao@gmail.com> wrote: >> >> On Fri, Dec 20, 2024 at 2:19 PM Chen Ridong <chenridong@huaweicloud.com> wrote: >>> >>> From: Chen Ridong <chenridong@huawei.com> >>> >>> The page reclaim isolates a batch of folios from the tail of one of the >>> LRU lists and works on those folios one by one. For a suitable >>> swap-backed folio, if the swap device is async, it queues that folio for >>> writeback. After the page reclaim finishes an entire batch, it puts back >>> the folios it queued for writeback to the head of the original LRU list. >>> >>> In the meantime, the page writeback flushes the queued folios also by >>> batches. Its batching logic is independent from that of the page reclaim. >>> For each of the folios it writes back, the page writeback calls >>> folio_rotate_reclaimable() which tries to rotate a folio to the tail. >>> >>> folio_rotate_reclaimable() only works for a folio after the page reclaim >>> has put it back. If an async swap device is fast enough, the page >>> writeback can finish with that folio while the page reclaim is still >>> working on the rest of the batch containing it. In this case, that folio >>> will remain at the head and the page reclaim will not retry it before >>> reaching there. >>> >>> The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back >>> while isolated") only fixed the issue for mglru. However, this issue >>> also exists in the traditional active/inactive LRU. This issue will be >>> worse if THP is split, which makes the list longer and needs longer time >>> to finish a batch of folios reclaim. >>> >>> This issue should be fixed in the same way for the traditional LRU. >>> Therefore, the common logic was extracted to the 'find_folios_written_back' >>> function firstly, which is then reused in the 'shrink_inactive_list' >>> function. Finally, retry reclaiming those folios that may have missed the >>> rotation for traditional LRU. >>> >>> Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/ >>> Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/ >>> Signed-off-by: Chen Ridong <chenridong@huawei.com> >>> --- >>> mm/vmscan.c | 108 ++++++++++++++++++++++++++++++++++------------------ >>> 1 file changed, 70 insertions(+), 38 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 39886f435ec5..e67e446540ba 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -283,6 +283,39 @@ static void set_task_reclaim_state(struct task_struct *task, >>> task->reclaim_state = rs; >>> } >>> >>> +/** >>> + * find_folios_written_back - Find and move the written back folios to a new list. >>> + * @list: filios list >>> + * @clean: the written back folios list >>> + * @is_retried: whether the list has already been retried. >>> + */ >>> +static inline void find_folios_written_back(struct list_head *list, >>> + struct list_head *clean, bool is_retried) >>> +{ >>> + struct folio *folio; >>> + struct folio *next; >>> + >>> + list_for_each_entry_safe_reverse(folio, next, list, lru) { >>> + if (!folio_evictable(folio)) { >>> + list_del(&folio->lru); >>> + folio_putback_lru(folio); >>> + continue; >>> + } >>> + >>> + /* retry folios that may have missed folio_rotate_reclaimable() */ >>> + if (!is_retried && !folio_test_active(folio) && !folio_mapped(folio) && >>> + !folio_test_dirty(folio) && !folio_test_writeback(folio)) { >>> + list_move(&folio->lru, clean); >>> + continue; >>> + } >>> + >>> + /* don't add rejected folios to the oldest generation */ >>> + if (lru_gen_enabled() && !lru_gen_distance(folio, false)) >>> + set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); >>> + } >>> + >>> +} >>> + >>> /* >>> * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to >>> * scan_control->nr_reclaimed. >>> @@ -1959,14 +1992,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, >>> enum lru_list lru) >>> { >>> LIST_HEAD(folio_list); >>> + LIST_HEAD(clean_list); >>> unsigned long nr_scanned; >>> - unsigned int nr_reclaimed = 0; >>> + unsigned int nr_reclaimed, total_reclaimed = 0; >>> + unsigned int nr_pageout = 0; >>> + unsigned int nr_unqueued_dirty = 0; >>> unsigned long nr_taken; >>> struct reclaim_stat stat; >>> bool file = is_file_lru(lru); >>> enum vm_event_item item; >>> struct pglist_data *pgdat = lruvec_pgdat(lruvec); >>> bool stalled = false; >>> + bool is_retried = false; > > The name is_retried is a bit confusing. It should be is_retry or > is_retrying since > we are currently retrying, not that we have already retried. > >>> >>> while (unlikely(too_many_isolated(pgdat, file, sc))) { >>> if (stalled) >>> @@ -2000,22 +2037,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, >>> if (nr_taken == 0) >>> return 0; >>> >>> +retry: >>> nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false); >>> >>> + sc->nr.dirty += stat.nr_dirty; >>> + sc->nr.congested += stat.nr_congested; >>> + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; >>> + sc->nr.writeback += stat.nr_writeback; >>> + sc->nr.immediate += stat.nr_immediate; >>> + total_reclaimed += nr_reclaimed; >>> + nr_pageout += stat.nr_pageout; >>> + nr_unqueued_dirty += stat.nr_unqueued_dirty; >>> + >>> + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, >>> + nr_scanned, nr_reclaimed, &stat, sc->priority, file); >> >> This is a bit odd, as nr_scanned during a retry still uses the >> previous nr_scanned >> value. However, I find that mglru shows no difference. >> >> retry: >> reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); >> sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; >> sc->nr_reclaimed += reclaimed; >> trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, >> scanned, reclaimed, &stat, sc->priority, >> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); >> >> Currently, the active/inactive state aligns with mglru in this trace. >> It seems that >> the userspace BPF should recognize that the nr_scanned during a retry doesn't >> indicate we are isolating new nr_scanned folios. Ideally, the is_retry >> flag should >> be passed to the trace, allowing userspace to identify that it's a retry and >> disregard the nr_scanned value. >> >> It might be worth addressing this in a separate patch. Add Bixuan to clarify >> how userspace depends on this trace and if "retry" will break his userspace >> BPF for both MGLRU and active/inactive cases. >> >> Otherwise, the patch looks good to me. >> > > By the way, it's completely clear that the trace was added after mglru's retry: > https://lore.kernel.org/linux-mm/20240105013607.2868-3-cuibixuan@vivo.com/ > > Therefore, I don't believe the potential confusion about nr_scanned in the trace > should prevent Ridong's fix for the missed rotation of written-back folios from > proceeding. > > If there is an issue with that, we should open a separate thread to address the > trace. > > Please feel free to add the below in the future version after you fix > "is_retried". > > Reviewed-by: Barry Song <baohua@kernel.org> > Thank you very much. I will update. Best regards Ridong >>> + >>> + find_folios_written_back(&folio_list, &clean_list, is_retried); >>> + >>> spin_lock_irq(&lruvec->lru_lock); >>> move_folios_to_lru(lruvec, &folio_list); >>> >>> __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), >>> stat.nr_demoted); >>> - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); >>> item = PGSTEAL_KSWAPD + reclaimer_offset(); >>> if (!cgroup_reclaim(sc)) >>> __count_vm_events(item, nr_reclaimed); >>> __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); >>> __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); >>> + >>> + if (!list_empty(&clean_list)) { >>> + list_splice_init(&clean_list, &folio_list); >>> + is_retried = true; >>> + spin_unlock_irq(&lruvec->lru_lock); >>> + goto retry; >>> + } >>> + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); >>> spin_unlock_irq(&lruvec->lru_lock); >>> + sc->nr.taken += nr_taken; >>> + if (file) >>> + sc->nr.file_taken += nr_taken; >>> >>> - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); >>> + lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed); >>> >>> /* >>> * If dirty folios are scanned that are not queued for IO, it >>> @@ -2028,7 +2090,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, >>> * the flushers simply cannot keep up with the allocation >>> * rate. Nudge the flusher threads in case they are asleep. >>> */ >>> - if (stat.nr_unqueued_dirty == nr_taken) { >>> + if (nr_unqueued_dirty == nr_taken) { >>> wakeup_flusher_threads(WB_REASON_VMSCAN); >>> /* >>> * For cgroupv1 dirty throttling is achieved by waking up >>> @@ -2043,18 +2105,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, >>> reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); >>> } >>> >>> - sc->nr.dirty += stat.nr_dirty; >>> - sc->nr.congested += stat.nr_congested; >>> - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; >>> - sc->nr.writeback += stat.nr_writeback; >>> - sc->nr.immediate += stat.nr_immediate; >>> - sc->nr.taken += nr_taken; >>> - if (file) >>> - sc->nr.file_taken += nr_taken; >>> - >>> - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, >>> - nr_scanned, nr_reclaimed, &stat, sc->priority, file); >>> - return nr_reclaimed; >>> + return total_reclaimed; >>> } >>> >>> /* >>> @@ -4585,12 +4636,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap >>> int reclaimed; >>> LIST_HEAD(list); >>> LIST_HEAD(clean); >>> - struct folio *folio; >>> - struct folio *next; >>> enum vm_event_item item; >>> struct reclaim_stat stat; >>> struct lru_gen_mm_walk *walk; >>> - bool skip_retry = false; >>> + bool is_retried = false; >>> struct lru_gen_folio *lrugen = &lruvec->lrugen; >>> struct mem_cgroup *memcg = lruvec_memcg(lruvec); >>> struct pglist_data *pgdat = lruvec_pgdat(lruvec); >>> @@ -4616,24 +4665,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap >>> scanned, reclaimed, &stat, sc->priority, >>> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); >>> >>> - list_for_each_entry_safe_reverse(folio, next, &list, lru) { >>> - if (!folio_evictable(folio)) { >>> - list_del(&folio->lru); >>> - folio_putback_lru(folio); >>> - continue; >>> - } >>> - >>> - /* retry folios that may have missed folio_rotate_reclaimable() */ >>> - if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && >>> - !folio_test_dirty(folio) && !folio_test_writeback(folio)) { >>> - list_move(&folio->lru, &clean); >>> - continue; >>> - } >>> - >>> - /* don't add rejected folios to the oldest generation */ >>> - if (!lru_gen_distance(folio, false)) >>> - set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); >>> - } >>> + find_folios_written_back(&list, &clean, is_retried); >>> >>> spin_lock_irq(&lruvec->lru_lock); >>> >>> @@ -4656,7 +4688,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap >>> list_splice_init(&clean, &list); >>> >>> if (!list_empty(&list)) { >>> - skip_retry = true; >>> + is_retried = true; >>> goto retry; >>> } >>> >>> -- >>> 2.34.1 >>> >> > > Thanks > barry
Hi Chen, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Chen-Ridong/mm-vmscan-retry-folios-written-back-while-isolated-for-traditional-LRU/20241220-092147 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20241220010931.3603111-1-chenridong%40huaweicloud.com patch subject: [PATCH -next v5] mm: vmscan: retry folios written back while isolated for traditional LRU config: i386-buildonly-randconfig-004-20241220 (https://download.01.org/0day-ci/archive/20241221/202412210101.Og2hX0Rs-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241221/202412210101.Og2hX0Rs-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412210101.Og2hX0Rs-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from mm/vmscan.c:30: include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~ ^ ~~~ include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 49 | NR_ZONE_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~~~~~~ ^ ~~~ In file included from mm/vmscan.c:42: In file included from include/linux/migrate.h:8: include/linux/hugetlb.h:1063:5: warning: no previous prototype for function 'replace_free_hugepage_folios' [-Wmissing-prototypes] 1063 | int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) | ^ include/linux/hugetlb.h:1063:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 1063 | int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) | ^ | static >> mm/vmscan.c:313:29: error: call to undeclared function 'lru_gen_distance'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 313 | if (lru_gen_enabled() && !lru_gen_distance(folio, false)) | ^ mm/vmscan.c:442:51: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 442 | size += zone_page_state(zone, NR_ZONE_LRU_BASE + lru); | ~~~~~~~~~~~~~~~~ ^ ~~~ mm/vmscan.c:1806:4: warning: arithmetic between different enumeration types ('enum vm_event_item' and 'enum zone_type') [-Wenum-enum-conversion] 1806 | __count_zid_vm_events(PGSCAN_SKIP, zid, nr_skipped[zid]); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:139:34: note: expanded from macro '__count_zid_vm_events' 139 | __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) | ~~~~~~~~~~~~~ ^ ~~~~~~~~~~~ mm/vmscan.c:2330:51: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 2330 | inactive = lruvec_page_state(lruvec, NR_LRU_BASE + inactive_lru); | ~~~~~~~~~~~ ^ ~~~~~~~~~~~~ mm/vmscan.c:2331:49: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 2331 | active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru); | ~~~~~~~~~~~ ^ ~~~~~~~~~~ mm/vmscan.c:6271:3: warning: arithmetic between different enumeration types ('enum vm_event_item' and 'enum zone_type') [-Wenum-enum-conversion] 6271 | __count_zid_vm_events(ALLOCSTALL, sc->reclaim_idx, 1); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:139:34: note: expanded from macro '__count_zid_vm_events' 139 | __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) | ~~~~~~~~~~~~~ ^ ~~~~~~~~~~~ 8 warnings and 1 error generated. vim +/lru_gen_distance +313 mm/vmscan.c 285 286 /** 287 * find_folios_written_back - Find and move the written back folios to a new list. 288 * @list: filios list 289 * @clean: the written back folios list 290 * @is_retried: whether the list has already been retried. 291 */ 292 static inline void find_folios_written_back(struct list_head *list, 293 struct list_head *clean, bool is_retried) 294 { 295 struct folio *folio; 296 struct folio *next; 297 298 list_for_each_entry_safe_reverse(folio, next, list, lru) { 299 if (!folio_evictable(folio)) { 300 list_del(&folio->lru); 301 folio_putback_lru(folio); 302 continue; 303 } 304 305 /* retry folios that may have missed folio_rotate_reclaimable() */ 306 if (!is_retried && !folio_test_active(folio) && !folio_mapped(folio) && 307 !folio_test_dirty(folio) && !folio_test_writeback(folio)) { 308 list_move(&folio->lru, clean); 309 continue; 310 } 311 312 /* don't add rejected folios to the oldest generation */ > 313 if (lru_gen_enabled() && !lru_gen_distance(folio, false)) 314 set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); 315 } 316
Hi Chen, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Chen-Ridong/mm-vmscan-retry-folios-written-back-while-isolated-for-traditional-LRU/20241220-092147 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20241220010931.3603111-1-chenridong%40huaweicloud.com patch subject: [PATCH -next v5] mm: vmscan: retry folios written back while isolated for traditional LRU config: i386-buildonly-randconfig-003-20241220 (https://download.01.org/0day-ci/archive/20241221/202412211023.UIBZqNFK-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241221/202412211023.UIBZqNFK-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412211023.UIBZqNFK-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/linux/migrate.h:8, from mm/vmscan.c:42: include/linux/hugetlb.h:1063:5: warning: no previous prototype for 'replace_free_hugepage_folios' [-Wmissing-prototypes] 1063 | int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/vmscan.c: In function 'find_folios_written_back': >> mm/vmscan.c:313:43: error: implicit declaration of function 'lru_gen_distance'; did you mean 'node_distance'? [-Werror=implicit-function-declaration] 313 | if (lru_gen_enabled() && !lru_gen_distance(folio, false)) | ^~~~~~~~~~~~~~~~ | node_distance cc1: some warnings being treated as errors vim +313 mm/vmscan.c 285 286 /** 287 * find_folios_written_back - Find and move the written back folios to a new list. 288 * @list: filios list 289 * @clean: the written back folios list 290 * @is_retried: whether the list has already been retried. 291 */ 292 static inline void find_folios_written_back(struct list_head *list, 293 struct list_head *clean, bool is_retried) 294 { 295 struct folio *folio; 296 struct folio *next; 297 298 list_for_each_entry_safe_reverse(folio, next, list, lru) { 299 if (!folio_evictable(folio)) { 300 list_del(&folio->lru); 301 folio_putback_lru(folio); 302 continue; 303 } 304 305 /* retry folios that may have missed folio_rotate_reclaimable() */ 306 if (!is_retried && !folio_test_active(folio) && !folio_mapped(folio) && 307 !folio_test_dirty(folio) && !folio_test_writeback(folio)) { 308 list_move(&folio->lru, clean); 309 continue; 310 } 311 312 /* don't add rejected folios to the oldest generation */ > 313 if (lru_gen_enabled() && !lru_gen_distance(folio, false)) 314 set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); 315 } 316
diff --git a/mm/vmscan.c b/mm/vmscan.c index 39886f435ec5..e67e446540ba 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -283,6 +283,39 @@ static void set_task_reclaim_state(struct task_struct *task, task->reclaim_state = rs; } +/** + * find_folios_written_back - Find and move the written back folios to a new list. + * @list: filios list + * @clean: the written back folios list + * @is_retried: whether the list has already been retried. + */ +static inline void find_folios_written_back(struct list_head *list, + struct list_head *clean, bool is_retried) +{ + struct folio *folio; + struct folio *next; + + list_for_each_entry_safe_reverse(folio, next, list, lru) { + if (!folio_evictable(folio)) { + list_del(&folio->lru); + folio_putback_lru(folio); + continue; + } + + /* retry folios that may have missed folio_rotate_reclaimable() */ + if (!is_retried && !folio_test_active(folio) && !folio_mapped(folio) && + !folio_test_dirty(folio) && !folio_test_writeback(folio)) { + list_move(&folio->lru, clean); + continue; + } + + /* don't add rejected folios to the oldest generation */ + if (lru_gen_enabled() && !lru_gen_distance(folio, false)) + set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); + } + +} + /* * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to * scan_control->nr_reclaimed. @@ -1959,14 +1992,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, enum lru_list lru) { LIST_HEAD(folio_list); + LIST_HEAD(clean_list); unsigned long nr_scanned; - unsigned int nr_reclaimed = 0; + unsigned int nr_reclaimed, total_reclaimed = 0; + unsigned int nr_pageout = 0; + unsigned int nr_unqueued_dirty = 0; unsigned long nr_taken; struct reclaim_stat stat; bool file = is_file_lru(lru); enum vm_event_item item; struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + bool is_retried = false; while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) @@ -2000,22 +2037,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (nr_taken == 0) return 0; +retry: nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false); + sc->nr.dirty += stat.nr_dirty; + sc->nr.congested += stat.nr_congested; + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; + sc->nr.writeback += stat.nr_writeback; + sc->nr.immediate += stat.nr_immediate; + total_reclaimed += nr_reclaimed; + nr_pageout += stat.nr_pageout; + nr_unqueued_dirty += stat.nr_unqueued_dirty; + + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + nr_scanned, nr_reclaimed, &stat, sc->priority, file); + + find_folios_written_back(&folio_list, &clean_list, is_retried); + spin_lock_irq(&lruvec->lru_lock); move_folios_to_lru(lruvec, &folio_list); __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), stat.nr_demoted); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); item = PGSTEAL_KSWAPD + reclaimer_offset(); if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); + + if (!list_empty(&clean_list)) { + list_splice_init(&clean_list, &folio_list); + is_retried = true; + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&lruvec->lru_lock); + sc->nr.taken += nr_taken; + if (file) + sc->nr.file_taken += nr_taken; - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); + lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed); /* * If dirty folios are scanned that are not queued for IO, it @@ -2028,7 +2090,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, * the flushers simply cannot keep up with the allocation * rate. Nudge the flusher threads in case they are asleep. */ - if (stat.nr_unqueued_dirty == nr_taken) { + if (nr_unqueued_dirty == nr_taken) { wakeup_flusher_threads(WB_REASON_VMSCAN); /* * For cgroupv1 dirty throttling is achieved by waking up @@ -2043,18 +2105,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); } - sc->nr.dirty += stat.nr_dirty; - sc->nr.congested += stat.nr_congested; - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; - sc->nr.writeback += stat.nr_writeback; - sc->nr.immediate += stat.nr_immediate; - sc->nr.taken += nr_taken; - if (file) - sc->nr.file_taken += nr_taken; - - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, - nr_scanned, nr_reclaimed, &stat, sc->priority, file); - return nr_reclaimed; + return total_reclaimed; } /* @@ -4585,12 +4636,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap int reclaimed; LIST_HEAD(list); LIST_HEAD(clean); - struct folio *folio; - struct folio *next; enum vm_event_item item; struct reclaim_stat stat; struct lru_gen_mm_walk *walk; - bool skip_retry = false; + bool is_retried = false; struct lru_gen_folio *lrugen = &lruvec->lrugen; struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); @@ -4616,24 +4665,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap scanned, reclaimed, &stat, sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); - list_for_each_entry_safe_reverse(folio, next, &list, lru) { - if (!folio_evictable(folio)) { - list_del(&folio->lru); - folio_putback_lru(folio); - continue; - } - - /* retry folios that may have missed folio_rotate_reclaimable() */ - if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && - !folio_test_dirty(folio) && !folio_test_writeback(folio)) { - list_move(&folio->lru, &clean); - continue; - } - - /* don't add rejected folios to the oldest generation */ - if (!lru_gen_distance(folio, false)) - set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); - } + find_folios_written_back(&list, &clean, is_retried); spin_lock_irq(&lruvec->lru_lock); @@ -4656,7 +4688,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap list_splice_init(&clean, &list); if (!list_empty(&list)) { - skip_retry = true; + is_retried = true; goto retry; }