Message ID | 20231018082104.3918770-3-link@vivo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | some fix of multi-gen lru | expand |
On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: > > For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, > gather folios which isolate to reclaim, and invoke shirnk_folio_list. > > But, when complete shrink, it not gather shrink reclaim stat into sc, > we can't get info like nr_dirty\congested in reclaim, and then > control writeback, dirty number and mark as LRUVEC_CONGESTED, or > just bpf trace shrink and get correct sc stat. > > This patch fix this by simple copy code from shrink_inactive_list when > end of shrink list. MGLRU doesn't try to write back dirt file pages in the reclaim path -- it filters them out in sort_folio() and leaves them to the page writeback. (The page writeback is a dedicated component for this purpose). So there is nothing to fix.
Hi Yu Zhao, Thanks for your reply. 在 2023/10/19 0:21, Yu Zhao 写道: > On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: >> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, >> gather folios which isolate to reclaim, and invoke shirnk_folio_list. >> >> But, when complete shrink, it not gather shrink reclaim stat into sc, >> we can't get info like nr_dirty\congested in reclaim, and then >> control writeback, dirty number and mark as LRUVEC_CONGESTED, or >> just bpf trace shrink and get correct sc stat. >> >> This patch fix this by simple copy code from shrink_inactive_list when >> end of shrink list. > MGLRU doesn't try to write back dirt file pages in the reclaim path -- > it filters them out in sort_folio() and leaves them to the page Nice to know this, sort_folio() filters some folio indeed. But, I want to know, if we touch some folio in shrink_folio_list(), may some folio become dirty or writeback even if sort_folio() filter then? > writeback. (The page writeback is a dedicated component for this > purpose). So there is nothing to fix.
On Wed, Oct 18, 2023 at 8:17 PM Huan Yang <link@vivo.com> wrote: > > Hi Yu Zhao, > > Thanks for your reply. > > 在 2023/10/19 0:21, Yu Zhao 写道: > > On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: > >> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, > >> gather folios which isolate to reclaim, and invoke shirnk_folio_list. > >> > >> But, when complete shrink, it not gather shrink reclaim stat into sc, > >> we can't get info like nr_dirty\congested in reclaim, and then > >> control writeback, dirty number and mark as LRUVEC_CONGESTED, or > >> just bpf trace shrink and get correct sc stat. > >> > >> This patch fix this by simple copy code from shrink_inactive_list when > >> end of shrink list. > > MGLRU doesn't try to write back dirt file pages in the reclaim path -- > > it filters them out in sort_folio() and leaves them to the page > Nice to know this, sort_folio() filters some folio indeed. > But, I want to know, if we touch some folio in shrink_folio_list(), may some > folio become dirty or writeback even if sort_folio() filter then? Good question: in that case MGLRU still doesn't try to write those folios back because isolate_folio() cleared PG_reclaim and shrink_folio_list() checks PG_reclaim: if (folio_test_dirty(folio)) { /* * Only kswapd can writeback filesystem folios * to avoid risk of stack overflow. But avoid * injecting inefficient single-folio I/O into * flusher writeback as much as possible: only * write folios when we've encountered many * dirty folios, and when we've already scanned * the rest of the LRU for clean folios and see * the same dirty folios again (with the reclaim * flag set). */ if (folio_is_file_lru(folio) && (!current_is_kswapd() || !folio_test_reclaim(folio) || !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
在 2023/10/19 10:39, Yu Zhao 写道: > On Wed, Oct 18, 2023 at 8:17 PM Huan Yang <link@vivo.com> wrote: >> Hi Yu Zhao, >> >> Thanks for your reply. >> >> 在 2023/10/19 0:21, Yu Zhao 写道: >>> On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote: >>>> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, >>>> gather folios which isolate to reclaim, and invoke shirnk_folio_list. >>>> >>>> But, when complete shrink, it not gather shrink reclaim stat into sc, >>>> we can't get info like nr_dirty\congested in reclaim, and then >>>> control writeback, dirty number and mark as LRUVEC_CONGESTED, or >>>> just bpf trace shrink and get correct sc stat. >>>> >>>> This patch fix this by simple copy code from shrink_inactive_list when >>>> end of shrink list. >>> MGLRU doesn't try to write back dirt file pages in the reclaim path -- >>> it filters them out in sort_folio() and leaves them to the page >> Nice to know this, sort_folio() filters some folio indeed. >> But, I want to know, if we touch some folio in shrink_folio_list(), may some >> folio become dirty or writeback even if sort_folio() filter then? > Good question: in that case MGLRU still doesn't try to write those > folios back because isolate_folio() cleared PG_reclaim and > shrink_folio_list() checks PG_reclaim: Thank you too much. So, MGLRU have many diff between typic LRU reclaim. So, why don't offer MGLRU a own shrink path to avoid so many check of folio? And more think, it's nice to assign a anon/file reclaim hook into anon_vma/address_space? (Each folio, have their own shrink path, don't try check path if it no need.) > > if (folio_test_dirty(folio)) { > /* > * Only kswapd can writeback filesystem folios > * to avoid risk of stack overflow. But avoid > * injecting inefficient single-folio I/O into > * flusher writeback as much as possible: only > * write folios when we've encountered many > * dirty folios, and when we've already scanned > * the rest of the LRU for clean folios and see > * the same dirty folios again (with the reclaim > * flag set). > */ > if (folio_is_file_lru(folio) && > (!current_is_kswapd() || > !folio_test_reclaim(folio) || > !test_bit(PGDAT_DIRTY, &pgdat->flags))) { Thanks
diff --git a/mm/vmscan.c b/mm/vmscan.c index 21099b9f21e0..88d1d586aea5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4593,6 +4593,41 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap */ nr_taken = sc->nr_scanned - nr_taken; + /* + * If dirty folios are scanned that are not queued for IO, it + * implies that flushers are not doing their job. This can + * happen when memory pressure pushes dirty folios to the end of + * the LRU before the dirty limits are breached and the dirty + * data has expired. It can also happen when the proportion of + * dirty folios grows not through writes but through memory + * pressure reclaiming all the clean cache. And in some cases, + * the flushers simply cannot keep up with the allocation + * rate. Nudge the flusher threads in case they are asleep. + */ + if (unlikely(stat.nr_unqueued_dirty == nr_taken)) { + wakeup_flusher_threads(WB_REASON_VMSCAN); + /* + * For cgroupv1 dirty throttling is achieved by waking up + * the kernel flusher here and later waiting on folios + * which are in writeback to finish (see shrink_folio_list()). + * + * Flusher may not be able to issue writeback quickly + * enough for cgroupv1 writeback throttling to work + * on a large system. + */ + if (!writeback_throttling_sane(sc)) + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); + } + + sc->nr.dirty += stat.nr_dirty; + sc->nr.congested += stat.nr_congested; + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; + sc->nr.writeback += stat.nr_writeback; + sc->nr.immediate += stat.nr_immediate; + sc->nr.taken += nr_taken; + if (type) + sc->nr.file_taken += nr_taken; + sc->nr_reclaimed += total_reclaimed; trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, nr_taken, total_reclaimed, &stat,
For multi-gen lru reclaim in evict_folios, like shrink_inactive_list, gather folios which isolate to reclaim, and invoke shirnk_folio_list. But, when complete shrink, it not gather shrink reclaim stat into sc, we can't get info like nr_dirty\congested in reclaim, and then control writeback, dirty number and mark as LRUVEC_CONGESTED, or just bpf trace shrink and get correct sc stat. This patch fix this by simple copy code from shrink_inactive_list when end of shrink list. Signed-off-by: Huan Yang <link@vivo.com> --- mm/vmscan.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)