diff mbox series

[2/2] mm: multi-gen lru: fix stat count

Message ID 20231018082104.3918770-3-link@vivo.com (mailing list archive)
State New
Headers show
Series some fix of multi-gen lru | expand

Commit Message

Huan Yang Oct. 18, 2023, 8:21 a.m. UTC
For multi-gen lru reclaim in evict_folios, like shrink_inactive_list,
gather folios which isolate to reclaim, and invoke shirnk_folio_list.

But, when complete shrink, it not gather shrink reclaim stat into sc,
we can't get info like nr_dirty\congested in reclaim, and then
control writeback, dirty number and mark as LRUVEC_CONGESTED, or
just bpf trace shrink and get correct sc stat.

This patch fix this by simple copy code from shrink_inactive_list when
end of shrink list.

Signed-off-by: Huan Yang <link@vivo.com>
---
 mm/vmscan.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Comments

Yu Zhao Oct. 18, 2023, 4:21 p.m. UTC | #1
On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote:
>
> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list,
> gather folios which isolate to reclaim, and invoke shirnk_folio_list.
>
> But, when complete shrink, it not gather shrink reclaim stat into sc,
> we can't get info like nr_dirty\congested in reclaim, and then
> control writeback, dirty number and mark as LRUVEC_CONGESTED, or
> just bpf trace shrink and get correct sc stat.
>
> This patch fix this by simple copy code from shrink_inactive_list when
> end of shrink list.

MGLRU doesn't try to write back dirt file pages in the reclaim path --
it filters them out in sort_folio() and leaves them to the page
writeback. (The page writeback is a dedicated component for this
purpose). So there is nothing to fix.
Huan Yang Oct. 19, 2023, 2:17 a.m. UTC | #2
Hi Yu Zhao,

Thanks for your reply.

在 2023/10/19 0:21, Yu Zhao 写道:
> On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote:
>> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list,
>> gather folios which isolate to reclaim, and invoke shirnk_folio_list.
>>
>> But, when complete shrink, it not gather shrink reclaim stat into sc,
>> we can't get info like nr_dirty\congested in reclaim, and then
>> control writeback, dirty number and mark as LRUVEC_CONGESTED, or
>> just bpf trace shrink and get correct sc stat.
>>
>> This patch fix this by simple copy code from shrink_inactive_list when
>> end of shrink list.
> MGLRU doesn't try to write back dirt file pages in the reclaim path --
> it filters them out in sort_folio() and leaves them to the page
Nice to know this,  sort_folio() filters some folio indeed.
But, I want to know, if we touch some folio in shrink_folio_list(), may some
folio become dirty or writeback even if sort_folio() filter then?
> writeback. (The page writeback is a dedicated component for this
> purpose). So there is nothing to fix.
Yu Zhao Oct. 19, 2023, 2:39 a.m. UTC | #3
On Wed, Oct 18, 2023 at 8:17 PM Huan Yang <link@vivo.com> wrote:
>
> Hi Yu Zhao,
>
> Thanks for your reply.
>
> 在 2023/10/19 0:21, Yu Zhao 写道:
> > On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote:
> >> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list,
> >> gather folios which isolate to reclaim, and invoke shirnk_folio_list.
> >>
> >> But, when complete shrink, it not gather shrink reclaim stat into sc,
> >> we can't get info like nr_dirty\congested in reclaim, and then
> >> control writeback, dirty number and mark as LRUVEC_CONGESTED, or
> >> just bpf trace shrink and get correct sc stat.
> >>
> >> This patch fix this by simple copy code from shrink_inactive_list when
> >> end of shrink list.
> > MGLRU doesn't try to write back dirt file pages in the reclaim path --
> > it filters them out in sort_folio() and leaves them to the page
> Nice to know this,  sort_folio() filters some folio indeed.
> But, I want to know, if we touch some folio in shrink_folio_list(), may some
> folio become dirty or writeback even if sort_folio() filter then?

Good question: in that case MGLRU still doesn't try to write those
folios back because isolate_folio() cleared PG_reclaim and
shrink_folio_list() checks PG_reclaim:

if (folio_test_dirty(folio)) {
/*
* Only kswapd can writeback filesystem folios
* to avoid risk of stack overflow. But avoid
* injecting inefficient single-folio I/O into
* flusher writeback as much as possible: only
* write folios when we've encountered many
* dirty folios, and when we've already scanned
* the rest of the LRU for clean folios and see
* the same dirty folios again (with the reclaim
* flag set).
*/
if (folio_is_file_lru(folio) &&
    (!current_is_kswapd() ||
     !folio_test_reclaim(folio) ||
     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
Huan Yang Oct. 19, 2023, 6:29 a.m. UTC | #4
在 2023/10/19 10:39, Yu Zhao 写道:
> On Wed, Oct 18, 2023 at 8:17 PM Huan Yang <link@vivo.com> wrote:
>> Hi Yu Zhao,
>>
>> Thanks for your reply.
>>
>> 在 2023/10/19 0:21, Yu Zhao 写道:
>>> On Wed, Oct 18, 2023 at 2:22 AM Huan Yang <link@vivo.com> wrote:
>>>> For multi-gen lru reclaim in evict_folios, like shrink_inactive_list,
>>>> gather folios which isolate to reclaim, and invoke shirnk_folio_list.
>>>>
>>>> But, when complete shrink, it not gather shrink reclaim stat into sc,
>>>> we can't get info like nr_dirty\congested in reclaim, and then
>>>> control writeback, dirty number and mark as LRUVEC_CONGESTED, or
>>>> just bpf trace shrink and get correct sc stat.
>>>>
>>>> This patch fix this by simple copy code from shrink_inactive_list when
>>>> end of shrink list.
>>> MGLRU doesn't try to write back dirt file pages in the reclaim path --
>>> it filters them out in sort_folio() and leaves them to the page
>> Nice to know this,  sort_folio() filters some folio indeed.
>> But, I want to know, if we touch some folio in shrink_folio_list(), may some
>> folio become dirty or writeback even if sort_folio() filter then?
> Good question: in that case MGLRU still doesn't try to write those
> folios back because isolate_folio() cleared PG_reclaim and
> shrink_folio_list() checks PG_reclaim:

Thank you too much. So, MGLRU have many diff between typic LRU reclaim.
So, why don't offer MGLRU a own shrink path to avoid so many check of folio?
And more think, it's nice to assign a anon/file reclaim hook into 
anon_vma/address_space?
(Each folio, have their own shrink path, don't try check path if it no 
need.)

>
> if (folio_test_dirty(folio)) {
> /*
> * Only kswapd can writeback filesystem folios
> * to avoid risk of stack overflow. But avoid
> * injecting inefficient single-folio I/O into
> * flusher writeback as much as possible: only
> * write folios when we've encountered many
> * dirty folios, and when we've already scanned
> * the rest of the LRU for clean folios and see
> * the same dirty folios again (with the reclaim
> * flag set).
> */
> if (folio_is_file_lru(folio) &&
>      (!current_is_kswapd() ||
>       !folio_test_reclaim(folio) ||
>       !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
Thanks
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 21099b9f21e0..88d1d586aea5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4593,6 +4593,41 @@  static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	 */
 	nr_taken = sc->nr_scanned - nr_taken;
 
+	/*
+	 * If dirty folios are scanned that are not queued for IO, it
+	 * implies that flushers are not doing their job. This can
+	 * happen when memory pressure pushes dirty folios to the end of
+	 * the LRU before the dirty limits are breached and the dirty
+	 * data has expired. It can also happen when the proportion of
+	 * dirty folios grows not through writes but through memory
+	 * pressure reclaiming all the clean cache. And in some cases,
+	 * the flushers simply cannot keep up with the allocation
+	 * rate. Nudge the flusher threads in case they are asleep.
+	 */
+	if (unlikely(stat.nr_unqueued_dirty == nr_taken)) {
+		wakeup_flusher_threads(WB_REASON_VMSCAN);
+		/*
+		 * For cgroupv1 dirty throttling is achieved by waking up
+		 * the kernel flusher here and later waiting on folios
+		 * which are in writeback to finish (see shrink_folio_list()).
+		 *
+		 * Flusher may not be able to issue writeback quickly
+		 * enough for cgroupv1 writeback throttling to work
+		 * on a large system.
+		 */
+		if (!writeback_throttling_sane(sc))
+			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+	}
+
+	sc->nr.dirty += stat.nr_dirty;
+	sc->nr.congested += stat.nr_congested;
+	sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
+	sc->nr.writeback += stat.nr_writeback;
+	sc->nr.immediate += stat.nr_immediate;
+	sc->nr.taken += nr_taken;
+	if (type)
+		sc->nr.file_taken += nr_taken;
+
 	sc->nr_reclaimed += total_reclaimed;
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, nr_taken,
 					     total_reclaimed, &stat,