Message ID | 20250113093453.1932083-5-kirill.shutemov@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Remove PG_reclaim | expand |
On 13.01.25 10:34, Kirill A. Shutemov wrote: > The recently introduced PG_dropbehind allows for freeing folios > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > to be involved to get the folio freed. > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in > lru_deactivate_file(). > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/swap.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index fc8281ef4241..4eb33b4804a8 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > folio_clear_referenced(folio); > > if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > - /* > - * Setting the reclaim flag could race with > - * folio_end_writeback() and confuse readahead. But the > - * race window is _really_ small and it's not a critical > - * problem. > - */ > lruvec_add_folio(lruvec, folio); > - folio_set_reclaim(folio); > + folio_set_dropbehind(folio); > } else { > /* > * The folio's writeback ended while it was in the batch. Acked-by: David Hildenbrand <david@redhat.com>
On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > The recently introduced PG_dropbehind allows for freeing folios > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > to be involved to get the folio freed. > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in > lru_deactivate_file(). > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > --- > mm/swap.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index fc8281ef4241..4eb33b4804a8 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > folio_clear_referenced(folio); > > if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > - /* > - * Setting the reclaim flag could race with > - * folio_end_writeback() and confuse readahead. But the > - * race window is _really_ small and it's not a critical > - * problem. > - */ > lruvec_add_folio(lruvec, folio); > - folio_set_reclaim(folio); > + folio_set_dropbehind(folio); > } else { > /* > * The folio's writeback ended while it was in the batch. Now there's a difference in behavior here depending on whether or not the folio is under writeback (or will be written back soon). If it is, we set PG_dropbehind to get it freed right after, but if writeback has already ended we put it on the tail of the LRU to be freed later. It's a bit counterintuitive to me that folios with pending writeback get freed faster than folios that completed their writeback already. Am I missing something?
On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote: > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > > > The recently introduced PG_dropbehind allows for freeing folios > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > > to be involved to get the folio freed. > > > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in > > lru_deactivate_file(). > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > --- > > mm/swap.c | 8 +------- > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > diff --git a/mm/swap.c b/mm/swap.c > > index fc8281ef4241..4eb33b4804a8 100644 > > --- a/mm/swap.c > > +++ b/mm/swap.c > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > > folio_clear_referenced(folio); > > > > if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > > - /* > > - * Setting the reclaim flag could race with > > - * folio_end_writeback() and confuse readahead. But the > > - * race window is _really_ small and it's not a critical > > - * problem. > > - */ > > lruvec_add_folio(lruvec, folio); > > - folio_set_reclaim(folio); > > + folio_set_dropbehind(folio); > > } else { > > /* > > * The folio's writeback ended while it was in the batch. > > Now there's a difference in behavior here depending on whether or not > the folio is under writeback (or will be written back soon). If it is, > we set PG_dropbehind to get it freed right after, but if writeback has > already ended we put it on the tail of the LRU to be freed later. > > It's a bit counterintuitive to me that folios with pending writeback > get freed faster than folios that completed their writeback already. > Am I missing something? Yeah, it is strange. I think we can drop the writeback/dirty check. Set PG_dropbehind and put the page on the tail of LRU unconditionally. The check was required to avoid confusion with PG_readahead. Comment above the function is not valid anymore. But the folio that is still dirty under writeback will be freed faster as we get rid of the folio just after writeback is done while clean page can dangle on LRU for a while. I don't think we have any convenient place to free clean dropbehind page other than shrink_folio_list(). Or do we? Looking at shrink_folio_list(), I think we need to bypass page demotion for PG_dropbehind pages.
On Tue, Jan 14, 2025 at 12:12 AM Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote: > > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov > > <kirill.shutemov@linux.intel.com> wrote: > > > > > > The recently introduced PG_dropbehind allows for freeing folios > > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > > > to be involved to get the folio freed. > > > > > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in > > > lru_deactivate_file(). > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > --- > > > mm/swap.c | 8 +------- > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > diff --git a/mm/swap.c b/mm/swap.c > > > index fc8281ef4241..4eb33b4804a8 100644 > > > --- a/mm/swap.c > > > +++ b/mm/swap.c > > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > > > folio_clear_referenced(folio); > > > > > > if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > > > - /* > > > - * Setting the reclaim flag could race with > > > - * folio_end_writeback() and confuse readahead. But the > > > - * race window is _really_ small and it's not a critical > > > - * problem. > > > - */ > > > lruvec_add_folio(lruvec, folio); > > > - folio_set_reclaim(folio); > > > + folio_set_dropbehind(folio); > > > } else { > > > /* > > > * The folio's writeback ended while it was in the batch. > > > > Now there's a difference in behavior here depending on whether or not > > the folio is under writeback (or will be written back soon). If it is, > > we set PG_dropbehind to get it freed right after, but if writeback has > > already ended we put it on the tail of the LRU to be freed later. > > > > It's a bit counterintuitive to me that folios with pending writeback > > get freed faster than folios that completed their writeback already. > > Am I missing something? > > Yeah, it is strange. > > I think we can drop the writeback/dirty check. Set PG_dropbehind and put > the page on the tail of LRU unconditionally. The check was required to > avoid confusion with PG_readahead. > > Comment above the function is not valid anymore. My read is that we don't put dirty/writeback folios at the tail of the LRU because they cannot be freed immediately and we want to give them time to be written back before reclaim reaches them. So I don't think we want to change that and always put the pages at the tail. > > But the folio that is still dirty under writeback will be freed faster as > we get rid of the folio just after writeback is done while clean page can > dangle on LRU for a while. Yeah if we reuse PG_dropbehind then we cannot avoid folio_end_writeback() freeing the folio faster than clean ones. > > I don't think we have any convenient place to free clean dropbehind page > other than shrink_folio_list(). Or do we? Not sure tbh. FWIW I am not saying it's necessarily a bad thing to free dirty/writeback folios before clean ones when deactivated, it's just strange and a behavioral change from today that I wanted to point out. Perhaps that's the best we can do for now. > > Looking at shrink_folio_list(), I think we need to bypass page demotion > for PG_dropbehind pages. > > -- > Kiryl Shutsemau / Kirill A. Shutemov
On Tue, Jan 14, 2025 at 11:03 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Jan 14, 2025 at 12:12 AM Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > > > On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote: > > > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov > > > <kirill.shutemov@linux.intel.com> wrote: > > > > > > > > The recently introduced PG_dropbehind allows for freeing folios > > > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > > > > to be involved to get the folio freed. > > > > > > > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in > > > > lru_deactivate_file(). > > > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > --- > > > > mm/swap.c | 8 +------- > > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > > > diff --git a/mm/swap.c b/mm/swap.c > > > > index fc8281ef4241..4eb33b4804a8 100644 > > > > --- a/mm/swap.c > > > > +++ b/mm/swap.c > > > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > > > > folio_clear_referenced(folio); > > > > > > > > if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > > > > - /* > > > > - * Setting the reclaim flag could race with > > > > - * folio_end_writeback() and confuse readahead. But the > > > > - * race window is _really_ small and it's not a critical > > > > - * problem. > > > > - */ > > > > lruvec_add_folio(lruvec, folio); > > > > - folio_set_reclaim(folio); > > > > + folio_set_dropbehind(folio); > > > > } else { > > > > /* > > > > * The folio's writeback ended while it was in the batch. > > > > > > Now there's a difference in behavior here depending on whether or not > > > the folio is under writeback (or will be written back soon). If it is, > > > we set PG_dropbehind to get it freed right after, but if writeback has > > > already ended we put it on the tail of the LRU to be freed later. > > > > > > It's a bit counterintuitive to me that folios with pending writeback > > > get freed faster than folios that completed their writeback already. > > > Am I missing something? > > > > Yeah, it is strange. > > > > I think we can drop the writeback/dirty check. Set PG_dropbehind and put > > the page on the tail of LRU unconditionally. The check was required to > > avoid confusion with PG_readahead. > > > > Comment above the function is not valid anymore. > > My read is that we don't put dirty/writeback folios at the tail of the > LRU because they cannot be freed immediately and we want to give them > time to be written back before reclaim reaches them. So I don't think > we want to change that and always put the pages at the tail. > > > > > But the folio that is still dirty under writeback will be freed faster as > > we get rid of the folio just after writeback is done while clean page can > > dangle on LRU for a while. > > Yeah if we reuse PG_dropbehind then we cannot avoid > folio_end_writeback() freeing the folio faster than clean ones. > > > > > I don't think we have any convenient place to free clean dropbehind page > > other than shrink_folio_list(). Or do we? > > Not sure tbh. FWIW I am not saying it's necessarily a bad thing to > free dirty/writeback folios before clean ones when deactivated, it's > just strange and a behavioral change from today that I wanted to point > out. Perhaps that's the best we can do for now. > > > > > Looking at shrink_folio_list(), I think we need to bypass page demotion > > for PG_dropbehind pages. I agree with Yosry. I don't think lru_deactivate_file() is still needed -- it was needed only because when truncation fails to free a dirty/writeback folio, page reclaim can do that quickly. For other conditions that mapping_evict_folio() returns 0, there isn't much page reclaim can do, and those conditions are not deactivate_file_folio() and lru_deactivate_file()'s intentions. So the following should be enough, and it's a lot cleaner : diff --git a/mm/truncate.c b/mm/truncate.c index e2e115adfbc5..12d2aa608517 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, * of interest and try to speed up its reclaim. */ if (!ret) { - deactivate_file_folio(folio); + folio_set_dropbehind(folio) /* Likely in the lru cache of a remote CPU */ if (nr_failed) (*nr_failed)++; Then we can drop deactivate_file_folio() and lru_deactivate_file().
On Tue, Jan 14, 2025 at 9:28 PM Yu Zhao <yuzhao@google.com> wrote: > > On Tue, Jan 14, 2025 at 11:03 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > On Tue, Jan 14, 2025 at 12:12 AM Kirill A. Shutemov > > <kirill.shutemov@linux.intel.com> wrote: > > > > > > On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote: > > > > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov > > > > <kirill.shutemov@linux.intel.com> wrote: > > > > > > > > > > The recently introduced PG_dropbehind allows for freeing folios > > > > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan > > > > > to be involved to get the folio freed. > > > > > > > > > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in > > > > > lru_deactivate_file(). > > > > > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > > --- > > > > > mm/swap.c | 8 +------- > > > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > > > > > diff --git a/mm/swap.c b/mm/swap.c > > > > > index fc8281ef4241..4eb33b4804a8 100644 > > > > > --- a/mm/swap.c > > > > > +++ b/mm/swap.c > > > > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) > > > > > folio_clear_referenced(folio); > > > > > > > > > > if (folio_test_writeback(folio) || folio_test_dirty(folio)) { > > > > > - /* > > > > > - * Setting the reclaim flag could race with > > > > > - * folio_end_writeback() and confuse readahead. But the > > > > > - * race window is _really_ small and it's not a critical > > > > > - * problem. > > > > > - */ > > > > > lruvec_add_folio(lruvec, folio); > > > > > - folio_set_reclaim(folio); > > > > > + folio_set_dropbehind(folio); > > > > > } else { > > > > > /* > > > > > * The folio's writeback ended while it was in the batch. > > > > > > > > Now there's a difference in behavior here depending on whether or not > > > > the folio is under writeback (or will be written back soon). If it is, > > > > we set PG_dropbehind to get it freed right after, but if writeback has > > > > already ended we put it on the tail of the LRU to be freed later. > > > > > > > > It's a bit counterintuitive to me that folios with pending writeback > > > > get freed faster than folios that completed their writeback already. > > > > Am I missing something? > > > > > > Yeah, it is strange. > > > > > > I think we can drop the writeback/dirty check. Set PG_dropbehind and put > > > the page on the tail of LRU unconditionally. The check was required to > > > avoid confusion with PG_readahead. > > > > > > Comment above the function is not valid anymore. > > > > My read is that we don't put dirty/writeback folios at the tail of the > > LRU because they cannot be freed immediately and we want to give them > > time to be written back before reclaim reaches them. So I don't think > > we want to change that and always put the pages at the tail. > > > > > > > > But the folio that is still dirty under writeback will be freed faster as > > > we get rid of the folio just after writeback is done while clean page can > > > dangle on LRU for a while. > > > > Yeah if we reuse PG_dropbehind then we cannot avoid > > folio_end_writeback() freeing the folio faster than clean ones. > > > > > > > > I don't think we have any convenient place to free clean dropbehind page > > > other than shrink_folio_list(). Or do we? > > > > Not sure tbh. FWIW I am not saying it's necessarily a bad thing to > > free dirty/writeback folios before clean ones when deactivated, it's > > just strange and a behavioral change from today that I wanted to point > > out. Perhaps that's the best we can do for now. > > > > > > > > Looking at shrink_folio_list(), I think we need to bypass page demotion > > > for PG_dropbehind pages. > > I agree with Yosry. I don't think lru_deactivate_file() is still > needed -- it was needed only because when truncation fails to free a > dirty/writeback folio, page reclaim can do that quickly. For other > conditions that mapping_evict_folio() returns 0, there isn't much page > reclaim can do, and those conditions are not deactivate_file_folio() > and lru_deactivate_file()'s intentions. So the following should be > enough, and it's a lot cleaner : > > diff --git a/mm/truncate.c b/mm/truncate.c > index e2e115adfbc5..12d2aa608517 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct > address_space *mapping, > * of interest and try to speed up its reclaim. > */ > if (!ret) { > - deactivate_file_folio(folio); > + folio_set_dropbehind(folio) > /* Likely in the lru cache of a remote CPU */ > if (nr_failed) > (*nr_failed)++; > > Then we can drop deactivate_file_folio() and lru_deactivate_file(). And with the above and list_move_tail() removed, we can also remove lruvec_add_folio_tail().
diff --git a/mm/swap.c b/mm/swap.c index fc8281ef4241..4eb33b4804a8 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio) folio_clear_referenced(folio); if (folio_test_writeback(folio) || folio_test_dirty(folio)) { - /* - * Setting the reclaim flag could race with - * folio_end_writeback() and confuse readahead. But the - * race window is _really_ small and it's not a critical - * problem. - */ lruvec_add_folio(lruvec, folio); - folio_set_reclaim(folio); + folio_set_dropbehind(folio); } else { /* * The folio's writeback ended while it was in the batch.
The recently introduced PG_dropbehind allows for freeing folios immediately after writeback. Unlike PG_reclaim, it does not need vmscan to be involved to get the folio freed. Instead of using folio_set_reclaim(), use folio_set_dropbehind() in lru_deactivate_file(). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> --- mm/swap.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)