Message ID | 20240209115950.3885183-2-chengming.zhou@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/zswap: fix LRU reclaim for zswap writeback folios | expand |
On Fri, Feb 9, 2024 at 4:00 AM <chengming.zhou@linux.dev> wrote: > > From: Chengming Zhou <zhouchengming@bytedance.com> > > All LRU move interfaces have a problem that it has no effect if the > folio is isolated from LRU (in cpu batch or isolated by shrinker). > Since it can't move/change folio LRU status when it's isolated, mostly > just clear the folio flag and do nothing in this case. > > In our case, a written back and reclaimable folio won't be rotated to > the tail of inactive list, since it's still in cpu lru_add batch. It > may cause the delayed reclaim of this folio and evict other folios. > > This patch changes to queue the reclaimable folio to cpu rotate batch > even when !folio_test_lru(), hoping it will likely be handled after > the lru_add batch which will put folio on the LRU list first, so > will be rotated to the tail successfully when handle rotate batch. It seems to me that it is totally up to chance whether the lru_add batch is handled first, especially that there may be problems if it isn't. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > mm/swap.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index cd8f0150ba3a..d304731e47cf 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, > > static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) > { > - if (!folio_test_unevictable(folio)) { > + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > + !folio_test_unevictable(folio) && !folio_test_active(folio)) { What are these conditions based on? I assume we want to check if the folio is locked because we no longer check that it is on the LRUs, so we want to check that no one else is operating on it, but I am not sure that's enough. > lruvec_del_folio(lruvec, folio); > folio_clear_active(folio); > lruvec_add_folio_tail(lruvec, folio); > @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) > void folio_rotate_reclaimable(struct folio *folio) > { > if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > - !folio_test_unevictable(folio) && folio_test_lru(folio)) { > + !folio_test_unevictable(folio) && !folio_test_active(folio)) { I am not sure it is safe to continue with a folio that is not on the LRUs. It could be isolated for other purposes, and we end up adding it to an LRU nonetheless. Also, folio_batch_move_lru() will do folio_test_clear_lru() and skip such folios anyway. There may also be messed up accounting, for example lru_move_tail_fn() calls lruvec_del_folio(), which does some bookkeeping, at least for the MGLRU case. > struct folio_batch *fbatch; > unsigned long flags; > > -- > 2.40.1 >
On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: > > From: Chengming Zhou <zhouchengming@bytedance.com> > > All LRU move interfaces have a problem that it has no effect if the > folio is isolated from LRU (in cpu batch or isolated by shrinker). > Since it can't move/change folio LRU status when it's isolated, mostly > just clear the folio flag and do nothing in this case. > > In our case, a written back and reclaimable folio won't be rotated to > the tail of inactive list, since it's still in cpu lru_add batch. It > may cause the delayed reclaim of this folio and evict other folios. > > This patch changes to queue the reclaimable folio to cpu rotate batch > even when !folio_test_lru(), hoping it will likely be handled after > the lru_add batch which will put folio on the LRU list first, so > will be rotated to the tail successfully when handle rotate batch. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> I don't think the analysis is correct. IIRC, writeback from non reclaim paths doesn't require isolation and the reclaim path doesn't use struct folio_batch lru_add. Did you see any performance improvements with this patch? In general, this kind of patches should have performance numbers to show it really helps (not just in theory). My guess is that you are hitting this problem [1]. [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/
On 2024/2/14 15:13, Yu Zhao wrote: > On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: >> >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> All LRU move interfaces have a problem that it has no effect if the >> folio is isolated from LRU (in cpu batch or isolated by shrinker). >> Since it can't move/change folio LRU status when it's isolated, mostly >> just clear the folio flag and do nothing in this case. >> >> In our case, a written back and reclaimable folio won't be rotated to >> the tail of inactive list, since it's still in cpu lru_add batch. It >> may cause the delayed reclaim of this folio and evict other folios. >> >> This patch changes to queue the reclaimable folio to cpu rotate batch >> even when !folio_test_lru(), hoping it will likely be handled after >> the lru_add batch which will put folio on the LRU list first, so >> will be rotated to the tail successfully when handle rotate batch. >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > I don't think the analysis is correct. IIRC, writeback from non > reclaim paths doesn't require isolation and the reclaim path doesn't > use struct folio_batch lru_add. Ah, my bad, I forgot to mention the important context in the message: This is not from the normal reclaim context, it's from zswap writeback reclaim context, which will first set PG_reclaim flag, then submit the async writeback io. If the writeback io complete fast enough, folio_rotate_reclaimable() will be called before that folio put on LRU list (it still in the local lru_add batch, so it's somewhat like isolated too) > > Did you see any performance improvements with this patch? In general, > this kind of patches should have performance numbers to show it really > helps (not just in theory). Right, there are some improvements, the numbers are put in cover letter. But this solution is not good enough, just RFC for discussion. :) mm-unstable-hot zswap-lru-reclaim real 63.34 62.72 user 1063.20 1060.30 sys 272.04 256.14 workingset_refault_anon 2103297.00 1788155.80 workingset_refault_file 28638.20 39249.40 workingset_activate_anon 746134.00 695435.40 workingset_activate_file 4344.60 4255.80 workingset_restore_anon 653163.80 605315.60 workingset_restore_file 1079.00 883.00 workingset_nodereclaim 0.00 0.00 pgscan 12971305.60 12730331.20 pgscan_kswapd 0.00 0.00 pgscan_direct 12971305.60 12730331.20 pgscan_khugepaged 0.00 0.00 > > My guess is that you are hitting this problem [1]. > > [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/ Right, I just see it, it's the same problem. The only difference is that in your case the folio is isolated by shrinker, in my case, the folio is in cpu lru_add batch. Anyway, the result is the same, that folio can't be rotated successfully when writeback complete. Thanks.
On 2024/2/13 16:49, Yosry Ahmed wrote: > On Fri, Feb 9, 2024 at 4:00 AM <chengming.zhou@linux.dev> wrote: >> >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> All LRU move interfaces have a problem that it has no effect if the >> folio is isolated from LRU (in cpu batch or isolated by shrinker). >> Since it can't move/change folio LRU status when it's isolated, mostly >> just clear the folio flag and do nothing in this case. >> >> In our case, a written back and reclaimable folio won't be rotated to >> the tail of inactive list, since it's still in cpu lru_add batch. It >> may cause the delayed reclaim of this folio and evict other folios. >> >> This patch changes to queue the reclaimable folio to cpu rotate batch >> even when !folio_test_lru(), hoping it will likely be handled after >> the lru_add batch which will put folio on the LRU list first, so >> will be rotated to the tail successfully when handle rotate batch. > > It seems to me that it is totally up to chance whether the lru_add > batch is handled first, especially that there may be problems if it > isn't. You're right, I just don't know better solution :) > >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >> --- >> mm/swap.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/mm/swap.c b/mm/swap.c >> index cd8f0150ba3a..d304731e47cf 100644 >> --- a/mm/swap.c >> +++ b/mm/swap.c >> @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, >> >> static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) >> { >> - if (!folio_test_unevictable(folio)) { >> + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && >> + !folio_test_unevictable(folio) && !folio_test_active(folio)) { > > What are these conditions based on? I assume we want to check if the > folio is locked because we no longer check that it is on the LRUs, so > we want to check that no one else is operating on it, but I am not > sure that's enough. These conditions are used for checking whether the folio should be reclaimed/rotated at this point. Like we shouldn't reclaim it if it has been dirtied or actived. lru_move_tail_fn() will only be called after we isolate this folio successfully in folio_batch_move_lru(), so if other path has isolated this folio (cpu batch or reclaim shrinker), this function will not be called. > >> lruvec_del_folio(lruvec, folio); >> folio_clear_active(folio); >> lruvec_add_folio_tail(lruvec, folio); >> @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) >> void folio_rotate_reclaimable(struct folio *folio) >> { >> if (!folio_test_locked(folio) && !folio_test_dirty(folio) && >> - !folio_test_unevictable(folio) && folio_test_lru(folio)) { >> + !folio_test_unevictable(folio) && !folio_test_active(folio)) { > > I am not sure it is safe to continue with a folio that is not on the > LRUs. It could be isolated for other purposes, and we end up adding it > to an LRU nonetheless. Also, folio_batch_move_lru() will do This shouldn't happen since lru_move_tail_fn() will only be called if folio_test_clear_lru() successfully in folio_batch_move_lru(). Thanks.
On Wed, Feb 14, 2024 at 05:54:56PM +0800, Chengming Zhou wrote: > On 2024/2/13 16:49, Yosry Ahmed wrote: > > On Fri, Feb 9, 2024 at 4:00 AM <chengming.zhou@linux.dev> wrote: > >> > >> From: Chengming Zhou <zhouchengming@bytedance.com> > >> > >> All LRU move interfaces have a problem that it has no effect if the > >> folio is isolated from LRU (in cpu batch or isolated by shrinker). > >> Since it can't move/change folio LRU status when it's isolated, mostly > >> just clear the folio flag and do nothing in this case. > >> > >> In our case, a written back and reclaimable folio won't be rotated to > >> the tail of inactive list, since it's still in cpu lru_add batch. It > >> may cause the delayed reclaim of this folio and evict other folios. > >> > >> This patch changes to queue the reclaimable folio to cpu rotate batch > >> even when !folio_test_lru(), hoping it will likely be handled after > >> the lru_add batch which will put folio on the LRU list first, so > >> will be rotated to the tail successfully when handle rotate batch. > > > > It seems to me that it is totally up to chance whether the lru_add > > batch is handled first, especially that there may be problems if it > > isn't. > > You're right, I just don't know better solution :) > > > > >> > >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > >> --- > >> mm/swap.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/mm/swap.c b/mm/swap.c > >> index cd8f0150ba3a..d304731e47cf 100644 > >> --- a/mm/swap.c > >> +++ b/mm/swap.c > >> @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, > >> > >> static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) > >> { > >> - if (!folio_test_unevictable(folio)) { > >> + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > >> + !folio_test_unevictable(folio) && !folio_test_active(folio)) { > > > > What are these conditions based on? I assume we want to check if the > > folio is locked because we no longer check that it is on the LRUs, so > > we want to check that no one else is operating on it, but I am not > > sure that's enough. > > These conditions are used for checking whether the folio should be reclaimed/rotated > at this point. Like we shouldn't reclaim it if it has been dirtied or actived. This should be explained somewhere, a comment or in the commit message. > lru_move_tail_fn() will only be called after we isolate this folio successfully > in folio_batch_move_lru(), so if other path has isolated this folio (cpu batch > or reclaim shrinker), this function will not be called. Interesting, why are we checking if the folio is locked here then? > > > > >> lruvec_del_folio(lruvec, folio); > >> folio_clear_active(folio); > >> lruvec_add_folio_tail(lruvec, folio); > >> @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) > >> void folio_rotate_reclaimable(struct folio *folio) > >> { > >> if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > >> - !folio_test_unevictable(folio) && folio_test_lru(folio)) { > >> + !folio_test_unevictable(folio) && !folio_test_active(folio)) { > > > > I am not sure it is safe to continue with a folio that is not on the > > LRUs. It could be isolated for other purposes, and we end up adding it > > to an LRU nonetheless. Also, folio_batch_move_lru() will do > > This shouldn't happen since lru_move_tail_fn() will only be called if > folio_test_clear_lru() successfully in folio_batch_move_lru(). I see, so this is where we hope lru_add batch gets handled first. I need to think about this some more, let's also see what others like Yu say. Thanks!
On Wed, Feb 14, 2024 at 4:18 AM Chengming Zhou <chengming.zhou@linux.dev> wrote: > > On 2024/2/14 15:13, Yu Zhao wrote: > > On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: > >> > >> From: Chengming Zhou <zhouchengming@bytedance.com> > >> > >> All LRU move interfaces have a problem that it has no effect if the > >> folio is isolated from LRU (in cpu batch or isolated by shrinker). > >> Since it can't move/change folio LRU status when it's isolated, mostly > >> just clear the folio flag and do nothing in this case. > >> > >> In our case, a written back and reclaimable folio won't be rotated to > >> the tail of inactive list, since it's still in cpu lru_add batch. It > >> may cause the delayed reclaim of this folio and evict other folios. > >> > >> This patch changes to queue the reclaimable folio to cpu rotate batch > >> even when !folio_test_lru(), hoping it will likely be handled after > >> the lru_add batch which will put folio on the LRU list first, so > >> will be rotated to the tail successfully when handle rotate batch. > >> > >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > I don't think the analysis is correct. IIRC, writeback from non > > reclaim paths doesn't require isolation and the reclaim path doesn't > > use struct folio_batch lru_add. > > Ah, my bad, I forgot to mention the important context in the message: > > This is not from the normal reclaim context, it's from zswap writeback > reclaim context, which will first set PG_reclaim flag, then submit the > async writeback io. > > If the writeback io complete fast enough, folio_rotate_reclaimable() > will be called before that folio put on LRU list (it still in the local > lru_add batch, so it's somewhat like isolated too) > > > > > Did you see any performance improvements with this patch? In general, > > this kind of patches should have performance numbers to show it really > > helps (not just in theory). > > Right, there are some improvements, the numbers are put in cover letter. > But this solution is not good enough, just RFC for discussion. :) > > mm-unstable-hot zswap-lru-reclaim > real 63.34 62.72 > user 1063.20 1060.30 > sys 272.04 256.14 > workingset_refault_anon 2103297.00 1788155.80 > workingset_refault_file 28638.20 39249.40 > workingset_activate_anon 746134.00 695435.40 > workingset_activate_file 4344.60 4255.80 > workingset_restore_anon 653163.80 605315.60 > workingset_restore_file 1079.00 883.00 > workingset_nodereclaim 0.00 0.00 > pgscan 12971305.60 12730331.20 > pgscan_kswapd 0.00 0.00 > pgscan_direct 12971305.60 12730331.20 > pgscan_khugepaged 0.00 0.00 > > > > > My guess is that you are hitting this problem [1]. > > > > [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/ > > Right, I just see it, it's the same problem. The only difference is that > in your case the folio is isolated by shrinker, in my case, the folio is > in cpu lru_add batch. Anyway, the result is the same, that folio can't be > rotated successfully when writeback complete. In that case, a better solution would be to make lru_add add (_reclaim() && !_dirty() && !_writeback()) folios at the tail. (_rotate() needs to leave _reclaim() set if it fails to rotate.)
On 2024/2/15 02:59, Yosry Ahmed wrote: > On Wed, Feb 14, 2024 at 05:54:56PM +0800, Chengming Zhou wrote: >> On 2024/2/13 16:49, Yosry Ahmed wrote: >>> On Fri, Feb 9, 2024 at 4:00 AM <chengming.zhou@linux.dev> wrote: >>>> >>>> From: Chengming Zhou <zhouchengming@bytedance.com> >>>> >>>> All LRU move interfaces have a problem that it has no effect if the >>>> folio is isolated from LRU (in cpu batch or isolated by shrinker). >>>> Since it can't move/change folio LRU status when it's isolated, mostly >>>> just clear the folio flag and do nothing in this case. >>>> >>>> In our case, a written back and reclaimable folio won't be rotated to >>>> the tail of inactive list, since it's still in cpu lru_add batch. It >>>> may cause the delayed reclaim of this folio and evict other folios. >>>> >>>> This patch changes to queue the reclaimable folio to cpu rotate batch >>>> even when !folio_test_lru(), hoping it will likely be handled after >>>> the lru_add batch which will put folio on the LRU list first, so >>>> will be rotated to the tail successfully when handle rotate batch. >>> >>> It seems to me that it is totally up to chance whether the lru_add >>> batch is handled first, especially that there may be problems if it >>> isn't. >> >> You're right, I just don't know better solution :) >> >>> >>>> >>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >>>> --- >>>> mm/swap.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/mm/swap.c b/mm/swap.c >>>> index cd8f0150ba3a..d304731e47cf 100644 >>>> --- a/mm/swap.c >>>> +++ b/mm/swap.c >>>> @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, >>>> >>>> static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) >>>> { >>>> - if (!folio_test_unevictable(folio)) { >>>> + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && >>>> + !folio_test_unevictable(folio) && !folio_test_active(folio)) { >>> >>> What are these conditions based on? I assume we want to check if the >>> folio is locked because we no longer check that it is on the LRUs, so >>> we want to check that no one else is operating on it, but I am not >>> sure that's enough. >> >> These conditions are used for checking whether the folio should be reclaimed/rotated >> at this point. Like we shouldn't reclaim it if it has been dirtied or actived. > > This should be explained somewhere, a comment or in the commit message. > >> lru_move_tail_fn() will only be called after we isolate this folio successfully >> in folio_batch_move_lru(), so if other path has isolated this folio (cpu batch >> or reclaim shrinker), this function will not be called. > > Interesting, why are we checking if the folio is locked here then? I think it means the folio is using by others, and reclaim needs to lock the folio. Not very sure. > >> >>> >>>> lruvec_del_folio(lruvec, folio); >>>> folio_clear_active(folio); >>>> lruvec_add_folio_tail(lruvec, folio); >>>> @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) >>>> void folio_rotate_reclaimable(struct folio *folio) >>>> { >>>> if (!folio_test_locked(folio) && !folio_test_dirty(folio) && >>>> - !folio_test_unevictable(folio) && folio_test_lru(folio)) { >>>> + !folio_test_unevictable(folio) && !folio_test_active(folio)) { >>> >>> I am not sure it is safe to continue with a folio that is not on the >>> LRUs. It could be isolated for other purposes, and we end up adding it >>> to an LRU nonetheless. Also, folio_batch_move_lru() will do >> >> This shouldn't happen since lru_move_tail_fn() will only be called if >> folio_test_clear_lru() successfully in folio_batch_move_lru(). > > I see, so this is where we hope lru_add batch gets handled first. I need > to think about this some more, let's also see what others like Yu say. Right.
On 2024/2/15 15:06, Yu Zhao wrote: > On Wed, Feb 14, 2024 at 4:18 AM Chengming Zhou <chengming.zhou@linux.dev> wrote: >> >> On 2024/2/14 15:13, Yu Zhao wrote: >>> On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: >>>> >>>> From: Chengming Zhou <zhouchengming@bytedance.com> >>>> >>>> All LRU move interfaces have a problem that it has no effect if the >>>> folio is isolated from LRU (in cpu batch or isolated by shrinker). >>>> Since it can't move/change folio LRU status when it's isolated, mostly >>>> just clear the folio flag and do nothing in this case. >>>> >>>> In our case, a written back and reclaimable folio won't be rotated to >>>> the tail of inactive list, since it's still in cpu lru_add batch. It >>>> may cause the delayed reclaim of this folio and evict other folios. >>>> >>>> This patch changes to queue the reclaimable folio to cpu rotate batch >>>> even when !folio_test_lru(), hoping it will likely be handled after >>>> the lru_add batch which will put folio on the LRU list first, so >>>> will be rotated to the tail successfully when handle rotate batch. >>>> >>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >>> >>> I don't think the analysis is correct. IIRC, writeback from non >>> reclaim paths doesn't require isolation and the reclaim path doesn't >>> use struct folio_batch lru_add. >> >> Ah, my bad, I forgot to mention the important context in the message: >> >> This is not from the normal reclaim context, it's from zswap writeback >> reclaim context, which will first set PG_reclaim flag, then submit the >> async writeback io. >> >> If the writeback io complete fast enough, folio_rotate_reclaimable() >> will be called before that folio put on LRU list (it still in the local >> lru_add batch, so it's somewhat like isolated too) >> >>> >>> Did you see any performance improvements with this patch? In general, >>> this kind of patches should have performance numbers to show it really >>> helps (not just in theory). >> >> Right, there are some improvements, the numbers are put in cover letter. >> But this solution is not good enough, just RFC for discussion. :) >> >> mm-unstable-hot zswap-lru-reclaim >> real 63.34 62.72 >> user 1063.20 1060.30 >> sys 272.04 256.14 >> workingset_refault_anon 2103297.00 1788155.80 >> workingset_refault_file 28638.20 39249.40 >> workingset_activate_anon 746134.00 695435.40 >> workingset_activate_file 4344.60 4255.80 >> workingset_restore_anon 653163.80 605315.60 >> workingset_restore_file 1079.00 883.00 >> workingset_nodereclaim 0.00 0.00 >> pgscan 12971305.60 12730331.20 >> pgscan_kswapd 0.00 0.00 >> pgscan_direct 12971305.60 12730331.20 >> pgscan_khugepaged 0.00 0.00 >> >>> >>> My guess is that you are hitting this problem [1]. >>> >>> [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/ >> >> Right, I just see it, it's the same problem. The only difference is that >> in your case the folio is isolated by shrinker, in my case, the folio is >> in cpu lru_add batch. Anyway, the result is the same, that folio can't be >> rotated successfully when writeback complete. > > In that case, a better solution would be to make lru_add add > (_reclaim() && !_dirty() && !_writeback()) folios at the tail. > (_rotate() needs to leave _reclaim() set if it fails to rotate.) Right, this is a solution. But PG_readahead is alias of PG_reclaim, I'm afraid this would rotate readahead folio to the inactive tail.
On Sat, Feb 17, 2024 at 9:52 PM Chengming Zhou <chengming.zhou@linux.dev> wrote: > > On 2024/2/15 15:06, Yu Zhao wrote: > > On Wed, Feb 14, 2024 at 4:18 AM Chengming Zhou <chengming.zhou@linux.dev> wrote: > >> > >> On 2024/2/14 15:13, Yu Zhao wrote: > >>> On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: > >>>> > >>>> From: Chengming Zhou <zhouchengming@bytedance.com> > >>>> > >>>> All LRU move interfaces have a problem that it has no effect if the > >>>> folio is isolated from LRU (in cpu batch or isolated by shrinker). > >>>> Since it can't move/change folio LRU status when it's isolated, mostly > >>>> just clear the folio flag and do nothing in this case. > >>>> > >>>> In our case, a written back and reclaimable folio won't be rotated to > >>>> the tail of inactive list, since it's still in cpu lru_add batch. It > >>>> may cause the delayed reclaim of this folio and evict other folios. > >>>> > >>>> This patch changes to queue the reclaimable folio to cpu rotate batch > >>>> even when !folio_test_lru(), hoping it will likely be handled after > >>>> the lru_add batch which will put folio on the LRU list first, so > >>>> will be rotated to the tail successfully when handle rotate batch. > >>>> > >>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > >>> > >>> I don't think the analysis is correct. IIRC, writeback from non > >>> reclaim paths doesn't require isolation and the reclaim path doesn't > >>> use struct folio_batch lru_add. > >> > >> Ah, my bad, I forgot to mention the important context in the message: > >> > >> This is not from the normal reclaim context, it's from zswap writeback > >> reclaim context, which will first set PG_reclaim flag, then submit the > >> async writeback io. > >> > >> If the writeback io complete fast enough, folio_rotate_reclaimable() > >> will be called before that folio put on LRU list (it still in the local > >> lru_add batch, so it's somewhat like isolated too) > >> > >>> > >>> Did you see any performance improvements with this patch? In general, > >>> this kind of patches should have performance numbers to show it really > >>> helps (not just in theory). > >> > >> Right, there are some improvements, the numbers are put in cover letter. > >> But this solution is not good enough, just RFC for discussion. :) > >> > >> mm-unstable-hot zswap-lru-reclaim > >> real 63.34 62.72 > >> user 1063.20 1060.30 > >> sys 272.04 256.14 > >> workingset_refault_anon 2103297.00 1788155.80 > >> workingset_refault_file 28638.20 39249.40 > >> workingset_activate_anon 746134.00 695435.40 > >> workingset_activate_file 4344.60 4255.80 > >> workingset_restore_anon 653163.80 605315.60 > >> workingset_restore_file 1079.00 883.00 > >> workingset_nodereclaim 0.00 0.00 > >> pgscan 12971305.60 12730331.20 > >> pgscan_kswapd 0.00 0.00 > >> pgscan_direct 12971305.60 12730331.20 > >> pgscan_khugepaged 0.00 0.00 > >> > >>> > >>> My guess is that you are hitting this problem [1]. > >>> > >>> [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/ > >> > >> Right, I just see it, it's the same problem. The only difference is that > >> in your case the folio is isolated by shrinker, in my case, the folio is > >> in cpu lru_add batch. Anyway, the result is the same, that folio can't be > >> rotated successfully when writeback complete. > > > > In that case, a better solution would be to make lru_add add > > (_reclaim() && !_dirty() && !_writeback()) folios at the tail. > > (_rotate() needs to leave _reclaim() set if it fails to rotate.) > > Right, this is a solution. But PG_readahead is alias of PG_reclaim, > I'm afraid this would rotate readahead folio to the inactive tail. Then drain before setting readahead, since readahead isn't set on every folio.
diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..d304731e47cf 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) { - if (!folio_test_unevictable(folio)) { + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && + !folio_test_unevictable(folio) && !folio_test_active(folio)) { lruvec_del_folio(lruvec, folio); folio_clear_active(folio); lruvec_add_folio_tail(lruvec, folio); @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) void folio_rotate_reclaimable(struct folio *folio) { if (!folio_test_locked(folio) && !folio_test_dirty(folio) && - !folio_test_unevictable(folio) && folio_test_lru(folio)) { + !folio_test_unevictable(folio) && !folio_test_active(folio)) { struct folio_batch *fbatch; unsigned long flags;