diff mbox series

[v2,1/3] mm/zswap: don't return LRU_SKIP if we have dropped lru lock

Message ID 20240126-zswap-writeback-race-v2-1-b10479847099@bytedance.com (mailing list archive)
State New
Headers show
Series mm/zswap: fix race between lru writeback and swapoff | expand

Commit Message

Chengming Zhou Jan. 28, 2024, 1:28 p.m. UTC
LRU_SKIP can only be returned if we don't ever dropped lru lock, or
we need to return LRU_RETRY to restart from the head of lru list.

Otherwise, the iteration might continue from a cursor position that
was freed while the locks were dropped.

Actually we may need to introduce another LRU_STOP to really terminate
the ongoing shrinking scan process, when we encounter a warm page
already in the swap cache. The current list_lru implementation
doesn't have this function to early break from __list_lru_walk_one.

Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/zswap.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Yosry Ahmed Jan. 30, 2024, 12:09 a.m. UTC | #1
On Sun, Jan 28, 2024 at 01:28:49PM +0000, Chengming Zhou wrote:
> LRU_SKIP can only be returned if we don't ever dropped lru lock, or
> we need to return LRU_RETRY to restart from the head of lru list.
> 
> Otherwise, the iteration might continue from a cursor position that
> was freed while the locks were dropped.

Does this warrant a stable backport?

> 
> Actually we may need to introduce another LRU_STOP to really terminate
> the ongoing shrinking scan process, when we encounter a warm page
> already in the swap cache. The current list_lru implementation
> doesn't have this function to early break from __list_lru_walk_one.
> 
> Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Acked-by: Yosry Ahmed <yosryahmed@google.com>
Nhat Pham Jan. 30, 2024, 12:12 a.m. UTC | #2
On Mon, Jan 29, 2024 at 4:09 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Sun, Jan 28, 2024 at 01:28:49PM +0000, Chengming Zhou wrote:
> > LRU_SKIP can only be returned if we don't ever dropped lru lock, or
> > we need to return LRU_RETRY to restart from the head of lru list.
> >
> > Otherwise, the iteration might continue from a cursor position that
> > was freed while the locks were dropped.
>
> Does this warrant a stable backport?

IUC, the zswap shrinker was merged in 6.8, and we're still in the RC's
for 6.8, right? If this patch goes into 6.8 then no need?
Otherwise, yeah it should go to 6.8 stable IMHO.

>
> >
> > Actually we may need to introduce another LRU_STOP to really terminate
> > the ongoing shrinking scan process, when we encounter a warm page
> > already in the swap cache. The current list_lru implementation
> > doesn't have this function to early break from __list_lru_walk_one.
> >
> > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> Acked-by: Yosry Ahmed <yosryahmed@google.com>
Yosry Ahmed Jan. 30, 2024, 12:58 a.m. UTC | #3
On Mon, Jan 29, 2024 at 04:12:54PM -0800, Nhat Pham wrote:
> On Mon, Jan 29, 2024 at 4:09 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Sun, Jan 28, 2024 at 01:28:49PM +0000, Chengming Zhou wrote:
> > > LRU_SKIP can only be returned if we don't ever dropped lru lock, or
> > > we need to return LRU_RETRY to restart from the head of lru list.
> > >
> > > Otherwise, the iteration might continue from a cursor position that
> > > was freed while the locks were dropped.
> >
> > Does this warrant a stable backport?
> 
> IUC, the zswap shrinker was merged in 6.8, and we're still in the RC's
> for 6.8, right? If this patch goes into 6.8 then no need?
> Otherwise, yeah it should go to 6.8 stable IMHO.

For some reason I thought the shrinker went into v6.7, my bad. Then I
guess it should only go into v6.8.
diff mbox series

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index 00e90b9b5417..81cb3790e0dd 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -901,10 +901,8 @@  static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
 		 * into the warmer region. We should terminate shrinking (if we're in the dynamic
 		 * shrinker context).
 		 */
-		if (writeback_result == -EEXIST && encountered_page_in_swapcache) {
-			ret = LRU_SKIP;
+		if (writeback_result == -EEXIST && encountered_page_in_swapcache)
 			*encountered_page_in_swapcache = true;
-		}
 
 		goto put_unlock;
 	}