Message ID | 20240201-b4-zswap-invalidate-entry-v1-2-56ed496b6e55@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/zswap: optimize zswap lru list | expand |
On Thu, Feb 01, 2024 at 03:49:02PM +0000, Chengming Zhou wrote: > During testing I found there are some times the zswap_writeback_entry() > return -ENOMEM, which is not we expected: > > bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' > @[-12]: 1563 > @[0]: 277221 > > The reason is that __read_swap_cache_async() return NULL because > swapcache_prepare() failed. The reason is that we won't invalidate > zswap entry when swap entry freed to the per-cpu pool, these zswap > entries are still on the zswap tree and lru list. > > This patch moves the invalidation ahead to when swap entry freed > to the per-cpu pool, since there is no any benefit to leave trashy > zswap entry on the tree and lru list. > > With this patch: > bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' > @[0]: 259744 > > Note: large folio can't have zswap entry for now, so don't bother > to add zswap entry invalidation in the large folio swap free path. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Great catch. Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Thu, Feb 1, 2024 at 7:50 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > During testing I found there are some times the zswap_writeback_entry() > return -ENOMEM, which is not we expected: > > bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' > @[-12]: 1563 > @[0]: 277221 > > The reason is that __read_swap_cache_async() return NULL because > swapcache_prepare() failed. The reason is that we won't invalidate > zswap entry when swap entry freed to the per-cpu pool, these zswap > entries are still on the zswap tree and lru list. > > This patch moves the invalidation ahead to when swap entry freed > to the per-cpu pool, since there is no any benefit to leave trashy > zswap entry on the tree and lru list. > > With this patch: > bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' > @[0]: 259744 Nice! Reviewed-by: Nhat Pham <nphamcs@gmail.com>
On Thu, Feb 01, 2024 at 03:49:02PM +0000, Chengming Zhou wrote: > During testing I found there are some times the zswap_writeback_entry() > return -ENOMEM, which is not we expected: > > bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' > @[-12]: 1563 > @[0]: 277221 > > The reason is that __read_swap_cache_async() return NULL because > swapcache_prepare() failed. The reason is that we won't invalidate > zswap entry when swap entry freed to the per-cpu pool, these zswap > entries are still on the zswap tree and lru list. > > This patch moves the invalidation ahead to when swap entry freed > to the per-cpu pool, since there is no any benefit to leave trashy > zswap entry on the tree and lru list. > > With this patch: > bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' > @[0]: 259744 > > Note: large folio can't have zswap entry for now, so don't bother > to add zswap entry invalidation in the large folio swap free path. This makes me slightly nervous. Should we add a comment somewhere just in case this is missed if someone adds large folio support? Otherwise the patch itself LGTM.
On 2024/2/2 08:11, Yosry Ahmed wrote: > On Thu, Feb 01, 2024 at 03:49:02PM +0000, Chengming Zhou wrote: >> During testing I found there are some times the zswap_writeback_entry() >> return -ENOMEM, which is not we expected: >> >> bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' >> @[-12]: 1563 >> @[0]: 277221 >> >> The reason is that __read_swap_cache_async() return NULL because >> swapcache_prepare() failed. The reason is that we won't invalidate >> zswap entry when swap entry freed to the per-cpu pool, these zswap >> entries are still on the zswap tree and lru list. >> >> This patch moves the invalidation ahead to when swap entry freed >> to the per-cpu pool, since there is no any benefit to leave trashy >> zswap entry on the tree and lru list. >> >> With this patch: >> bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' >> @[0]: 259744 >> >> Note: large folio can't have zswap entry for now, so don't bother >> to add zswap entry invalidation in the large folio swap free path. > > This makes me slightly nervous. Should we add a comment somewhere just > in case this is missed if someone adds large folio support? Ok, will add this comment: + /* Large folio swap slot is not covered. */ zswap_invalidate(entry); > > Otherwise the patch itself LGTM. Thanks!
diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 91895ce1fdbc..341aea490070 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -29,7 +29,7 @@ struct zswap_lruvec_state { bool zswap_store(struct folio *folio); bool zswap_load(struct folio *folio); -void zswap_invalidate(int type, pgoff_t offset); +void zswap_invalidate(swp_entry_t swp); int zswap_swapon(int type, unsigned long nr_pages); void zswap_swapoff(int type); void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg); @@ -50,7 +50,7 @@ static inline bool zswap_load(struct folio *folio) return false; } -static inline void zswap_invalidate(int type, pgoff_t offset) {} +static inline void zswap_invalidate(swp_entry_t swp) {} static inline int zswap_swapon(int type, unsigned long nr_pages) { return 0; diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 0bec1f705f8e..d24cdea26daa 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -273,6 +273,8 @@ void free_swap_slot(swp_entry_t entry) { struct swap_slots_cache *cache; + zswap_invalidate(entry); + cache = raw_cpu_ptr(&swp_slots); if (likely(use_swap_slot_cache && cache->slots_ret)) { spin_lock_irq(&cache->free_lock); diff --git a/mm/swapfile.c b/mm/swapfile.c index 0580bb3e34d7..65b49db89b36 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -744,7 +744,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, swap_slot_free_notify = NULL; while (offset <= end) { arch_swap_invalidate_page(si->type, offset); - zswap_invalidate(si->type, offset); if (swap_slot_free_notify) swap_slot_free_notify(si->bdev, offset); offset++; diff --git a/mm/zswap.c b/mm/zswap.c index 735f1a6ef336..d8bb0e06e2b0 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1738,9 +1738,10 @@ bool zswap_load(struct folio *folio) return true; } -void zswap_invalidate(int type, pgoff_t offset) +void zswap_invalidate(swp_entry_t swp) { - struct zswap_tree *tree = swap_zswap_tree(swp_entry(type, offset)); + pgoff_t offset = swp_offset(swp); + struct zswap_tree *tree = swap_zswap_tree(swp); struct zswap_entry *entry; spin_lock(&tree->lock);
During testing I found there are some times the zswap_writeback_entry() return -ENOMEM, which is not we expected: bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' @[-12]: 1563 @[0]: 277221 The reason is that __read_swap_cache_async() return NULL because swapcache_prepare() failed. The reason is that we won't invalidate zswap entry when swap entry freed to the per-cpu pool, these zswap entries are still on the zswap tree and lru list. This patch moves the invalidation ahead to when swap entry freed to the per-cpu pool, since there is no any benefit to leave trashy zswap entry on the tree and lru list. With this patch: bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}' @[0]: 259744 Note: large folio can't have zswap entry for now, so don't bother to add zswap entry invalidation in the large folio swap free path. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/zswap.h | 4 ++-- mm/swap_slots.c | 2 ++ mm/swapfile.c | 1 - mm/zswap.c | 5 +++-- 4 files changed, 7 insertions(+), 5 deletions(-)