Message ID | 20240205232442.3240571-1-nphamcs@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/swap_state: update zswap LRU's protection range with the folio locked | expand |
On 2024/2/6 07:24, Nhat Pham wrote: > Move the zswap LRU protection range update above the swap_read_folio() > call, and only when a new page is allocated. This is the case where > (z)swapin could happen, which is a signal that the zswap shrinker should > be more conservative with its reclaiming action. > > It also prevents a race, in which folio migration can clear the > memcg_data of the now unlocked folio, resulting in a warning in the > inlined folio_lruvec() call. > > Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/ > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > Signed-off-by: Nhat Pham <nphamcs@gmail.com> LGTM, thanks! Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > mm/swap_state.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index e671266ad772..7255c01a1e4e 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, > /* The page was likely read above, so no need for plugging here */ > folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, > &page_allocated, false); > - if (unlikely(page_allocated)) > + if (unlikely(page_allocated)) { > + zswap_folio_swapin(folio); > swap_read_folio(folio, false, NULL); > - zswap_folio_swapin(folio); > + } > return folio; > } > > @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, > /* The folio was likely read above, so no need for plugging here */ > folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, > &page_allocated, false); > - if (unlikely(page_allocated)) > + if (unlikely(page_allocated)) { > + zswap_folio_swapin(folio); > swap_read_folio(folio, false, NULL); > - zswap_folio_swapin(folio); > + } > return folio; > } > > > base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d
On Mon, Feb 05, 2024 at 03:24:42PM -0800, Nhat Pham wrote: > Move the zswap LRU protection range update above the swap_read_folio() > call, and only when a new page is allocated. This is the case where > (z)swapin could happen, which is a signal that the zswap shrinker should > be more conservative with its reclaiming action. > > It also prevents a race, in which folio migration can clear the > memcg_data of the now unlocked folio, resulting in a warning in the > inlined folio_lruvec() call. The warning is the most probable outcome, and it will cause the update to go against the root cgroup which is safe at least. But AFAICS there is no ordering guarantee to rule out a UAF if the lookup succeeds but the memcg and lruvec get freed before the update. I think that part should be more prominent in the changelog. It's more important than the first paragraph. Consider somebody scrolling through the git log and trying to decide whether to backport or not; it's helpful to describe the bug and its impact first thing, then put the explanation of the fix after. > Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/ > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > Signed-off-by: Nhat Pham <nphamcs@gmail.com> Would it make sense to add VM_WARN_ON_ONCE(!folio_test_locked(folio)); to zswap_folio_swapin() as well?
On Tue, Feb 6, 2024 at 7:15 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Mon, Feb 05, 2024 at 03:24:42PM -0800, Nhat Pham wrote: > > Move the zswap LRU protection range update above the swap_read_folio() > > call, and only when a new page is allocated. This is the case where > > (z)swapin could happen, which is a signal that the zswap shrinker should > > be more conservative with its reclaiming action. > > > > It also prevents a race, in which folio migration can clear the > > memcg_data of the now unlocked folio, resulting in a warning in the > > inlined folio_lruvec() call. > > The warning is the most probable outcome, and it will cause the update > to go against the root cgroup which is safe at least. > > But AFAICS there is no ordering guarantee to rule out a UAF if the > lookup succeeds but the memcg and lruvec get freed before the update. Ah nice. I didn't consider that. IIUC, having the folio locked should prevent this too. Based on the documentation: * For a non-kmem folio any of the following ensures folio and memcg binding * stability: * * - the folio lock I'll rework the commit log to include this, and make this more prominent :) > > I think that part should be more prominent in the changelog. It's more > important than the first paragraph. Consider somebody scrolling > through the git log and trying to decide whether to backport or not; > it's helpful to describe the bug and its impact first thing, then put > the explanation of the fix after. > > > Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com > > Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/ > > Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") > > Signed-off-by: Nhat Pham <nphamcs@gmail.com> > > Would it make sense to add > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > to zswap_folio_swapin() as well?
diff --git a/mm/swap_state.c b/mm/swap_state.c index e671266ad772..7255c01a1e4e 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, /* The page was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, &page_allocated, false); - if (unlikely(page_allocated)) + if (unlikely(page_allocated)) { + zswap_folio_swapin(folio); swap_read_folio(folio, false, NULL); - zswap_folio_swapin(folio); + } return folio; } @@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, /* The folio was likely read above, so no need for plugging here */ folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, &page_allocated, false); - if (unlikely(page_allocated)) + if (unlikely(page_allocated)) { + zswap_folio_swapin(folio); swap_read_folio(folio, false, NULL); - zswap_folio_swapin(folio); + } return folio; }
Move the zswap LRU protection range update above the swap_read_folio() call, and only when a new page is allocated. This is the case where (z)swapin could happen, which is a signal that the zswap shrinker should be more conservative with its reclaiming action. It also prevents a race, in which folio migration can clear the memcg_data of the now unlocked folio, resulting in a warning in the inlined folio_lruvec() call. Reported-by: syzbot+17a611d10af7d18a7092@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000ae47f90610803260@google.com/ Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Nhat Pham <nphamcs@gmail.com> --- mm/swap_state.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) base-commit: 91f3daa1765ee4e0c89987dc25f72c40f07af34d