Message ID | 20240207033857.3820921-1-chengming.zhou@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] mm/zswap: invalidate old entry when store fail or !zswap_enabled | expand |
On Tue, Feb 6, 2024 at 7:39 PM <chengming.zhou@linux.dev> wrote: > > From: Chengming Zhou <zhouchengming@bytedance.com> > > We may encounter duplicate entry in the zswap_store(): > > 1. swap slot that freed to per-cpu swap cache, doesn't invalidate > the zswap entry, then got reused. This has been fixed. > > 2. !exclusive load mode, swapin folio will leave its zswap entry > on the tree, then swapout again. This has been removed. > > 3. one folio can be dirtied again after zswap_store(), so need to > zswap_store() again. This should be handled correctly. Thanks, I have been wondering about the cause of that for a while. > > So we must invalidate the old duplicate entry before insert the > new one, which actually doesn't have to be done at the beginning > of zswap_store(). And this is a normal situation, we shouldn't > WARN_ON(1) in this case, so delete it. (The WARN_ON(1) seems want > to detect swap entry UAF problem? But not very necessary here.) > > The good point is that we don't need to lock tree twice in the > store success path. > > Note we still need to invalidate the old duplicate entry in the > store failure path, otherwise the new data in swapfile could be > overwrite by the old data in zswap pool when lru writeback. > > We have to do this even when !zswap_enabled since zswap can be > disabled anytime. If the folio store success before, then got > dirtied again but zswap disabled, we won't invalidate the old > duplicate entry in the zswap_store(). So later lru writeback > may overwrite the new data in swapfile. > > Fixes: 42c06a0e8ebe ("mm: kill frontswap") > Cc: <stable@vger.kernel.org> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Yosry Ahmed <yosryahmed@google.com> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > v3: > - Fix a few grammatical problems in comments, per Yosry. > > v2: > - Change the duplicate entry invalidation loop to if, since we hold > the lock, we won't find it once we invalidate it, per Yosry. > - Add Fixes tag. > --- > mm/zswap.c | 33 ++++++++++++++++----------------- > 1 file changed, 16 insertions(+), 17 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index cd67f7f6b302..d9d8947d6761 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1518,18 +1518,8 @@ bool zswap_store(struct folio *folio) > return false; > > if (!zswap_enabled) > - return false; > + goto check_old; > > - /* > - * If this is a duplicate, it must be removed before attempting to store > - * it, otherwise, if the store fails the old page won't be removed from > - * the tree, and it might be written back overriding the new data. > - */ > - spin_lock(&tree->lock); > - entry = zswap_rb_search(&tree->rbroot, offset); > - if (entry) > - zswap_invalidate_entry(tree, entry); > - spin_unlock(&tree->lock); > objcg = get_obj_cgroup_from_folio(folio); > if (objcg && !obj_cgroup_may_zswap(objcg)) { > memcg = get_mem_cgroup_from_objcg(objcg); > @@ -1608,14 +1598,12 @@ bool zswap_store(struct folio *folio) > /* map */ > spin_lock(&tree->lock); > /* > - * A duplicate entry should have been removed at the beginning of this > - * function. Since the swap entry should be pinned, if a duplicate is > - * found again here it means that something went wrong in the swap > - * cache. > + * The folio may have been dirtied again, invalidate the > + * possibly stale entry before inserting the new entry. > */ > - while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > - WARN_ON(1); > + if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > zswap_invalidate_entry(tree, dupentry); > + VM_WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); It seems there is only one path called zswap_rb_insert() and there is no loop to repeat the insert any more. Can we have the zswap_rb_insert() install the entry and return the dupentry? We can still just call zswap_invalidate_entry() on the duplicate. The mapping of the dupentry has been removed when zswap_rb_insert() returns. That will save a repeat lookup on the duplicate case. After this change, the zswap_rb_insert() will map to the xarray xa_store() pretty nicely. Chris > } > if (entry->length) { > INIT_LIST_HEAD(&entry->lru); > @@ -1638,6 +1626,17 @@ bool zswap_store(struct folio *folio) > reject: > if (objcg) > obj_cgroup_put(objcg); > +check_old: > + /* > + * If the zswap store fails or zswap is disabled, we must invalidate the > + * possibly stale entry which was previously stored at this offset. > + * Otherwise, writeback could overwrite the new data in the swapfile. > + */ > + spin_lock(&tree->lock); > + entry = zswap_rb_search(&tree->rbroot, offset); > + if (entry) > + zswap_invalidate_entry(tree, entry); > + spin_unlock(&tree->lock); > return false; > > shrink: > -- > 2.40.1 > >
> > @@ -1608,14 +1598,12 @@ bool zswap_store(struct folio *folio) > > /* map */ > > spin_lock(&tree->lock); > > /* > > - * A duplicate entry should have been removed at the beginning of this > > - * function. Since the swap entry should be pinned, if a duplicate is > > - * found again here it means that something went wrong in the swap > > - * cache. > > + * The folio may have been dirtied again, invalidate the > > + * possibly stale entry before inserting the new entry. > > */ > > - while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > > - WARN_ON(1); > > + if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > > zswap_invalidate_entry(tree, dupentry); > > + VM_WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); > > It seems there is only one path called zswap_rb_insert() and there is > no loop to repeat the insert any more. Can we have the > zswap_rb_insert() install the entry and return the dupentry? We can > still just call zswap_invalidate_entry() on the duplicate. The mapping > of the dupentry has been removed when zswap_rb_insert() returns. That > will save a repeat lookup on the duplicate case. > After this change, the zswap_rb_insert() will map to the xarray > xa_store() pretty nicely. I brought this up in v1 [1]. We agreed to leave it as-is for now since we expect the xarray conversion soon-ish. No need to update zswap_rb_insert() only to replace it with xa_store() later anyway. [1] https://lore.kernel.org/lkml/ZcFne336KJdbrvvS@google.com/
On Tue, Feb 6, 2024 at 9:46 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > @@ -1608,14 +1598,12 @@ bool zswap_store(struct folio *folio) > > > /* map */ > > > spin_lock(&tree->lock); > > > /* > > > - * A duplicate entry should have been removed at the beginning of this > > > - * function. Since the swap entry should be pinned, if a duplicate is > > > - * found again here it means that something went wrong in the swap > > > - * cache. > > > + * The folio may have been dirtied again, invalidate the > > > + * possibly stale entry before inserting the new entry. > > > */ > > > - while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > > > - WARN_ON(1); > > > + if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > > > zswap_invalidate_entry(tree, dupentry); > > > + VM_WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); > > > > It seems there is only one path called zswap_rb_insert() and there is > > no loop to repeat the insert any more. Can we have the > > zswap_rb_insert() install the entry and return the dupentry? We can > > still just call zswap_invalidate_entry() on the duplicate. The mapping > > of the dupentry has been removed when zswap_rb_insert() returns. That > > will save a repeat lookup on the duplicate case. > > After this change, the zswap_rb_insert() will map to the xarray > > xa_store() pretty nicely. > > I brought this up in v1 [1]. We agreed to leave it as-is for now since > we expect the xarray conversion soon-ish. No need to update > zswap_rb_insert() only to replace it with xa_store() later anyway. > > [1] https://lore.kernel.org/lkml/ZcFne336KJdbrvvS@google.com/ > Ah, thanks for the pointer. I miss your earlier reply. Acked-by: Chris Li <chrisl@kernel.org> Chris
On 2024/2/7 11:38, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > We may encounter duplicate entry in the zswap_store(): > > 1. swap slot that freed to per-cpu swap cache, doesn't invalidate > the zswap entry, then got reused. This has been fixed. > > 2. !exclusive load mode, swapin folio will leave its zswap entry > on the tree, then swapout again. This has been removed. > > 3. one folio can be dirtied again after zswap_store(), so need to > zswap_store() again. This should be handled correctly. > > So we must invalidate the old duplicate entry before insert the > new one, which actually doesn't have to be done at the beginning > of zswap_store(). And this is a normal situation, we shouldn't > WARN_ON(1) in this case, so delete it. (The WARN_ON(1) seems want > to detect swap entry UAF problem? But not very necessary here.) > > The good point is that we don't need to lock tree twice in the > store success path. > > Note we still need to invalidate the old duplicate entry in the > store failure path, otherwise the new data in swapfile could be > overwrite by the old data in zswap pool when lru writeback. > > We have to do this even when !zswap_enabled since zswap can be > disabled anytime. If the folio store success before, then got > dirtied again but zswap disabled, we won't invalidate the old > duplicate entry in the zswap_store(). So later lru writeback > may overwrite the new data in swapfile. > > Fixes: 42c06a0e8ebe ("mm: kill frontswap") > Cc: <stable@vger.kernel.org> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Acked-by: Yosry Ahmed <yosryahmed@google.com> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > v3: > - Fix a few grammatical problems in comments, per Yosry. > > v2: > - Change the duplicate entry invalidation loop to if, since we hold > the lock, we won't find it once we invalidate it, per Yosry. > - Add Fixes tag. > --- > mm/zswap.c | 33 ++++++++++++++++----------------- > 1 file changed, 16 insertions(+), 17 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index cd67f7f6b302..d9d8947d6761 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1518,18 +1518,8 @@ bool zswap_store(struct folio *folio) > return false; > > if (!zswap_enabled) > - return false; > + goto check_old; > > - /* > - * If this is a duplicate, it must be removed before attempting to store > - * it, otherwise, if the store fails the old page won't be removed from > - * the tree, and it might be written back overriding the new data. > - */ > - spin_lock(&tree->lock); > - entry = zswap_rb_search(&tree->rbroot, offset); > - if (entry) > - zswap_invalidate_entry(tree, entry); > - spin_unlock(&tree->lock); > objcg = get_obj_cgroup_from_folio(folio); > if (objcg && !obj_cgroup_may_zswap(objcg)) { > memcg = get_mem_cgroup_from_objcg(objcg); > @@ -1608,14 +1598,12 @@ bool zswap_store(struct folio *folio) > /* map */ > spin_lock(&tree->lock); > /* > - * A duplicate entry should have been removed at the beginning of this > - * function. Since the swap entry should be pinned, if a duplicate is > - * found again here it means that something went wrong in the swap > - * cache. > + * The folio may have been dirtied again, invalidate the > + * possibly stale entry before inserting the new entry. > */ > - while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > - WARN_ON(1); > + if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > zswap_invalidate_entry(tree, dupentry); > + VM_WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); Oh, I just realized this is empty if !CONFIG_DEBUG_VM, will post v4. Thanks. > } > if (entry->length) { > INIT_LIST_HEAD(&entry->lru); > @@ -1638,6 +1626,17 @@ bool zswap_store(struct folio *folio) > reject: > if (objcg) > obj_cgroup_put(objcg); > +check_old: > + /* > + * If the zswap store fails or zswap is disabled, we must invalidate the > + * possibly stale entry which was previously stored at this offset. > + * Otherwise, writeback could overwrite the new data in the swapfile. > + */ > + spin_lock(&tree->lock); > + entry = zswap_rb_search(&tree->rbroot, offset); > + if (entry) > + zswap_invalidate_entry(tree, entry); > + spin_unlock(&tree->lock); > return false; > > shrink:
diff --git a/mm/zswap.c b/mm/zswap.c index cd67f7f6b302..d9d8947d6761 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1518,18 +1518,8 @@ bool zswap_store(struct folio *folio) return false; if (!zswap_enabled) - return false; + goto check_old; - /* - * If this is a duplicate, it must be removed before attempting to store - * it, otherwise, if the store fails the old page won't be removed from - * the tree, and it might be written back overriding the new data. - */ - spin_lock(&tree->lock); - entry = zswap_rb_search(&tree->rbroot, offset); - if (entry) - zswap_invalidate_entry(tree, entry); - spin_unlock(&tree->lock); objcg = get_obj_cgroup_from_folio(folio); if (objcg && !obj_cgroup_may_zswap(objcg)) { memcg = get_mem_cgroup_from_objcg(objcg); @@ -1608,14 +1598,12 @@ bool zswap_store(struct folio *folio) /* map */ spin_lock(&tree->lock); /* - * A duplicate entry should have been removed at the beginning of this - * function. Since the swap entry should be pinned, if a duplicate is - * found again here it means that something went wrong in the swap - * cache. + * The folio may have been dirtied again, invalidate the + * possibly stale entry before inserting the new entry. */ - while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { - WARN_ON(1); + if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { zswap_invalidate_entry(tree, dupentry); + VM_WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); } if (entry->length) { INIT_LIST_HEAD(&entry->lru); @@ -1638,6 +1626,17 @@ bool zswap_store(struct folio *folio) reject: if (objcg) obj_cgroup_put(objcg); +check_old: + /* + * If the zswap store fails or zswap is disabled, we must invalidate the + * possibly stale entry which was previously stored at this offset. + * Otherwise, writeback could overwrite the new data in the swapfile. + */ + spin_lock(&tree->lock); + entry = zswap_rb_search(&tree->rbroot, offset); + if (entry) + zswap_invalidate_entry(tree, entry); + spin_unlock(&tree->lock); return false; shrink: