Message ID | 20240624175313.47329-2-ryncsn@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Split list_lru lock into per-cgroup scope | expand |
> On Jun 25, 2024, at 01:53, Kairui Song <ryncsn@gmail.com> wrote: > > From: Kairui Song <kasong@tencent.com> > > Currently, the (shadow) nodes of the swap cache are not accounted to > their corresponding memory cgroup, instead, they are all accounted > to the root cgroup. This leads to inaccurate accounting and > ineffective reclaiming. > > This issue is similar to commit 7b785645e8f1 ("mm: fix page cache > convergence regression"), where page cache shadow nodes were incorrectly > accounted. That was due to the accidental dropping of the accounting > flag during the XArray conversion in commit a28334862993 > ("page cache: Finish XArray conversion"). > > However, this fix has a different cause. Swap cache shadow nodes were > never accounted even before the XArray conversion, since they did not > exist until commit 3852f6768ede ("mm/swapcache: support to handle the > shadow entries"), which was years after the XArray conversion. Without > shadow nodes, swap cache nodes can only use a very small amount of memory > and so reclaiming is not very important. > > But now with shadow nodes, if a cgroup swaps out a large amount of > memory, it could take up a lot of memory. > > This can be easily fixed by adding proper flags and LRU setters. > > Signed-off-by: Kairui Song <kasong@tencent.com> I haven't looked at the details of this patch yet, but I think it is not related to this series, it could be as an individual patch (If it is a real problem, I think it will be quickly and easily merged). Thanks.
On Wed, Jul 17, 2024 at 11:25 AM Muchun Song <muchun.song@linux.dev> wrote: > > On Jun 25, 2024, at 01:53, Kairui Song <ryncsn@gmail.com> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > Currently, the (shadow) nodes of the swap cache are not accounted to > > their corresponding memory cgroup, instead, they are all accounted > > to the root cgroup. This leads to inaccurate accounting and > > ineffective reclaiming. > > > > This issue is similar to commit 7b785645e8f1 ("mm: fix page cache > > convergence regression"), where page cache shadow nodes were incorrectly > > accounted. That was due to the accidental dropping of the accounting > > flag during the XArray conversion in commit a28334862993 > > ("page cache: Finish XArray conversion"). > > > > However, this fix has a different cause. Swap cache shadow nodes were > > never accounted even before the XArray conversion, since they did not > > exist until commit 3852f6768ede ("mm/swapcache: support to handle the > > shadow entries"), which was years after the XArray conversion. Without > > shadow nodes, swap cache nodes can only use a very small amount of memory > > and so reclaiming is not very important. > > > > But now with shadow nodes, if a cgroup swaps out a large amount of > > memory, it could take up a lot of memory. > > > > This can be easily fixed by adding proper flags and LRU setters. > > > > Signed-off-by: Kairui Song <kasong@tencent.com> > > I haven't looked at the details of this patch yet, but I think it is not > related to this series, it could be as an individual patch (If it is a real > problem, I think it will be quickly and easily merged). > > Thanks. > The gain of this patch is not as big as being part of the series, but yes, it can be sent independently.
On Tue, Jun 25, 2024 at 01:53:07AM GMT, Kairui Song wrote: > From: Kairui Song <kasong@tencent.com> > > Currently, the (shadow) nodes of the swap cache are not accounted to > their corresponding memory cgroup, instead, they are all accounted > to the root cgroup. This leads to inaccurate accounting and > ineffective reclaiming. > > This issue is similar to commit 7b785645e8f1 ("mm: fix page cache > convergence regression"), where page cache shadow nodes were incorrectly > accounted. That was due to the accidental dropping of the accounting > flag during the XArray conversion in commit a28334862993 > ("page cache: Finish XArray conversion"). > > However, this fix has a different cause. Swap cache shadow nodes were > never accounted even before the XArray conversion, since they did not > exist until commit 3852f6768ede ("mm/swapcache: support to handle the > shadow entries"), which was years after the XArray conversion. Without > shadow nodes, swap cache nodes can only use a very small amount of memory > and so reclaiming is not very important. > > But now with shadow nodes, if a cgroup swaps out a large amount of > memory, it could take up a lot of memory. > > This can be easily fixed by adding proper flags and LRU setters. > > Signed-off-by: Kairui Song <kasong@tencent.com> As Muchun said, please send this patch separately. However as I am thinking more about this patch, I think it is incomplete and the full solution may be much more involved and might not be worth it. One of the differences between file page cache and swap page cache is the context in which the underlying xarray node can be allocated. For file pages, such allocations happen in the context of the process/cgroup owning the file pages and thus the memcg of the current is used for charging. However xarray node allocations happen when a page is added to swap cache which often happen in reclaim and reclaim can happen in any context i.e. kernel thread, unrelated process/cgroups e.t.c. So, we may charge unrelated memcg for these nodes. Now you may argue that we can use the memcg of the page which is being swapped out but then you may have an xarray node containing pointers to pages (or shadows) of different memcgs. Who should be charged? BTW filesystem shared between multiple cgroups can face this issue as well but users have more control on shared filesystem as compare to shared swap address space. Shakeel
diff --git a/mm/swap_state.c b/mm/swap_state.c index 642c30d8376c..68afaaf1c09b 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -95,6 +95,7 @@ int add_to_swap_cache(struct folio *folio, swp_entry_t entry, void *old; xas_set_update(&xas, workingset_update_node); + xas_set_lru(&xas, &shadow_nodes); VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); @@ -714,7 +715,7 @@ int init_swap_address_space(unsigned int type, unsigned long nr_pages) return -ENOMEM; for (i = 0; i < nr; i++) { space = spaces + i; - xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ); + xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ | XA_FLAGS_ACCOUNT); atomic_set(&space->i_mmap_writable, 0); space->a_ops = &swap_aops; /* swap cache doesn't use writeback related tags */