diff mbox series

[v3,6/9] mm/workingset: handle the page without memcg

Message ID 1584423717-3440-7-git-send-email-iamjoonsoo.kim@lge.com (mailing list archive)
State New, archived
Headers show
Series workingset protection/detection on the anonymous LRU list | expand

Commit Message

Joonsoo Kim March 17, 2020, 5:41 a.m. UTC
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

When implementing workingset detection for anonymous page, I found
some swapcache pages with NULL memcg. From the code reading, I found
two reasons.

One is the case that swap-in readahead happens. The other is the
corner case related to the shmem cache. These two problems should be
fixed, but, it's not straight-forward to fix. For example, when swap-off,
all swapped-out pages are read into swapcache. In this case, who's the
owner of the swapcache page?

Since this problem doesn't look trivial, I decide to leave the issue and
handles this corner case on the place where the error occurs.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/workingset.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Johannes Weiner March 18, 2020, 7:59 p.m. UTC | #1
On Tue, Mar 17, 2020 at 02:41:54PM +0900, js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> When implementing workingset detection for anonymous page, I found
> some swapcache pages with NULL memcg. From the code reading, I found
> two reasons.
> 
> One is the case that swap-in readahead happens. The other is the
> corner case related to the shmem cache. These two problems should be
> fixed, but, it's not straight-forward to fix. For example, when swap-off,
> all swapped-out pages are read into swapcache. In this case, who's the
> owner of the swapcache page?
> 
> Since this problem doesn't look trivial, I decide to leave the issue and
> handles this corner case on the place where the error occurs.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

It wouldn't be hard to find out who owns this page. The code in
mem_cgroup_try_charge() is only a few lines:

			swp_entry_t ent = { .val = page_private(page), };
			unsigned short id = lookup_swap_cgroup_id(ent);

			rcu_read_lock();
			memcg = mem_cgroup_from_id(id);
			if (memcg && !css_tryget_online(&memcg->css))
				memcg = NULL;
			rcu_read_unlock();

THAT BEING SAID, I don't think we actually *want* to know the original
cgroup for readahead pages. Because before they are accessed and
charged to the original owner, the pages are sitting on the root
cgroup LRU list and follow the root group's aging speed and LRU order.

Eviction and refault tracking is about the LRU that hosts the pages.

So IMO your patch is much less of a hack than you might think.

> diff --git a/mm/workingset.c b/mm/workingset.c
> index a9f474a..8d2e83a 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -257,6 +257,10 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
>  	VM_BUG_ON_PAGE(page_count(page), page);
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>  
> +	/* page_memcg() can be NULL if swap-in readahead happens */
> +	if (!page_memcg(page))
> +		return NULL;
> +
>  	advance_inactive_age(page_memcg(page), pgdat, is_file);
>  
>  	lruvec = mem_cgroup_lruvec(target_memcg, pgdat);

This means a readahead page that hasn't been accessed will actively
not be tracked as an eviction and later as a refault.

I think that's the right thing to do, but I would expand the comment:

/*
 * A page can be without a cgroup here when it was brought in by swap
 * readahead and nobody has touched it since.
 *
 * The idea behind the workingset code is to tell on page fault time
 * whether pages have been previously used or not. Since this page
 * hasn't been used, don't store a shadow entry for it; when it later
 * faults back in, we treat it as the new page that it is.
 */
Joonsoo Kim March 19, 2020, 8:31 a.m. UTC | #2
2020년 3월 19일 (목) 오전 4:59, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
>
> On Tue, Mar 17, 2020 at 02:41:54PM +0900, js1304@gmail.com wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > When implementing workingset detection for anonymous page, I found
> > some swapcache pages with NULL memcg. From the code reading, I found
> > two reasons.
> >
> > One is the case that swap-in readahead happens. The other is the
> > corner case related to the shmem cache. These two problems should be
> > fixed, but, it's not straight-forward to fix. For example, when swap-off,
> > all swapped-out pages are read into swapcache. In this case, who's the
> > owner of the swapcache page?
> >
> > Since this problem doesn't look trivial, I decide to leave the issue and
> > handles this corner case on the place where the error occurs.
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> It wouldn't be hard to find out who owns this page. The code in
> mem_cgroup_try_charge() is only a few lines:

lookup_swap_cgroup_id() uses additional memory and is only
usable if CONFIG_MEMCG_SWAP is on.

>                         swp_entry_t ent = { .val = page_private(page), };
>                         unsigned short id = lookup_swap_cgroup_id(ent);
>
>                         rcu_read_lock();
>                         memcg = mem_cgroup_from_id(id);
>                         if (memcg && !css_tryget_online(&memcg->css))
>                                 memcg = NULL;
>                         rcu_read_unlock();
>
> THAT BEING SAID, I don't think we actually *want* to know the original
> cgroup for readahead pages. Because before they are accessed and
> charged to the original owner, the pages are sitting on the root
> cgroup LRU list and follow the root group's aging speed and LRU order.

Okay. Sound reasonable.

> Eviction and refault tracking is about the LRU that hosts the pages.
>
> So IMO your patch is much less of a hack than you might think.

Good!

> > diff --git a/mm/workingset.c b/mm/workingset.c
> > index a9f474a..8d2e83a 100644
> > --- a/mm/workingset.c
> > +++ b/mm/workingset.c
> > @@ -257,6 +257,10 @@ void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
> >       VM_BUG_ON_PAGE(page_count(page), page);
> >       VM_BUG_ON_PAGE(!PageLocked(page), page);
> >
> > +     /* page_memcg() can be NULL if swap-in readahead happens */
> > +     if (!page_memcg(page))
> > +             return NULL;
> > +
> >       advance_inactive_age(page_memcg(page), pgdat, is_file);
> >
> >       lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
>
> This means a readahead page that hasn't been accessed will actively
> not be tracked as an eviction and later as a refault.
>
> I think that's the right thing to do, but I would expand the comment:

Okay. I will add the following comment.

Thanks.

> /*
>  * A page can be without a cgroup here when it was brought in by swap
>  * readahead and nobody has touched it since.
>  *
>  * The idea behind the workingset code is to tell on page fault time
>  * whether pages have been previously used or not. Since this page
>  * hasn't been used, don't store a shadow entry for it; when it later
>  * faults back in, we treat it as the new page that it is.
>  */
diff mbox series

Patch

diff --git a/mm/workingset.c b/mm/workingset.c
index a9f474a..8d2e83a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -257,6 +257,10 @@  void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg)
 	VM_BUG_ON_PAGE(page_count(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 
+	/* page_memcg() can be NULL if swap-in readahead happens */
+	if (!page_memcg(page))
+		return NULL;
+
 	advance_inactive_age(page_memcg(page), pgdat, is_file);
 
 	lruvec = mem_cgroup_lruvec(target_memcg, pgdat);