diff mbox series

[v4,4/5] mm: memcontrol: use obj_cgroup APIs to charge kmem pages

Message ID 20210318110658.60892-5-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Use obj_cgroup APIs to charge kmem pages | expand

Commit Message

Muchun Song March 18, 2021, 11:06 a.m. UTC
Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are charged via the new APIs of obj_cgroup. The new APIs
introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the memory.
But there are still some corner objects (e.g. allocations larger than
order-1 page on SLUB) which are not charged via the new APIs. Those
objects (include the pages which are allocated from buddy allocator
directly) are charged as kmem pages which still hold a reference to
the memory cgroup.

We want to reuse the obj_cgroup APIs to charge the kmem pages.
If we do that, we should store an object cgroup pointer to
page->memcg_data for the kmem pages.

Finally, page->memcg_data will have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

We do not change the behavior of page_memcg() and page_memcg_rcu().
They are also suitable for LRU pages and kmem pages. Why?

Because memory allocations pinning memcgs for a long time - it exists
at a larger scale and is causing recurring problems in the real world:
page cache doesn't get reclaimed for a long time, or is used by the
second, third, fourth, ... instance of the same job that was restarted
into a new cgroup every time. Unreclaimable dying cgroups pile up,
waste memory, and make page reclaim very inefficient.

We can convert LRU pages and most other raw memcg pins to the objcg
direction to fix this problem, and then the page->memcg will always
point to an object cgroup pointer. At that time, LRU pages and kmem
pages will be treated the same. The implementation of page_memcg()
will remove the kmem page check.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup. Finally, the page->memcg_data of the kmem page points to
an object cgroup. We can use the __page_objcg() to get the object
cgroup associated with a kmem page. Or we can use page_memcg()
to get the memory cgroup associated with a kmem page, but caller must
ensure that the returned memcg won't be released (e.g. acquire the
rcu_read_lock or css_set_lock).

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++----------
 mm/memcontrol.c            | 101 ++++++++++++++++++++++++---------------
 2 files changed, 156 insertions(+), 61 deletions(-)

Comments

Johannes Weiner March 18, 2021, 3:23 p.m. UTC | #1
On Thu, Mar 18, 2021 at 07:06:57PM +0800, Muchun Song wrote:
> Since Roman series "The new cgroup slab memory controller" applied. All
> slab objects are charged via the new APIs of obj_cgroup. The new APIs
> introduce a struct obj_cgroup to charge slab objects. It prevents
> long-living objects from pinning the original memory cgroup in the memory.
> But there are still some corner objects (e.g. allocations larger than
> order-1 page on SLUB) which are not charged via the new APIs. Those
> objects (include the pages which are allocated from buddy allocator
> directly) are charged as kmem pages which still hold a reference to
> the memory cgroup.
> 
> We want to reuse the obj_cgroup APIs to charge the kmem pages.
> If we do that, we should store an object cgroup pointer to
> page->memcg_data for the kmem pages.
> 
> Finally, page->memcg_data will have 3 different meanings.
> 
>   1) For the slab pages, page->memcg_data points to an object cgroups
>      vector.
> 
>   2) For the kmem pages (exclude the slab pages), page->memcg_data
>      points to an object cgroup.
> 
>   3) For the user pages (e.g. the LRU pages), page->memcg_data points
>      to a memory cgroup.
> 
> We do not change the behavior of page_memcg() and page_memcg_rcu().
> They are also suitable for LRU pages and kmem pages. Why?
> 
> Because memory allocations pinning memcgs for a long time - it exists
> at a larger scale and is causing recurring problems in the real world:
> page cache doesn't get reclaimed for a long time, or is used by the
> second, third, fourth, ... instance of the same job that was restarted
> into a new cgroup every time. Unreclaimable dying cgroups pile up,
> waste memory, and make page reclaim very inefficient.
> 
> We can convert LRU pages and most other raw memcg pins to the objcg
> direction to fix this problem, and then the page->memcg will always
> point to an object cgroup pointer. At that time, LRU pages and kmem
> pages will be treated the same. The implementation of page_memcg()
> will remove the kmem page check.
> 
> This patch aims to charge the kmem pages by using the new APIs of
> obj_cgroup. Finally, the page->memcg_data of the kmem page points to
> an object cgroup. We can use the __page_objcg() to get the object
> cgroup associated with a kmem page. Or we can use page_memcg()
> to get the memory cgroup associated with a kmem page, but caller must
> ensure that the returned memcg won't be released (e.g. acquire the
> rcu_read_lock or css_set_lock).
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Thanks for the updated version, looks good to me!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Shakeel Butt March 19, 2021, 3:39 a.m. UTC | #2
On Thu, Mar 18, 2021 at 4:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
[...]
>
> +static inline struct mem_cgroup *get_obj_cgroup_memcg(struct obj_cgroup *objcg)

I would prefer get_mem_cgroup_from_objcg().

> +{
> +       struct mem_cgroup *memcg;
> +
> +       rcu_read_lock();
> +retry:
> +       memcg = obj_cgroup_memcg(objcg);
> +       if (unlikely(!css_tryget(&memcg->css)))
> +               goto retry;
> +       rcu_read_unlock();
> +
> +       return memcg;
> +}
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
>                                  gfp_t gfp, bool new_page)
> @@ -3070,15 +3088,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
>         struct mem_cgroup *memcg;
>         int ret;
>
> -       rcu_read_lock();
> -retry:
> -       memcg = obj_cgroup_memcg(objcg);
> -       if (unlikely(!css_tryget(&memcg->css)))
> -               goto retry;
> -       rcu_read_unlock();
> -
> +       memcg = get_obj_cgroup_memcg(objcg);
>         ret = __memcg_kmem_charge(memcg, gfp, nr_pages);

Why not manually inline __memcg_kmem_charge() here? This is the only user.

Similarly manually inline __memcg_kmem_uncharge() into
obj_cgroup_uncharge_pages() and call obj_cgroup_uncharge_pages() in
obj_cgroup_release().

> -
>         css_put(&memcg->css);
>
>         return ret;
> @@ -3143,18 +3154,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
>   */
>  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>  {
> -       struct mem_cgroup *memcg;
> +       struct obj_cgroup *objcg;
>         int ret = 0;
>
> -       memcg = get_mem_cgroup_from_current();

This was the only use of get_mem_cgroup_from_current(). Why not remove it?

> -       if (memcg && !mem_cgroup_is_root(memcg)) {
> -               ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
> +       objcg = get_obj_cgroup_from_current();
> +       if (objcg) {
> +               ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
>                 if (!ret) {
> -                       page->memcg_data = (unsigned long)memcg |
> +                       page->memcg_data = (unsigned long)objcg |
>                                 MEMCG_DATA_KMEM;
>                         return 0;
>                 }
> -               css_put(&memcg->css);
> +               obj_cgroup_put(objcg);
>         }
>         return ret;
>  }
[...]
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
>         unsigned long nr_pages;
> +       struct mem_cgroup *memcg;
> +       struct obj_cgroup *objcg;
>
>         VM_BUG_ON_PAGE(PageLRU(page), page);
>
> -       if (!page_memcg(page))
> -               return;
> -
>         /*
>          * Nobody should be changing or seriously looking at
> -        * page_memcg(page) at this point, we have fully
> +        * page memcg or objcg at this point, we have fully
>          * exclusive access to the page.
>          */
> +       if (PageMemcgKmem(page)) {
> +               objcg = __page_objcg(page);
> +               memcg = get_obj_cgroup_memcg(objcg);

Can you add a comment that this get matches the put at the end of the
function and kmem pages do not hold memcg references anymore.

> +       } else {
> +               memcg = __page_memcg(page);
> +       }
> +
> +       if (!memcg)
> +               return;
>
> -       if (ug->memcg != page_memcg(page)) {
> +       if (ug->memcg != memcg) {
>                 if (ug->memcg) {
>                         uncharge_batch(ug);
>                         uncharge_gather_clear(ug);
>                 }
> -               ug->memcg = page_memcg(page);
> +               ug->memcg = memcg;
>                 ug->dummy_page = page;
>
>                 /* pairs with css_put in uncharge_batch */
> -               css_get(&ug->memcg->css);
> +               css_get(&memcg->css);
>         }
>
>         nr_pages = compound_nr(page);
> -       ug->nr_pages += nr_pages;
>
> -       if (PageMemcgKmem(page))
> +       if (PageMemcgKmem(page)) {
> +               ug->nr_memory += nr_pages;
>                 ug->nr_kmem += nr_pages;
> -       else
> +
> +               page->memcg_data = 0;
> +               obj_cgroup_put(objcg);
> +       } else {
> +               /* LRU pages aren't accounted at the root level */
> +               if (!mem_cgroup_is_root(memcg))
> +                       ug->nr_memory += nr_pages;
>                 ug->pgpgout++;
>
> -       page->memcg_data = 0;
> -       css_put(&ug->memcg->css);
> +               page->memcg_data = 0;
> +       }
> +
> +       css_put(&memcg->css);
>  }
>
>  /**
> --
> 2.11.0
>
Muchun Song March 19, 2021, 4:05 a.m. UTC | #3
On Fri, Mar 19, 2021 at 11:40 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, Mar 18, 2021 at 4:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> [...]
> >
> > +static inline struct mem_cgroup *get_obj_cgroup_memcg(struct obj_cgroup *objcg)
>
> I would prefer get_mem_cgroup_from_objcg().

Inspired by obj_cgroup_memcg() which returns the memcg from objcg.
So I introduce get_obj_cgroup_memcg() which obtains a reference of
memcg on the basis of obj_cgroup_memcg().

So that the names are more consistent. Just my thought.

So should I rename it to get_mem_cgroup_from_objcg?

>
> > +{
> > +       struct mem_cgroup *memcg;
> > +
> > +       rcu_read_lock();
> > +retry:
> > +       memcg = obj_cgroup_memcg(objcg);
> > +       if (unlikely(!css_tryget(&memcg->css)))
> > +               goto retry;
> > +       rcu_read_unlock();
> > +
> > +       return memcg;
> > +}
> > +
> >  #ifdef CONFIG_MEMCG_KMEM
> >  int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
> >                                  gfp_t gfp, bool new_page)
> > @@ -3070,15 +3088,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> >         struct mem_cgroup *memcg;
> >         int ret;
> >
> > -       rcu_read_lock();
> > -retry:
> > -       memcg = obj_cgroup_memcg(objcg);
> > -       if (unlikely(!css_tryget(&memcg->css)))
> > -               goto retry;
> > -       rcu_read_unlock();
> > -
> > +       memcg = get_obj_cgroup_memcg(objcg);
> >         ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
>
> Why not manually inline __memcg_kmem_charge() here? This is the only user.
>
> Similarly manually inline __memcg_kmem_uncharge() into
> obj_cgroup_uncharge_pages() and call obj_cgroup_uncharge_pages() in
> obj_cgroup_release().

Good point. I will do this.

>
> > -
> >         css_put(&memcg->css);
> >
> >         return ret;
> > @@ -3143,18 +3154,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
> >   */
> >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> >  {
> > -       struct mem_cgroup *memcg;
> > +       struct obj_cgroup *objcg;
> >         int ret = 0;
> >
> > -       memcg = get_mem_cgroup_from_current();
>
> This was the only use of get_mem_cgroup_from_current(). Why not remove it?

I saw a potential user.

    [PATCH v10 0/3] Charge loop device i/o to issuing cgroup

To avoid reintroducing them. So I did not remove it.

>
> > -       if (memcg && !mem_cgroup_is_root(memcg)) {
> > -               ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
> > +       objcg = get_obj_cgroup_from_current();
> > +       if (objcg) {
> > +               ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
> >                 if (!ret) {
> > -                       page->memcg_data = (unsigned long)memcg |
> > +                       page->memcg_data = (unsigned long)objcg |
> >                                 MEMCG_DATA_KMEM;
> >                         return 0;
> >                 }
> > -               css_put(&memcg->css);
> > +               obj_cgroup_put(objcg);
> >         }
> >         return ret;
> >  }
> [...]
> >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >  {
> >         unsigned long nr_pages;
> > +       struct mem_cgroup *memcg;
> > +       struct obj_cgroup *objcg;
> >
> >         VM_BUG_ON_PAGE(PageLRU(page), page);
> >
> > -       if (!page_memcg(page))
> > -               return;
> > -
> >         /*
> >          * Nobody should be changing or seriously looking at
> > -        * page_memcg(page) at this point, we have fully
> > +        * page memcg or objcg at this point, we have fully
> >          * exclusive access to the page.
> >          */
> > +       if (PageMemcgKmem(page)) {
> > +               objcg = __page_objcg(page);
> > +               memcg = get_obj_cgroup_memcg(objcg);
>
> Can you add a comment that this get matches the put at the end of the
> function and kmem pages do not hold memcg references anymore.

Make sense. Will do.

Thanks for your suggestions.

>
> > +       } else {
> > +               memcg = __page_memcg(page);
> > +       }
> > +
> > +       if (!memcg)
> > +               return;
> >
> > -       if (ug->memcg != page_memcg(page)) {
> > +       if (ug->memcg != memcg) {
> >                 if (ug->memcg) {
> >                         uncharge_batch(ug);
> >                         uncharge_gather_clear(ug);
> >                 }
> > -               ug->memcg = page_memcg(page);
> > +               ug->memcg = memcg;
> >                 ug->dummy_page = page;
> >
> >                 /* pairs with css_put in uncharge_batch */
> > -               css_get(&ug->memcg->css);
> > +               css_get(&memcg->css);
> >         }
> >
> >         nr_pages = compound_nr(page);
> > -       ug->nr_pages += nr_pages;
> >
> > -       if (PageMemcgKmem(page))
> > +       if (PageMemcgKmem(page)) {
> > +               ug->nr_memory += nr_pages;
> >                 ug->nr_kmem += nr_pages;
> > -       else
> > +
> > +               page->memcg_data = 0;
> > +               obj_cgroup_put(objcg);
> > +       } else {
> > +               /* LRU pages aren't accounted at the root level */
> > +               if (!mem_cgroup_is_root(memcg))
> > +                       ug->nr_memory += nr_pages;
> >                 ug->pgpgout++;
> >
> > -       page->memcg_data = 0;
> > -       css_put(&ug->memcg->css);
> > +               page->memcg_data = 0;
> > +       }
> > +
> > +       css_put(&memcg->css);
> >  }
> >
> >  /**
> > --
> > 2.11.0
> >
Shakeel Butt March 19, 2021, 1:59 p.m. UTC | #4
On Thu, Mar 18, 2021 at 9:05 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Fri, Mar 19, 2021 at 11:40 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Thu, Mar 18, 2021 at 4:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > [...]
> > >
> > > +static inline struct mem_cgroup *get_obj_cgroup_memcg(struct obj_cgroup *objcg)
> >
> > I would prefer get_mem_cgroup_from_objcg().
>
> Inspired by obj_cgroup_memcg() which returns the memcg from objcg.
> So I introduce get_obj_cgroup_memcg() which obtains a reference of
> memcg on the basis of obj_cgroup_memcg().
>
> So that the names are more consistent. Just my thought.
>
> So should I rename it to get_mem_cgroup_from_objcg?
>

If you look at other functions which get reference on mem_cgroup, they
have the format of get_mem_cgroup_*. Similarly the current function to
get a reference on obj_cgroup is get_obj_cgroup_from_current().

So, from the name get_obj_cgroup_memcg(), it seems like we are getting
reference on obj_cgroup but the function is getting reference on
mem_cgroup.

> >
> > > +{
> > > +       struct mem_cgroup *memcg;
> > > +
> > > +       rcu_read_lock();
> > > +retry:
> > > +       memcg = obj_cgroup_memcg(objcg);
> > > +       if (unlikely(!css_tryget(&memcg->css)))
> > > +               goto retry;
> > > +       rcu_read_unlock();
> > > +
> > > +       return memcg;
> > > +}
> > > +
> > >  #ifdef CONFIG_MEMCG_KMEM
> > >  int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
> > >                                  gfp_t gfp, bool new_page)
> > > @@ -3070,15 +3088,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> > >         struct mem_cgroup *memcg;
> > >         int ret;
> > >
> > > -       rcu_read_lock();
> > > -retry:
> > > -       memcg = obj_cgroup_memcg(objcg);
> > > -       if (unlikely(!css_tryget(&memcg->css)))
> > > -               goto retry;
> > > -       rcu_read_unlock();
> > > -
> > > +       memcg = get_obj_cgroup_memcg(objcg);
> > >         ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
> >
> > Why not manually inline __memcg_kmem_charge() here? This is the only user.
> >
> > Similarly manually inline __memcg_kmem_uncharge() into
> > obj_cgroup_uncharge_pages() and call obj_cgroup_uncharge_pages() in
> > obj_cgroup_release().
>
> Good point. I will do this.
>
> >
> > > -
> > >         css_put(&memcg->css);
> > >
> > >         return ret;
> > > @@ -3143,18 +3154,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
> > >   */
> > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> > >  {
> > > -       struct mem_cgroup *memcg;
> > > +       struct obj_cgroup *objcg;
> > >         int ret = 0;
> > >
> > > -       memcg = get_mem_cgroup_from_current();
> >
> > This was the only use of get_mem_cgroup_from_current(). Why not remove it?
>
> I saw a potential user.
>
>     [PATCH v10 0/3] Charge loop device i/o to issuing cgroup
>
> To avoid reintroducing them. So I did not remove it.
>

Don't worry about that. Most probably that user would be changing this
function, so it would to better to introduce from scratch.
Muchun Song March 19, 2021, 3:45 p.m. UTC | #5
On Fri, Mar 19, 2021 at 9:59 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, Mar 18, 2021 at 9:05 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Fri, Mar 19, 2021 at 11:40 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Thu, Mar 18, 2021 at 4:08 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > [...]
> > > >
> > > > +static inline struct mem_cgroup *get_obj_cgroup_memcg(struct obj_cgroup *objcg)
> > >
> > > I would prefer get_mem_cgroup_from_objcg().
> >
> > Inspired by obj_cgroup_memcg() which returns the memcg from objcg.
> > So I introduce get_obj_cgroup_memcg() which obtains a reference of
> > memcg on the basis of obj_cgroup_memcg().
> >
> > So that the names are more consistent. Just my thought.
> >
> > So should I rename it to get_mem_cgroup_from_objcg?
> >
>
> If you look at other functions which get reference on mem_cgroup, they
> have the format of get_mem_cgroup_*. Similarly the current function to
> get a reference on obj_cgroup is get_obj_cgroup_from_current().
>
> So, from the name get_obj_cgroup_memcg(), it seems like we are getting
> reference on obj_cgroup but the function is getting reference on
> mem_cgroup.

Make sense. I will use get_mem_cgroup_from_objcg(). Thanks.

>
> > >
> > > > +{
> > > > +       struct mem_cgroup *memcg;
> > > > +
> > > > +       rcu_read_lock();
> > > > +retry:
> > > > +       memcg = obj_cgroup_memcg(objcg);
> > > > +       if (unlikely(!css_tryget(&memcg->css)))
> > > > +               goto retry;
> > > > +       rcu_read_unlock();
> > > > +
> > > > +       return memcg;
> > > > +}
> > > > +
> > > >  #ifdef CONFIG_MEMCG_KMEM
> > > >  int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
> > > >                                  gfp_t gfp, bool new_page)
> > > > @@ -3070,15 +3088,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> > > >         struct mem_cgroup *memcg;
> > > >         int ret;
> > > >
> > > > -       rcu_read_lock();
> > > > -retry:
> > > > -       memcg = obj_cgroup_memcg(objcg);
> > > > -       if (unlikely(!css_tryget(&memcg->css)))
> > > > -               goto retry;
> > > > -       rcu_read_unlock();
> > > > -
> > > > +       memcg = get_obj_cgroup_memcg(objcg);
> > > >         ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
> > >
> > > Why not manually inline __memcg_kmem_charge() here? This is the only user.
> > >
> > > Similarly manually inline __memcg_kmem_uncharge() into
> > > obj_cgroup_uncharge_pages() and call obj_cgroup_uncharge_pages() in
> > > obj_cgroup_release().
> >
> > Good point. I will do this.
> >
> > >
> > > > -
> > > >         css_put(&memcg->css);
> > > >
> > > >         return ret;
> > > > @@ -3143,18 +3154,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
> > > >   */
> > > >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> > > >  {
> > > > -       struct mem_cgroup *memcg;
> > > > +       struct obj_cgroup *objcg;
> > > >         int ret = 0;
> > > >
> > > > -       memcg = get_mem_cgroup_from_current();
> > >
> > > This was the only use of get_mem_cgroup_from_current(). Why not remove it?
> >
> > I saw a potential user.
> >
> >     [PATCH v10 0/3] Charge loop device i/o to issuing cgroup
> >
> > To avoid reintroducing them. So I did not remove it.
> >
>
> Don't worry about that. Most probably that user would be changing this
> function, so it would to better to introduce from scratch.

OK. I will remove get_mem_cgroup_from_current(). Thanks for
your suggestions.
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..395a113e4a3b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -358,6 +358,62 @@  enum page_memcg_data_flags {
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
 
+static inline bool PageMemcgKmem(struct page *page);
+
+/*
+ * After the initialization objcg->memcg is always pointing at
+ * a valid memcg, but can be atomically swapped to the parent memcg.
+ *
+ * The caller must ensure that the returned memcg won't be released:
+ * e.g. acquire the rcu_read_lock or css_set_lock.
+ */
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+	return READ_ONCE(objcg->memcg);
+}
+
+/*
+ * __page_memcg - get the memory cgroup associated with a non-kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * kmem pages.
+ */
+static inline struct mem_cgroup *__page_memcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
+/*
+ * __page_objcg - get the object cgroup associated with a kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper object cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * LRU pages.
+ */
+static inline struct obj_cgroup *__page_objcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
 /*
  * page_memcg - get the memory cgroup associated with a page
  * @page: a pointer to the page struct
@@ -367,20 +423,23 @@  enum page_memcg_data_flags {
  * proper memory cgroup pointer. It's not safe to call this function
  * against some type of pages, e.g. slab pages or ex-slab pages.
  *
- * Any of the following ensures page and memcg binding stability:
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
+ *
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
-	unsigned long memcg_data = page->memcg_data;
-
-	VM_BUG_ON_PAGE(PageSlab(page), page);
-	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
-
-	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+	if (PageMemcgKmem(page))
+		return obj_cgroup_memcg(__page_objcg(page));
+	else
+		return __page_memcg(page);
 }
 
 /*
@@ -394,11 +453,19 @@  static inline struct mem_cgroup *page_memcg(struct page *page)
  */
 static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 {
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
-				     ~MEMCG_DATA_FLAGS_MASK);
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -406,15 +473,21 @@  static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
- * or NULL. This function unlike page_memcg() can take any  page
+ * or NULL. This function unlike page_memcg() can take any page
  * as an argument. It has to be used in cases when it's not known if a page
- * has an associated memory cgroup pointer or an object cgroups vector.
+ * has an associated memory cgroup pointer or an object cgroups vector or
+ * an object cgroup.
+ *
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
  *
- * Any of the following ensures page and memcg binding stability:
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg_check(struct page *page)
 {
@@ -427,6 +500,13 @@  static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
@@ -713,18 +793,6 @@  static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
-/*
- * After the initialization objcg->memcg is always pointing at
- * a valid memcg, but can be atomically swapped to the parent memcg.
- *
- * The caller must ensure that the returned memcg won't be released:
- * e.g. acquire the rcu_read_lock or css_set_lock.
- */
-static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
-{
-	return READ_ONCE(objcg->memcg);
-}
-
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 104bddf21314..1cef20a2f116 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -855,18 +855,22 @@  void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
 			     int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
-	struct mem_cgroup *memcg = page_memcg(head);
+	struct mem_cgroup *memcg;
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
+	rcu_read_lock();
+	memcg = page_memcg(head);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
+		rcu_read_unlock();
 		__mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(__mod_lruvec_page_state);
 
@@ -2905,6 +2909,20 @@  static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 	page->memcg_data = (unsigned long)memcg;
 }
 
+static inline struct mem_cgroup *get_obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+retry:
+	memcg = obj_cgroup_memcg(objcg);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
+	rcu_read_unlock();
+
+	return memcg;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 				 gfp_t gfp, bool new_page)
@@ -3070,15 +3088,8 @@  static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
 	struct mem_cgroup *memcg;
 	int ret;
 
-	rcu_read_lock();
-retry:
-	memcg = obj_cgroup_memcg(objcg);
-	if (unlikely(!css_tryget(&memcg->css)))
-		goto retry;
-	rcu_read_unlock();
-
+	memcg = get_obj_cgroup_memcg(objcg);
 	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
-
 	css_put(&memcg->css);
 
 	return ret;
@@ -3143,18 +3154,18 @@  static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
  */
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	int ret = 0;
 
-	memcg = get_mem_cgroup_from_current();
-	if (memcg && !mem_cgroup_is_root(memcg)) {
-		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
+	objcg = get_obj_cgroup_from_current();
+	if (objcg) {
+		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
 		if (!ret) {
-			page->memcg_data = (unsigned long)memcg |
+			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
 			return 0;
 		}
-		css_put(&memcg->css);
+		obj_cgroup_put(objcg);
 	}
 	return ret;
 }
@@ -3166,16 +3177,16 @@  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page_memcg(page);
+	struct obj_cgroup *objcg;
 	unsigned int nr_pages = 1 << order;
 
-	if (!memcg)
+	if (!PageMemcgKmem(page))
 		return;
 
-	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
-	__memcg_kmem_uncharge(memcg, nr_pages);
+	objcg = __page_objcg(page);
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
 	page->memcg_data = 0;
-	css_put(&memcg->css);
+	obj_cgroup_put(objcg);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -6790,7 +6801,7 @@  int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 
 struct uncharge_gather {
 	struct mem_cgroup *memcg;
-	unsigned long nr_pages;
+	unsigned long nr_memory;
 	unsigned long pgpgout;
 	unsigned long nr_kmem;
 	struct page *dummy_page;
@@ -6805,10 +6816,10 @@  static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long flags;
 
-	if (!mem_cgroup_is_root(ug->memcg)) {
-		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
+	if (ug->nr_memory) {
+		page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
 		if (do_memsw_account())
-			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
+			page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
 		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
 			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
 		memcg_oom_recover(ug->memcg);
@@ -6816,7 +6827,7 @@  static void uncharge_batch(const struct uncharge_gather *ug)
 
 	local_irq_save(flags);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
-	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
+	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory);
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
@@ -6827,40 +6838,56 @@  static void uncharge_batch(const struct uncharge_gather *ug)
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
 	unsigned long nr_pages;
+	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page_memcg(page))
-		return;
-
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page_memcg(page) at this point, we have fully
+	 * page memcg or objcg at this point, we have fully
 	 * exclusive access to the page.
 	 */
+	if (PageMemcgKmem(page)) {
+		objcg = __page_objcg(page);
+		memcg = get_obj_cgroup_memcg(objcg);
+	} else {
+		memcg = __page_memcg(page);
+	}
+
+	if (!memcg)
+		return;
 
-	if (ug->memcg != page_memcg(page)) {
+	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page_memcg(page);
+		ug->memcg = memcg;
 		ug->dummy_page = page;
 
 		/* pairs with css_put in uncharge_batch */
-		css_get(&ug->memcg->css);
+		css_get(&memcg->css);
 	}
 
 	nr_pages = compound_nr(page);
-	ug->nr_pages += nr_pages;
 
-	if (PageMemcgKmem(page))
+	if (PageMemcgKmem(page)) {
+		ug->nr_memory += nr_pages;
 		ug->nr_kmem += nr_pages;
-	else
+
+		page->memcg_data = 0;
+		obj_cgroup_put(objcg);
+	} else {
+		/* LRU pages aren't accounted at the root level */
+		if (!mem_cgroup_is_root(memcg))
+			ug->nr_memory += nr_pages;
 		ug->pgpgout++;
 
-	page->memcg_data = 0;
-	css_put(&ug->memcg->css);
+		page->memcg_data = 0;
+	}
+
+	css_put(&memcg->css);
 }
 
 /**