Message ID | 60df0efd-f458-a13c-7c89-749bdab21d1d@virtuozzo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [mm,v3] memcg: enable memory accounting in __alloc_pages_bulk | expand |
On Tue, Oct 12, 2021 at 7:58 AM Vasily Averin <vvs@virtuozzo.com> wrote: > > Enable memory accounting for bulk page allocator. > > Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") > Cc: <stable@vger.kernel.org> > Signed-off-by: Vasily Averin <vvs@virtuozzo.com> > --- > v3: added comments, > removed call of post charge hook for nr_pages = 0 > v2: modified according to Shakeel Butt's remarks > --- > include/linux/memcontrol.h | 11 +++++++++ > mm/memcontrol.c | 48 +++++++++++++++++++++++++++++++++++++- > mm/page_alloc.c | 21 ++++++++++++++++- > 3 files changed, 78 insertions(+), 2 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 3096c9a0ee01..990acd70c846 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg) > percpu_ref_put(&objcg->refcnt); > } > > +static inline void obj_cgroup_put_many(struct obj_cgroup *objcg, > + unsigned long nr) > +{ > + percpu_ref_put_many(&objcg->refcnt, nr); > +} > + > static inline void mem_cgroup_put(struct mem_cgroup *memcg) > { > if (memcg) > @@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) > > #endif /* CONFIG_MEMCG_KMEM */ > > +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp, > + unsigned int nr_pages); > +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page); > +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg, > + unsigned int nr_pages); > #endif /* _LINUX_MEMCONTROL_H */ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 87e41c3cac10..16fe3384c12c 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) > refill_obj_stock(objcg, size, true); > } > > -#endif /* CONFIG_MEMCG_KMEM */ > +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp, > + unsigned int nr_pages) > +{ > + struct obj_cgroup *objcg = NULL; No need to explicitly set it to NULL. > + > + if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT)) > + return true; > + > + objcg = get_obj_cgroup_from_current(); > + > + if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) { > + obj_cgroup_put(objcg); > + return false; > + } get_obj_cgroup_from_current() can return NULL, so you would need to return true early for that condition. > + obj_cgroup_get_many(objcg, nr_pages - 1); > + *objcgp = objcg; > + return true; > +} > + > +void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page) > +{ > + page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM; > +} > + > +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg, > + unsigned int nr_pages) > +{ > + obj_cgroup_uncharge_pages(objcg, nr_pages); > + obj_cgroup_put_many(objcg, nr_pages); > +} You can keep the following '#else' code block in the header file. > +#else /* !CONFIG_MEMCG_KMEM */ > +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp, > + unsigned int nr_pages) > +{ > + return true; > +} > + > +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page) > +{ > +} > + > +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg, > + unsigned int nr_pages) > +{ > +} > +#endif > +
On Tue, Oct 12, 2021 at 05:58:21PM +0300, Vasily Averin wrote: > Enable memory accounting for bulk page allocator. > > Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") > Cc: <stable@vger.kernel.org> > Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Mel Gorman <mgorman@techsingularity.net>
On Tue 12-10-21 17:58:21, Vasily Averin wrote:
> Enable memory accounting for bulk page allocator.
ENOCHANGELOG
And I have to say I am not very happy about the solution. It adds a very
tricky code where it splits different charging steps apart.
Would it be just too inefficient to charge page-by-page once all pages
are already taken away from the pcp lists? This bulk should be small so
this shouldn't really cause massive problems. I mean something like
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..8bcd69195ef5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
local_unlock_irqrestore(&pagesets.lock, flags);
+ if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
+ /* charge pages here */
+ }
+
__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 12-10-21 17:58:21, Vasily Averin wrote: > > Enable memory accounting for bulk page allocator. > > ENOCHANGELOG > > And I have to say I am not very happy about the solution. It adds a very > tricky code where it splits different charging steps apart. > > Would it be just too inefficient to charge page-by-page once all pages > are already taken away from the pcp lists? This bulk should be small so > this shouldn't really cause massive problems. I mean something like > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b37435c274cf..8bcd69195ef5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > local_unlock_irqrestore(&pagesets.lock, flags); > > + if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) { > + /* charge pages here */ > + } It is not that simple because __alloc_pages_bulk only allocate pages for empty slots in the page_array provided by the caller. The failure handling for post charging would be more complicated.
On Tue 12-10-21 09:08:38, Shakeel Butt wrote: > On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 12-10-21 17:58:21, Vasily Averin wrote: > > > Enable memory accounting for bulk page allocator. > > > > ENOCHANGELOG > > > > And I have to say I am not very happy about the solution. It adds a very > > tricky code where it splits different charging steps apart. > > > > Would it be just too inefficient to charge page-by-page once all pages > > are already taken away from the pcp lists? This bulk should be small so > > this shouldn't really cause massive problems. I mean something like > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index b37435c274cf..8bcd69195ef5 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > > > local_unlock_irqrestore(&pagesets.lock, flags); > > > > + if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) { > > + /* charge pages here */ > > + } > > It is not that simple because __alloc_pages_bulk only allocate pages > for empty slots in the page_array provided by the caller. > > The failure handling for post charging would be more complicated. If this is really that complicated (I haven't tried) then it would be much more simple to completely skip the bulk allocator for __GFP_ACCOUNT rather than add a tricky code. The bulk allocator is meant to be used for ultra hot paths and memcg charging along with the reclaim doesn't really fit into that model anyway. Or are there any actual users who really need bulk allocator optimization and also need memcg accounting?
On 12.10.2021 18:36, Michal Hocko wrote: > On Tue 12-10-21 17:58:21, Vasily Averin wrote: >> Enable memory accounting for bulk page allocator. > > ENOCHANGELOG > > And I have to say I am not very happy about the solution. It adds a very > tricky code where it splits different charging steps apart. > > Would it be just too inefficient to charge page-by-page once all pages > are already taken away from the pcp lists? This bulk should be small so > this shouldn't really cause massive problems. I mean something like > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b37435c274cf..8bcd69195ef5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > local_unlock_irqrestore(&pagesets.lock, flags); > > + if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) { > + /* charge pages here */ > + } > + > __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); > zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); > In general it looks like we can do it. We can traverse via filled page_array or page_list. For page_array we need to check is the page already accounted (incoming array can contain some pages already, both in the beginning and in middle) For each taken page we can try to charge it. If it was charges successfully -- we will process next page in list/array. When charge fails we need to remove rest of pages from list/array and somehow release them. At present I do not understand how to do it correctly -- perhaps just call free_page() ? Finally, we'll need to adjust nr_account and nr_populated properly. I'll try to implement this tomorrow. Thank you, Vasily Averin
On Tue, Oct 12, 2021 at 11:24 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 12-10-21 09:08:38, Shakeel Butt wrote: > > On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Tue 12-10-21 17:58:21, Vasily Averin wrote: > > > > Enable memory accounting for bulk page allocator. > > > > > > ENOCHANGELOG > > > > > > And I have to say I am not very happy about the solution. It adds a very > > > tricky code where it splits different charging steps apart. > > > > > > Would it be just too inefficient to charge page-by-page once all pages > > > are already taken away from the pcp lists? This bulk should be small so > > > this shouldn't really cause massive problems. I mean something like > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index b37435c274cf..8bcd69195ef5 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > > > > > local_unlock_irqrestore(&pagesets.lock, flags); > > > > > > + if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) { > > > + /* charge pages here */ > > > + } > > > > It is not that simple because __alloc_pages_bulk only allocate pages > > for empty slots in the page_array provided by the caller. > > > > The failure handling for post charging would be more complicated. > > If this is really that complicated (I haven't tried) then it would be > much more simple to completely skip the bulk allocator for __GFP_ACCOUNT > rather than add a tricky code. The bulk allocator is meant to be used > for ultra hot paths and memcg charging along with the reclaim doesn't > really fit into that model anyway. Or are there any actual users who > really need bulk allocator optimization and also need memcg accounting? Bulk allocator is being used for vmalloc and we have several kvmalloc() with __GFP_ACCOUNT allocations. It seems like Vasily has some ideas, so let's wait for his next version.
On Wed 13-10-21 09:41:15, Shakeel Butt wrote: > On Tue, Oct 12, 2021 at 11:24 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 12-10-21 09:08:38, Shakeel Butt wrote: > > > On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > On Tue 12-10-21 17:58:21, Vasily Averin wrote: > > > > > Enable memory accounting for bulk page allocator. > > > > > > > > ENOCHANGELOG > > > > > > > > And I have to say I am not very happy about the solution. It adds a very > > > > tricky code where it splits different charging steps apart. > > > > > > > > Would it be just too inefficient to charge page-by-page once all pages > > > > are already taken away from the pcp lists? This bulk should be small so > > > > this shouldn't really cause massive problems. I mean something like > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index b37435c274cf..8bcd69195ef5 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > > > > > > > local_unlock_irqrestore(&pagesets.lock, flags); > > > > > > > > + if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) { > > > > + /* charge pages here */ > > > > + } > > > > > > It is not that simple because __alloc_pages_bulk only allocate pages > > > for empty slots in the page_array provided by the caller. > > > > > > The failure handling for post charging would be more complicated. > > > > If this is really that complicated (I haven't tried) then it would be > > much more simple to completely skip the bulk allocator for __GFP_ACCOUNT > > rather than add a tricky code. The bulk allocator is meant to be used > > for ultra hot paths and memcg charging along with the reclaim doesn't > > really fit into that model anyway. Or are there any actual users who > > really need bulk allocator optimization and also need memcg accounting? > > Bulk allocator is being used for vmalloc and we have several > kvmalloc() with __GFP_ACCOUNT allocations. Do we really need to use bulk allocator for these allocations? Bulk allocator is an bypass of the page allocator for performance reason and I can see why that can be useful but considering that the charging path can imply some heavy lifting is all the code churn to make bulk allocator memcg aware really worth it? Why cannot we simply skip over bulk allocator for __GFP_ACCOUNT. That would be a trivial fix.
On Wed, Oct 13, 2021 at 10:16 AM Michal Hocko <mhocko@suse.com> wrote: > [...] > > > If this is really that complicated (I haven't tried) then it would be > > > much more simple to completely skip the bulk allocator for __GFP_ACCOUNT > > > rather than add a tricky code. The bulk allocator is meant to be used > > > for ultra hot paths and memcg charging along with the reclaim doesn't > > > really fit into that model anyway. Or are there any actual users who > > > really need bulk allocator optimization and also need memcg accounting? > > > > Bulk allocator is being used for vmalloc and we have several > > kvmalloc() with __GFP_ACCOUNT allocations. > > Do we really need to use bulk allocator for these allocations? > Bulk allocator is an bypass of the page allocator for performance reason > and I can see why that can be useful but considering that the charging > path can imply some heavy lifting is all the code churn to make bulk > allocator memcg aware really worth it? Why cannot we simply skip over > bulk allocator for __GFP_ACCOUNT. That would be a trivial fix. > -- Actually that might be the simplest solution and I agree to skip bulk allocator for __GFP_ACCOUNT allocations.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3096c9a0ee01..990acd70c846 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg) percpu_ref_put(&objcg->refcnt); } +static inline void obj_cgroup_put_many(struct obj_cgroup *objcg, + unsigned long nr) +{ + percpu_ref_put_many(&objcg->refcnt, nr); +} + static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) @@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) #endif /* CONFIG_MEMCG_KMEM */ +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp, + unsigned int nr_pages); +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page); +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg, + unsigned int nr_pages); #endif /* _LINUX_MEMCONTROL_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 87e41c3cac10..16fe3384c12c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) refill_obj_stock(objcg, size, true); } -#endif /* CONFIG_MEMCG_KMEM */ +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp, + unsigned int nr_pages) +{ + struct obj_cgroup *objcg = NULL; + + if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT)) + return true; + + objcg = get_obj_cgroup_from_current(); + + if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) { + obj_cgroup_put(objcg); + return false; + } + obj_cgroup_get_many(objcg, nr_pages - 1); + *objcgp = objcg; + return true; +} + +void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page) +{ + page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM; +} + +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg, + unsigned int nr_pages) +{ + obj_cgroup_uncharge_pages(objcg, nr_pages); + obj_cgroup_put_many(objcg, nr_pages); +} +#else /* !CONFIG_MEMCG_KMEM */ +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp, + unsigned int nr_pages) +{ + return true; +} + +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page) +{ +} + +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg, + unsigned int nr_pages) +{ +} +#endif + /* * Because page_memcg(head) is not set on tails, set it now. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b37435c274cf..23a8b55df508 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5207,6 +5207,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, gfp_t alloc_gfp; unsigned int alloc_flags = ALLOC_WMARK_LOW; int nr_populated = 0, nr_account = 0; + unsigned int nr_pre_charge = 0; + struct obj_cgroup *objcg = NULL; /* * Skip populated array elements to determine if any pages need @@ -5275,6 +5277,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; + nr_pre_charge = nr_pages - nr_populated; + /* pre-charge memory and take refcounts for each allocated page */ + if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge)) + goto failed; + /* Attempt the batch allocation */ local_lock_irqsave(&pagesets.lock, flags); pcp = this_cpu_ptr(zone->per_cpu_pageset); @@ -5299,6 +5306,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, nr_account++; prep_new_page(page, 0, gfp, 0); + if (objcg) + memcg_bulk_charge_hook(objcg, page); + if (page_list) list_add(&page->lru, page_list); else @@ -5310,13 +5320,22 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(ac.preferred_zoneref->zone, zone, nr_account); + /* + * Typically nr_pre_charge == nr_account, + * otherwise a few extra pages was pre-charged, + * and must be uncharged now. + */ + if (objcg && nr_pre_charge - nr_account) + memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account); out: return nr_populated; failed_irq: local_unlock_irqrestore(&pagesets.lock, flags); - + /* uncharge memory and decrement refcounts for all pre-charged pages */ + if (objcg) + memcg_bulk_post_charge_hook(objcg, nr_pre_charge); failed: page = __alloc_pages(gfp, 0, preferred_nid, nodemask); if (page) {
Enable memory accounting for bulk page allocator. Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") Cc: <stable@vger.kernel.org> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> --- v3: added comments, removed call of post charge hook for nr_pages = 0 v2: modified according to Shakeel Butt's remarks --- include/linux/memcontrol.h | 11 +++++++++ mm/memcontrol.c | 48 +++++++++++++++++++++++++++++++++++++- mm/page_alloc.c | 21 ++++++++++++++++- 3 files changed, 78 insertions(+), 2 deletions(-)