Message ID | 20211013194338.1804247-1-shakeelb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT | expand |
On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote: > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in > __vmalloc_area_node()") switched to bulk page allocator for order 0 > allocation backing vmalloc. However bulk page allocator does not support > __GFP_ACCOUNT allocations and there are several users of > kvmalloc(__GFP_ACCOUNT). > > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In > future if there is workload that can be significantly improved with the > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the > decision. > > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()") > Signed-off-by: Shakeel Butt <shakeelb@google.com> > --- > mm/page_alloc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 668edb16446a..b3acad4615d3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > unsigned int alloc_flags = ALLOC_WMARK_LOW; > int nr_populated = 0, nr_account = 0; > > + /* Bulk allocator does not support memcg accounting. */ > + if (unlikely(gfp & __GFP_ACCOUNT)) > + goto out; > + Isn't it a bit too aggressive? How about if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT)) gfp &= ~__GFP_ACCOUNT; And maybe with some explanatory message? Thanks!
On Wed, Oct 13, 2021 at 3:03 PM Roman Gushchin <guro@fb.com> wrote: > > On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote: > > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in > > __vmalloc_area_node()") switched to bulk page allocator for order 0 > > allocation backing vmalloc. However bulk page allocator does not support > > __GFP_ACCOUNT allocations and there are several users of > > kvmalloc(__GFP_ACCOUNT). > > > > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In > > future if there is workload that can be significantly improved with the > > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the > > decision. > > > > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()") > > Signed-off-by: Shakeel Butt <shakeelb@google.com> > > --- > > mm/page_alloc.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 668edb16446a..b3acad4615d3 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > unsigned int alloc_flags = ALLOC_WMARK_LOW; > > int nr_populated = 0, nr_account = 0; > > > > + /* Bulk allocator does not support memcg accounting. */ > > + if (unlikely(gfp & __GFP_ACCOUNT)) > > + goto out; > > + > > Isn't it a bit too aggressive? > > How about > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT)) We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can trigger bulk page allocator through vmalloc, so I don't think the warning would be any helpful. > gfp &= ~__GFP_ACCOUNT; Bulk allocator is best effort, so callers have adequate fallbacks. Transparently disabling accounting would be unexpected.
On Wed, Oct 13, 2021 at 03:26:11PM -0700, Shakeel Butt wrote: > On Wed, Oct 13, 2021 at 3:03 PM Roman Gushchin <guro@fb.com> wrote: > > > > On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote: > > > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in > > > __vmalloc_area_node()") switched to bulk page allocator for order 0 > > > allocation backing vmalloc. However bulk page allocator does not support > > > __GFP_ACCOUNT allocations and there are several users of > > > kvmalloc(__GFP_ACCOUNT). > > > > > > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In > > > future if there is workload that can be significantly improved with the > > > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the > > > decision. > > > > > > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()") > > > Signed-off-by: Shakeel Butt <shakeelb@google.com> > > > --- > > > mm/page_alloc.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 668edb16446a..b3acad4615d3 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > > unsigned int alloc_flags = ALLOC_WMARK_LOW; > > > int nr_populated = 0, nr_account = 0; > > > > > > + /* Bulk allocator does not support memcg accounting. */ > > > + if (unlikely(gfp & __GFP_ACCOUNT)) > > > + goto out; > > > + > > > > Isn't it a bit too aggressive? > > > > How about > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT)) > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can > trigger bulk page allocator through vmalloc, so I don't think the > warning would be any helpful. > > > gfp &= ~__GFP_ACCOUNT; > > Bulk allocator is best effort, so callers have adequate fallbacks. > Transparently disabling accounting would be unexpected. I see... Shouldn't we then move this check to an upper level? E.g.: if (!(gfp & __GFP_ACCOUNT)) call_into_bulk_allocator(); else call_into_per_page_allocator(); Not a big deal, I'm just worried about potential silent memory allocation failures. Thanks!
On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <guro@fb.com> wrote: > [...] > > > > > > Isn't it a bit too aggressive? > > > > > > How about > > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT)) > > > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can > > trigger bulk page allocator through vmalloc, so I don't think the > > warning would be any helpful. > > > > > gfp &= ~__GFP_ACCOUNT; > > > > Bulk allocator is best effort, so callers have adequate fallbacks. > > Transparently disabling accounting would be unexpected. > > I see... > > Shouldn't we then move this check to an upper level? > > E.g.: > > if (!(gfp & __GFP_ACCOUNT)) > call_into_bulk_allocator(); > else > call_into_per_page_allocator(); > If we add this check in the upper level (e.g. in vm_area_alloc_pages() ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the bulk allocator to detect future users. At the moment I am more inclined towards this patch's approach. Let's say in future we find there is a __GFP_ACCOUNT allocation which can benefit from bulk allocator and we decide to add such support in bulk allocator then we would not need to change the bulk allocator callers at that time just the bulk allocator.
On Wed, Oct 13, 2021 at 04:45:35PM -0700, Shakeel Butt wrote: > On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <guro@fb.com> wrote: > > > [...] > > > > > > > > Isn't it a bit too aggressive? > > > > > > > > How about > > > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT)) > > > > > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can > > > trigger bulk page allocator through vmalloc, so I don't think the > > > warning would be any helpful. > > > > > > > gfp &= ~__GFP_ACCOUNT; > > > > > > Bulk allocator is best effort, so callers have adequate fallbacks. > > > Transparently disabling accounting would be unexpected. > > > > I see... > > > > Shouldn't we then move this check to an upper level? > > > > E.g.: > > > > if (!(gfp & __GFP_ACCOUNT)) > > call_into_bulk_allocator(); > > else > > call_into_per_page_allocator(); > > > > If we add this check in the upper level (e.g. in vm_area_alloc_pages() > ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the > bulk allocator to detect future users. > > At the moment I am more inclined towards this patch's approach. Let's > say in future we find there is a __GFP_ACCOUNT allocation which can > benefit from bulk allocator and we decide to add such support in bulk > allocator then we would not need to change the bulk allocator callers > at that time just the bulk allocator. Ok, no objections from me. Thanks!
On Wed, Oct 13, 2021 at 04:45:35PM -0700, Shakeel Butt wrote: > On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <guro@fb.com> wrote: > > > [...] > > > > > > > > Isn't it a bit too aggressive? > > > > > > > > How about > > > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT)) > > > > > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can > > > trigger bulk page allocator through vmalloc, so I don't think the > > > warning would be any helpful. > > > > > > > gfp &= ~__GFP_ACCOUNT; > > > > > > Bulk allocator is best effort, so callers have adequate fallbacks. > > > Transparently disabling accounting would be unexpected. > > > > I see... > > > > Shouldn't we then move this check to an upper level? > > > > E.g.: > > > > if (!(gfp & __GFP_ACCOUNT)) > > call_into_bulk_allocator(); > > else > > call_into_per_page_allocator(); > > > > If we add this check in the upper level (e.g. in vm_area_alloc_pages() > ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the > bulk allocator to detect future users. > > At the moment I am more inclined towards this patch's approach. Let's > say in future we find there is a __GFP_ACCOUNT allocation which can > benefit from bulk allocator and we decide to add such support in bulk > allocator then we would not need to change the bulk allocator callers > at that time just the bulk allocator. I agree with you. Let's apply the patch as-is.
On 13.10.2021 22:43, Shakeel Butt wrote: > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in > __vmalloc_area_node()") switched to bulk page allocator for order 0 > allocation backing vmalloc. However bulk page allocator does not support > __GFP_ACCOUNT allocations and there are several users of > kvmalloc(__GFP_ACCOUNT). > > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In > future if there is workload that can be significantly improved with the > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the > decision. > > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()") > Signed-off-by: Shakeel Butt <shakeelb@google.com> > --- > mm/page_alloc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 668edb16446a..b3acad4615d3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > unsigned int alloc_flags = ALLOC_WMARK_LOW; > int nr_populated = 0, nr_account = 0; > > + /* Bulk allocator does not support memcg accounting. */ > + if (unlikely(gfp & __GFP_ACCOUNT)) > + goto out; > + May be (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) check is better here? > /* > * Skip populated array elements to determine if any pages need > * to be allocated before disabling IRQs. >
On Wed 13-10-21 12:43:38, Shakeel Butt wrote: [...] > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 668edb16446a..b3acad4615d3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > unsigned int alloc_flags = ALLOC_WMARK_LOW; > int nr_populated = 0, nr_account = 0; > > + /* Bulk allocator does not support memcg accounting. */ > + if (unlikely(gfp & __GFP_ACCOUNT)) > + goto out; Did you mean goto failed here? This would break some which do not have any fallback. E.g. xfs_buf_alloc_pages but likely more. Sorry I could have been more specific when talking about bypassing the bulk allocator. It is quite confusing because the bulk allocator interface consists of the bulk allocator and the fallback to the normal page allocator. > + > /* > * Skip populated array elements to determine if any pages need > * to be allocated before disabling IRQs. > -- > 2.33.0.882.g93a45727a2-goog
On Thu 14-10-21 10:05:21, Vasily Averin wrote: [...] > > + /* Bulk allocator does not support memcg accounting. */ > > + if (unlikely(gfp & __GFP_ACCOUNT)) > > + goto out; > > + > > May be (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) check is better here? Yes please
On Thu, Oct 14, 2021 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote: > > On Wed 13-10-21 12:43:38, Shakeel Butt wrote: > [...] > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 668edb16446a..b3acad4615d3 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > unsigned int alloc_flags = ALLOC_WMARK_LOW; > > int nr_populated = 0, nr_account = 0; > > > > + /* Bulk allocator does not support memcg accounting. */ > > + if (unlikely(gfp & __GFP_ACCOUNT)) > > + goto out; > > Did you mean goto failed here? This would break some which do not > have any fallback. E.g. xfs_buf_alloc_pages but likely more. > > Sorry I could have been more specific when talking about bypassing the > bulk allocator. It is quite confusing because the bulk allocator > interface consists of the bulk allocator and the fallback to the normal > page allocator. > I did consider 'goto failed' here but for that I have to move __GFP_ACCOUNT check after the "Already populated array" check in the function. Basically what's the point of doing other operations (incrementing nr_populated) if we are gonna skip bulk anyways. Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT and vmalloc() is the only __GFP_ACCOUNT user at this point. So, not an issue for now but I suppose it is better to be future-proof and do the 'goto failed'. Let me know what you think.
On Thu 14-10-21 08:01:16, Shakeel Butt wrote: > On Thu, Oct 14, 2021 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Wed 13-10-21 12:43:38, Shakeel Butt wrote: > > [...] > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 668edb16446a..b3acad4615d3 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, > > > unsigned int alloc_flags = ALLOC_WMARK_LOW; > > > int nr_populated = 0, nr_account = 0; > > > > > > + /* Bulk allocator does not support memcg accounting. */ > > > + if (unlikely(gfp & __GFP_ACCOUNT)) > > > + goto out; > > > > Did you mean goto failed here? This would break some which do not > > have any fallback. E.g. xfs_buf_alloc_pages but likely more. > > > > Sorry I could have been more specific when talking about bypassing the > > bulk allocator. It is quite confusing because the bulk allocator > > interface consists of the bulk allocator and the fallback to the normal > > page allocator. > > > > I did consider 'goto failed' here but for that I have to move > __GFP_ACCOUNT check after the "Already populated array" check in the > function. Basically what's the point of doing other operations > (incrementing nr_populated) if we are gonna skip bulk anyways. I have to say I do not follow why that is a problem. > Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT and > vmalloc() is the only __GFP_ACCOUNT user at this point. So, not an > issue for now but I suppose it is better to be future-proof and do the > 'goto failed'. Why do we want to have that silent trap?
On Thu, Oct 14, 2021 at 08:01:16AM -0700, Shakeel Butt wrote:
> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT
It probably should, actually. Sorry, somewhat off-topic, but we've
seen this consume quite large amounts of memory. __GFP_ACCOUNT and
vmstat tracking seems overdue for this one.
On 14.10.2021 18:23, Johannes Weiner wrote: > On Thu, Oct 14, 2021 at 08:01:16AM -0700, Shakeel Butt wrote: >> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT > > It probably should, actually. Sorry, somewhat off-topic, but we've > seen this consume quite large amounts of memory. __GFP_ACCOUNT and > vmstat tracking seems overdue for this one. If this will be required, you can use [PATCH mm v5] memcg: enable memory accounting in __alloc_pages_bulk https://lkml.org/lkml/2021/10/14/197 As far as I understand it will not be used right now, however I decided to submit it anyway, perhaps it may be needed later. Thank you, Vasily Averin
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 668edb16446a..b3acad4615d3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, unsigned int alloc_flags = ALLOC_WMARK_LOW; int nr_populated = 0, nr_account = 0; + /* Bulk allocator does not support memcg accounting. */ + if (unlikely(gfp & __GFP_ACCOUNT)) + goto out; + /* * Skip populated array elements to determine if any pages need * to be allocated before disabling IRQs.
The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()") switched to bulk page allocator for order 0 allocation backing vmalloc. However bulk page allocator does not support __GFP_ACCOUNT allocations and there are several users of kvmalloc(__GFP_ACCOUNT). For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In future if there is workload that can be significantly improved with the bulk page allocator with __GFP_ACCCOUNT support, we can revisit the decision. Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()") Signed-off-by: Shakeel Butt <shakeelb@google.com> --- mm/page_alloc.c | 4 ++++ 1 file changed, 4 insertions(+)