diff mbox series

memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT

Message ID 20211013194338.1804247-1-shakeelb@google.com (mailing list archive)
State New
Headers show
Series memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT | expand

Commit Message

Shakeel Butt Oct. 13, 2021, 7:43 p.m. UTC
The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
__vmalloc_area_node()") switched to bulk page allocator for order 0
allocation backing vmalloc. However bulk page allocator does not support
__GFP_ACCOUNT allocations and there are several users of
kvmalloc(__GFP_ACCOUNT).

For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
future if there is workload that can be significantly improved with the
bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
decision.

Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/page_alloc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Roman Gushchin Oct. 13, 2021, 10:03 p.m. UTC | #1
On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote:
> The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> __vmalloc_area_node()") switched to bulk page allocator for order 0
> allocation backing vmalloc. However bulk page allocator does not support
> __GFP_ACCOUNT allocations and there are several users of
> kvmalloc(__GFP_ACCOUNT).
> 
> For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> future if there is workload that can be significantly improved with the
> bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> decision.
> 
> Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  mm/page_alloc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 668edb16446a..b3acad4615d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  	unsigned int alloc_flags = ALLOC_WMARK_LOW;
>  	int nr_populated = 0, nr_account = 0;
>  
> +	/* Bulk allocator does not support memcg accounting. */
> +	if (unlikely(gfp & __GFP_ACCOUNT))
> +		goto out;
> +

Isn't it a bit too aggressive?

How about
    if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
       gfp &= ~__GFP_ACCOUNT;

And maybe with some explanatory message?

Thanks!
Shakeel Butt Oct. 13, 2021, 10:26 p.m. UTC | #2
On Wed, Oct 13, 2021 at 3:03 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote:
> > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> > __vmalloc_area_node()") switched to bulk page allocator for order 0
> > allocation backing vmalloc. However bulk page allocator does not support
> > __GFP_ACCOUNT allocations and there are several users of
> > kvmalloc(__GFP_ACCOUNT).
> >
> > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> > future if there is workload that can be significantly improved with the
> > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> > decision.
> >
> > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
> > ---
> >  mm/page_alloc.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 668edb16446a..b3acad4615d3 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> >       unsigned int alloc_flags = ALLOC_WMARK_LOW;
> >       int nr_populated = 0, nr_account = 0;
> >
> > +     /* Bulk allocator does not support memcg accounting. */
> > +     if (unlikely(gfp & __GFP_ACCOUNT))
> > +             goto out;
> > +
>
> Isn't it a bit too aggressive?
>
> How about
>     if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))

We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
trigger bulk page allocator through vmalloc, so I don't think the
warning would be any helpful.

>        gfp &= ~__GFP_ACCOUNT;

Bulk allocator is best effort, so callers have adequate fallbacks.
Transparently disabling accounting would be unexpected.
Roman Gushchin Oct. 13, 2021, 11:15 p.m. UTC | #3
On Wed, Oct 13, 2021 at 03:26:11PM -0700, Shakeel Butt wrote:
> On Wed, Oct 13, 2021 at 3:03 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote:
> > > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> > > __vmalloc_area_node()") switched to bulk page allocator for order 0
> > > allocation backing vmalloc. However bulk page allocator does not support
> > > __GFP_ACCOUNT allocations and there are several users of
> > > kvmalloc(__GFP_ACCOUNT).
> > >
> > > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> > > future if there is workload that can be significantly improved with the
> > > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> > > decision.
> > >
> > > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> > > Signed-off-by: Shakeel Butt <shakeelb@google.com>
> > > ---
> > >  mm/page_alloc.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 668edb16446a..b3acad4615d3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > >       unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > >       int nr_populated = 0, nr_account = 0;
> > >
> > > +     /* Bulk allocator does not support memcg accounting. */
> > > +     if (unlikely(gfp & __GFP_ACCOUNT))
> > > +             goto out;
> > > +
> >
> > Isn't it a bit too aggressive?
> >
> > How about
> >     if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> 
> We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> trigger bulk page allocator through vmalloc, so I don't think the
> warning would be any helpful.
> 
> >        gfp &= ~__GFP_ACCOUNT;
> 
> Bulk allocator is best effort, so callers have adequate fallbacks.
> Transparently disabling accounting would be unexpected.

I see...

Shouldn't we then move this check to an upper level?

E.g.:

if (!(gfp & __GFP_ACCOUNT))
   call_into_bulk_allocator();
else
   call_into_per_page_allocator();

Not a big deal, I'm just worried about potential silent memory allocation
failures.

Thanks!
Shakeel Butt Oct. 13, 2021, 11:45 p.m. UTC | #4
On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <guro@fb.com> wrote:
>
[...]
> > >
> > > Isn't it a bit too aggressive?
> > >
> > > How about
> > >     if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> >
> > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> > trigger bulk page allocator through vmalloc, so I don't think the
> > warning would be any helpful.
> >
> > >        gfp &= ~__GFP_ACCOUNT;
> >
> > Bulk allocator is best effort, so callers have adequate fallbacks.
> > Transparently disabling accounting would be unexpected.
>
> I see...
>
> Shouldn't we then move this check to an upper level?
>
> E.g.:
>
> if (!(gfp & __GFP_ACCOUNT))
>    call_into_bulk_allocator();
> else
>    call_into_per_page_allocator();
>

If we add this check in the upper level (e.g. in vm_area_alloc_pages()
) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the
bulk allocator to detect future users.

At the moment I am more inclined towards this patch's approach. Let's
say in future we find there is a __GFP_ACCOUNT allocation which can
benefit from bulk allocator and we decide to add such support in bulk
allocator then we would not need to change the bulk allocator callers
at that time just the bulk allocator.
Roman Gushchin Oct. 14, 2021, 12:06 a.m. UTC | #5
On Wed, Oct 13, 2021 at 04:45:35PM -0700, Shakeel Butt wrote:
> On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <guro@fb.com> wrote:
> >
> [...]
> > > >
> > > > Isn't it a bit too aggressive?
> > > >
> > > > How about
> > > >     if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> > >
> > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> > > trigger bulk page allocator through vmalloc, so I don't think the
> > > warning would be any helpful.
> > >
> > > >        gfp &= ~__GFP_ACCOUNT;
> > >
> > > Bulk allocator is best effort, so callers have adequate fallbacks.
> > > Transparently disabling accounting would be unexpected.
> >
> > I see...
> >
> > Shouldn't we then move this check to an upper level?
> >
> > E.g.:
> >
> > if (!(gfp & __GFP_ACCOUNT))
> >    call_into_bulk_allocator();
> > else
> >    call_into_per_page_allocator();
> >
> 
> If we add this check in the upper level (e.g. in vm_area_alloc_pages()
> ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the
> bulk allocator to detect future users.
> 
> At the moment I am more inclined towards this patch's approach. Let's
> say in future we find there is a __GFP_ACCOUNT allocation which can
> benefit from bulk allocator and we decide to add such support in bulk
> allocator then we would not need to change the bulk allocator callers
> at that time just the bulk allocator.

Ok, no objections from me.

Thanks!
Matthew Wilcox Oct. 14, 2021, 12:08 a.m. UTC | #6
On Wed, Oct 13, 2021 at 04:45:35PM -0700, Shakeel Butt wrote:
> On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <guro@fb.com> wrote:
> >
> [...]
> > > >
> > > > Isn't it a bit too aggressive?
> > > >
> > > > How about
> > > >     if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> > >
> > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> > > trigger bulk page allocator through vmalloc, so I don't think the
> > > warning would be any helpful.
> > >
> > > >        gfp &= ~__GFP_ACCOUNT;
> > >
> > > Bulk allocator is best effort, so callers have adequate fallbacks.
> > > Transparently disabling accounting would be unexpected.
> >
> > I see...
> >
> > Shouldn't we then move this check to an upper level?
> >
> > E.g.:
> >
> > if (!(gfp & __GFP_ACCOUNT))
> >    call_into_bulk_allocator();
> > else
> >    call_into_per_page_allocator();
> >
> 
> If we add this check in the upper level (e.g. in vm_area_alloc_pages()
> ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the
> bulk allocator to detect future users.
> 
> At the moment I am more inclined towards this patch's approach. Let's
> say in future we find there is a __GFP_ACCOUNT allocation which can
> benefit from bulk allocator and we decide to add such support in bulk
> allocator then we would not need to change the bulk allocator callers
> at that time just the bulk allocator.

I agree with you.  Let's apply the patch as-is.
Vasily Averin Oct. 14, 2021, 7:05 a.m. UTC | #7
On 13.10.2021 22:43, Shakeel Butt wrote:
> The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> __vmalloc_area_node()") switched to bulk page allocator for order 0
> allocation backing vmalloc. However bulk page allocator does not support
> __GFP_ACCOUNT allocations and there are several users of
> kvmalloc(__GFP_ACCOUNT).
> 
> For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> future if there is workload that can be significantly improved with the
> bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> decision.
> 
> Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  mm/page_alloc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 668edb16446a..b3acad4615d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  	unsigned int alloc_flags = ALLOC_WMARK_LOW;
>  	int nr_populated = 0, nr_account = 0;
>  
> +	/* Bulk allocator does not support memcg accounting. */
> +	if (unlikely(gfp & __GFP_ACCOUNT))
> +		goto out;
> +

May be (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) check is better here?

>  	/*
>  	 * Skip populated array elements to determine if any pages need
>  	 * to be allocated before disabling IRQs.
>
Michal Hocko Oct. 14, 2021, 7:16 a.m. UTC | #8
On Wed 13-10-21 12:43:38, Shakeel Butt wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 668edb16446a..b3acad4615d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  	unsigned int alloc_flags = ALLOC_WMARK_LOW;
>  	int nr_populated = 0, nr_account = 0;
>  
> +	/* Bulk allocator does not support memcg accounting. */
> +	if (unlikely(gfp & __GFP_ACCOUNT))
> +		goto out;

Did you mean goto failed here? This would break some which do not
have any fallback. E.g. xfs_buf_alloc_pages but likely more.

Sorry I could have been more specific when talking about bypassing the
bulk allocator. It is quite confusing because the bulk allocator
interface consists of the bulk allocator and the fallback to the normal
page allocator.

> +
>  	/*
>  	 * Skip populated array elements to determine if any pages need
>  	 * to be allocated before disabling IRQs.
> -- 
> 2.33.0.882.g93a45727a2-goog
Michal Hocko Oct. 14, 2021, 7:18 a.m. UTC | #9
On Thu 14-10-21 10:05:21, Vasily Averin wrote:
[...]
> > +	/* Bulk allocator does not support memcg accounting. */
> > +	if (unlikely(gfp & __GFP_ACCOUNT))
> > +		goto out;
> > +
> 
> May be (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) check is better here?

Yes please
Shakeel Butt Oct. 14, 2021, 3:01 p.m. UTC | #10
On Thu, Oct 14, 2021 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 13-10-21 12:43:38, Shakeel Butt wrote:
> [...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 668edb16446a..b3acad4615d3 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> >       unsigned int alloc_flags = ALLOC_WMARK_LOW;
> >       int nr_populated = 0, nr_account = 0;
> >
> > +     /* Bulk allocator does not support memcg accounting. */
> > +     if (unlikely(gfp & __GFP_ACCOUNT))
> > +             goto out;
>
> Did you mean goto failed here? This would break some which do not
> have any fallback. E.g. xfs_buf_alloc_pages but likely more.
>
> Sorry I could have been more specific when talking about bypassing the
> bulk allocator. It is quite confusing because the bulk allocator
> interface consists of the bulk allocator and the fallback to the normal
> page allocator.
>

I did consider 'goto failed' here but for that I have to move
__GFP_ACCOUNT check after the "Already populated array" check in the
function. Basically what's the point of doing other operations
(incrementing nr_populated) if we are gonna skip bulk anyways.

Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT and
vmalloc() is the only __GFP_ACCOUNT user at this point. So, not an
issue for now but I suppose it is better to be future-proof and do the
'goto failed'.

Let me know what you think.
Michal Hocko Oct. 14, 2021, 3:13 p.m. UTC | #11
On Thu 14-10-21 08:01:16, Shakeel Butt wrote:
> On Thu, Oct 14, 2021 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 13-10-21 12:43:38, Shakeel Butt wrote:
> > [...]
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 668edb16446a..b3acad4615d3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > >       unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > >       int nr_populated = 0, nr_account = 0;
> > >
> > > +     /* Bulk allocator does not support memcg accounting. */
> > > +     if (unlikely(gfp & __GFP_ACCOUNT))
> > > +             goto out;
> >
> > Did you mean goto failed here? This would break some which do not
> > have any fallback. E.g. xfs_buf_alloc_pages but likely more.
> >
> > Sorry I could have been more specific when talking about bypassing the
> > bulk allocator. It is quite confusing because the bulk allocator
> > interface consists of the bulk allocator and the fallback to the normal
> > page allocator.
> >
> 
> I did consider 'goto failed' here but for that I have to move
> __GFP_ACCOUNT check after the "Already populated array" check in the
> function. Basically what's the point of doing other operations
> (incrementing nr_populated) if we are gonna skip bulk anyways.

I have to say I do not follow why that is a problem.

> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT and
> vmalloc() is the only __GFP_ACCOUNT user at this point. So, not an
> issue for now but I suppose it is better to be future-proof and do the
> 'goto failed'.

Why do we want to have that silent trap?
Johannes Weiner Oct. 14, 2021, 3:23 p.m. UTC | #12
On Thu, Oct 14, 2021 at 08:01:16AM -0700, Shakeel Butt wrote:
> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT

It probably should, actually. Sorry, somewhat off-topic, but we've
seen this consume quite large amounts of memory. __GFP_ACCOUNT and
vmstat tracking seems overdue for this one.
Vasily Averin Oct. 14, 2021, 5:43 p.m. UTC | #13
On 14.10.2021 18:23, Johannes Weiner wrote:
> On Thu, Oct 14, 2021 at 08:01:16AM -0700, Shakeel Butt wrote:
>> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT
> 
> It probably should, actually. Sorry, somewhat off-topic, but we've
> seen this consume quite large amounts of memory. __GFP_ACCOUNT and
> vmstat tracking seems overdue for this one.

If this will be required, you can use
[PATCH mm v5] memcg: enable memory accounting in __alloc_pages_bulk
https://lkml.org/lkml/2021/10/14/197

As far as I understand it will not be used right now,
however I decided to submit it anyway, perhaps it may be needed later.

Thank you,
	Vasily Averin
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 668edb16446a..b3acad4615d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5215,6 +5215,10 @@  unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	int nr_populated = 0, nr_account = 0;
 
+	/* Bulk allocator does not support memcg accounting. */
+	if (unlikely(gfp & __GFP_ACCOUNT))
+		goto out;
+
 	/*
 	 * Skip populated array elements to determine if any pages need
 	 * to be allocated before disabling IRQs.