diff mbox series

[2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE

Message ID 154106695670.898059.5301435081426064314.stgit@buzz (mailing list archive)
State New, archived
Headers show
Series [2] mm/kvmalloc: do not call kmalloc for size > KMALLOC_MAX_SIZE | expand

Commit Message

Konstantin Khlebnikov Nov. 1, 2018, 10:09 a.m. UTC
Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 mm/util.c |    4 ++++
 1 file changed, 4 insertions(+)

Comments

Michal Hocko Nov. 1, 2018, 10:24 a.m. UTC | #1
On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.

I would go on and say that allocations with sizes too large can actually
trigger a warning (once you have posted in the previous version outside
of the changelog area) because that might be interesting to people -
there are deployments to panic on warning and then a warning is much
more important.

> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/util.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..f5f04fa22814 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> +	if (size > KMALLOC_MAX_SIZE)
> +		goto fallback;
> +
>  	/*
>  	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>  	 * so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> +fallback:
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
>  }
>
Konstantin Khlebnikov Nov. 1, 2018, 10:48 a.m. UTC | #2
On 01.11.2018 13:24, Michal Hocko wrote:
> On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> 
> I would go on and say that allocations with sizes too large can actually
> trigger a warning (once you have posted in the previous version outside
> of the changelog area) because that might be interesting to people -
> there are deployments to panic on warning and then a warning is much
> more important.

It seems that warning isn't completely valid.


__alloc_pages_slowpath() handles this more gracefully:

	/*
	 * In the slowpath, we sanity check order to avoid ever trying to
	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
	 * be using allocators in order of preference for an area that is
	 * too large.
	 */
	if (order >= MAX_ORDER) {
		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
		return NULL;
	}


Fast path is ready for order >= MAX_ORDER


Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
from surprising place - get_page_from_freelist()


Probably node_reclaim() simply needs something like this:

	if (order >= MAX_ORDER)
		return NODE_RECLAIM_NOSCAN;


> 
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
> Thanks!
> 
>> ---
>>   mm/util.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 8bf08b5b5760..f5f04fa22814 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	gfp_t kmalloc_flags = flags;
>>   	void *ret;
>>   
>> +	if (size > KMALLOC_MAX_SIZE)
>> +		goto fallback;
>> +
>>   	/*
>>   	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>   	 * so the given set of flags has to be compatible.
>> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	if (ret || size <= PAGE_SIZE)
>>   		return ret;
>>   
>> +fallback:
>>   	return __vmalloc_node_flags_caller(size, node, flags,
>>   			__builtin_return_address(0));
>>   }
>>
>
Michal Hocko Nov. 1, 2018, 12:55 p.m. UTC | #3
On Thu 01-11-18 13:48:17, Konstantin Khlebnikov wrote:
> 
> 
> On 01.11.2018 13:24, Michal Hocko wrote:
> > On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> > > Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> > 
> > I would go on and say that allocations with sizes too large can actually
> > trigger a warning (once you have posted in the previous version outside
> > of the changelog area) because that might be interesting to people -
> > there are deployments to panic on warning and then a warning is much
> > more important.
> 
> It seems that warning isn't completely valid.
> 
> 
> __alloc_pages_slowpath() handles this more gracefully:
> 
> 	/*
> 	 * In the slowpath, we sanity check order to avoid ever trying to
> 	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> 	 * be using allocators in order of preference for an area that is
> 	 * too large.
> 	 */
> 	if (order >= MAX_ORDER) {
> 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> 		return NULL;
> 	}
> 
> 
> Fast path is ready for order >= MAX_ORDER
> 
> 
> Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
> from surprising place - get_page_from_freelist()
> 
> 
> Probably node_reclaim() simply needs something like this:
> 
> 	if (order >= MAX_ORDER)
> 		return NODE_RECLAIM_NOSCAN;

Maybe but the point is that triggering this warning is possible. Even if
the warning is bogus it doesn't really make much sense to even try
kmalloc if the size is not supported by the allocator.
Konstantin Khlebnikov Nov. 1, 2018, 4:42 p.m. UTC | #4
On 01.11.2018 15:55, Michal Hocko wrote:
> On Thu 01-11-18 13:48:17, Konstantin Khlebnikov wrote:
>>
>>
>> On 01.11.2018 13:24, Michal Hocko wrote:
>>> On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
>>>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
>>>
>>> I would go on and say that allocations with sizes too large can actually
>>> trigger a warning (once you have posted in the previous version outside
>>> of the changelog area) because that might be interesting to people -
>>> there are deployments to panic on warning and then a warning is much
>>> more important.
>>
>> It seems that warning isn't completely valid.
>>
>>
>> __alloc_pages_slowpath() handles this more gracefully:
>>
>> 	/*
>> 	 * In the slowpath, we sanity check order to avoid ever trying to
>> 	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
>> 	 * be using allocators in order of preference for an area that is
>> 	 * too large.
>> 	 */
>> 	if (order >= MAX_ORDER) {
>> 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>> 		return NULL;
>> 	}
>>
>>
>> Fast path is ready for order >= MAX_ORDER
>>
>>
>> Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
>> from surprising place - get_page_from_freelist()
>>
>>
>> Probably node_reclaim() simply needs something like this:
>>
>> 	if (order >= MAX_ORDER)
>> 		return NODE_RECLAIM_NOSCAN;
> 
> Maybe but the point is that triggering this warning is possible. Even if
> the warning is bogus it doesn't really make much sense to even try
> kmalloc if the size is not supported by the allocator.
> 

But __GFP_NOWARN allocation (like in this case) should just fail silently
without warnings regardless of reason because caller can deal with that.

Without __GFP_NOWARN allocator should print standard warning.

Caller anyway must handle NULL\ENOMEM result - this error path
should be used for handling impossible sizes too.
Of course it could check size first, just as optimization.
Michal Hocko Nov. 1, 2018, 4:55 p.m. UTC | #5
On Thu 01-11-18 19:42:48, Konstantin Khlebnikov wrote:
> On 01.11.2018 15:55, Michal Hocko wrote:
> > On Thu 01-11-18 13:48:17, Konstantin Khlebnikov wrote:
> > > 
> > > 
> > > On 01.11.2018 13:24, Michal Hocko wrote:
> > > > On Thu 01-11-18 13:09:16, Konstantin Khlebnikov wrote:
> > > > > Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> > > > 
> > > > I would go on and say that allocations with sizes too large can actually
> > > > trigger a warning (once you have posted in the previous version outside
> > > > of the changelog area) because that might be interesting to people -
> > > > there are deployments to panic on warning and then a warning is much
> > > > more important.
> > > 
> > > It seems that warning isn't completely valid.
> > > 
> > > 
> > > __alloc_pages_slowpath() handles this more gracefully:
> > > 
> > > 	/*
> > > 	 * In the slowpath, we sanity check order to avoid ever trying to
> > > 	 * reclaim >= MAX_ORDER areas which will never succeed. Callers may
> > > 	 * be using allocators in order of preference for an area that is
> > > 	 * too large.
> > > 	 */
> > > 	if (order >= MAX_ORDER) {
> > > 		WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> > > 		return NULL;
> > > 	}
> > > 
> > > 
> > > Fast path is ready for order >= MAX_ORDER
> > > 
> > > 
> > > Problem is in node_reclaim() which is called earlier than __alloc_pages_slowpath()
> > > from surprising place - get_page_from_freelist()
> > > 
> > > 
> > > Probably node_reclaim() simply needs something like this:
> > > 
> > > 	if (order >= MAX_ORDER)
> > > 		return NODE_RECLAIM_NOSCAN;
> > 
> > Maybe but the point is that triggering this warning is possible. Even if
> > the warning is bogus it doesn't really make much sense to even try
> > kmalloc if the size is not supported by the allocator.
> > 
> 
> But __GFP_NOWARN allocation (like in this case) should just fail silently
> without warnings regardless of reason because caller can deal with that.

__GFP_NOWARN is not about no warning to be triggered from the allocation
context. It is more about not complaining about the allocation failure.
I do not think we want to check the gfp mask in all possible paths
triggered from the allocator/reclaim.

I have just looked at the original warning you have hit and it came from
88d6ac40c1c6 ("mm/vmstat: fix divide error at __fragmentation_index"). I
would argue that the warning is a bit of an over-reaction. Regardless of
the gfp_mask.
Vlastimil Babka Nov. 5, 2018, 1:03 p.m. UTC | #6
On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Makes sense regardless of warnings stuff.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

But it must be moved below the GFP_KERNEL check!

> ---
>  mm/util.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..f5f04fa22814 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> +	if (size > KMALLOC_MAX_SIZE)
> +		goto fallback;
> +
>  	/*
>  	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>  	 * so the given set of flags has to be compatible.
> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> +fallback:
>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
>  }
>
Konstantin Khlebnikov Nov. 5, 2018, 4:19 p.m. UTC | #7
On 05.11.2018 16:03, Vlastimil Babka wrote:
> On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> 
> Makes sense regardless of warnings stuff.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> But it must be moved below the GFP_KERNEL check!

But kmalloc cannot handle it regardless of GFP.

Ok maybe write something like this

if (size > KMALLOC_MAX_SIZE) {
	if (WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL)
		return NULL;
	goto do_vmalloc;
}

or fix that uncertainty right in vmalloc

For now comment in vmalloc declares

  *	Any use of gfp flags outside of GFP_KERNEL should be consulted
  *	with mm people.

=)

> 
>> ---
>>   mm/util.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 8bf08b5b5760..f5f04fa22814 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	gfp_t kmalloc_flags = flags;
>>   	void *ret;
>>   
>> +	if (size > KMALLOC_MAX_SIZE)
>> +		goto fallback;
>> +
>>   	/*
>>   	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>   	 * so the given set of flags has to be compatible.
>> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   	if (ret || size <= PAGE_SIZE)
>>   		return ret;
>>   
>> +fallback:
>>   	return __vmalloc_node_flags_caller(size, node, flags,
>>   			__builtin_return_address(0));
>>   }
>>
>
Vlastimil Babka Nov. 5, 2018, 4:52 p.m. UTC | #8
On 11/5/18 5:19 PM, Konstantin Khlebnikov wrote:
> 
> 
> On 05.11.2018 16:03, Vlastimil Babka wrote:
>> On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
>>> Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>
>> Makes sense regardless of warnings stuff.
>>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>>
>> But it must be moved below the GFP_KERNEL check!
> 
> But kmalloc cannot handle it regardless of GFP.

Sure, but that's less problematic than skipping to vmalloc() for
!GFP_KERNEL. Especially for large sizes where it's likely that page
tables might get allocated (with GFP_KERNEL).

> Ok maybe write something like this
> 
> if (size > KMALLOC_MAX_SIZE) {
> 	if (WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL)
> 		return NULL;
> 	goto do_vmalloc;
> }

Probably should check also for __GFP_NOWARN.

> or fix that uncertainty right in vmalloc
> 
> For now comment in vmalloc declares
> 
>   *	Any use of gfp flags outside of GFP_KERNEL should be consulted
>   *	with mm people.

Dunno, what does Michal think?

> =)
> 
>>
>>> ---
>>>   mm/util.c |    4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/mm/util.c b/mm/util.c
>>> index 8bf08b5b5760..f5f04fa22814 100644
>>> --- a/mm/util.c
>>> +++ b/mm/util.c
>>> @@ -392,6 +392,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>>   	gfp_t kmalloc_flags = flags;
>>>   	void *ret;
>>>   
>>> +	if (size > KMALLOC_MAX_SIZE)
>>> +		goto fallback;
>>> +
>>>   	/*
>>>   	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
>>>   	 * so the given set of flags has to be compatible.
>>> @@ -422,6 +425,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>>   	if (ret || size <= PAGE_SIZE)
>>>   		return ret;
>>>   
>>> +fallback:
>>>   	return __vmalloc_node_flags_caller(size, node, flags,
>>>   			__builtin_return_address(0));
>>>   }
>>>
>>
Michal Hocko Nov. 5, 2018, 4:57 p.m. UTC | #9
On Mon 05-11-18 19:19:28, Konstantin Khlebnikov wrote:
> 
> 
> On 05.11.2018 16:03, Vlastimil Babka wrote:
> > On 11/1/18 11:09 AM, Konstantin Khlebnikov wrote:
> > > Allocations over KMALLOC_MAX_SIZE could be served only by vmalloc.
> > > 
> > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > 
> > Makes sense regardless of warnings stuff.
> > 
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > But it must be moved below the GFP_KERNEL check!
> 
> But kmalloc cannot handle it regardless of GFP.
> 
> Ok maybe write something like this
> 
> if (size > KMALLOC_MAX_SIZE) {
> 	if (WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL)
> 		return NULL;
> 	goto do_vmalloc;
> }

Do we really have to be so defensive? I agree with Vlastimil that the
check should be done after GFP_KERNEL check (I should have noticed that).
kmalloc should already complain on the allocation size request.

> or fix that uncertainty right in vmalloc
> 
> For now comment in vmalloc declares
> 
>  *	Any use of gfp flags outside of GFP_KERNEL should be consulted
>  *	with mm people.

Which is what we want. There are some exceptional cases where using a
subset of GFP_KERNEL works fine (e.g. scope nofs/noio context).
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index 8bf08b5b5760..f5f04fa22814 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -392,6 +392,9 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	gfp_t kmalloc_flags = flags;
 	void *ret;
 
+	if (size > KMALLOC_MAX_SIZE)
+		goto fallback;
+
 	/*
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
@@ -422,6 +425,7 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+fallback:
 	return __vmalloc_node_flags_caller(size, node, flags,
 			__builtin_return_address(0));
 }