[net-next] netlink: use kvmalloc() in netlink_alloc_large_skb()

Message ID	20240224090630.605917-1-edumazet@google.com (mailing list archive)
State	Accepted
Commit	f8cbf6bde4c8d5d32330bcceafa7b139fec89f97
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7BE31B7EF for <netdev@vger.kernel.org>; Sat, 24 Feb 2024 09:06:32 +0000 (UTC) Date: Sat, 24 Feb 2024 09:06:30 +0000 Precedence: bulk Mime-Version: 1.0 Message-ID: <20240224090630.605917-1-edumazet@google.com> Subject: [PATCH net-next] netlink: use kvmalloc() in netlink_alloc_large_skb() From: Eric Dumazet <edumazet@google.com> To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org, eric.dumazet@gmail.com, Eric Dumazet <edumazet@google.com>, Zhengchao Shao <shaozhengchao@huawei.com> Content-Type: text/plain; charset="UTF-8"
Series	[net-next] netlink: use kvmalloc() in netlink_alloc_large_skb() \| expand [net-next] netlink: use kvmalloc() in netlink_alloc_large_skb()

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 945 this patch: 945
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 4 of 4 maintainers
netdev/build_clang	success	Errors and warnings before: 958 this patch: 958
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 962 this patch: 962
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2024-02-25--03-00 (tests: 1457)

Eric Dumazet Feb. 24, 2024, 9:06 a.m. UTC

This is a followup of commit 234ec0b6034b ("netlink: fix potential
sleeping issue in mqueue_flush_file"), because vfree_atomic()
overhead is unfortunate for medium sized allocations.

1) If the allocation is smaller than PAGE_SIZE, do not bother
   with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
   while NLMSG_GOODSIZE is smaller than 8KB.

2) Use kvmalloc(), which might allocate one high order page
   instead of vmalloc if memory is not too fragmented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
---
 net/netlink/af_netlink.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

shaozhengchao Feb. 26, 2024, 1:33 a.m. UTC | #1

On 2024/2/24 17:06, Eric Dumazet wrote:
> This is a followup of commit 234ec0b6034b ("netlink: fix potential
> sleeping issue in mqueue_flush_file"), because vfree_atomic()
> overhead is unfortunate for medium sized allocations.
> 
> 1) If the allocation is smaller than PAGE_SIZE, do not bother
>     with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
>     while NLMSG_GOODSIZE is smaller than 8KB.
> 
> 2) Use kvmalloc(), which might allocate one high order page
>     instead of vmalloc if memory is not too fragmented.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>   net/netlink/af_netlink.c | 18 ++++++++----------
>   1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 9c962347cf859f16fc76e4d8a2fd22cdb3d142d6..90ca4e0ed9b3632bf223bf29fd864dbb76f3c89c 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -1206,23 +1206,21 @@ struct sock *netlink_getsockbyfilp(struct file *filp)
>   
>   struct sk_buff *netlink_alloc_large_skb(unsigned int size, int broadcast)
>   {
> +	size_t head_size = SKB_HEAD_ALIGN(size);
>   	struct sk_buff *skb;
>   	void *data;
>   
> -	if (size <= NLMSG_GOODSIZE || broadcast)
> +	if (head_size <= PAGE_SIZE || broadcast)
>   		return alloc_skb(size, GFP_KERNEL);
>   
> -	size = SKB_DATA_ALIGN(size) +
> -	       SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -
> -	data = vmalloc(size);
> -	if (data == NULL)
> +	data = kvmalloc(head_size, GFP_KERNEL);
> +	if (!data)
>   		return NULL;
>   
> -	skb = __build_skb(data, size);
> -	if (skb == NULL)
> -		vfree(data);
> -	else
> +	skb = __build_skb(data, head_size);
> +	if (!skb)
> +		kvfree(data);
> +	else if (is_vmalloc_addr(data))
>   		skb->destructor = netlink_skb_destructor;
>   
>   	return skb;
LGTM, thanks.

Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com>

Jakub Kicinski Feb. 27, 2024, 5:52 p.m. UTC | #2

On Sat, 24 Feb 2024 09:06:30 +0000 Eric Dumazet wrote:
>  struct sk_buff *netlink_alloc_large_skb(unsigned int size, int broadcast)
>  {
> +	size_t head_size = SKB_HEAD_ALIGN(size);
>  	struct sk_buff *skb;
>  	void *data;
>  
> -	if (size <= NLMSG_GOODSIZE || broadcast)
> +	if (head_size <= PAGE_SIZE || broadcast)
>  		return alloc_skb(size, GFP_KERNEL);
>  
> -	size = SKB_DATA_ALIGN(size) +
> -	       SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -
> -	data = vmalloc(size);
> -	if (data == NULL)
> +	data = kvmalloc(head_size, GFP_KERNEL);
> +	if (!data)
>  		return NULL;
>  
> -	skb = __build_skb(data, size);
> -	if (skb == NULL)
> -		vfree(data);
> -	else
> +	skb = __build_skb(data, head_size);

Is this going to work with KFENCE? Don't we need similar size
adjustment logic as we have in __slab_build_skb() ?

> +	if (!skb)
> +		kvfree(data);
> +	else if (is_vmalloc_addr(data))
>  		skb->destructor = netlink_skb_destructor;

Eric Dumazet Feb. 27, 2024, 6:15 p.m. UTC | #3

On Tue, Feb 27, 2024 at 6:52 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 24 Feb 2024 09:06:30 +0000 Eric Dumazet wrote:
> >  struct sk_buff *netlink_alloc_large_skb(unsigned int size, int broadcast)
> >  {
> > +     size_t head_size = SKB_HEAD_ALIGN(size);
> >       struct sk_buff *skb;
> >       void *data;
> >
> > -     if (size <= NLMSG_GOODSIZE || broadcast)
> > +     if (head_size <= PAGE_SIZE || broadcast)
> >               return alloc_skb(size, GFP_KERNEL);
> >
> > -     size = SKB_DATA_ALIGN(size) +
> > -            SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > -
> > -     data = vmalloc(size);
> > -     if (data == NULL)
> > +     data = kvmalloc(head_size, GFP_KERNEL);
> > +     if (!data)
> >               return NULL;
> >
> > -     skb = __build_skb(data, size);
> > -     if (skb == NULL)
> > -             vfree(data);
> > -     else
> > +     skb = __build_skb(data, head_size);
>
> Is this going to work with KFENCE? Don't we need similar size
> adjustment logic as we have in __slab_build_skb() ?

Note that the 2nd argument of  __build_skb() has not been changed by my patch.

 SKB_HEAD_ALIGN(size) == SKB_DATA_ALIGN(size) +

SKB_DATA_ALIGN(sizeof(struct skb_shared_info));

I do not expect kfence being a problem here ?

Either data is vmalloc, and the patch is a no-op,
either it is kmalloc(), and __build_skb() does nothing special,
kfence magic already happened.

>
> > +     if (!skb)
> > +             kvfree(data);

Note that skb->head at this point must be equal to @data

> > +     else if (is_vmalloc_addr(data))
> >               skb->destructor = netlink_skb_destructor;

patchwork-bot+netdevbpf@kernel.org Feb. 27, 2024, 7:20 p.m. UTC | #4

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 24 Feb 2024 09:06:30 +0000 you wrote:
> This is a followup of commit 234ec0b6034b ("netlink: fix potential
> sleeping issue in mqueue_flush_file"), because vfree_atomic()
> overhead is unfortunate for medium sized allocations.
> 
> 1) If the allocation is smaller than PAGE_SIZE, do not bother
>    with vmalloc() at all. Some arches have 64KB PAGE_SIZE,
>    while NLMSG_GOODSIZE is smaller than 8KB.
> 
> [...]

Here is the summary with links:
  - [net-next] netlink: use kvmalloc() in netlink_alloc_large_skb()
    https://git.kernel.org/netdev/net-next/c/f8cbf6bde4c8

You are awesome, thank you!

[net-next] netlink: use kvmalloc() in netlink_alloc_large_skb()

Checks

Commit Message

Comments

Patch