diff mbox series

[net,v9] skb_expand_head() adjust skb->truesize incorrectly

Message ID 2bd9c638-3038-5aba-1dae-ad939e13c0c4@virtuozzo.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v9] skb_expand_head() adjust skb->truesize incorrectly | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 13 maintainers not CCed: edumazet@google.com ceggers@arri.de bjorn@kernel.org aahringo@redhat.com fw@strlen.de tglx@linutronix.de cong.wang@bytedance.com willemb@google.com jonathan.lemon@gmail.com xiangxia.m.yue@gmail.com alobakin@pm.me pabeni@redhat.com yangbo.lu@nxp.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 3070 this patch: 3070
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 73 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 3154 this patch: 3154
netdev/header_inline success Link

Commit Message

Vasily Averin Oct. 4, 2021, 1 p.m. UTC
Christoph Paasch reports [1] about incorrect skb->truesize
after skb_expand_head() call in ip6_xmit.
This may happen because of two reasons:
- skb_set_owner_w() for newly cloned skb is called too early,
before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
- pskb_expand_head() does not adjust truesize in (skb->sk) case.
In this case sk->sk_wmem_alloc should be adjusted too.

[1] https://lkml.org/lkml/2021/8/20/1082

Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v9: restored sock_edemux check
v8: clone non-wmem skb
v7 (from kuba@):
    shift more magic into helpers,
    follow Eric's advice and don't inherit non-wmem skbs for now
v6: fixed delta,
    improved comments
v5: fixed else condition, thanks to Eric
    reworked update of expanded skb,
    added corresponding comments
v4: decided to use is_skb_wmem() after pskb_expand_head() call
    fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
v3: removed __pskb_expand_head(),
    added is_skb_wmem() helper for skb with wmem-compatible destructors
    there are 2 ways to use it:
     - before pskb_expand_head(), to create skb clones
     - after successfull pskb_expand_head() to change owner on extended
       skb.
v2: based on patch version from Eric Dumazet,
    added __pskb_expand_head() function, which can be forced
    to adjust skb->truesize and sk->sk_wmem_alloc.
---
 include/net/sock.h |  1 +
 net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
 net/core/sock.c    |  8 ++++++++
 3 files changed, 31 insertions(+), 13 deletions(-)

Comments

Vasily Averin Oct. 4, 2021, 1:14 p.m. UTC | #1
Dear Eric,
could you please take look at this patch?

Original issue reported by Christoph Paasch is still not fixed,
however now buggy paches was merged into upstream.

v6 patch version [1] fixes the problem by careful change of destructor
on existing skb. I still think it is correct however I'm agree
it requires careful review.

[1] https://lkml.org/lkml/2021/9/6/584

This patch version is more simple and returns to cloning of non-wmem skb.

Both variants (i.e. this one and v6) resolves the problem.

Could you please review the patches, select one of them or propose some
better solution?

Thank you,
	Vasily Averin

On 10/4/21 4:00 PM, Vasily Averin wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v9: restored sock_edemux check
> v8: clone non-wmem skb
> v7 (from kuba@):
>     shift more magic into helpers,
>     follow Eric's advice and don't inherit non-wmem skbs for now
> v6: fixed delta,
>     improved comments
> v5: fixed else condition, thanks to Eric
>     reworked update of expanded skb,
>     added corresponding comments
> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
> v3: removed __pskb_expand_head(),
>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>     there are 2 ways to use it:
>      - before pskb_expand_head(), to create skb clones
>      - after successfull pskb_expand_head() to change owner on extended
>        skb.
> v2: based on patch version from Eric Dumazet,
>     added __pskb_expand_head() function, which can be forced
>     to adjust skb->truesize and sk->sk_wmem_alloc.
> ---
>  include/net/sock.h |  1 +
>  net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
>  net/core/sock.c    |  8 ++++++++
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 66a9a90f9558..a547651d46a7 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>  			     gfp_t priority);
>  void __sock_wfree(struct sk_buff *skb);
>  void sock_wfree(struct sk_buff *skb);
> +bool is_skb_wmem(const struct sk_buff *skb);
>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>  			     gfp_t priority);
>  void skb_orphan_partial(struct sk_buff *skb);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2170bea2c7de..8cb6d360cda5 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  {
>  	int delta = headroom - skb_headroom(skb);
> +	int osize = skb_end_offset(skb);
> +	struct sock *sk = skb->sk;
>  
>  	if (WARN_ONCE(delta <= 0,
>  		      "%s is expecting an increase in the headroom", __func__))
>  		return skb;
>  
> -	/* pskb_expand_head() might crash, if skb is shared */
> -	if (skb_shared(skb)) {
> +	delta = SKB_DATA_ALIGN(delta);
> +	/* pskb_expand_head() might crash, if skb is shared. */
> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>  
> -		if (likely(nskb)) {
> -			if (skb->sk)
> -				skb_set_owner_w(nskb, skb->sk);
> -			consume_skb(skb);
> -		} else {
> -			kfree_skb(skb);
> -		}
> +		if (unlikely(!nskb))
> +			goto fail;
> +
> +		if (sk)
> +			skb_set_owner_w(nskb, sk);
> +		consume_skb(skb);
>  		skb = nskb;
>  	}
> -	if (skb &&
> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> -		kfree_skb(skb);
> -		skb = NULL;
> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
> +		goto fail;
> +
> +	if (sk && skb->destructor != sock_edemux) {
> +		delta = skb_end_offset(skb) - osize;
> +		refcount_add(delta, &sk->sk_wmem_alloc);
> +		skb->truesize += delta;
>  	}
>  	return skb;
> +
> +fail:
> +	kfree_skb(skb);
> +	return NULL;
>  }
>  EXPORT_SYMBOL(skb_expand_head);
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 62627e868e03..1932755ae9ba 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>  
> +bool is_skb_wmem(const struct sk_buff *skb)
> +{
> +	return skb->destructor == sock_wfree ||
> +	       skb->destructor == __sock_wfree ||
> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
> +}
> +EXPORT_SYMBOL(is_skb_wmem);
> +
>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_TLS_DEVICE
>
Eric Dumazet Oct. 4, 2021, 7:26 p.m. UTC | #2
On 10/4/21 6:00 AM, Vasily Averin wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v9: restored sock_edemux check
> v8: clone non-wmem skb
> v7 (from kuba@):
>     shift more magic into helpers,
>     follow Eric's advice and don't inherit non-wmem skbs for now
> v6: fixed delta,
>     improved comments
> v5: fixed else condition, thanks to Eric
>     reworked update of expanded skb,
>     added corresponding comments
> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
> v3: removed __pskb_expand_head(),
>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>     there are 2 ways to use it:
>      - before pskb_expand_head(), to create skb clones
>      - after successfull pskb_expand_head() to change owner on extended
>        skb.
> v2: based on patch version from Eric Dumazet,
>     added __pskb_expand_head() function, which can be forced
>     to adjust skb->truesize and sk->sk_wmem_alloc.
> ---
>  include/net/sock.h |  1 +
>  net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
>  net/core/sock.c    |  8 ++++++++
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 66a9a90f9558..a547651d46a7 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>  			     gfp_t priority);
>  void __sock_wfree(struct sk_buff *skb);
>  void sock_wfree(struct sk_buff *skb);
> +bool is_skb_wmem(const struct sk_buff *skb);
>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>  			     gfp_t priority);
>  void skb_orphan_partial(struct sk_buff *skb);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2170bea2c7de..8cb6d360cda5 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  {
>  	int delta = headroom - skb_headroom(skb);
> +	int osize = skb_end_offset(skb);
> +	struct sock *sk = skb->sk;
>  
>  	if (WARN_ONCE(delta <= 0,
>  		      "%s is expecting an increase in the headroom", __func__))
>  		return skb;
>  
> -	/* pskb_expand_head() might crash, if skb is shared */
> -	if (skb_shared(skb)) {
> +	delta = SKB_DATA_ALIGN(delta);
> +	/* pskb_expand_head() might crash, if skb is shared. */
> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>  
> -		if (likely(nskb)) {
> -			if (skb->sk)
> -				skb_set_owner_w(nskb, skb->sk);
> -			consume_skb(skb);
> -		} else {
> -			kfree_skb(skb);
> -		}
> +		if (unlikely(!nskb))
> +			goto fail;
> +
> +		if (sk)
> +			skb_set_owner_w(nskb, sk);
> +		consume_skb(skb);
>  		skb = nskb;
>  	}
> -	if (skb &&
> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> -		kfree_skb(skb);
> -		skb = NULL;
> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
> +		goto fail;
> +
> +	if (sk && skb->destructor != sock_edemux) {

    Why not re-using is_skb_wmem() here ?
    Testing != sock_edemux looks strange.
> +		delta = skb_end_offset(skb) - osize;
> +		refcount_add(delta, &sk->sk_wmem_alloc);
> +		skb->truesize += delta;
>  	}
>  	return skb;
> +
> +fail:
> +	kfree_skb(skb);
> +	return NULL;
>  }
>  EXPORT_SYMBOL(skb_expand_head);
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 62627e868e03..1932755ae9ba 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>  
> +bool is_skb_wmem(const struct sk_buff *skb)
> +{
> +	return skb->destructor == sock_wfree ||
> +	       skb->destructor == __sock_wfree ||
> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
> +}
> +EXPORT_SYMBOL(is_skb_wmem);
> +

This probably should be inlined.

>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_TLS_DEVICE
>
Vasily Averin Oct. 5, 2021, 5:57 a.m. UTC | #3
On 10/4/21 10:26 PM, Eric Dumazet wrote:
> 
> 
> On 10/4/21 6:00 AM, Vasily Averin wrote:
>> Christoph Paasch reports [1] about incorrect skb->truesize
>> after skb_expand_head() call in ip6_xmit.
>> This may happen because of two reasons:
>> - skb_set_owner_w() for newly cloned skb is called too early,
>> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
>> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
>> In this case sk->sk_wmem_alloc should be adjusted too.
>>
>> [1] https://lkml.org/lkml/2021/8/20/1082
>>
>> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
>> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
>> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
>> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
>> ---
>> v9: restored sock_edemux check
>> v8: clone non-wmem skb
>> v7 (from kuba@):
>>     shift more magic into helpers,
>>     follow Eric's advice and don't inherit non-wmem skbs for now
>> v6: fixed delta,
>>     improved comments
>> v5: fixed else condition, thanks to Eric
>>     reworked update of expanded skb,
>>     added corresponding comments
>> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
>> v3: removed __pskb_expand_head(),
>>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>>     there are 2 ways to use it:
>>      - before pskb_expand_head(), to create skb clones
>>      - after successfull pskb_expand_head() to change owner on extended
>>        skb.
>> v2: based on patch version from Eric Dumazet,
>>     added __pskb_expand_head() function, which can be forced
>>     to adjust skb->truesize and sk->sk_wmem_alloc.
>> ---
>>  include/net/sock.h |  1 +
>>  net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
>>  net/core/sock.c    |  8 ++++++++
>>  3 files changed, 31 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 66a9a90f9558..a547651d46a7 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>>  			     gfp_t priority);
>>  void __sock_wfree(struct sk_buff *skb);
>>  void sock_wfree(struct sk_buff *skb);
>> +bool is_skb_wmem(const struct sk_buff *skb);
>>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>>  			     gfp_t priority);
>>  void skb_orphan_partial(struct sk_buff *skb);
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 2170bea2c7de..8cb6d360cda5 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>>  {
>>  	int delta = headroom - skb_headroom(skb);
>> +	int osize = skb_end_offset(skb);
>> +	struct sock *sk = skb->sk;
>>  
>>  	if (WARN_ONCE(delta <= 0,
>>  		      "%s is expecting an increase in the headroom", __func__))
>>  		return skb;
>>  
>> -	/* pskb_expand_head() might crash, if skb is shared */
>> -	if (skb_shared(skb)) {
>> +	delta = SKB_DATA_ALIGN(delta);
>> +	/* pskb_expand_head() might crash, if skb is shared. */
>> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>>  
>> -		if (likely(nskb)) {
>> -			if (skb->sk)
>> -				skb_set_owner_w(nskb, skb->sk);
>> -			consume_skb(skb);
>> -		} else {
>> -			kfree_skb(skb);
>> -		}
>> +		if (unlikely(!nskb))
>> +			goto fail;
>> +
>> +		if (sk)
>> +			skb_set_owner_w(nskb, sk);
>> +		consume_skb(skb);
>>  		skb = nskb;
>>  	}
>> -	if (skb &&
>> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
>> -		kfree_skb(skb);
>> -		skb = NULL;
>> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
>> +		goto fail;
>> +
>> +	if (sk && skb->destructor != sock_edemux) {
> 
>     Why not re-using is_skb_wmem() here ?
>     Testing != sock_edemux looks strange.

All non-wmem skbs was cloned and then was freed already.
After pskb_expand_head() call we can have:
(1) either original wmem skbs 
(2) or cloned skbs: 
 (2a) either without sk at all,
 (2b) or with sock_edemux destructor (that was set inside skb_set_owner_w() for !sk_fullsock(sk))
 (2c) or with sock_wfree destructor (that was set inside skb_set_owner_w() for sk_fullsock(sk))

(2a) and (2b) do not require truesize/sk_wmem_alloc update, it was handled inside pskb_expand_head()
(1) and (2c) cases are processed here.

If required I can add this explanation either into patch description or as comment.

Btw I just noticed that we can avoid cloning for original skbs without sk.
How do you think should I do it?

>> +		delta = skb_end_offset(skb) - osize;
>> +		refcount_add(delta, &sk->sk_wmem_alloc);
>> +		skb->truesize += delta;
>>  	}
>>  	return skb;
>> +
>> +fail:
>> +	kfree_skb(skb);
>> +	return NULL;
>>  }
>>  EXPORT_SYMBOL(skb_expand_head);
>>  
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 62627e868e03..1932755ae9ba 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>>  }
>>  EXPORT_SYMBOL(skb_set_owner_w);
>>  
>> +bool is_skb_wmem(const struct sk_buff *skb)
>> +{
>> +	return skb->destructor == sock_wfree ||
>> +	       skb->destructor == __sock_wfree ||
>> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
>> +}
>> +EXPORT_SYMBOL(is_skb_wmem);
>> +
> 
> This probably should be inlined.

David Miller pointed me out in the comments to an early version of the patch
"Please do not use inline in foo.c files, let the compiler decide."

>>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>>  {
>>  #ifdef CONFIG_TLS_DEVICE
>>
Eric Dumazet Oct. 20, 2021, 4:14 p.m. UTC | #4
On 10/4/21 10:57 PM, Vasily Averin wrote:

>>>  
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 62627e868e03..1932755ae9ba 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>>>  }
>>>  EXPORT_SYMBOL(skb_set_owner_w);
>>>  
>>> +bool is_skb_wmem(const struct sk_buff *skb)
>>> +{
>>> +	return skb->destructor == sock_wfree ||
>>> +	       skb->destructor == __sock_wfree ||
>>> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
>>> +}
>>> +EXPORT_SYMBOL(is_skb_wmem);
>>> +
>>
>> This probably should be inlined.
> 
> David Miller pointed me out in the comments to an early version of the patch
> "Please do not use inline in foo.c files, let the compiler decide."
> 

Sure, my suggestion was to move this helper in an include file,
and use

static inline bool ....

I would not suggest add an inline in a C file, unless absolutely critical.
Eric Dumazet Oct. 20, 2021, 4:18 p.m. UTC | #5
On 10/4/21 10:57 PM, Vasily Averin wrote:
> On 10/4/21 10:26 PM, Eric Dumazet wrote:

>>
>>     Why not re-using is_skb_wmem() here ?
>>     Testing != sock_edemux looks strange.
> 
> All non-wmem skbs was cloned and then was freed already.
> After pskb_expand_head() call we can have:
> (1) either original wmem skbs 
> (2) or cloned skbs: 
>  (2a) either without sk at all,
>  (2b) or with sock_edemux destructor (that was set inside skb_set_owner_w() for !sk_fullsock(sk))
>  (2c) or with sock_wfree destructor (that was set inside skb_set_owner_w() for sk_fullsock(sk))
> 
> (2a) and (2b) do not require truesize/sk_wmem_alloc update, it was handled inside pskb_expand_head()
> (1) and (2c) cases are processed here.
> 
> If required I can add this explanation either into patch description or as comment.
> 

sock_edemux is one of the current destructors.

New ones will be added later. We can not expect that in two or three years,
at least one reviewer will remember this special case.

I would prefer you list the known destructors (allow-list, instead of disallow-list)



> Btw I just noticed that we can avoid cloning for original skbs without sk.
> How do you think should I do it?

Lets wait a bit before new features...
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 66a9a90f9558..a547651d46a7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1695,6 +1695,7 @@  struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
 			     gfp_t priority);
 void __sock_wfree(struct sk_buff *skb);
 void sock_wfree(struct sk_buff *skb);
+bool is_skb_wmem(const struct sk_buff *skb);
 struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
 			     gfp_t priority);
 void skb_orphan_partial(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2170bea2c7de..8cb6d360cda5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1804,30 +1804,39 @@  EXPORT_SYMBOL(skb_realloc_headroom);
 struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
 {
 	int delta = headroom - skb_headroom(skb);
+	int osize = skb_end_offset(skb);
+	struct sock *sk = skb->sk;
 
 	if (WARN_ONCE(delta <= 0,
 		      "%s is expecting an increase in the headroom", __func__))
 		return skb;
 
-	/* pskb_expand_head() might crash, if skb is shared */
-	if (skb_shared(skb)) {
+	delta = SKB_DATA_ALIGN(delta);
+	/* pskb_expand_head() might crash, if skb is shared. */
+	if (skb_shared(skb) || !is_skb_wmem(skb)) {
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
-		if (likely(nskb)) {
-			if (skb->sk)
-				skb_set_owner_w(nskb, skb->sk);
-			consume_skb(skb);
-		} else {
-			kfree_skb(skb);
-		}
+		if (unlikely(!nskb))
+			goto fail;
+
+		if (sk)
+			skb_set_owner_w(nskb, sk);
+		consume_skb(skb);
 		skb = nskb;
 	}
-	if (skb &&
-	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
-		kfree_skb(skb);
-		skb = NULL;
+	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
+		goto fail;
+
+	if (sk && skb->destructor != sock_edemux) {
+		delta = skb_end_offset(skb) - osize;
+		refcount_add(delta, &sk->sk_wmem_alloc);
+		skb->truesize += delta;
 	}
 	return skb;
+
+fail:
+	kfree_skb(skb);
+	return NULL;
 }
 EXPORT_SYMBOL(skb_expand_head);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 62627e868e03..1932755ae9ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2227,6 +2227,14 @@  void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 }
 EXPORT_SYMBOL(skb_set_owner_w);
 
+bool is_skb_wmem(const struct sk_buff *skb)
+{
+	return skb->destructor == sock_wfree ||
+	       skb->destructor == __sock_wfree ||
+	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+EXPORT_SYMBOL(is_skb_wmem);
+
 static bool can_skb_orphan_partial(const struct sk_buff *skb)
 {
 #ifdef CONFIG_TLS_DEVICE