diff mbox series

[net-next,1/2] net: Fix skb consume leak in sch_handle_egress

Message ID 20230825134946.31083-1-daniel@iogearbox.net (mailing list archive)
State Accepted
Commit 28d18b673ffa2d13112ddb6e4c32c60d9b0cda50
Delegated to: Netdev Maintainers
Headers show
Series [net-next,1/2] net: Fix skb consume leak in sch_handle_egress | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1338 this patch: 1338
netdev/cc_maintainers fail 1 blamed authors not CCed: ast@kernel.org; 3 maintainers not CCed: ast@kernel.org bpf@vger.kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 1353 this patch: 1353
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1361 this patch: 1361
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Daniel Borkmann Aug. 25, 2023, 1:49 p.m. UTC
Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:

  [...]
  unreferenced object 0xffff88818bcb4f00 (size 232):
  comm "softirq", pid 0, jiffies 4299085078 (age 134.028s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff  ..pa.....A1.....
  backtrace:
    [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
    [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
    [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
    [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
    [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
    [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190
    [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
    [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00
    [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
    [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190
    [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
    [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
    [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
    [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
    [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
    [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
  [...]

I was able to reproduce this via:

  ip link add dev dummy0 type dummy
  ip link set dev dummy0 up
  tc qdisc add dev eth0 clsact
  tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
  ping 1.1.1.1
  <stolen>

After the fix, there are no kmemleak reports with the reproducer. This is
in line with what is also done on the ingress side, and from debugging the
skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
that these are two different skbs with both skb_unref(skb) as true. The two
seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
is false in tcf_mirred_act() for egress. This was initially reported by Gal.

Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
Reported-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com
---
 net/core/dev.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Simon Horman Aug. 26, 2023, 7:57 a.m. UTC | #1
On Fri, Aug 25, 2023 at 03:49:45PM +0200, Daniel Borkmann wrote:
> Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:
> 
>   [...]
>   unreferenced object 0xffff88818bcb4f00 (size 232):
>   comm "softirq", pid 0, jiffies 4299085078 (age 134.028s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff  ..pa.....A1.....
>   backtrace:
>     [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
>     [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
>     [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
>     [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
>     [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
>     [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190
>     [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
>     [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00
>     [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
>     [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190
>     [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
>     [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
>     [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
>     [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
>     [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
>     [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
>   [...]
> 
> I was able to reproduce this via:
> 
>   ip link add dev dummy0 type dummy
>   ip link set dev dummy0 up
>   tc qdisc add dev eth0 clsact
>   tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
>   ping 1.1.1.1
>   <stolen>
> 
> After the fix, there are no kmemleak reports with the reproducer. This is
> in line with what is also done on the ingress side, and from debugging the
> skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
> that these are two different skbs with both skb_unref(skb) as true. The two
> seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
> is false in tcf_mirred_act() for egress. This was initially reported by Gal.
> 
> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
> Reported-by: Gal Pressman <gal@nvidia.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com

Reviewed-by: Simon Horman <horms@kernel.org>
Gal Pressman Aug. 27, 2023, 1:55 p.m. UTC | #2
On 25/08/2023 16:49, Daniel Borkmann wrote:
> Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:
> 
>   [...]
>   unreferenced object 0xffff88818bcb4f00 (size 232):
>   comm "softirq", pid 0, jiffies 4299085078 (age 134.028s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff  ..pa.....A1.....
>   backtrace:
>     [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
>     [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
>     [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
>     [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
>     [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
>     [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190
>     [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
>     [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00
>     [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
>     [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190
>     [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
>     [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
>     [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
>     [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
>     [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
>     [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
>   [...]
> 
> I was able to reproduce this via:
> 
>   ip link add dev dummy0 type dummy
>   ip link set dev dummy0 up
>   tc qdisc add dev eth0 clsact
>   tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
>   ping 1.1.1.1
>   <stolen>
> 
> After the fix, there are no kmemleak reports with the reproducer. This is
> in line with what is also done on the ingress side, and from debugging the
> skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
> that these are two different skbs with both skb_unref(skb) as true. The two
> seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
> is false in tcf_mirred_act() for egress. This was initially reported by Gal.
> 
> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
> Reported-by: Gal Pressman <gal@nvidia.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com

I suspect that this series causes our regression to timeout due to some
stuck tests :\.
I'm not 100% sure yet though, verifying..
patchwork-bot+netdevbpf@kernel.org Aug. 28, 2023, 9:20 a.m. UTC | #3
Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri, 25 Aug 2023 15:49:45 +0200 you wrote:
> Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:
> 
>   [...]
>   unreferenced object 0xffff88818bcb4f00 (size 232):
>   comm "softirq", pid 0, jiffies 4299085078 (age 134.028s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff  ..pa.....A1.....
>   backtrace:
>     [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
>     [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
>     [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
>     [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
>     [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
>     [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190
>     [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
>     [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00
>     [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
>     [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190
>     [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
>     [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
>     [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
>     [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
>     [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
>     [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
>   [...]
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] net: Fix skb consume leak in sch_handle_egress
    https://git.kernel.org/netdev/net-next/c/28d18b673ffa
  - [net-next,2/2] net: Make consumed action consistent in sch_handle_egress
    https://git.kernel.org/netdev/net-next/c/3a1e2f43985a

You are awesome, thank you!
Gal Pressman Aug. 28, 2023, 12:55 p.m. UTC | #4
On 27/08/2023 16:55, Gal Pressman wrote:
> On 25/08/2023 16:49, Daniel Borkmann wrote:
>> Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}:
>>
>>   [...]
>>   unreferenced object 0xffff88818bcb4f00 (size 232):
>>   comm "softirq", pid 0, jiffies 4299085078 (age 134.028s)
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>     00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff  ..pa.....A1.....
>>   backtrace:
>>     [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400
>>     [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0
>>     [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0
>>     [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870
>>     [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0
>>     [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190
>>     [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470
>>     [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00
>>     [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230
>>     [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190
>>     [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10
>>     [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0
>>     [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450
>>     [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0
>>     [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420
>>     [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920
>>   [...]
>>
>> I was able to reproduce this via:
>>
>>   ip link add dev dummy0 type dummy
>>   ip link set dev dummy0 up
>>   tc qdisc add dev eth0 clsact
>>   tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0
>>   ping 1.1.1.1
>>   <stolen>
>>
>> After the fix, there are no kmemleak reports with the reproducer. This is
>> in line with what is also done on the ingress side, and from debugging the
>> skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
>> that these are two different skbs with both skb_unref(skb) as true. The two
>> seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
>> is false in tcf_mirred_act() for egress. This was initially reported by Gal.
>>
>> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
>> Reported-by: Gal Pressman <gal@nvidia.com>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com
> 
> I suspect that this series causes our regression to timeout due to some
> stuck tests :\.
> I'm not 100% sure yet though, verifying..

Seems like everything is passing now, hope it was a false alarm, will
report back if anything breaks.
Daniel Borkmann Aug. 28, 2023, 1:05 p.m. UTC | #5
On 8/28/23 2:55 PM, Gal Pressman wrote:
> On 27/08/2023 16:55, Gal Pressman wrote:
>> On 25/08/2023 16:49, Daniel Borkmann wrote:
[...]
>>> After the fix, there are no kmemleak reports with the reproducer. This is
>>> in line with what is also done on the ingress side, and from debugging the
>>> skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible
>>> that these are two different skbs with both skb_unref(skb) as true. The two
>>> seen skbs are due to mirred doing a skb_clone() internally as use_reinsert
>>> is false in tcf_mirred_act() for egress. This was initially reported by Gal.
>>>
>>> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support")
>>> Reported-by: Gal Pressman <gal@nvidia.com>
>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com
>>
>> I suspect that this series causes our regression to timeout due to some
>> stuck tests :\.
>> I'm not 100% sure yet though, verifying..
> 
> Seems like everything is passing now, hope it was a false alarm, will
> report back if anything breaks.

Sounds good, thanks for your help Gal!
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 17e6281e408c..9f6ed6d97f89 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4061,6 +4061,7 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
 	case TC_ACT_TRAP:
+		consume_skb(skb);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
 	}