Message ID | 20230825134946.31083-1-daniel@iogearbox.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 28d18b673ffa2d13112ddb6e4c32c60d9b0cda50 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,1/2] net: Fix skb consume leak in sch_handle_egress | expand |
On Fri, Aug 25, 2023 at 03:49:45PM +0200, Daniel Borkmann wrote: > Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: > > [...] > unreferenced object 0xffff88818bcb4f00 (size 232): > comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... > backtrace: > [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 > [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 > [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 > [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 > [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 > [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 > [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 > [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 > [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 > [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 > [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 > [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 > [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 > [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 > [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 > [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 > [...] > > I was able to reproduce this via: > > ip link add dev dummy0 type dummy > ip link set dev dummy0 up > tc qdisc add dev eth0 clsact > tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0 > ping 1.1.1.1 > <stolen> > > After the fix, there are no kmemleak reports with the reproducer. This is > in line with what is also done on the ingress side, and from debugging the > skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible > that these are two different skbs with both skb_unref(skb) as true. The two > seen skbs are due to mirred doing a skb_clone() internally as use_reinsert > is false in tcf_mirred_act() for egress. This was initially reported by Gal. > > Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") > Reported-by: Gal Pressman <gal@nvidia.com> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com Reviewed-by: Simon Horman <horms@kernel.org>
On 25/08/2023 16:49, Daniel Borkmann wrote: > Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: > > [...] > unreferenced object 0xffff88818bcb4f00 (size 232): > comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... > backtrace: > [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 > [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 > [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 > [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 > [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 > [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 > [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 > [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 > [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 > [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 > [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 > [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 > [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 > [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 > [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 > [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 > [...] > > I was able to reproduce this via: > > ip link add dev dummy0 type dummy > ip link set dev dummy0 up > tc qdisc add dev eth0 clsact > tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0 > ping 1.1.1.1 > <stolen> > > After the fix, there are no kmemleak reports with the reproducer. This is > in line with what is also done on the ingress side, and from debugging the > skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible > that these are two different skbs with both skb_unref(skb) as true. The two > seen skbs are due to mirred doing a skb_clone() internally as use_reinsert > is false in tcf_mirred_act() for egress. This was initially reported by Gal. > > Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") > Reported-by: Gal Pressman <gal@nvidia.com> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com I suspect that this series causes our regression to timeout due to some stuck tests :\. I'm not 100% sure yet though, verifying..
Hello: This series was applied to netdev/net-next.git (main) by David S. Miller <davem@davemloft.net>: On Fri, 25 Aug 2023 15:49:45 +0200 you wrote: > Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: > > [...] > unreferenced object 0xffff88818bcb4f00 (size 232): > comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... > backtrace: > [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 > [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 > [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 > [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 > [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 > [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 > [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 > [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 > [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 > [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 > [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 > [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 > [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 > [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 > [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 > [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 > [...] > > [...] Here is the summary with links: - [net-next,1/2] net: Fix skb consume leak in sch_handle_egress https://git.kernel.org/netdev/net-next/c/28d18b673ffa - [net-next,2/2] net: Make consumed action consistent in sch_handle_egress https://git.kernel.org/netdev/net-next/c/3a1e2f43985a You are awesome, thank you!
On 27/08/2023 16:55, Gal Pressman wrote: > On 25/08/2023 16:49, Daniel Borkmann wrote: >> Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: >> >> [...] >> unreferenced object 0xffff88818bcb4f00 (size 232): >> comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) >> hex dump (first 32 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... >> backtrace: >> [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 >> [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 >> [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 >> [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 >> [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 >> [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 >> [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 >> [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 >> [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 >> [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 >> [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 >> [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 >> [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 >> [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 >> [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 >> [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 >> [...] >> >> I was able to reproduce this via: >> >> ip link add dev dummy0 type dummy >> ip link set dev dummy0 up >> tc qdisc add dev eth0 clsact >> tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0 >> ping 1.1.1.1 >> <stolen> >> >> After the fix, there are no kmemleak reports with the reproducer. This is >> in line with what is also done on the ingress side, and from debugging the >> skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible >> that these are two different skbs with both skb_unref(skb) as true. The two >> seen skbs are due to mirred doing a skb_clone() internally as use_reinsert >> is false in tcf_mirred_act() for egress. This was initially reported by Gal. >> >> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") >> Reported-by: Gal Pressman <gal@nvidia.com> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> >> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com > > I suspect that this series causes our regression to timeout due to some > stuck tests :\. > I'm not 100% sure yet though, verifying.. Seems like everything is passing now, hope it was a false alarm, will report back if anything breaks.
On 8/28/23 2:55 PM, Gal Pressman wrote: > On 27/08/2023 16:55, Gal Pressman wrote: >> On 25/08/2023 16:49, Daniel Borkmann wrote: [...] >>> After the fix, there are no kmemleak reports with the reproducer. This is >>> in line with what is also done on the ingress side, and from debugging the >>> skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible >>> that these are two different skbs with both skb_unref(skb) as true. The two >>> seen skbs are due to mirred doing a skb_clone() internally as use_reinsert >>> is false in tcf_mirred_act() for egress. This was initially reported by Gal. >>> >>> Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") >>> Reported-by: Gal Pressman <gal@nvidia.com> >>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> >>> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com >> >> I suspect that this series causes our regression to timeout due to some >> stuck tests :\. >> I'm not 100% sure yet though, verifying.. > > Seems like everything is passing now, hope it was a false alarm, will > report back if anything breaks. Sounds good, thanks for your help Gal!
diff --git a/net/core/dev.c b/net/core/dev.c index 17e6281e408c..9f6ed6d97f89 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4061,6 +4061,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) case TC_ACT_STOLEN: case TC_ACT_QUEUED: case TC_ACT_TRAP: + consume_skb(skb); *ret = NET_XMIT_SUCCESS; return NULL; }
Fix a memory leak for the tc egress path with TC_ACT_{STOLEN,QUEUED,TRAP}: [...] unreferenced object 0xffff88818bcb4f00 (size 232): comm "softirq", pid 0, jiffies 4299085078 (age 134.028s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 80 70 61 81 88 ff ff 00 41 31 14 81 88 ff ff ..pa.....A1..... backtrace: [<ffffffff9991b938>] kmem_cache_alloc_node+0x268/0x400 [<ffffffff9b3d9231>] __alloc_skb+0x211/0x2c0 [<ffffffff9b3f0c7e>] alloc_skb_with_frags+0xbe/0x6b0 [<ffffffff9b3bf9a9>] sock_alloc_send_pskb+0x6a9/0x870 [<ffffffff9b6b3f00>] __ip_append_data+0x14d0/0x3bf0 [<ffffffff9b6ba24e>] ip_append_data+0xee/0x190 [<ffffffff9b7e1496>] icmp_push_reply+0xa6/0x470 [<ffffffff9b7e4030>] icmp_reply+0x900/0xa00 [<ffffffff9b7e42e3>] icmp_echo.part.0+0x1a3/0x230 [<ffffffff9b7e444d>] icmp_echo+0xcd/0x190 [<ffffffff9b7e9566>] icmp_rcv+0x806/0xe10 [<ffffffff9b699bd1>] ip_protocol_deliver_rcu+0x351/0x3d0 [<ffffffff9b699f14>] ip_local_deliver_finish+0x2b4/0x450 [<ffffffff9b69a234>] ip_local_deliver+0x174/0x1f0 [<ffffffff9b69a4b2>] ip_sublist_rcv_finish+0x1f2/0x420 [<ffffffff9b69ab56>] ip_sublist_rcv+0x466/0x920 [...] I was able to reproduce this via: ip link add dev dummy0 type dummy ip link set dev dummy0 up tc qdisc add dev eth0 clsact tc filter add dev eth0 egress protocol ip prio 1 u32 match ip protocol 1 0xff action mirred egress redirect dev dummy0 ping 1.1.1.1 <stolen> After the fix, there are no kmemleak reports with the reproducer. This is in line with what is also done on the ingress side, and from debugging the skb_unref(skb) on dummy xmit and sch_handle_egress() side, it is visible that these are two different skbs with both skb_unref(skb) as true. The two seen skbs are due to mirred doing a skb_clone() internally as use_reinsert is false in tcf_mirred_act() for egress. This was initially reported by Gal. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/bdfc2640-8f65-5b56-4472-db8e2b161aab@nvidia.com --- net/core/dev.c | 1 + 1 file changed, 1 insertion(+)