diff mbox series

[net] vrf: Fix lockdep splat in output path

Message ID 20230715153605.4068066-1-idosch@nvidia.com (mailing list archive)
State Accepted
Commit 2033ab90380d46e0e9f0520fd6776a73d107fd95
Delegated to: Netdev Maintainers
Headers show
Series [net] vrf: Fix lockdep splat in output path | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch warning WARNING: Commit log lines starting with '#' are dropped by git as comments
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ido Schimmel July 15, 2023, 3:36 p.m. UTC
Cited commit converted the neighbour code to use the standard RCU
variant instead of the RCU-bh variant, but the VRF code still uses
rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
code in its IPv4 and IPv6 output paths, resulting in lockdep splats
[1][2]. Can be reproduced using [3].

Fix by switching to rcu_read_lock() / rcu_read_unlock().

[1]
=============================
WARNING: suspicious RCU usage
6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
-----------------------------
include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ping/183:
 #0: ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0
 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030

stack backtrace:
CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xc1/0xf0
 lockdep_rcu_suspicious+0x211/0x3b0
 vrf_output+0x1380/0x2030
 ip_push_pending_frames+0x125/0x2a0
 raw_sendmsg+0x200d/0x33c0
 inet_sendmsg+0xa2/0xe0
 __sys_sendto+0x2aa/0x420
 __x64_sys_sendto+0xe5/0x1c0
 do_syscall_64+0x38/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

[2]
=============================
WARNING: suspicious RCU usage
6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted
-----------------------------
include/net/neighbour.h:302 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by ping6/182:
 #0: ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50
 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310

stack backtrace:
CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xc1/0xf0
 lockdep_rcu_suspicious+0x211/0x3b0
 vrf_output6+0xd32/0x1310
 ip6_local_out+0xb4/0x1a0
 ip6_send_skb+0xbc/0x340
 ip6_push_pending_frames+0xe5/0x110
 rawv6_sendmsg+0x2e6e/0x3e50
 inet_sendmsg+0xa2/0xe0
 __sys_sendto+0x2aa/0x420
 __x64_sys_sendto+0xe5/0x1c0
 do_syscall_64+0x38/0x80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

[3]
#!/bin/bash

ip link add name vrf-red up numtxqueues 2 type vrf table 10
ip link add name swp1 up master vrf-red type dummy
ip address add 192.0.2.1/24 dev swp1
ip address add 2001:db8:1::1/64 dev swp1
ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1
ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1
ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null
ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null

Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
Using the "Link" tag instead of "Closes" since there are two reports in
the link, but I can only reproduce the second.

I believe that the rcu_read_lock_bh() / rcu_read_unlock_bh() in
vrf_finish_direct() can be removed since dev_queue_xmit_nit() uses
rcu_read_lock() / rcu_read_unlock(). I will send a patch to net-next
after confirming it.
---
 drivers/net/vrf.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

David Ahern July 15, 2023, 5:20 p.m. UTC | #1
On 7/15/23 9:36 AM, Ido Schimmel wrote:
> Cited commit converted the neighbour code to use the standard RCU
> variant instead of the RCU-bh variant, but the VRF code still uses
> rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
> code in its IPv4 and IPv6 output paths, resulting in lockdep splats
> [1][2]. Can be reproduced using [3].
> 
> Fix by switching to rcu_read_lock() / rcu_read_unlock().
> 

...

> [3]
> #!/bin/bash
> 
> ip link add name vrf-red up numtxqueues 2 type vrf table 10
> ip link add name swp1 up master vrf-red type dummy
> ip address add 192.0.2.1/24 dev swp1
> ip address add 2001:db8:1::1/64 dev swp1
> ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1
> ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1
> ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null
> ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null
> 
> Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh")
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> Using the "Link" tag instead of "Closes" since there are two reports in
> the link, but I can only reproduce the second.
> 
> I believe that the rcu_read_lock_bh() / rcu_read_unlock_bh() in
> vrf_finish_direct() can be removed since dev_queue_xmit_nit() uses
> rcu_read_lock() / rcu_read_unlock(). I will send a patch to net-next
> after confirming it.
> ---
>  drivers/net/vrf.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>

Thanks, Ido.
Eric Dumazet July 17, 2023, 4:20 p.m. UTC | #2
On Sat, Jul 15, 2023 at 5:37 PM Ido Schimmel <idosch@nvidia.com> wrote:
>
> Cited commit converted the neighbour code to use the standard RCU
> variant instead of the RCU-bh variant, but the VRF code still uses
> rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
> code in its IPv4 and IPv6 output paths, resulting in lockdep splats
> [1][2]. Can be reproduced using [3].
>
> Fix by switching to rcu_read_lock() / rcu_read_unlock().

We might have other bugs like that.

We should get rid of all rcu_read_lock_bh()

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks.
patchwork-bot+netdevbpf@kernel.org July 18, 2023, 11 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sat, 15 Jul 2023 18:36:05 +0300 you wrote:
> Cited commit converted the neighbour code to use the standard RCU
> variant instead of the RCU-bh variant, but the VRF code still uses
> rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup
> code in its IPv4 and IPv6 output paths, resulting in lockdep splats
> [1][2]. Can be reproduced using [3].
> 
> Fix by switching to rcu_read_lock() / rcu_read_unlock().
> 
> [...]

Here is the summary with links:
  - [net] vrf: Fix lockdep splat in output path
    https://git.kernel.org/netdev/net/c/2033ab90380d

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index bdb3a76a352e..6043e63b42f9 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -664,7 +664,7 @@  static int vrf_finish_output6(struct net *net, struct sock *sk,
 	skb->protocol = htons(ETH_P_IPV6);
 	skb->dev = dev;
 
-	rcu_read_lock_bh();
+	rcu_read_lock();
 	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
 	neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
 	if (unlikely(!neigh))
@@ -672,10 +672,10 @@  static int vrf_finish_output6(struct net *net, struct sock *sk,
 	if (!IS_ERR(neigh)) {
 		sock_confirm_neigh(skb, neigh);
 		ret = neigh_output(neigh, skb, false);
-		rcu_read_unlock_bh();
+		rcu_read_unlock();
 		return ret;
 	}
-	rcu_read_unlock_bh();
+	rcu_read_unlock();
 
 	IP6_INC_STATS(dev_net(dst->dev),
 		      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
@@ -889,7 +889,7 @@  static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 		}
 	}
 
-	rcu_read_lock_bh();
+	rcu_read_lock();
 
 	neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
 	if (!IS_ERR(neigh)) {
@@ -898,11 +898,11 @@  static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s
 		sock_confirm_neigh(skb, neigh);
 		/* if crossing protocols, can not use the cached header */
 		ret = neigh_output(neigh, skb, is_v6gw);
-		rcu_read_unlock_bh();
+		rcu_read_unlock();
 		return ret;
 	}
 
-	rcu_read_unlock_bh();
+	rcu_read_unlock();
 	vrf_tx_error(skb->dev, skb);
 	return -EINVAL;
 }