diff mbox series

Revert "vrf: Remove unnecessary RCU-bh critical section"

Message ID 20240923162506.1405109-1-greearb@candelatech.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Revert "vrf: Remove unnecessary RCU-bh critical section" | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 3 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 21 this patch: 21
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 79 this patch: 79
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-09-24--03-00 (tests: 762)

Commit Message

Ben Greear Sept. 23, 2024, 4:25 p.m. UTC
From: Ben Greear <greearb@candelatech.com>

This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.

dev_queue_xmit_nit needs to run with bh locking, otherwise
it conflicts with packets coming in from a nic in softirq
context and packets being transmitted from user context.

================================
WARNING: inconsistent lock state
6.11.0 #1 Tainted: G        W
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
{IN-SOFTIRQ-W} state was registered at:
  lock_acquire+0x19a/0x4f0
  _raw_spin_lock+0x27/0x40
  packet_rcv+0xa33/0x1320
  __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
  __netif_receive_skb_list_core+0x2c9/0x890
  netif_receive_skb_list_internal+0x610/0xcc0
  napi_complete_done+0x1c0/0x7c0
  igb_poll+0x1dbb/0x57e0 [igb]
  __napi_poll.constprop.0+0x99/0x430
  net_rx_action+0x8e7/0xe10
  handle_softirqs+0x1b7/0x800
  __irq_exit_rcu+0x91/0xc0
  irq_exit_rcu+0x5/0x10
  common_interrupt+0x7f/0xa0
  asm_common_interrupt+0x22/0x40
  cpuidle_enter_state+0x289/0x320
  cpuidle_enter+0x45/0xa0
  do_idle+0x2fe/0x3e0
  cpu_startup_entry+0x4b/0x60
  start_secondary+0x201/0x280
  common_startup_64+0x13e/0x148
irq event stamp: 467094363
hardirqs last  enabled at (467094363): [<ffffffff83dc794b>] _raw_spin_unlock_irqrestore+0x2b/0x50
hardirqs last disabled at (467094362): [<ffffffff83dc7753>] _raw_spin_lock_irqsave+0x53/0x60
softirqs last  enabled at (467094360): [<ffffffff83481213>] skb_attempt_defer_free+0x303/0x4e0
softirqs last disabled at (467094358): [<ffffffff83481188>] skb_attempt_defer_free+0x278/0x4e0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(rlock-AF_PACKET);
  <Interrupt>
    lock(rlock-AF_PACKET);

 *** DEADLOCK ***

3 locks held by btserver/134819:
 #0: ffff888136a3bf98 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_recvmsg+0xc7/0x4e0
 #1: ffffffff84e4bc20 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x59/0x1e20
 #2: ffffffff84e4bc20 (rcu_read_lock){....}-{1:2}, at: dev_queue_xmit_nit+0x2a/0xa40

stack backtrace:
CPU: 2 UID: 0 PID: 134819 Comm: btserver Tainted: G        W          6.11.0 #1
Tainted: [W]=WARN
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
Call Trace:
 <TASK>
 dump_stack_lvl+0x73/0xa0
 mark_lock+0x102e/0x16b0
 ? print_usage_bug.part.0+0x600/0x600
 ? print_usage_bug.part.0+0x600/0x600
 ? print_usage_bug.part.0+0x600/0x600
 ? lock_acquire+0x19a/0x4f0
 ? find_held_lock+0x2d/0x110
 __lock_acquire+0x9ae/0x6170
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 lock_acquire+0x19a/0x4f0
 ? tpacket_rcv+0x863/0x3b30
 ? run_filter+0x131/0x300
 ? lock_sync+0x170/0x170
 ? do_syscall_64+0x69/0x160
 ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
 ? lock_is_held_type+0xa5/0x110
 _raw_spin_lock+0x27/0x40
 ? tpacket_rcv+0x863/0x3b30
 tpacket_rcv+0x863/0x3b30
 ? packet_recvmsg+0x1340/0x1340
 ? __asan_memcpy+0x38/0x60
 ? __skb_clone+0x547/0x730
 ? packet_recvmsg+0x1340/0x1340
 dev_queue_xmit_nit+0x709/0xa40
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 vrf_finish_direct+0x26e/0x340 [vrf]
 ? vrf_ip_local_out+0x570/0x570 [vrf]
 vrf_l3_out+0x5f4/0xe80 [vrf]
 __ip_local_out+0x51e/0x7a0
 ? __ip_append_data+0x3d00/0x3d00
 ? __lock_acquire+0x1b57/0x6170
 ? ipv4_dst_check+0xd6/0x150
 ? lock_is_held_type+0xa5/0x110
 __ip_queue_xmit+0x7ff/0x1e20
 __tcp_transmit_skb+0x1699/0x3850
 ? __tcp_select_window+0xfb0/0xfb0
 ? __build_skb_around+0x22f/0x330
 ? __alloc_skb+0x13d/0x2c0
 ? __napi_build_skb+0x40/0x40
 ? __tcp_send_ack.part.0+0x5f/0x690
 ? skb_attempt_defer_free+0x303/0x4e0
 tcp_recvmsg_locked+0xdd1/0x23e0
 ? tcp_recvmsg+0xc7/0x4e0
 ? tcp_update_recv_tstamps+0x1c0/0x1c0
 tcp_recvmsg+0xe5/0x4e0
 ? tcp_recv_timestamp+0x6c0/0x6c0
 inet_recvmsg+0xf0/0x4b0
 ? inet_splice_eof+0xa0/0xa0
 ? inet_splice_eof+0xa0/0xa0
 sock_recvmsg+0xc8/0x150
 ? poll_schedule_timeout.constprop.0+0xe0/0xe0
 sock_read_iter+0x258/0x380
 ? poll_schedule_timeout.constprop.0+0xe0/0xe0
 ? sock_recvmsg+0x150/0x150
 ? rw_verify_area+0x64/0x590
 vfs_read+0x8d5/0xc20
 ? poll_schedule_timeout.constprop.0+0xe0/0xe0
 ? kernel_read+0x50/0x50
 ? __asan_memset+0x1f/0x40
 ? ktime_get_ts64+0x85/0x210
 ? __fget_light+0x4d/0x1d0
 ksys_read+0x166/0x1c0
 ? __ia32_sys_pwrite64+0x1d0/0x1d0
 ? __ia32_sys_poll+0x3e0/0x3e0
 do_syscall_64+0x69/0x160
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f6909b01b92

Signed-off-by: Ben Greear <greearb@candelatech.com>
---
 drivers/net/vrf.c | 2 ++
 net/core/dev.c    | 1 +
 2 files changed, 3 insertions(+)

Comments

Florian Westphal Sept. 23, 2024, 4:55 p.m. UTC | #1
greearb@candelatech.com <greearb@candelatech.com> wrote:
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 4d8ccaf9a2b4..4087f72f0d2b 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
>  		eth_zero_addr(eth->h_dest);
>  		eth->h_proto = skb->protocol;
>  
> +		rcu_read_lock_bh();
>  		dev_queue_xmit_nit(skb, vrf_dev);
> +		rcu_read_unlock_bh();

[..]

> + *	BH must be disabled before calling this.

Can you replace the rcu_read_lock_bh with plain local_bh_enable/disable?
I think that would make more sense.

Otherwise comment should explain why rcu read lock has to be held too,
I see no reason for it.
Willem de Bruijn Sept. 23, 2024, 7:57 p.m. UTC | #2
greearb@ wrote:
> From: Ben Greear <greearb@candelatech.com>
> 
> This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
> 
> dev_queue_xmit_nit needs to run with bh locking, otherwise
> it conflicts with packets coming in from a nic in softirq
> context and packets being transmitted from user context.
> 
> ================================
> WARNING: inconsistent lock state
> 6.11.0 #1 Tainted: G        W
> --------------------------------
> inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> btserver/134819 [HC0[0]:SC0[0]:HE1:SE1] takes:
> ffff8882da30c118 (rlock-AF_PACKET){+.?.}-{2:2}, at: tpacket_rcv+0x863/0x3b30
> {IN-SOFTIRQ-W} state was registered at:
>   lock_acquire+0x19a/0x4f0
>   _raw_spin_lock+0x27/0x40
>   packet_rcv+0xa33/0x1320
>   __netif_receive_skb_core.constprop.0+0xcb0/0x3a90
>   __netif_receive_skb_list_core+0x2c9/0x890
>   netif_receive_skb_list_internal+0x610/0xcc0
>   napi_complete_done+0x1c0/0x7c0
>   igb_poll+0x1dbb/0x57e0 [igb]
>   __napi_poll.constprop.0+0x99/0x430
>   net_rx_action+0x8e7/0xe10
>   handle_softirqs+0x1b7/0x800
>   __irq_exit_rcu+0x91/0xc0
>   irq_exit_rcu+0x5/0x10
>   common_interrupt+0x7f/0xa0
>   asm_common_interrupt+0x22/0x40
>   cpuidle_enter_state+0x289/0x320
>   cpuidle_enter+0x45/0xa0
>   do_idle+0x2fe/0x3e0
>   cpu_startup_entry+0x4b/0x60
>   start_secondary+0x201/0x280
>   common_startup_64+0x13e/0x148
> irq event stamp: 467094363
> hardirqs last  enabled at (467094363): [<ffffffff83dc794b>] _raw_spin_unlock_irqrestore+0x2b/0x50
> hardirqs last disabled at (467094362): [<ffffffff83dc7753>] _raw_spin_lock_irqsave+0x53/0x60
> softirqs last  enabled at (467094360): [<ffffffff83481213>] skb_attempt_defer_free+0x303/0x4e0
> softirqs last disabled at (467094358): [<ffffffff83481188>] skb_attempt_defer_free+0x278/0x4e0
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(rlock-AF_PACKET);
>   <Interrupt>
>     lock(rlock-AF_PACKET);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by btserver/134819:
>  #0: ffff888136a3bf98 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_recvmsg+0xc7/0x4e0
>  #1: ffffffff84e4bc20 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x59/0x1e20
>  #2: ffffffff84e4bc20 (rcu_read_lock){....}-{1:2}, at: dev_queue_xmit_nit+0x2a/0xa40
> 
> stack backtrace:
> CPU: 2 UID: 0 PID: 134819 Comm: btserver Tainted: G        W          6.11.0 #1
> Tainted: [W]=WARN
> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x73/0xa0
>  mark_lock+0x102e/0x16b0
>  ? print_usage_bug.part.0+0x600/0x600
>  ? print_usage_bug.part.0+0x600/0x600
>  ? print_usage_bug.part.0+0x600/0x600
>  ? lock_acquire+0x19a/0x4f0
>  ? find_held_lock+0x2d/0x110
>  __lock_acquire+0x9ae/0x6170
>  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
>  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
>  lock_acquire+0x19a/0x4f0
>  ? tpacket_rcv+0x863/0x3b30
>  ? run_filter+0x131/0x300
>  ? lock_sync+0x170/0x170
>  ? do_syscall_64+0x69/0x160
>  ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  ? lock_is_held_type+0xa5/0x110
>  _raw_spin_lock+0x27/0x40
>  ? tpacket_rcv+0x863/0x3b30
>  tpacket_rcv+0x863/0x3b30
>  ? packet_recvmsg+0x1340/0x1340
>  ? __asan_memcpy+0x38/0x60
>  ? __skb_clone+0x547/0x730
>  ? packet_recvmsg+0x1340/0x1340
>  dev_queue_xmit_nit+0x709/0xa40
>  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
>  vrf_finish_direct+0x26e/0x340 [vrf]
>  ? vrf_ip_local_out+0x570/0x570 [vrf]
>  vrf_l3_out+0x5f4/0xe80 [vrf]
>  __ip_local_out+0x51e/0x7a0
>  ? __ip_append_data+0x3d00/0x3d00
>  ? __lock_acquire+0x1b57/0x6170
>  ? ipv4_dst_check+0xd6/0x150
>  ? lock_is_held_type+0xa5/0x110
>  __ip_queue_xmit+0x7ff/0x1e20
>  __tcp_transmit_skb+0x1699/0x3850
>  ? __tcp_select_window+0xfb0/0xfb0
>  ? __build_skb_around+0x22f/0x330
>  ? __alloc_skb+0x13d/0x2c0
>  ? __napi_build_skb+0x40/0x40
>  ? __tcp_send_ack.part.0+0x5f/0x690
>  ? skb_attempt_defer_free+0x303/0x4e0
>  tcp_recvmsg_locked+0xdd1/0x23e0
>  ? tcp_recvmsg+0xc7/0x4e0
>  ? tcp_update_recv_tstamps+0x1c0/0x1c0
>  tcp_recvmsg+0xe5/0x4e0
>  ? tcp_recv_timestamp+0x6c0/0x6c0
>  inet_recvmsg+0xf0/0x4b0
>  ? inet_splice_eof+0xa0/0xa0
>  ? inet_splice_eof+0xa0/0xa0
>  sock_recvmsg+0xc8/0x150
>  ? poll_schedule_timeout.constprop.0+0xe0/0xe0
>  sock_read_iter+0x258/0x380
>  ? poll_schedule_timeout.constprop.0+0xe0/0xe0
>  ? sock_recvmsg+0x150/0x150
>  ? rw_verify_area+0x64/0x590
>  vfs_read+0x8d5/0xc20
>  ? poll_schedule_timeout.constprop.0+0xe0/0xe0
>  ? kernel_read+0x50/0x50
>  ? __asan_memset+0x1f/0x40
>  ? ktime_get_ts64+0x85/0x210
>  ? __fget_light+0x4d/0x1d0
>  ksys_read+0x166/0x1c0
>  ? __ia32_sys_pwrite64+0x1d0/0x1d0
>  ? __ia32_sys_poll+0x3e0/0x3e0
>  do_syscall_64+0x69/0x160
>  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f6909b01b92

Can compress output to short relevant part

Please also add a Link: to the previous conversation

and a Fixes: to the commit that introduced this
and a Cc: stable@vger.kernel.org

Mark the subject PATCH net


> 
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
>  drivers/net/vrf.c | 2 ++
>  net/core/dev.c    | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 4d8ccaf9a2b4..4087f72f0d2b 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
>  		eth_zero_addr(eth->h_dest);
>  		eth->h_proto = skb->protocol;
>  
> +		rcu_read_lock_bh();
>  		dev_queue_xmit_nit(skb, vrf_dev);
> +		rcu_read_unlock_bh();
>  
>  		skb_pull(skb, ETH_HLEN);
>  	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd479f5f22f6..566e69a38eed 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2285,6 +2285,7 @@ EXPORT_SYMBOL_GPL(dev_nit_active);
>  /*
>   *	Support routine. Sends outgoing frames to any network
>   *	taps currently in use.
> + *	BH must be disabled before calling this.
>   */
>  
>  void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
> -- 
> 2.42.0
>
Willem de Bruijn Sept. 23, 2024, 7:58 p.m. UTC | #3
Florian Westphal wrote:
> greearb@candelatech.com <greearb@candelatech.com> wrote:
> > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> > index 4d8ccaf9a2b4..4087f72f0d2b 100644
> > --- a/drivers/net/vrf.c
> > +++ b/drivers/net/vrf.c
> > @@ -608,7 +608,9 @@ static void vrf_finish_direct(struct sk_buff *skb)
> >  		eth_zero_addr(eth->h_dest);
> >  		eth->h_proto = skb->protocol;
> >  
> > +		rcu_read_lock_bh();
> >  		dev_queue_xmit_nit(skb, vrf_dev);
> > +		rcu_read_unlock_bh();
> 
> [..]
> 
> > + *	BH must be disabled before calling this.
> 
> Can you replace the rcu_read_lock_bh with plain local_bh_enable/disable?
> I think that would make more sense.
> 
> Otherwise comment should explain why rcu read lock has to be held too,
> I see no reason for it.

This path should duplicate how __dev_queue_xmit calls
dev_queue_xmit_nit.
diff mbox series

Patch

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4d8ccaf9a2b4..4087f72f0d2b 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -608,7 +608,9 @@  static void vrf_finish_direct(struct sk_buff *skb)
 		eth_zero_addr(eth->h_dest);
 		eth->h_proto = skb->protocol;
 
+		rcu_read_lock_bh();
 		dev_queue_xmit_nit(skb, vrf_dev);
+		rcu_read_unlock_bh();
 
 		skb_pull(skb, ETH_HLEN);
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index cd479f5f22f6..566e69a38eed 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2285,6 +2285,7 @@  EXPORT_SYMBOL_GPL(dev_nit_active);
 /*
  *	Support routine. Sends outgoing frames to any network
  *	taps currently in use.
+ *	BH must be disabled before calling this.
  */
 
 void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)