diff mbox series

[net] veth: fix races around rq->rx_notify_masked

Message ID 20220208232822.3432213-1-eric.dumazet@gmail.com (mailing list archive)
State Accepted
Commit 68468d8c4cd4222a4ca1f185ab5a1c14480d078c
Delegated to: Netdev Maintainers
Headers show
Series [net] veth: fix races around rq->rx_notify_masked | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: daniel@iogearbox.net; 5 maintainers not CCed: hawk@kernel.org daniel@iogearbox.net john.fastabend@gmail.com bpf@vger.kernel.org ast@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Feb. 8, 2022, 11:28 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

veth being NETIF_F_LLTX enabled, we need to be more careful
whenever we read/write rq->rx_notify_masked.

BUG: KCSAN: data-race in veth_xmit / veth_xmit

write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0:
 __veth_xdp_flush drivers/net/veth.c:269 [inline]
 veth_xmit+0x307/0x470 drivers/net/veth.c:350
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
 NF_HOOK include/linux/netfilter.h:307 [inline]
 br_forward_finish net/bridge/br_forward.c:66 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
 br_dev_xmit+0x8b6/0x960
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 neigh_hh_output include/net/neighbour.h:525 [inline]
 neigh_output include/net/neighbour.h:539 [inline]
 ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1:
 __veth_xdp_flush drivers/net/veth.c:268 [inline]
 veth_xmit+0x2d6/0x470 drivers/net/veth.c:350
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
 NF_HOOK include/linux/netfilter.h:307 [inline]
 br_forward_finish net/bridge/br_forward.c:66 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
 br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
 br_dev_xmit+0x8b6/0x960
 __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
 netdev_start_xmit include/linux/netdevice.h:4697 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3473
 dev_hard_start_xmit net/core/dev.c:3489 [inline]
 __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
 neigh_hh_output include/net/neighbour.h:525 [inline]
 neigh_output include/net/neighbour.h:539 [inline]
 ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
 ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
 dst_output include/net/dst.h:451 [inline]
 ip_local_out net/ipv4/ip_output.c:126 [inline]
 ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
 udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
 udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00 -> 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 23563 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00064-gc36c04c2e132 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 948d4f214fde ("veth: Add driver XDP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 drivers/net/veth.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Feb. 9, 2022, 12:10 p.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue,  8 Feb 2022 15:28:22 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> veth being NETIF_F_LLTX enabled, we need to be more careful
> whenever we read/write rq->rx_notify_masked.
> 
> BUG: KCSAN: data-race in veth_xmit / veth_xmit
> 
> [...]

Here is the summary with links:
  - [net] veth: fix races around rq->rx_notify_masked
    https://git.kernel.org/netdev/net/c/68468d8c4cd4

You are awesome, thank you!
Toshiaki Makita Feb. 9, 2022, 12:36 p.m. UTC | #2
On 2022/02/09 8:28, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>

Thank you for handling this case.

> veth being NETIF_F_LLTX enabled, we need to be more careful
> whenever we read/write rq->rx_notify_masked.
> 
> BUG: KCSAN: data-race in veth_xmit / veth_xmit
> 
> write to 0xffff888133d9a9f8 of 1 bytes by task 23552 on cpu 0:
>   __veth_xdp_flush drivers/net/veth.c:269 [inline]
>   veth_xmit+0x307/0x470 drivers/net/veth.c:350
>   __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
>   netdev_start_xmit include/linux/netdevice.h:4697 [inline]
>   xmit_one+0x105/0x2f0 net/core/dev.c:3473
>   dev_hard_start_xmit net/core/dev.c:3489 [inline]
>   __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
>   dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
>   br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
>   NF_HOOK include/linux/netfilter.h:307 [inline]
>   br_forward_finish net/bridge/br_forward.c:66 [inline]
>   NF_HOOK include/linux/netfilter.h:307 [inline]
>   __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
>   br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
>   br_dev_xmit+0x8b6/0x960
>   __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
>   netdev_start_xmit include/linux/netdevice.h:4697 [inline]
>   xmit_one+0x105/0x2f0 net/core/dev.c:3473
>   dev_hard_start_xmit net/core/dev.c:3489 [inline]
>   __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
>   dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
>   neigh_hh_output include/net/neighbour.h:525 [inline]
>   neigh_output include/net/neighbour.h:539 [inline]
>   ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
>   ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
>   NF_HOOK_COND include/linux/netfilter.h:296 [inline]
>   ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
>   dst_output include/net/dst.h:451 [inline]
>   ip_local_out net/ipv4/ip_output.c:126 [inline]
>   ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
>   udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
>   udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
>   inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
>   sock_sendmsg_nosec net/socket.c:705 [inline]
>   sock_sendmsg net/socket.c:725 [inline]
>   ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
>   ___sys_sendmsg net/socket.c:2467 [inline]
>   __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
>   __do_sys_sendmmsg net/socket.c:2582 [inline]
>   __se_sys_sendmmsg net/socket.c:2579 [inline]
>   __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> read to 0xffff888133d9a9f8 of 1 bytes by task 23563 on cpu 1:
>   __veth_xdp_flush drivers/net/veth.c:268 [inline]
>   veth_xmit+0x2d6/0x470 drivers/net/veth.c:350
>   __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
>   netdev_start_xmit include/linux/netdevice.h:4697 [inline]
>   xmit_one+0x105/0x2f0 net/core/dev.c:3473
>   dev_hard_start_xmit net/core/dev.c:3489 [inline]
>   __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
>   dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
>   br_dev_queue_push_xmit+0x3ce/0x430 net/bridge/br_forward.c:53
>   NF_HOOK include/linux/netfilter.h:307 [inline]
>   br_forward_finish net/bridge/br_forward.c:66 [inline]
>   NF_HOOK include/linux/netfilter.h:307 [inline]
>   __br_forward+0x2e4/0x400 net/bridge/br_forward.c:115
>   br_flood+0x521/0x5c0 net/bridge/br_forward.c:242
>   br_dev_xmit+0x8b6/0x960
>   __netdev_start_xmit include/linux/netdevice.h:4683 [inline]
>   netdev_start_xmit include/linux/netdevice.h:4697 [inline]
>   xmit_one+0x105/0x2f0 net/core/dev.c:3473
>   dev_hard_start_xmit net/core/dev.c:3489 [inline]
>   __dev_queue_xmit+0x86d/0xf90 net/core/dev.c:4116
>   dev_queue_xmit+0x13/0x20 net/core/dev.c:4149
>   neigh_hh_output include/net/neighbour.h:525 [inline]
>   neigh_output include/net/neighbour.h:539 [inline]
>   ip_finish_output2+0x6f8/0xb70 net/ipv4/ip_output.c:228
>   ip_finish_output+0xfb/0x240 net/ipv4/ip_output.c:316
>   NF_HOOK_COND include/linux/netfilter.h:296 [inline]
>   ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:430
>   dst_output include/net/dst.h:451 [inline]
>   ip_local_out net/ipv4/ip_output.c:126 [inline]
>   ip_send_skb+0x6e/0xe0 net/ipv4/ip_output.c:1570
>   udp_send_skb+0x641/0x880 net/ipv4/udp.c:967
>   udp_sendmsg+0x12ea/0x14c0 net/ipv4/udp.c:1254
>   inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
>   sock_sendmsg_nosec net/socket.c:705 [inline]
>   sock_sendmsg net/socket.c:725 [inline]
>   ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
>   ___sys_sendmsg net/socket.c:2467 [inline]
>   __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
>   __do_sys_sendmmsg net/socket.c:2582 [inline]
>   __se_sys_sendmmsg net/socket.c:2579 [inline]
>   __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> value changed: 0x00 -> 0x01

I'm not familiar with KCSAN.
Does this mean rx_notify_masked value is changed while another CPU is reading it?

If so, I'm not sure there is a problem with that.
At least we could call napi_schedule() twice, but that just causes one extra napi 
poll due to NAPIF_STATE_MISSED, and it happens very rarely?

Toshiaki Makita
Eric Dumazet Feb. 10, 2022, 4:57 p.m. UTC | #3
On Wed, Feb 9, 2022 at 4:36 AM Toshiaki Makita
<toshiaki.makita1@gmail.com> wrote:
>
> On 2022/02/09 8:28, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
>
> Thank you for handling this case.
>
> > veth being NETIF_F_LLTX enabled, we need to be more careful
> > whenever we read/write rq->rx_notify_masked.
> >
> > BUG: KCSAN: data-race in veth_xmit / veth_xmit
> >
> > w
> > value changed: 0x00 -> 0x01
>
> I'm not familiar with KCSAN.
> Does this mean rx_notify_masked value is changed while another CPU is reading it?
>

Yes.

> If so, I'm not sure there is a problem with that.

This is a problem if not annotated properly.

> At least we could call napi_schedule() twice, but that just causes one extra napi
> poll due to NAPIF_STATE_MISSED, and it happens very rarely?
>
> Toshiaki Makita

The issue might be more problematic, a compiler might play bad games,
look for load and store tearing.
Toshiaki Makita Feb. 14, 2022, 3:58 p.m. UTC | #4
On 2022/02/11 1:57, Eric Dumazet wrote:
> On Wed, Feb 9, 2022 at 4:36 AM Toshiaki Makita
> <toshiaki.makita1@gmail.com> wrote:
>>
>> On 2022/02/09 8:28, Eric Dumazet wrote:
>>> From: Eric Dumazet <edumazet@google.com>
>>
>> Thank you for handling this case.
>>
>>> veth being NETIF_F_LLTX enabled, we need to be more careful
>>> whenever we read/write rq->rx_notify_masked.
>>>
>>> BUG: KCSAN: data-race in veth_xmit / veth_xmit
>>>
>>> w
>>> value changed: 0x00 -> 0x01
>>
>> I'm not familiar with KCSAN.
>> Does this mean rx_notify_masked value is changed while another CPU is reading it?
>>
> 
> Yes.
> 
>> If so, I'm not sure there is a problem with that.
> 
> This is a problem if not annotated properly.
> 
>> At least we could call napi_schedule() twice, but that just causes one extra napi
>> poll due to NAPIF_STATE_MISSED, and it happens very rarely?
> 
> The issue might be more problematic, a compiler might play bad games,
> look for load and store tearing.

Thank you.
I assume you mean problems like those listed in this page,
https://lwn.net/Articles/793253/
e.g. "invented stores", not exactly load/store tearing.
(since rx_notify_masked is 0/1 value and seems not able to be teared)

Now I understand that it's pretty hard to know what exact problem will happen as the 
behavior is undefined and depends heavily on compiler implementation.

Thank you for the fix again!

Toshiaki Makita
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 354a963075c5f15d194152ca9b38c59c66763e94..d29fb9759cc9557d62370f016fc160f3dbd16ce3 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -265,9 +265,10 @@  static void __veth_xdp_flush(struct veth_rq *rq)
 {
 	/* Write ptr_ring before reading rx_notify_masked */
 	smp_mb();
-	if (!rq->rx_notify_masked) {
-		rq->rx_notify_masked = true;
-		napi_schedule(&rq->xdp_napi);
+	if (!READ_ONCE(rq->rx_notify_masked) &&
+	    napi_schedule_prep(&rq->xdp_napi)) {
+		WRITE_ONCE(rq->rx_notify_masked, true);
+		__napi_schedule(&rq->xdp_napi);
 	}
 }
 
@@ -912,8 +913,10 @@  static int veth_poll(struct napi_struct *napi, int budget)
 		/* Write rx_notify_masked before reading ptr_ring */
 		smp_store_mb(rq->rx_notify_masked, false);
 		if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) {
-			rq->rx_notify_masked = true;
-			napi_schedule(&rq->xdp_napi);
+			if (napi_schedule_prep(&rq->xdp_napi)) {
+				WRITE_ONCE(rq->rx_notify_masked, true);
+				__napi_schedule(&rq->xdp_napi);
+			}
 		}
 	}