diff mbox series

[net] virtio-net: fix race between set queues and probe

Message ID 20230725072049.617289-1-jasowang@redhat.com (mailing list archive)
State Accepted
Commit 25266128fe16d5632d43ada34c847d7b8daba539
Delegated to: Netdev Maintainers
Headers show
Series [net] virtio-net: fix race between set queues and probe | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Wang July 25, 2023, 7:20 a.m. UTC
A race were found where set_channels could be called after registering
but before virtnet_set_queues() in virtnet_probe(). Fixing this by
moving the virtnet_set_queues() before netdevice registering. While at
it, use _virtnet_set_queues() to avoid holding rtnl as the device is
not even registered at that time.

Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Michael S. Tsirkin July 25, 2023, 7:53 a.m. UTC | #1
On Tue, Jul 25, 2023 at 03:20:49AM -0400, Jason Wang wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
> 
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang <jasowang@redhat.com>


Acked-by: Michael S. Tsirkin <mst@redhat.com>

stable material I guess?

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0db14f6b87d3..1270c8d23463 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (vi->has_rss || vi->has_rss_hash_report)
>  		virtnet_init_default_rss(vi);
>  
> +	_virtnet_set_queues(vi, vi->curr_queue_pairs);
> +
>  	/* serialize netdev register + virtio_device_ready() with ndo_open() */
>  	rtnl_lock();
>  
> @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free_unregister_netdev;
>  	}
>  
> -	virtnet_set_queues(vi, vi->curr_queue_pairs);
> -
>  	/* Assume link up if device can't report link status,
>  	   otherwise get link status from config. */
>  	netif_carrier_off(dev);
> -- 
> 2.39.3
Xuan Zhuo July 25, 2023, 11:10 a.m. UTC | #2
On Tue, 25 Jul 2023 03:20:49 -0400, Jason Wang <jasowang@redhat.com> wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
>
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0db14f6b87d3..1270c8d23463 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (vi->has_rss || vi->has_rss_hash_report)
>  		virtnet_init_default_rss(vi);
>
> +	_virtnet_set_queues(vi, vi->curr_queue_pairs);
> +
>  	/* serialize netdev register + virtio_device_ready() with ndo_open() */
>  	rtnl_lock();
>
> @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free_unregister_netdev;
>  	}
>
> -	virtnet_set_queues(vi, vi->curr_queue_pairs);
> -
>  	/* Assume link up if device can't report link status,
>  	   otherwise get link status from config. */
>  	netif_carrier_off(dev);
> --
> 2.39.3
>
Jason Wang July 26, 2023, 1:54 a.m. UTC | #3
On Tue, Jul 25, 2023 at 3:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jul 25, 2023 at 03:20:49AM -0400, Jason Wang wrote:
> > A race were found where set_channels could be called after registering
> > but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> > moving the virtnet_set_queues() before netdevice registering. While at
> > it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> > not even registered at that time.
> >
> > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> stable material I guess?

Yes it is.

Thanks

>
> > ---
> >  drivers/net/virtio_net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 0db14f6b87d3..1270c8d23463 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> >       if (vi->has_rss || vi->has_rss_hash_report)
> >               virtnet_init_default_rss(vi);
> >
> > +     _virtnet_set_queues(vi, vi->curr_queue_pairs);
> > +
> >       /* serialize netdev register + virtio_device_ready() with ndo_open() */
> >       rtnl_lock();
> >
> > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> >               goto free_unregister_netdev;
> >       }
> >
> > -     virtnet_set_queues(vi, vi->curr_queue_pairs);
> > -
> >       /* Assume link up if device can't report link status,
> >          otherwise get link status from config. */
> >       netif_carrier_off(dev);
> > --
> > 2.39.3
>
patchwork-bot+netdevbpf@kernel.org July 27, 2023, 5:20 a.m. UTC | #4
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 25 Jul 2023 03:20:49 -0400 you wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
> 
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> [...]

Here is the summary with links:
  - [net] virtio-net: fix race between set queues and probe
    https://git.kernel.org/netdev/net/c/25266128fe16

You are awesome, thank you!
Dragos Tatulea Aug. 8, 2023, 11:35 a.m. UTC | #5
On Tue, 2023-07-25 at 03:20 -0400, Jason Wang wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
> 
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0db14f6b87d3..1270c8d23463 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>         if (vi->has_rss || vi->has_rss_hash_report)
>                 virtnet_init_default_rss(vi);
>  
> +       _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +
>         /* serialize netdev register + virtio_device_ready() with ndo_open()
> */
>         rtnl_lock();
>  
> @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>                 goto free_unregister_netdev;
>         }
>  
> -       virtnet_set_queues(vi, vi->curr_queue_pairs);
> -
>         /* Assume link up if device can't report link status,
>            otherwise get link status from config. */
>         netif_carrier_off(dev);

This change seems to break mlx5_vdpa when using virtio_vdpa.
_virtnet_set_queues() hangs. Because DRIVER_OK has not yet been set, mlx5_vdpa
cvq kick handler will ignore any commands.

Output:
[  199.681445] mlx5_core 0000:08:00.0: E-Switch: Enable: mode(OFFLOADS),
nvfs(1), necvfs(0), active vports(2)
[  199.690154] mlx5_core 0000:08:00.2: firmware version: 22.38.458
[  199.862816] mlx5_core 0000:08:00.2: Rate limit: 127 rates are supported,
range: 0Mbps to 97656Mbps
[  199.895653] mlx5_core 0000:08:00.2: Assigned random MAC address
b2:a1:71:c0:28:a2
[  200.036900] mlx5_core 0000:08:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048)
RxCqeCmprss(0 enhanced)
[  202.215022] mlx5_core 0000:08:00.2: mlx5_vdpa_reset:2813:(pid 1294):
performing device reset
[  223.228309] rcu: INFO: rcu_sched self-detected stall on CPU
[  223.228870] rcu:    3-....: (5250 ticks this GP)
idle=1cdc/1/0x4000000000000000 softirq=1981/1981 fqs=2064
[  223.229686] rcu:    (t=5251 jiffies g=7061 q=4365 ncpus=10)
[  223.230165] CPU: 3 PID: 1294 Comm: modprobe Not tainted 6.5.0-rc4+ #32
[  223.230725] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-
1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  223.231657] RIP: 0010:virtqueue_get_buf_ctx+0x7/0x290
[  223.232141] Code: ...
[  223.233639] RSP: 0018:ffff888144fe7898 EFLAGS: 00000246
[  223.234108] RAX: 0000000000000000 RBX: ffff888103327940 RCX: ffff888100eca000
[  223.234711] RDX: 0000000000000000 RSI: ffff888144fe78ac RDI: ffff888108c3b300
[  223.235315] RBP: ffff888144fe78d0 R08: 0000000000000001 R09: ffff888103327940
[  223.235921] R10: 0000000000000003 R11: 0000000000000008 R12: 0000000000000002
[  223.236536] R13: 0000000000000004 R14: ffff8881037c1400 R15: ffff88810337f740
[  223.237144] FS:  00007f55572f8740(0000) GS:ffff88846f980000(0000)
knlGS:0000000000000000
[  223.237877] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  223.238401] CR2: 00007f5556d05c00 CR3: 000000012df3a002 CR4: 0000000000370ea0
[  223.239016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  223.239630] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  223.240265] Call Trace:
[  223.240570]  <IRQ>
[  223.240823]  ? rcu_dump_cpu_stacks+0xc4/0x100
[  223.241250]  ? rcu_sched_clock_irq+0x407/0xaf0
[  223.241677]  ? trigger_load_balance+0x62/0x380
[  223.242105]  ? update_process_times+0x5b/0x90
[  223.242527]  ? tick_sched_timer+0x8b/0xa0
[  223.242926]  ? tick_sched_do_timer+0x80/0x80
[  223.243346]  ? __hrtimer_run_queues+0x121/0x270
[  223.243777]  ? hrtimer_interrupt+0xf4/0x230
[  223.244193]  ? __sysvec_apic_timer_interrupt+0x52/0xf0
[  223.244684]  ? sysvec_apic_timer_interrupt+0x69/0x90
[  223.245150]  </IRQ>
[  223.245412]  <TASK>
[  223.245668]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[  223.246160]  ? virtqueue_get_buf_ctx+0x7/0x290
[  223.246588]  virtnet_send_command+0x113/0x170
[  223.247008]  ? virtnet_set_affinity+0x166/0x1b0
[  223.247441]  _virtnet_set_queues+0x9f/0x100
[  223.247850]  virtnet_probe+0xa27/0x1270
[  223.248242]  ? vdpa_get_config+0x5b/0x70 [vdpa]
[  223.248699]  ? virtio_vdpa_notify_with_data+0x30/0x30 [virtio_vdpa]
[  223.249251]  virtio_dev_probe+0x196/0x300
[  223.249646]  really_probe+0xc3/0x3d0
[  223.250012]  ? driver_probe_device+0x90/0x90
[  223.250425]  __driver_probe_device+0x80/0x160
[  223.250846]  driver_probe_device+0x1f/0x90
[  223.251247]  __device_attach_driver+0x7d/0x100
[  223.251674]  bus_for_each_drv+0x7d/0xd0
[  223.252055]  __device_attach+0xb2/0x1c0
[  223.252455]  bus_probe_device+0x86/0xa0
[  223.252839]  device_add+0x65b/0x870
[  223.253200]  ? mlx5_vdpa_set_status+0x2a/0x200 [mlx5_vdpa]
[  223.253699]  ? virtio_vdpa_notify_with_data+0x30/0x30 [virtio_vdpa]
[  223.254255]  register_virtio_device+0xb5/0xf0
[  223.254678]  virtio_vdpa_probe+0xae/0xf0 [virtio_vdpa]
[  223.255156]  really_probe+0xc3/0x3d0
[  223.255518]  ? __device_attach_driver+0x100/0x100
[  223.255962]  __driver_probe_device+0x80/0x160
[  223.256409]  driver_probe_device+0x1f/0x90
[  223.256812]  __driver_attach+0xe9/0x1b0
[  223.257190]  bus_for_each_dev+0x70/0xc0
[  223.257574]  bus_add_driver+0xf1/0x220
[  223.257954]  driver_register+0x55/0xf0
[  223.258330]  ? 0xffffffffa0647000
[  223.258675]  do_one_initcall+0x4c/0x1e0
[  223.259060]  ? kmalloc_trace+0x26/0x80
[  223.259438]  do_init_module+0x88/0x260
[  223.259813]  init_module_from_file+0x86/0xc0
[  223.260245]  __x64_sys_finit_module+0x16d/0x2e0
[  223.260695]  do_syscall_64+0x3d/0x90
[  223.261063]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  223.261532] RIP: 0033:0x7f5556d2bd2d
[  223.261894] Code: ...
[  223.263415] RSP: 002b:00007ffc1ffcd908 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[  223.264113] RAX: ffffffffffffffda RBX: 00005642fa82caf0 RCX: 00007f5556d2bd2d
[  223.264736] RDX: 0000000000000000 RSI: 00005642f8ca3fc9 RDI: 0000000000000003
[  223.265348] RBP: 00007ffc1ffcd9c0 R08: 0000000000000000 R09: 00007ffc1ffcd950
[  223.265966] R10: 0000000000000003 R11: 0000000000000246 R12: 00005642f8ca3fc9
[  223.266581] R13: 0000000000040000 R14: 00005642fa82cc20 R15: 0000000000000000
[  223.267192]  </TASK>

Thanks,
Dragos
Jason Wang Aug. 9, 2023, 1:43 a.m. UTC | #6
On Tue, Aug 8, 2023 at 7:35 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Tue, 2023-07-25 at 03:20 -0400, Jason Wang wrote:
> > A race were found where set_channels could be called after registering
> > but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> > moving the virtnet_set_queues() before netdevice registering. While at
> > it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> > not even registered at that time.
> >
> > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/net/virtio_net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 0db14f6b87d3..1270c8d23463 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> >         if (vi->has_rss || vi->has_rss_hash_report)
> >                 virtnet_init_default_rss(vi);
> >
> > +       _virtnet_set_queues(vi, vi->curr_queue_pairs);
> > +
> >         /* serialize netdev register + virtio_device_ready() with ndo_open()
> > */
> >         rtnl_lock();
> >
> > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> >                 goto free_unregister_netdev;
> >         }
> >
> > -       virtnet_set_queues(vi, vi->curr_queue_pairs);
> > -
> >         /* Assume link up if device can't report link status,
> >            otherwise get link status from config. */
> >         netif_carrier_off(dev);
>
> This change seems to break mlx5_vdpa when using virtio_vdpa.
> _virtnet_set_queues() hangs. Because DRIVER_OK has not yet been set, mlx5_vdpa
> cvq kick handler will ignore any commands.
>

Yes, I will post a fix soon.

Thanks
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0db14f6b87d3..1270c8d23463 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -4219,6 +4219,8 @@  static int virtnet_probe(struct virtio_device *vdev)
 	if (vi->has_rss || vi->has_rss_hash_report)
 		virtnet_init_default_rss(vi);
 
+	_virtnet_set_queues(vi, vi->curr_queue_pairs);
+
 	/* serialize netdev register + virtio_device_ready() with ndo_open() */
 	rtnl_lock();
 
@@ -4257,8 +4259,6 @@  static int virtnet_probe(struct virtio_device *vdev)
 		goto free_unregister_netdev;
 	}
 
-	virtnet_set_queues(vi, vi->curr_queue_pairs);
-
 	/* Assume link up if device can't report link status,
 	   otherwise get link status from config. */
 	netif_carrier_off(dev);