Message ID | 20240115012918.3081203-1-yanjun.zhu@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/1] virtio_net: Add timeout handler to avoid kernel hang | expand |
On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Some devices emulate the virtio_net hardwares. When virtio_net > driver sends commands to the emulated hardware, normally the > hardware needs time to response. Sometimes the time is very > long. Thus, the following will appear. Then the whole system > will hang. > The similar problems also occur in Intel NICs and Mellanox NICs. > As such, the similar solution is borrowed from them. A timeout > value is added and the timeout value as large as possible is set > to ensure that the driver gets the maximum possible response from > the hardware. > > " > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > [ 213.796194] irq event stamp: 67740 > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 213.796247] PKRU: 55555554 > [ 213.796249] Call Trace: > [ 213.796250] <IRQ> > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > [ 213.796282] </IRQ> > [ 213.796284] <TASK> > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > [ 213.796328] virtio_dev_probe+0x195/0x230 > [ 213.796333] really_probe+0x19f/0x400 > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > [ 213.796340] __driver_probe_device+0x78/0x160 > [ 213.796343] driver_probe_device+0x1f/0x90 > [ 213.796346] __driver_attach+0xd6/0x1d0 > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > [ 213.796355] bus_add_driver+0x119/0x220 > [ 213.796359] driver_register+0x59/0x100 > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > [ 213.796375] do_one_initcall+0x6f/0x380 > [ 213.796384] do_init_module+0x60/0x240 > [ 213.796388] init_module_from_file+0x86/0xc0 > [ 213.796396] idempotent_init_module+0x129/0x2c0 > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > [ 213.796409] do_syscall_64+0x60/0xe0 > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 213.796435] RIP: 0033:0x7f7758f279cd > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > [ 213.796482] </TASK> > " > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/net/virtio_net.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 51b1868d2f22..28b7dd917a43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > { > struct scatterlist *sgs[4], hdr, stat; > unsigned out_num = 0, tmp; > - int ret; > + int ret, timeout = 200; Any reason we choose this value or how can we know it works for all types of devices? A more easy way is to use cond_resched() but it may have side effects as well[1]. But it seems less intrusive anyhow than the proposal here? Thanks [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > * into the hypervisor, so the request should be handled immediately. > */ > while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > + !virtqueue_is_broken(vi->cvq)) { > + if (timeout) > + timeout--; > + else > + return false; /* long time no response */ > + > cpu_relax(); > + } > > return vi->ctrl->status == VIRTIO_NET_OK; > } > -- > 2.42.0 >
在 2024/1/15 10:20, Jason Wang 写道: > On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote: >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> Some devices emulate the virtio_net hardwares. When virtio_net >> driver sends commands to the emulated hardware, normally the >> hardware needs time to response. Sometimes the time is very >> long. Thus, the following will appear. Then the whole system >> will hang. >> The similar problems also occur in Intel NICs and Mellanox NICs. >> As such, the similar solution is borrowed from them. A timeout >> value is added and the timeout value as large as possible is set >> to ensure that the driver gets the maximum possible response from >> the hardware. >> >> " >> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] >> [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >> [ 213.796194] irq event stamp: 67740 >> [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >> [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 >> [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 >> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b >> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 >> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 >> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 >> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 >> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 >> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 >> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 >> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 >> [ 213.796247] PKRU: 55555554 >> [ 213.796249] Call Trace: >> [ 213.796250] <IRQ> >> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >> [ 213.796282] </IRQ> >> [ 213.796284] <TASK> >> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >> [ 213.796328] virtio_dev_probe+0x195/0x230 >> [ 213.796333] really_probe+0x19f/0x400 >> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >> [ 213.796340] __driver_probe_device+0x78/0x160 >> [ 213.796343] driver_probe_device+0x1f/0x90 >> [ 213.796346] __driver_attach+0xd6/0x1d0 >> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >> [ 213.796355] bus_add_driver+0x119/0x220 >> [ 213.796359] driver_register+0x59/0x100 >> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >> [ 213.796375] do_one_initcall+0x6f/0x380 >> [ 213.796384] do_init_module+0x60/0x240 >> [ 213.796388] init_module_from_file+0x86/0xc0 >> [ 213.796396] idempotent_init_module+0x129/0x2c0 >> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >> [ 213.796409] do_syscall_64+0x60/0xe0 >> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 213.796435] RIP: 0033:0x7f7758f279cd >> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 >> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 >> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd >> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f >> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 >> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 >> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 >> [ 213.796482] </TASK> >> " >> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> drivers/net/virtio_net.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 51b1868d2f22..28b7dd917a43 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> { >> struct scatterlist *sgs[4], hdr, stat; >> unsigned out_num = 0, tmp; >> - int ret; >> + int ret, timeout = 200; > Any reason we choose this value or how can we know it works for all > types of devices? Sorry. Resend it again because it is rejected by netdev. As I mentioned in the commit log, the similar problem also occurs in Intel NIC driver and Mellanox NIC driver. This commit is borrowed from the solution of Intel NIC driver. So the value is also from Intel NIC driver solution. > > A more easy way is to use cond_resched() but it may have side effects > as well[1]. But it seems less intrusive anyhow than the proposal here? Thanks a lot for your suggestions. I have made tests with the commits in the link https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html. Because virtio_net driver spins waiting for the response of hardware, virtio_net driver can not be unloaded and kernel hang still occurs when running "ip link" or unloading virtio_net module. Zhu Yanjun > > Thanks > > [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html > >> /* Caller should know better */ >> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> * into the hypervisor, so the request should be handled immediately. >> */ >> while (!virtqueue_get_buf(vi->cvq, &tmp) && >> - !virtqueue_is_broken(vi->cvq)) >> + !virtqueue_is_broken(vi->cvq)) { >> + if (timeout) >> + timeout--; >> + else >> + return false; /* long time no response */ >> + >> cpu_relax(); >> + } >> >> return vi->ctrl->status == VIRTIO_NET_OK; >> } >> -- >> 2.42.0 >>
On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Some devices emulate the virtio_net hardwares. When virtio_net > driver sends commands to the emulated hardware, normally the > hardware needs time to response. Sometimes the time is very > long. Thus, the following will appear. Then the whole system > will hang. > The similar problems also occur in Intel NICs and Mellanox NICs. > As such, the similar solution is borrowed from them. A timeout > value is added and the timeout value as large as possible is set > to ensure that the driver gets the maximum possible response from > the hardware. > > " > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > [ 213.796194] irq event stamp: 67740 > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 213.796247] PKRU: 55555554 > [ 213.796249] Call Trace: > [ 213.796250] <IRQ> > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > [ 213.796282] </IRQ> > [ 213.796284] <TASK> > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > [ 213.796328] virtio_dev_probe+0x195/0x230 > [ 213.796333] really_probe+0x19f/0x400 > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > [ 213.796340] __driver_probe_device+0x78/0x160 > [ 213.796343] driver_probe_device+0x1f/0x90 > [ 213.796346] __driver_attach+0xd6/0x1d0 > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > [ 213.796355] bus_add_driver+0x119/0x220 > [ 213.796359] driver_register+0x59/0x100 > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > [ 213.796375] do_one_initcall+0x6f/0x380 > [ 213.796384] do_init_module+0x60/0x240 > [ 213.796388] init_module_from_file+0x86/0xc0 > [ 213.796396] idempotent_init_module+0x129/0x2c0 > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > [ 213.796409] do_syscall_64+0x60/0xe0 > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 213.796435] RIP: 0033:0x7f7758f279cd > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > [ 213.796482] </TASK> > " > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/net/virtio_net.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 51b1868d2f22..28b7dd917a43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > { > struct scatterlist *sgs[4], hdr, stat; > unsigned out_num = 0, tmp; > - int ret; > + int ret, timeout = 200; > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > * into the hypervisor, so the request should be handled immediately. > */ > while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > + !virtqueue_is_broken(vi->cvq)) { > + if (timeout) > + timeout--; This is not really a timeout, just a loop counter. 200 iterations could be a very short time on reasonable H/W. I guess this avoid the soft lockup, but possibly (likely?) breaks the functionality when we need to loop for some non negligible time. I fear we need a more complex solution, as mentioned by Micheal in the thread you quoted. Cheers, Paolo
在 2024/1/16 20:04, Paolo Abeni 写道: > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: >> From: Zhu Yanjun <yanjun.zhu@linux.dev> >> >> Some devices emulate the virtio_net hardwares. When virtio_net >> driver sends commands to the emulated hardware, normally the >> hardware needs time to response. Sometimes the time is very >> long. Thus, the following will appear. Then the whole system >> will hang. >> The similar problems also occur in Intel NICs and Mellanox NICs. >> As such, the similar solution is borrowed from them. A timeout >> value is added and the timeout value as large as possible is set >> to ensure that the driver gets the maximum possible response from >> the hardware. >> >> " >> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] >> [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >> [ 213.796194] irq event stamp: 67740 >> [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >> [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 >> [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 >> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b >> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 >> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 >> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 >> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 >> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 >> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 >> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 >> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 >> [ 213.796247] PKRU: 55555554 >> [ 213.796249] Call Trace: >> [ 213.796250] <IRQ> >> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >> [ 213.796282] </IRQ> >> [ 213.796284] <TASK> >> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >> [ 213.796328] virtio_dev_probe+0x195/0x230 >> [ 213.796333] really_probe+0x19f/0x400 >> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >> [ 213.796340] __driver_probe_device+0x78/0x160 >> [ 213.796343] driver_probe_device+0x1f/0x90 >> [ 213.796346] __driver_attach+0xd6/0x1d0 >> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >> [ 213.796355] bus_add_driver+0x119/0x220 >> [ 213.796359] driver_register+0x59/0x100 >> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >> [ 213.796375] do_one_initcall+0x6f/0x380 >> [ 213.796384] do_init_module+0x60/0x240 >> [ 213.796388] init_module_from_file+0x86/0xc0 >> [ 213.796396] idempotent_init_module+0x129/0x2c0 >> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >> [ 213.796409] do_syscall_64+0x60/0xe0 >> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >> [ 213.796435] RIP: 0033:0x7f7758f279cd >> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 >> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 >> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd >> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f >> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 >> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 >> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 >> [ 213.796482] </TASK> >> " >> >> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >> --- >> drivers/net/virtio_net.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 51b1868d2f22..28b7dd917a43 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> { >> struct scatterlist *sgs[4], hdr, stat; >> unsigned out_num = 0, tmp; >> - int ret; >> + int ret, timeout = 200; >> >> /* Caller should know better */ >> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> * into the hypervisor, so the request should be handled immediately. >> */ >> while (!virtqueue_get_buf(vi->cvq, &tmp) && >> - !virtqueue_is_broken(vi->cvq)) >> + !virtqueue_is_broken(vi->cvq)) { >> + if (timeout) >> + timeout--; > This is not really a timeout, just a loop counter. 200 iterations could > be a very short time on reasonable H/W. I guess this avoid the soft > lockup, but possibly (likely?) breaks the functionality when we need to > loop for some non negligible time. > > I fear we need a more complex solution, as mentioned by Micheal in the > thread you quoted. Got it. I also look forward to the more complex solution to this problem. Zhu Yanjun > > Cheers, > > Paolo >
On Tue, Jan 16, 2024 at 01:04:49PM +0100, Paolo Abeni wrote: > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > > > Some devices emulate the virtio_net hardwares. When virtio_net > > driver sends commands to the emulated hardware, normally the > > hardware needs time to response. Sometimes the time is very > > long. Thus, the following will appear. Then the whole system > > will hang. > > The similar problems also occur in Intel NICs and Mellanox NICs. > > As such, the similar solution is borrowed from them. A timeout > > value is added and the timeout value as large as possible is set > > to ensure that the driver gets the maximum possible response from > > the hardware. > > > > " > > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > > [ 213.796194] irq event stamp: 67740 > > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > [ 213.796247] PKRU: 55555554 > > [ 213.796249] Call Trace: > > [ 213.796250] <IRQ> > > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > > [ 213.796282] </IRQ> > > [ 213.796284] <TASK> > > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > > [ 213.796328] virtio_dev_probe+0x195/0x230 > > [ 213.796333] really_probe+0x19f/0x400 > > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > > [ 213.796340] __driver_probe_device+0x78/0x160 > > [ 213.796343] driver_probe_device+0x1f/0x90 > > [ 213.796346] __driver_attach+0xd6/0x1d0 > > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > > [ 213.796355] bus_add_driver+0x119/0x220 > > [ 213.796359] driver_register+0x59/0x100 > > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > > [ 213.796375] do_one_initcall+0x6f/0x380 > > [ 213.796384] do_init_module+0x60/0x240 > > [ 213.796388] init_module_from_file+0x86/0xc0 > > [ 213.796396] idempotent_init_module+0x129/0x2c0 > > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > > [ 213.796409] do_syscall_64+0x60/0xe0 > > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > [ 213.796435] RIP: 0033:0x7f7758f279cd > > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > > [ 213.796482] </TASK> > > " > > > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > > --- > > drivers/net/virtio_net.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 51b1868d2f22..28b7dd917a43 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > { > > struct scatterlist *sgs[4], hdr, stat; > > unsigned out_num = 0, tmp; > > - int ret; > > + int ret, timeout = 200; > > > > /* Caller should know better */ > > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > * into the hypervisor, so the request should be handled immediately. > > */ > > while (!virtqueue_get_buf(vi->cvq, &tmp) && > > - !virtqueue_is_broken(vi->cvq)) > > + !virtqueue_is_broken(vi->cvq)) { > > + if (timeout) > > + timeout--; > > This is not really a timeout, just a loop counter. 200 iterations could > be a very short time on reasonable H/W. I guess this avoid the soft > lockup, but possibly (likely?) breaks the functionality when we need to > loop for some non negligible time. > > I fear we need a more complex solution, as mentioned by Micheal in the > thread you quoted. > > Cheers, > > Paolo So a bunch of devices just queue a rquest on a workqueue. Problem is, errors are not reported then. I really feel this is something netdev core should support better: return a "deferred" status and have netdev core wait outside a lock on a waitqueue.
On Thu, 18 Jan 2024 08:14:21 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Tue, Jan 16, 2024 at 01:04:49PM +0100, Paolo Abeni wrote: > > On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > > > > > Some devices emulate the virtio_net hardwares. When virtio_net > > > driver sends commands to the emulated hardware, normally the > > > hardware needs time to response. Sometimes the time is very > > > long. Thus, the following will appear. Then the whole system > > > will hang. > > > The similar problems also occur in Intel NICs and Mellanox NICs. > > > As such, the similar solution is borrowed from them. A timeout > > > value is added and the timeout value as large as possible is set > > > to ensure that the driver gets the maximum possible response from > > > the hardware. > > > > > > " > > > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > > > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > > > [ 213.796194] irq event stamp: 67740 > > > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > > > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > > > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > > > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > > > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > > > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > > > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > > > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > > > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > > > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > > > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > > > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > > > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > > > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > > > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > > [ 213.796247] PKRU: 55555554 > > > [ 213.796249] Call Trace: > > > [ 213.796250] <IRQ> > > > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > > > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > > > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > > > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > > > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > > > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > > > [ 213.796282] </IRQ> > > > [ 213.796284] <TASK> > > > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > > > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > > > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > > > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > > > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > > > [ 213.796328] virtio_dev_probe+0x195/0x230 > > > [ 213.796333] really_probe+0x19f/0x400 > > > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > > > [ 213.796340] __driver_probe_device+0x78/0x160 > > > [ 213.796343] driver_probe_device+0x1f/0x90 > > > [ 213.796346] __driver_attach+0xd6/0x1d0 > > > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > > > [ 213.796355] bus_add_driver+0x119/0x220 > > > [ 213.796359] driver_register+0x59/0x100 > > > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > > > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > > > [ 213.796375] do_one_initcall+0x6f/0x380 > > > [ 213.796384] do_init_module+0x60/0x240 > > > [ 213.796388] init_module_from_file+0x86/0xc0 > > > [ 213.796396] idempotent_init_module+0x129/0x2c0 > > > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > > > [ 213.796409] do_syscall_64+0x60/0xe0 > > > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > > > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > > > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > > > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > > > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > [ 213.796435] RIP: 0033:0x7f7758f279cd > > > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > > > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > > > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > > > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > > > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > > > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > > > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > > > [ 213.796482] </TASK> > > > " > > > > > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > > > --- > > > drivers/net/virtio_net.c | 10 ++++++++-- > > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 51b1868d2f22..28b7dd917a43 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > > { > > > struct scatterlist *sgs[4], hdr, stat; > > > unsigned out_num = 0, tmp; > > > - int ret; > > > + int ret, timeout = 200; > > > > > > /* Caller should know better */ > > > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > > * into the hypervisor, so the request should be handled immediately. > > > */ > > > while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > - !virtqueue_is_broken(vi->cvq)) > > > + !virtqueue_is_broken(vi->cvq)) { > > > + if (timeout) > > > + timeout--; > > > > This is not really a timeout, just a loop counter. 200 iterations could > > be a very short time on reasonable H/W. I guess this avoid the soft > > lockup, but possibly (likely?) breaks the functionality when we need to > > loop for some non negligible time. > > > > I fear we need a more complex solution, as mentioned by Micheal in the > > thread you quoted. > > > > Cheers, > > > > Paolo > > So a bunch of devices just queue a rquest on a workqueue. Problem is, > errors are not reported then. I really feel this is something netdev > core should support better: return a "deferred" status and have > netdev core wait outside a lock on a waitqueue. And on the other side, more and more requests will be post to the cvq, we should refactor this to async mode. Now the netdim has this request. @Heng Qi Thanks. > > > -- > MST >
在 2024/1/18 下午8:01, Zhu Yanjun 写道: > > 在 2024/1/16 20:04, Paolo Abeni 写道: >> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: >>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>> >>> Some devices emulate the virtio_net hardwares. When virtio_net >>> driver sends commands to the emulated hardware, normally the >>> hardware needs time to response. Sometimes the time is very >>> long. Thus, the following will appear. Then the whole system >>> will hang. >>> The similar problems also occur in Intel NICs and Mellanox NICs. >>> As such, the similar solution is borrowed from them. A timeout >>> value is added and the timeout value as large as possible is set >>> to ensure that the driver gets the maximum possible response from >>> the hardware. >>> >>> " >>> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! >>> [(udev-worker):3157] >>> [ 213.796114] Modules linked in: virtio_net(+) net_failover >>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common >>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs >>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp >>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support >>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev >>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci >>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus >>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf >>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update >>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel >>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 >>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi >>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >>> [ 213.796194] irq event stamp: 67740 >>> [ 213.796195] hardirqs last enabled at (67739): >>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>> [ 213.796203] hardirqs last disabled at (67740): >>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >>> [ 213.796208] softirqs last enabled at (67686): >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>> [ 213.796214] softirqs last disabled at (67681): >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded >>> Not tainted 6.7.0+ #9 >>> [ 213.796220] Hardware name: Intel Corporation >>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 >>> 11/14/2022 >>> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >>> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 >>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 >>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 >>> 89 e8 5b >>> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >>> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: >>> 0000000000000001 >>> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: >>> ff2f15095896f000 >>> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: >>> 0000000000000000 >>> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: >>> 0000000000000002 >>> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: >>> ff2f1509534f3000 >>> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) >>> knlGS:0000000000000000 >>> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: >>> 0000000000f71ef0 >>> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: >>> 0000000000000400 >>> [ 213.796247] PKRU: 55555554 >>> [ 213.796249] Call Trace: >>> [ 213.796250] <IRQ> >>> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >>> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >>> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >>> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >>> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >>> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >>> [ 213.796282] </IRQ> >>> [ 213.796284] <TASK> >>> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >>> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >>> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >>> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >>> [ 213.796328] virtio_dev_probe+0x195/0x230 >>> [ 213.796333] really_probe+0x19f/0x400 >>> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >>> [ 213.796340] __driver_probe_device+0x78/0x160 >>> [ 213.796343] driver_probe_device+0x1f/0x90 >>> [ 213.796346] __driver_attach+0xd6/0x1d0 >>> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >>> [ 213.796355] bus_add_driver+0x119/0x220 >>> [ 213.796359] driver_register+0x59/0x100 >>> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >>> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >>> [ 213.796375] do_one_initcall+0x6f/0x380 >>> [ 213.796384] do_init_module+0x60/0x240 >>> [ 213.796388] init_module_from_file+0x86/0xc0 >>> [ 213.796396] idempotent_init_module+0x129/0x2c0 >>> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >>> [ 213.796409] do_syscall_64+0x60/0xe0 >>> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >>> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >>> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >>> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >>> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>> [ 213.796435] RIP: 0033:0x7f7758f279cd >>> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e >>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 >>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 >>> 89 01 48 >>> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: >>> 0000000000000139 >>> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: >>> 00007f7758f279cd >>> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: >>> 000000000000000f >>> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: >>> 00007ffe2cad8860 >>> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: >>> 0000000000020000 >>> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: >>> 0000557f98783aa0 >>> [ 213.796482] </TASK> >>> " >>> >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >>> --- >>> drivers/net/virtio_net.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index 51b1868d2f22..28b7dd917a43 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct >>> virtnet_info *vi, u8 class, u8 cmd, >>> { >>> struct scatterlist *sgs[4], hdr, stat; >>> unsigned out_num = 0, tmp; >>> - int ret; >>> + int ret, timeout = 200; >>> /* Caller should know better */ >>> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct >>> virtnet_info *vi, u8 class, u8 cmd, >>> * into the hypervisor, so the request should be handled >>> immediately. >>> */ >>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>> - !virtqueue_is_broken(vi->cvq)) >>> + !virtqueue_is_broken(vi->cvq)) { >>> + if (timeout) >>> + timeout--; >> This is not really a timeout, just a loop counter. 200 iterations could >> be a very short time on reasonable H/W. I guess this avoid the soft >> lockup, but possibly (likely?) breaks the functionality when we need to >> loop for some non negligible time. >> >> I fear we need a more complex solution, as mentioned by Micheal in the >> thread you quoted. > > Got it. I also look forward to the more complex solution to this problem. Can we add a device capability (new feature bit) such as ctrq_wait_timeout to get a reasonable timeout? Thanks, Heng > > Zhu Yanjun > >> >> Cheers, >> >> Paolo >>
> > > > while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > - !virtqueue_is_broken(vi->cvq)) > > > > + !virtqueue_is_broken(vi->cvq)) { > > > > + if (timeout) > > > > + timeout--; > > > This is not really a timeout, just a loop counter. 200 iterations could > > > be a very short time on reasonable H/W. I guess this avoid the soft > > > lockup, but possibly (likely?) breaks the functionality when we need to > > > loop for some non negligible time. > > > > > > I fear we need a more complex solution, as mentioned by Micheal in the > > > thread you quoted. > > > > Got it. I also look forward to the more complex solution to this problem. > > Can we add a device capability (new feature bit) such as ctrq_wait_timeout > to get a reasonable timeout? The usual solution to this is include/linux/iopoll.h. If you can sleep read_poll_timeout() otherwise read_poll_timeout_atomic(). Andrew
在 2024/1/20 1:29, Andrew Lunn 写道: >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>> - !virtqueue_is_broken(vi->cvq)) >>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>> + if (timeout) >>>>> + timeout--; >>>> This is not really a timeout, just a loop counter. 200 iterations could >>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>> loop for some non negligible time. >>>> >>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>> thread you quoted. >>> Got it. I also look forward to the more complex solution to this problem. >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout >> to get a reasonable timeout? > The usual solution to this is include/linux/iopoll.h. If you can sleep > read_poll_timeout() otherwise read_poll_timeout_atomic(). Thanks. The 2 functions read_poll_timeout() and read_poll_timeout_atomic() are interesting. Zhu Yanjun > > Andrew
在 2024/1/20 1:29, Andrew Lunn 写道: >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>> - !virtqueue_is_broken(vi->cvq)) >>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>> + if (timeout) >>>>> + timeout--; >>>> This is not really a timeout, just a loop counter. 200 iterations could >>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>> loop for some non negligible time. >>>> >>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>> thread you quoted. >>> Got it. I also look forward to the more complex solution to this problem. >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout >> to get a reasonable timeout? > The usual solution to this is include/linux/iopoll.h. If you can sleep > read_poll_timeout() otherwise read_poll_timeout_atomic(). I read carefully the functions read_poll_timeout() and read_poll_timeout_atomic(). The timeout is set by the caller of the 2 functions. As such, can we add a module parameter to customize this timeout value by the user? Or this timeout value is stored in device register, virtio_net driver will read this timeout value at initialization? Zhu Yanjun > > Andrew
On Mon, Jan 15, 2024 at 6:22 PM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > 在 2024/1/15 10:20, Jason Wang 写道: > > On Mon, Jan 15, 2024 at 9:35 AM Zhu Yanjun <yanjun.zhu@intel.com> wrote: > > From: Zhu Yanjun <yanjun.zhu@linux.dev> > > Some devices emulate the virtio_net hardwares. When virtio_net > driver sends commands to the emulated hardware, normally the > hardware needs time to response. Sometimes the time is very > long. Thus, the following will appear. Then the whole system > will hang. > The similar problems also occur in Intel NICs and Mellanox NICs. > As such, the similar solution is borrowed from them. A timeout > value is added and the timeout value as large as possible is set > to ensure that the driver gets the maximum possible response from > the hardware. > > " > [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! [(udev-worker):3157] > [ 213.796114] Modules linked in: virtio_net(+) net_failover failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > [ 213.796194] irq event stamp: 67740 > [ 213.796195] hardirqs last enabled at (67739): [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796203] hardirqs last disabled at (67740): [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > [ 213.796208] softirqs last enabled at (67686): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796214] softirqs last disabled at (67681): [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded Not tainted 6.7.0+ #9 > [ 213.796220] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022 > [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 89 e8 5b > [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: 0000000000000001 > [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: ff2f15095896f000 > [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: 0000000000000002 > [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: ff2f1509534f3000 > [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) knlGS:0000000000000000 > [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: 0000000000f71ef0 > [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 213.796247] PKRU: 55555554 > [ 213.796249] Call Trace: > [ 213.796250] <IRQ> > [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > [ 213.796282] </IRQ> > [ 213.796284] <TASK> > [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > [ 213.796328] virtio_dev_probe+0x195/0x230 > [ 213.796333] really_probe+0x19f/0x400 > [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > [ 213.796340] __driver_probe_device+0x78/0x160 > [ 213.796343] driver_probe_device+0x1f/0x90 > [ 213.796346] __driver_attach+0xd6/0x1d0 > [ 213.796349] bus_for_each_dev+0x8c/0xe0 > [ 213.796355] bus_add_driver+0x119/0x220 > [ 213.796359] driver_register+0x59/0x100 > [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > [ 213.796375] do_one_initcall+0x6f/0x380 > [ 213.796384] do_init_module+0x60/0x240 > [ 213.796388] init_module_from_file+0x86/0xc0 > [ 213.796396] idempotent_init_module+0x129/0x2c0 > [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > [ 213.796409] do_syscall_64+0x60/0xe0 > [ 213.796415] ? do_syscall_64+0x6f/0xe0 > [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > [ 213.796424] ? do_syscall_64+0x6f/0xe0 > [ 213.796427] ? do_syscall_64+0x6f/0xe0 > [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > [ 213.796435] RIP: 0033:0x7f7758f279cd > [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 89 01 48 > [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 > [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: 00007f7758f279cd > [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: 000000000000000f > [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: 00007ffe2cad8860 > [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000 > [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: 0000557f98783aa0 > [ 213.796482] </TASK> > " > > Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > --- > drivers/net/virtio_net.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 51b1868d2f22..28b7dd917a43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > { > struct scatterlist *sgs[4], hdr, stat; > unsigned out_num = 0, tmp; > - int ret; > + int ret, timeout = 200; > > Any reason we choose this value or how can we know it works for all > types of devices? > > As I mentioned in the commit log, the similar problem also occurs in Intel NIC driver and Mellanox NIC driver. > > This commit is borrowed from the solution of Intel NIC driver. So the value is also from Intel NIC driver solution. Right, so basically I meant we need a solution that works for all vendors. > > A more easy way is to use cond_resched() but it may have side effects > as well[1]. But it seems less intrusive anyhow than the proposal here? > > Thanks a lot for your suggestions. I have made tests with the commits in the link https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html. > > Because virtio_net driver spins waiting for the response of hardware, virtio_net driver can not be unloaded and kernel hang still occurs when running "ip link" or unloading virtio_net module. Well, yes. It aims to solve the lockups as described in this commit log. It doesn't solve the issue of infinite waiting etc. Thanks > > Zhu Yanjun > > Thanks > > [1] https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60297.html > > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > * into the hypervisor, so the request should be handled immediately. > */ > while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > + !virtqueue_is_broken(vi->cvq)) { > + if (timeout) > + timeout--; > + else > + return false; /* long time no response */ > + > cpu_relax(); > + } > > return vi->ctrl->status == VIRTIO_NET_OK; > } > -- > 2.42.0 >
On Fri, Jan 19, 2024 at 10:27 PM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/1/18 下午8:01, Zhu Yanjun 写道: > > > > 在 2024/1/16 20:04, Paolo Abeni 写道: > >> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: > >>> From: Zhu Yanjun <yanjun.zhu@linux.dev> > >>> > >>> Some devices emulate the virtio_net hardwares. When virtio_net > >>> driver sends commands to the emulated hardware, normally the > >>> hardware needs time to response. Sometimes the time is very > >>> long. Thus, the following will appear. Then the whole system > >>> will hang. > >>> The similar problems also occur in Intel NICs and Mellanox NICs. > >>> As such, the similar solution is borrowed from them. A timeout > >>> value is added and the timeout value as large as possible is set > >>> to ensure that the driver gets the maximum possible response from > >>> the hardware. > >>> > >>> " > >>> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! > >>> [(udev-worker):3157] > >>> [ 213.796114] Modules linked in: virtio_net(+) net_failover > >>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common > >>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs > >>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp > >>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support > >>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev > >>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci > >>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus > >>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf > >>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update > >>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel > >>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 > >>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi > >>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath > >>> [ 213.796194] irq event stamp: 67740 > >>> [ 213.796195] hardirqs last enabled at (67739): > >>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > >>> [ 213.796203] hardirqs last disabled at (67740): > >>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 > >>> [ 213.796208] softirqs last enabled at (67686): > >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > >>> [ 213.796214] softirqs last disabled at (67681): > >>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 > >>> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded > >>> Not tainted 6.7.0+ #9 > >>> [ 213.796220] Hardware name: Intel Corporation > >>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 > >>> 11/14/2022 > >>> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 > >>> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 > >>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 > >>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 > >>> 89 e8 5b > >>> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 > >>> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: > >>> 0000000000000001 > >>> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: > >>> ff2f15095896f000 > >>> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: > >>> 0000000000000000 > >>> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: > >>> 0000000000000002 > >>> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: > >>> ff2f1509534f3000 > >>> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) > >>> knlGS:0000000000000000 > >>> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: > >>> 0000000000f71ef0 > >>> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > >>> 0000000000000000 > >>> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: > >>> 0000000000000400 > >>> [ 213.796247] PKRU: 55555554 > >>> [ 213.796249] Call Trace: > >>> [ 213.796250] <IRQ> > >>> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 > >>> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 > >>> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 > >>> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 > >>> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 > >>> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 > >>> [ 213.796282] </IRQ> > >>> [ 213.796284] <TASK> > >>> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > >>> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 > >>> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] > >>> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] > >>> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] > >>> [ 213.796328] virtio_dev_probe+0x195/0x230 > >>> [ 213.796333] really_probe+0x19f/0x400 > >>> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 > >>> [ 213.796340] __driver_probe_device+0x78/0x160 > >>> [ 213.796343] driver_probe_device+0x1f/0x90 > >>> [ 213.796346] __driver_attach+0xd6/0x1d0 > >>> [ 213.796349] bus_for_each_dev+0x8c/0xe0 > >>> [ 213.796355] bus_add_driver+0x119/0x220 > >>> [ 213.796359] driver_register+0x59/0x100 > >>> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] > >>> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] > >>> [ 213.796375] do_one_initcall+0x6f/0x380 > >>> [ 213.796384] do_init_module+0x60/0x240 > >>> [ 213.796388] init_module_from_file+0x86/0xc0 > >>> [ 213.796396] idempotent_init_module+0x129/0x2c0 > >>> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 > >>> [ 213.796409] do_syscall_64+0x60/0xe0 > >>> [ 213.796415] ? do_syscall_64+0x6f/0xe0 > >>> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 > >>> [ 213.796424] ? do_syscall_64+0x6f/0xe0 > >>> [ 213.796427] ? do_syscall_64+0x6f/0xe0 > >>> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >>> [ 213.796435] RIP: 0033:0x7f7758f279cd > >>> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e > >>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 > >>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 > >>> 89 01 48 > >>> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: > >>> 0000000000000139 > >>> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: > >>> 00007f7758f279cd > >>> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: > >>> 000000000000000f > >>> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: > >>> 00007ffe2cad8860 > >>> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: > >>> 0000000000020000 > >>> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: > >>> 0000557f98783aa0 > >>> [ 213.796482] </TASK> > >>> " > >>> > >>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> > >>> --- > >>> drivers/net/virtio_net.c | 10 ++++++++-- > >>> 1 file changed, 8 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>> index 51b1868d2f22..28b7dd917a43 100644 > >>> --- a/drivers/net/virtio_net.c > >>> +++ b/drivers/net/virtio_net.c > >>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct > >>> virtnet_info *vi, u8 class, u8 cmd, > >>> { > >>> struct scatterlist *sgs[4], hdr, stat; > >>> unsigned out_num = 0, tmp; > >>> - int ret; > >>> + int ret, timeout = 200; > >>> /* Caller should know better */ > >>> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > >>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct > >>> virtnet_info *vi, u8 class, u8 cmd, > >>> * into the hypervisor, so the request should be handled > >>> immediately. > >>> */ > >>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > >>> - !virtqueue_is_broken(vi->cvq)) > >>> + !virtqueue_is_broken(vi->cvq)) { > >>> + if (timeout) > >>> + timeout--; > >> This is not really a timeout, just a loop counter. 200 iterations could > >> be a very short time on reasonable H/W. I guess this avoid the soft > >> lockup, but possibly (likely?) breaks the functionality when we need to > >> loop for some non negligible time. > >> > >> I fear we need a more complex solution, as mentioned by Micheal in the > >> thread you quoted. > > > > Got it. I also look forward to the more complex solution to this problem. > > Can we add a device capability (new feature bit) such as > ctrq_wait_timeout to get a reasonable timeout? This adds another kind of complexity for migration compatibility. And we need to make it more general, e.g 1) it should not be cvq specific 2) or we can have a timeout that works for all queues ? Thanks > > Thanks, > Heng > > > > > Zhu Yanjun > > > >> > >> Cheers, > >> > >> Paolo > >> >
On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > >>>>> - !virtqueue_is_broken(vi->cvq)) > >>>>> + !virtqueue_is_broken(vi->cvq)) { > >>>>> + if (timeout) > >>>>> + timeout--; > >>>> This is not really a timeout, just a loop counter. 200 iterations could > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > >>>> loop for some non negligible time. > >>>> > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > >>>> thread you quoted. > >>> Got it. I also look forward to the more complex solution to this problem. > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > >> to get a reasonable timeout? > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > I read carefully the functions read_poll_timeout() and > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > functions. FYI, in order to avoid a swtich of atomic or not, we need convert rx mode setting to workqueue first: https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > As such, can we add a module parameter to customize this timeout value > by the user? Who is the "user" here, or how can the "user" know the value? > > Or this timeout value is stored in device register, virtio_net driver > will read this timeout value at initialization? See another thread. The design needs to be general, or you can post a RFC. In another thought, we've already had a tx watchdog, maybe we can have something similar to cvq and use timeout + reset in that case. Thans > > Zhu Yanjun > > > > > Andrew >
On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > >>>>> + if (timeout) > > >>>>> + timeout--; > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > >>>> loop for some non negligible time. > > >>>> > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > >>>> thread you quoted. > > >>> Got it. I also look forward to the more complex solution to this problem. > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > >> to get a reasonable timeout? > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > I read carefully the functions read_poll_timeout() and > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > functions. > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > mode setting to workqueue first: > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > As such, can we add a module parameter to customize this timeout value > > by the user? > > Who is the "user" here, or how can the "user" know the value? > > > > > Or this timeout value is stored in device register, virtio_net driver > > will read this timeout value at initialization? > > See another thread. The design needs to be general, or you can post a RFC. > > In another thought, we've already had a tx watchdog, maybe we can have > something similar to cvq and use timeout + reset in that case. But we may block by the reset ^_^ if the device is broken? Thanks. > > Thans > > > > > Zhu Yanjun > > > > > > > > Andrew > > >
On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > >>>>> + if (timeout) > > > >>>>> + timeout--; > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > >>>> loop for some non negligible time. > > > >>>> > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > >>>> thread you quoted. > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > >> to get a reasonable timeout? > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > I read carefully the functions read_poll_timeout() and > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > functions. > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > mode setting to workqueue first: > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > by the user? > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > will read this timeout value at initialization? > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > In another thought, we've already had a tx watchdog, maybe we can have > > something similar to cvq and use timeout + reset in that case. > > But we may block by the reset ^_^ if the device is broken? I mean vq reset here. It looks like we have multiple goals here 1) avoid lockups, using workqueue + cond_resched() seems to be sufficient, it has issue but nothing new 2) recover from the unresponsive device, the issue for timeout is that it needs to deal with false positives Thanks > > Thanks. > > > > > > Thans > > > > > > > > Zhu Yanjun > > > > > > > > > > > Andrew > > > > > >
在 2024/1/22 上午11:08, Jason Wang 写道: > On Fri, Jan 19, 2024 at 10:27 PM Heng Qi <hengqi@linux.alibaba.com> wrote: >> >> >> 在 2024/1/18 下午8:01, Zhu Yanjun 写道: >>> 在 2024/1/16 20:04, Paolo Abeni 写道: >>>> On Mon, 2024-01-15 at 09:29 +0800, Zhu Yanjun wrote: >>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev> >>>>> >>>>> Some devices emulate the virtio_net hardwares. When virtio_net >>>>> driver sends commands to the emulated hardware, normally the >>>>> hardware needs time to response. Sometimes the time is very >>>>> long. Thus, the following will appear. Then the whole system >>>>> will hang. >>>>> The similar problems also occur in Intel NICs and Mellanox NICs. >>>>> As such, the similar solution is borrowed from them. A timeout >>>>> value is added and the timeout value as large as possible is set >>>>> to ensure that the driver gets the maximum possible response from >>>>> the hardware. >>>>> >>>>> " >>>>> [ 213.795860] watchdog: BUG: soft lockup - CPU#108 stuck for 26s! >>>>> [(udev-worker):3157] >>>>> [ 213.796114] Modules linked in: virtio_net(+) net_failover >>>>> failover qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common >>>>> intel_uncore_frequency intel_uncore_frequency_common intel_ifs >>>>> i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp >>>>> coretemp iTCO_wdt rapl intel_pmc_bxt dax_hmem iTCO_vendor_support >>>>> vfat cxl_acpi intel_cstate pmt_telemetry pmt_class intel_sdsi joydev >>>>> intel_uncore cxl_core fat pcspkr mei_me isst_if_mbox_pci >>>>> isst_if_mmio idxd i2c_i801 isst_if_common mei intel_vsec idxd_bus >>>>> i2c_smbus i2c_ismt ipmi_ssif acpi_ipmi ipmi_si ipmi_devintf >>>>> ipmi_msghandler acpi_pad acpi_power_meter pfr_telemetry pfr_update >>>>> fuse loop zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel >>>>> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 >>>>> bnxt_en sha256_ssse3 sha1_ssse3 nvme ast nvme_core i2c_algo_bit wmi >>>>> pinctrl_emmitsburg scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath >>>>> [ 213.796194] irq event stamp: 67740 >>>>> [ 213.796195] hardirqs last enabled at (67739): >>>>> [<ffffffff8c2015ca>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>>>> [ 213.796203] hardirqs last disabled at (67740): >>>>> [<ffffffff8c14108e>] sysvec_apic_timer_interrupt+0xe/0x90 >>>>> [ 213.796208] softirqs last enabled at (67686): >>>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>>>> [ 213.796214] softirqs last disabled at (67681): >>>>> [<ffffffff8b12115e>] __irq_exit_rcu+0xbe/0xe0 >>>>> [ 213.796217] CPU: 108 PID: 3157 Comm: (udev-worker) Kdump: loaded >>>>> Not tainted 6.7.0+ #9 >>>>> [ 213.796220] Hardware name: Intel Corporation >>>>> M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 >>>>> 11/14/2022 >>>>> [ 213.796221] RIP: 0010:virtqueue_get_buf_ctx_split+0x8d/0x110 >>>>> [ 213.796228] Code: 89 df e8 26 fe ff ff 0f b7 43 50 83 c0 01 66 89 >>>>> 43 50 f6 43 78 01 75 12 80 7b 42 00 48 8b 4b 68 8b 53 58 74 0f 66 87 >>>>> 44 51 04 <48> 89 e8 5b 5d c3 cc cc cc cc 66 89 44 51 04 0f ae f0 48 >>>>> 89 e8 5b >>>>> [ 213.796230] RSP: 0018:ff4bbb362306f9b0 EFLAGS: 00000246 >>>>> [ 213.796233] RAX: 0000000000000000 RBX: ff2f15095896f000 RCX: >>>>> 0000000000000001 >>>>> [ 213.796235] RDX: 0000000000000000 RSI: ff4bbb362306f9cc RDI: >>>>> ff2f15095896f000 >>>>> [ 213.796236] RBP: 0000000000000000 R08: 0000000000000000 R09: >>>>> 0000000000000000 >>>>> [ 213.796237] R10: 0000000000000003 R11: ff2f15095893cc40 R12: >>>>> 0000000000000002 >>>>> [ 213.796239] R13: 0000000000000004 R14: 0000000000000000 R15: >>>>> ff2f1509534f3000 >>>>> [ 213.796240] FS: 00007f775847d0c0(0000) GS:ff2f1528bac00000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 213.796242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 213.796243] CR2: 0000557f987b6e70 CR3: 0000002098602006 CR4: >>>>> 0000000000f71ef0 >>>>> [ 213.796245] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>> 0000000000000000 >>>>> [ 213.796246] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: >>>>> 0000000000000400 >>>>> [ 213.796247] PKRU: 55555554 >>>>> [ 213.796249] Call Trace: >>>>> [ 213.796250] <IRQ> >>>>> [ 213.796252] ? watchdog_timer_fn+0x1c0/0x220 >>>>> [ 213.796258] ? __pfx_watchdog_timer_fn+0x10/0x10 >>>>> [ 213.796261] ? __hrtimer_run_queues+0x1af/0x380 >>>>> [ 213.796269] ? hrtimer_interrupt+0xf8/0x230 >>>>> [ 213.796274] ? __sysvec_apic_timer_interrupt+0x64/0x1a0 >>>>> [ 213.796279] ? sysvec_apic_timer_interrupt+0x6d/0x90 >>>>> [ 213.796282] </IRQ> >>>>> [ 213.796284] <TASK> >>>>> [ 213.796285] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 >>>>> [ 213.796293] ? virtqueue_get_buf_ctx_split+0x8d/0x110 >>>>> [ 213.796297] virtnet_send_command+0x18a/0x1f0 [virtio_net] >>>>> [ 213.796310] _virtnet_set_queues+0xc6/0x120 [virtio_net] >>>>> [ 213.796319] virtnet_probe+0xa06/0xd50 [virtio_net] >>>>> [ 213.796328] virtio_dev_probe+0x195/0x230 >>>>> [ 213.796333] really_probe+0x19f/0x400 >>>>> [ 213.796338] ? __pfx___driver_attach+0x10/0x10 >>>>> [ 213.796340] __driver_probe_device+0x78/0x160 >>>>> [ 213.796343] driver_probe_device+0x1f/0x90 >>>>> [ 213.796346] __driver_attach+0xd6/0x1d0 >>>>> [ 213.796349] bus_for_each_dev+0x8c/0xe0 >>>>> [ 213.796355] bus_add_driver+0x119/0x220 >>>>> [ 213.796359] driver_register+0x59/0x100 >>>>> [ 213.796362] ? __pfx_virtio_net_driver_init+0x10/0x10 [virtio_net] >>>>> [ 213.796369] virtio_net_driver_init+0x8e/0xff0 [virtio_net] >>>>> [ 213.796375] do_one_initcall+0x6f/0x380 >>>>> [ 213.796384] do_init_module+0x60/0x240 >>>>> [ 213.796388] init_module_from_file+0x86/0xc0 >>>>> [ 213.796396] idempotent_init_module+0x129/0x2c0 >>>>> [ 213.796406] __x64_sys_finit_module+0x5e/0xb0 >>>>> [ 213.796409] do_syscall_64+0x60/0xe0 >>>>> [ 213.796415] ? do_syscall_64+0x6f/0xe0 >>>>> [ 213.796418] ? lockdep_hardirqs_on_prepare+0xe4/0x1a0 >>>>> [ 213.796424] ? do_syscall_64+0x6f/0xe0 >>>>> [ 213.796427] ? do_syscall_64+0x6f/0xe0 >>>>> [ 213.796431] entry_SYSCALL_64_after_hwframe+0x6e/0x76 >>>>> [ 213.796435] RIP: 0033:0x7f7758f279cd >>>>> [ 213.796465] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e >>>>> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 >>>>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 33 e4 0c 00 f7 d8 64 >>>>> 89 01 48 >>>>> [ 213.796467] RSP: 002b:00007ffe2cad8738 EFLAGS: 00000246 ORIG_RAX: >>>>> 0000000000000139 >>>>> [ 213.796469] RAX: ffffffffffffffda RBX: 0000557f987a8180 RCX: >>>>> 00007f7758f279cd >>>>> [ 213.796471] RDX: 0000000000000000 RSI: 00007f77593e5453 RDI: >>>>> 000000000000000f >>>>> [ 213.796472] RBP: 00007f77593e5453 R08: 0000000000000000 R09: >>>>> 00007ffe2cad8860 >>>>> [ 213.796473] R10: 000000000000000f R11: 0000000000000246 R12: >>>>> 0000000000020000 >>>>> [ 213.796475] R13: 0000557f9879f8e0 R14: 0000000000000000 R15: >>>>> 0000557f98783aa0 >>>>> [ 213.796482] </TASK> >>>>> " >>>>> >>>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> >>>>> --- >>>>> drivers/net/virtio_net.c | 10 ++++++++-- >>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>> index 51b1868d2f22..28b7dd917a43 100644 >>>>> --- a/drivers/net/virtio_net.c >>>>> +++ b/drivers/net/virtio_net.c >>>>> @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct >>>>> virtnet_info *vi, u8 class, u8 cmd, >>>>> { >>>>> struct scatterlist *sgs[4], hdr, stat; >>>>> unsigned out_num = 0, tmp; >>>>> - int ret; >>>>> + int ret, timeout = 200; >>>>> /* Caller should know better */ >>>>> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >>>>> @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct >>>>> virtnet_info *vi, u8 class, u8 cmd, >>>>> * into the hypervisor, so the request should be handled >>>>> immediately. >>>>> */ >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>> - !virtqueue_is_broken(vi->cvq)) >>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>> + if (timeout) >>>>> + timeout--; >>>> This is not really a timeout, just a loop counter. 200 iterations could >>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>> loop for some non negligible time. >>>> >>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>> thread you quoted. >>> Got it. I also look forward to the more complex solution to this problem. >> Can we add a device capability (new feature bit) such as >> ctrq_wait_timeout to get a reasonable timeout? > This adds another kind of complexity for migration compatibility. Yes, this requires device adaptation. But the value will be more reasonable, different devices have different requirements for timeout. > > And we need to make it more general, e.g > > 1) it should not be cvq specific Agree. Especially considering avq also has this problem. > 2) or we can have a timeout that works for all queues I remember Parav mentioned that they were also making some bulk improvements to ctrlq/avq, or we should look at a combined solution. rxq timeout can be used for our detection aliveness needs, especially after a live migration. txq timeout can be used with watchdog + vq reset. Thanks, Heng > ? > > Thanks > >> Thanks, >> Heng >> >>> Zhu Yanjun >>> >>>> Cheers, >>>> >>>> Paolo >>>>
On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > >>>>> + if (timeout) > > > > >>>>> + timeout--; > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > >>>> loop for some non negligible time. > > > > >>>> > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > >>>> thread you quoted. > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > >> to get a reasonable timeout? > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > functions. > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > mode setting to workqueue first: > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > by the user? > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > will read this timeout value at initialization? > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > something similar to cvq and use timeout + reset in that case. > > > > But we may block by the reset ^_^ if the device is broken? > > I mean vq reset here. I see. I mean when the deivce is broken, the vq reset also many be blocked. void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) { struct virtio_pci_modern_common_cfg __iomem *cfg; cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; vp_iowrite16(index, &cfg->cfg.queue_select); vp_iowrite16(1, &cfg->queue_reset); while (vp_ioread16(&cfg->queue_reset)) msleep(1); while (vp_ioread16(&cfg->cfg.queue_enable)) msleep(1); } EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); In this function, for the broken device, we can not expect something. > > It looks like we have multiple goals here > > 1) avoid lockups, using workqueue + cond_resched() seems to be > sufficient, it has issue but nothing new > 2) recover from the unresponsive device, the issue for timeout is that > it needs to deal with false positives I agree. But I want to add a new goal, cvq async. In the netdim, we will send many requests via the cvq, so the cvq async will be nice. Thanks. > > Thanks > > > > > Thanks. > > > > > > > > > > Thans > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > Andrew > > > > > > > > > >
On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > >>>>> + if (timeout) > > > > > >>>>> + timeout--; > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > >>>> loop for some non negligible time. > > > > > >>>> > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > >>>> thread you quoted. > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > >> to get a reasonable timeout? > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > functions. > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > mode setting to workqueue first: > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > by the user? > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > will read this timeout value at initialization? > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > something similar to cvq and use timeout + reset in that case. > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > I mean vq reset here. > > I see. > > I mean when the deivce is broken, the vq reset also many be blocked. > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > { > struct virtio_pci_modern_common_cfg __iomem *cfg; > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > vp_iowrite16(index, &cfg->cfg.queue_select); > vp_iowrite16(1, &cfg->queue_reset); > > while (vp_ioread16(&cfg->queue_reset)) > msleep(1); > > while (vp_ioread16(&cfg->cfg.queue_enable)) > msleep(1); > } > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > In this function, for the broken device, we can not expect something. Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. Thanks > > > > > > It looks like we have multiple goals here > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > sufficient, it has issue but nothing new > > 2) recover from the unresponsive device, the issue for timeout is that > > it needs to deal with false positives > > > I agree. > > But I want to add a new goal, cvq async. In the netdim, we will > send many requests via the cvq, so the cvq async will be nice. > > Thanks. > > > > > > Thanks > > > > > > > > Thanks. > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > >
On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > >>>>> + if (timeout) > > > > > > >>>>> + timeout--; > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > >>>> loop for some non negligible time. > > > > > > >>>> > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > >>>> thread you quoted. > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > >> to get a reasonable timeout? > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > functions. > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > mode setting to workqueue first: > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > by the user? > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > will read this timeout value at initialization? > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > I mean vq reset here. > > > > I see. > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > { > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > vp_iowrite16(1, &cfg->queue_reset); > > > > while (vp_ioread16(&cfg->queue_reset)) > > msleep(1); > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > msleep(1); > > } > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > In this function, for the broken device, we can not expect something. > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > Thanks > > > > > > > > > > > It looks like we have multiple goals here > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > sufficient, it has issue but nothing new > > > 2) recover from the unresponsive device, the issue for timeout is that > > > it needs to deal with false positives > > > > > > I agree. > > > > But I want to add a new goal, cvq async. In the netdim, we will > > send many requests via the cvq, so the cvq async will be nice. Then you need an interrupt for cvq. FYI, I've posted a series that use interrupt for cvq in the past: https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ Haven't found time in working on this anymore, maybe we can start from this or not. Thanks > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > >
On Mon, 22 Jan 2024 14:55:46 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > >>>>> + if (timeout) > > > > > > >>>>> + timeout--; > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > >>>> loop for some non negligible time. > > > > > > >>>> > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > >>>> thread you quoted. > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > >> to get a reasonable timeout? > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > functions. > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > mode setting to workqueue first: > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > by the user? > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > will read this timeout value at initialization? > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > I mean vq reset here. > > > > I see. > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > { > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > vp_iowrite16(1, &cfg->queue_reset); > > > > while (vp_ioread16(&cfg->queue_reset)) > > msleep(1); > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > msleep(1); > > } > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > In this function, for the broken device, we can not expect something. > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > YES. I agree. Thanks. > Thanks > > > > > > > > > > > It looks like we have multiple goals here > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > sufficient, it has issue but nothing new > > > 2) recover from the unresponsive device, the issue for timeout is that > > > it needs to deal with false positives > > > > > > I agree. > > > > But I want to add a new goal, cvq async. In the netdim, we will > > send many requests via the cvq, so the cvq async will be nice. > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > >
On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > >>>>> + if (timeout) > > > > > > > >>>>> + timeout--; > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > >>>> loop for some non negligible time. > > > > > > > >>>> > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > >>>> thread you quoted. > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > >> to get a reasonable timeout? > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > functions. > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > mode setting to workqueue first: > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > by the user? > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > I mean vq reset here. > > > > > > I see. > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > { > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > msleep(1); > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > msleep(1); > > > } > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > In this function, for the broken device, we can not expect something. > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > Thanks > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > sufficient, it has issue but nothing new > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > it needs to deal with false positives > > > > > > > > > I agree. > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > send many requests via the cvq, so the cvq async will be nice. > > Then you need an interrupt for cvq. > > FYI, I've posted a series that use interrupt for cvq in the past: > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ I know this. But the interrupt maybe not a good solution without new space. > > Haven't found time in working on this anymore, maybe we can start from > this or not. I said async, but my aim is to put many requests to the cvq before getting the response. Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ Thanks. > > Thanks > > > > > > > Thanks. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > >
On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > >>>>> + if (timeout) > > > > > > > > >>>>> + timeout--; > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > >>>> > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > >>>> thread you quoted. > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > functions. > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > by the user? > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > I mean vq reset here. > > > > > > > > I see. > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > { > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > msleep(1); > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > msleep(1); > > > > } > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > sufficient, it has issue but nothing new > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > it needs to deal with false positives > > > > > > > > > > > > I agree. > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > send many requests via the cvq, so the cvq async will be nice. > > > > Then you need an interrupt for cvq. > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > I know this. But the interrupt maybe not a good solution without new space. What do you mean by "new space"? We can introduce something like enable_cb_delayed(), then you will only get notified after several requests. > > > > > Haven't found time in working on this anymore, maybe we can start from > > this or not. > > > I said async, but my aim is to put many requests to the cvq before getting the > response. It doesn't differ from TX/RX in this case. > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > This seems like a hack, if interrupt is used, you can simply do that in the callback. Thanks > Thanks. > > > > > > Thanks > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > >>>>> + timeout--; > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > >>>> > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > >>>> thread you quoted. > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > functions. > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > by the user? > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > I see. > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > { > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > msleep(1); > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > msleep(1); > > > > > } > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > sufficient, it has issue but nothing new > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > I agree. > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > Then you need an interrupt for cvq. > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > I know this. But the interrupt maybe not a good solution without new space. > > What do you mean by "new space"? Yes, I know, the cvq can work with interrupt by the virtio spec. But as I know, many hypervisors implement the cvq without supporting interrupt. If we let the cvq work with interrupt without negotiation then many hypervisors will hang on the new kernel. > > We can introduce something like enable_cb_delayed(), then you will > only get notified after several requests. > > > > > > > > > Haven't found time in working on this anymore, maybe we can start from > > > this or not. > > > > > > I said async, but my aim is to put many requests to the cvq before getting the > > response. > > It doesn't differ from TX/RX in this case. > > > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > > > > This seems like a hack, if interrupt is used, you can simply do that > in the callback. YES. I also want to change the code, I just want to say the async is a goal. For the rx mode, we have introduce a work queue, I want to move the sending command job to the work queue. The caller just wakeup the work queue. If the caller want to got the result sync, then the caller can wait for it. If not, the caller can register an function to the work queue. And I think it will be easy to implement the timeout inside the workqueue. Thanks. > > Thanks > > > Thanks. > > > > > > > > > > Thanks > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > >>>> > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > I see. > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > { > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > msleep(1); > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > msleep(1); > > > > > > } > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > sufficient, it has issue but nothing new > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > What do you mean by "new space"? > > Yes, I know, the cvq can work with interrupt by the virtio spec. > But as I know, many hypervisors implement the cvq without supporting interrupt. It's a bug of the hypervisor that needs to be fix. Interrupt is provided by transport not the virtio itself. Otherwise it can only support for Linux but not other OSes. > If we let the cvq work with interrupt without negotiation then > many hypervisors will hang on the new kernel. > > > > > We can introduce something like enable_cb_delayed(), then you will > > only get notified after several requests. > > > > > > > > > > > > > Haven't found time in working on this anymore, maybe we can start from > > > > this or not. > > > > > > > > > I said async, but my aim is to put many requests to the cvq before getting the > > > response. > > > > It doesn't differ from TX/RX in this case. > > > > > > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > > > > > > > This seems like a hack, if interrupt is used, you can simply do that > > in the callback. > > YES. > > I also want to change the code, I just want to say the async is a goal. > > For the rx mode, we have introduce a work queue, I want to move the > sending command job to the work queue. The caller just wakeup > the work queue. > > If the caller want to got the result sync, then the caller can wait for it. > If not, the caller can register an function to the work queue. > > And I think it will be easy to implement the timeout inside the workqueue. Looks much more complicated than a simple interrupt + timer/watchdog etc. Thanks > > Thanks. > > > > > > Thanks > > > > > Thanks. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > > >>>> > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > > > I see. > > > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > > { > > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > > msleep(1); > > > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > > msleep(1); > > > > > > > } > > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > > sufficient, it has issue but nothing new > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > > > What do you mean by "new space"? > > > > Yes, I know, the cvq can work with interrupt by the virtio spec. > > But as I know, many hypervisors implement the cvq without supporting interrupt. > > It's a bug of the hypervisor that needs to be fix. Interrupt is > provided by transport not the virtio itself. YES. I agree. But I still think we should not work with interrupt without any negotiation directly. I more like to introduce a new feature to enable this. Thanks. > > Otherwise it can only support for Linux but not other OSes. > > > If we let the cvq work with interrupt without negotiation then > > many hypervisors will hang on the new kernel. > > > > > > > > We can introduce something like enable_cb_delayed(), then you will > > > only get notified after several requests. > > > > > > > > > > > > > > > > > Haven't found time in working on this anymore, maybe we can start from > > > > > this or not. > > > > > > > > > > > > I said async, but my aim is to put many requests to the cvq before getting the > > > > response. > > > > > > It doesn't differ from TX/RX in this case. > > > > > > > > > > > Heng Qi posted this https://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ > > > > > > > > > > This seems like a hack, if interrupt is used, you can simply do that > > > in the callback. > > > > YES. > > > > I also want to change the code, I just want to say the async is a goal. > > > > For the rx mode, we have introduce a work queue, I want to move the > > sending command job to the work queue. The caller just wakeup > > the work queue. > > > > If the caller want to got the result sync, then the caller can wait for it. > > If not, the caller can register an function to the work queue. > > > > And I think it will be easy to implement the timeout inside the workqueue. > > Looks much more complicated than a simple interrupt + timer/watchdog etc. > > Thanks > > > > > Thanks. > > > > > > > > > > Thanks > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thans > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Mon, Jan 22, 2024 at 4:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > > > >>>> > > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > > > > > I see. > > > > > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > > > { > > > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > > > msleep(1); > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > > > msleep(1); > > > > > > > > } > > > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > > > sufficient, it has issue but nothing new > > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > > > > > What do you mean by "new space"? > > > > > > Yes, I know, the cvq can work with interrupt by the virtio spec. > > > But as I know, many hypervisors implement the cvq without supporting interrupt. > > > > It's a bug of the hypervisor that needs to be fix. Interrupt is > > provided by transport not the virtio itself. > > YES. I agree. > > But I still think we should not work with interrupt without any negotiation > directly. I more like to introduce a new feature to enable this. I can hardly believe we need to workaround the issue of specific hypervisors like this... Thanks > > Thanks. > >
On Mon, 22 Jan 2024 16:32:46 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Mon, Jan 22, 2024 at 4:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > On Mon, 22 Jan 2024 15:57:08 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > On Mon, Jan 22, 2024 at 3:36 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > On Mon, 22 Jan 2024 15:19:12 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > On Mon, Jan 22, 2024 at 3:07 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > On Mon, Jan 22, 2024 at 2:55 PM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > > > > > On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > > > > > On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2024/1/20 1:29, Andrew Lunn 写道: > > > > > > > > > > > > > >>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > > > > > > > > > > >>>>> - !virtqueue_is_broken(vi->cvq)) > > > > > > > > > > > > > >>>>> + !virtqueue_is_broken(vi->cvq)) { > > > > > > > > > > > > > >>>>> + if (timeout) > > > > > > > > > > > > > >>>>> + timeout--; > > > > > > > > > > > > > >>>> This is not really a timeout, just a loop counter. 200 iterations could > > > > > > > > > > > > > >>>> be a very short time on reasonable H/W. I guess this avoid the soft > > > > > > > > > > > > > >>>> lockup, but possibly (likely?) breaks the functionality when we need to > > > > > > > > > > > > > >>>> loop for some non negligible time. > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > >>>> I fear we need a more complex solution, as mentioned by Micheal in the > > > > > > > > > > > > > >>>> thread you quoted. > > > > > > > > > > > > > >>> Got it. I also look forward to the more complex solution to this problem. > > > > > > > > > > > > > >> Can we add a device capability (new feature bit) such as ctrq_wait_timeout > > > > > > > > > > > > > >> to get a reasonable timeout? > > > > > > > > > > > > > > The usual solution to this is include/linux/iopoll.h. If you can sleep > > > > > > > > > > > > > > read_poll_timeout() otherwise read_poll_timeout_atomic(). > > > > > > > > > > > > > > > > > > > > > > > > > > I read carefully the functions read_poll_timeout() and > > > > > > > > > > > > > read_poll_timeout_atomic(). The timeout is set by the caller of the 2 > > > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > > > > > FYI, in order to avoid a swtich of atomic or not, we need convert rx > > > > > > > > > > > > mode setting to workqueue first: > > > > > > > > > > > > > > > > > > > > > > > > https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, can we add a module parameter to customize this timeout value > > > > > > > > > > > > > by the user? > > > > > > > > > > > > > > > > > > > > > > > > Who is the "user" here, or how can the "user" know the value? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or this timeout value is stored in device register, virtio_net driver > > > > > > > > > > > > > will read this timeout value at initialization? > > > > > > > > > > > > > > > > > > > > > > > > See another thread. The design needs to be general, or you can post a RFC. > > > > > > > > > > > > > > > > > > > > > > > > In another thought, we've already had a tx watchdog, maybe we can have > > > > > > > > > > > > something similar to cvq and use timeout + reset in that case. > > > > > > > > > > > > > > > > > > > > > > But we may block by the reset ^_^ if the device is broken? > > > > > > > > > > > > > > > > > > > > I mean vq reset here. > > > > > > > > > > > > > > > > > > I see. > > > > > > > > > > > > > > > > > > I mean when the deivce is broken, the vq reset also many be blocked. > > > > > > > > > > > > > > > > > > void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) > > > > > > > > > { > > > > > > > > > struct virtio_pci_modern_common_cfg __iomem *cfg; > > > > > > > > > > > > > > > > > > cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; > > > > > > > > > > > > > > > > > > vp_iowrite16(index, &cfg->cfg.queue_select); > > > > > > > > > vp_iowrite16(1, &cfg->queue_reset); > > > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->queue_reset)) > > > > > > > > > msleep(1); > > > > > > > > > > > > > > > > > > while (vp_ioread16(&cfg->cfg.queue_enable)) > > > > > > > > > msleep(1); > > > > > > > > > } > > > > > > > > > EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); > > > > > > > > > > > > > > > > > > In this function, for the broken device, we can not expect something. > > > > > > > > > > > > > > > > Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like we have multiple goals here > > > > > > > > > > > > > > > > > > > > 1) avoid lockups, using workqueue + cond_resched() seems to be > > > > > > > > > > sufficient, it has issue but nothing new > > > > > > > > > > 2) recover from the unresponsive device, the issue for timeout is that > > > > > > > > > > it needs to deal with false positives > > > > > > > > > > > > > > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > > > > > > > But I want to add a new goal, cvq async. In the netdim, we will > > > > > > > > > send many requests via the cvq, so the cvq async will be nice. > > > > > > > > > > > > > > Then you need an interrupt for cvq. > > > > > > > > > > > > > > FYI, I've posted a series that use interrupt for cvq in the past: > > > > > > > > > > > > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ > > > > > > > > > > > > I know this. But the interrupt maybe not a good solution without new space. > > > > > > > > > > What do you mean by "new space"? > > > > > > > > Yes, I know, the cvq can work with interrupt by the virtio spec. > > > > But as I know, many hypervisors implement the cvq without supporting interrupt. > > > > > > It's a bug of the hypervisor that needs to be fix. Interrupt is > > > provided by transport not the virtio itself. > > > > YES. I agree. > > > > But I still think we should not work with interrupt without any negotiation > > directly. I more like to introduce a new feature to enable this. > > I can hardly believe we need to workaround the issue of specific > hypervisors like this... Maybe we should hear others. We are ok for this. I just think for other hypervisors. Thanks. > > Thanks > > > > > Thanks. > > > > >
在 2024/1/26 11:11, Zhu Yanjun 写道: > > > 在 2024/1/22 15:02, Xuan Zhuo 写道: >> On Mon, 22 Jan 2024 14:58:09 +0800, Jason Wang<jasowang@redhat.com> wrote: >>> On Mon, Jan 22, 2024 at 2:55 PM Jason Wang<jasowang@redhat.com> wrote: >>>> On Mon, Jan 22, 2024 at 2:20 PM Xuan Zhuo<xuanzhuo@linux.alibaba.com> wrote: >>>>> On Mon, 22 Jan 2024 12:16:27 +0800, Jason Wang<jasowang@redhat.com> wrote: >>>>>> On Mon, Jan 22, 2024 at 12:00 PM Xuan Zhuo<xuanzhuo@linux.alibaba.com> wrote: >>>>>>> On Mon, 22 Jan 2024 11:14:30 +0800, Jason Wang<jasowang@redhat.com> wrote: >>>>>>>> On Mon, Jan 22, 2024 at 10:12 AM Zhu Yanjun<yanjun.zhu@linux.dev> wrote: >>>>>>>>> 在 2024/1/20 1:29, Andrew Lunn 写道: >>>>>>>>>>>>>> while (!virtqueue_get_buf(vi->cvq, &tmp) && >>>>>>>>>>>>>> - !virtqueue_is_broken(vi->cvq)) >>>>>>>>>>>>>> + !virtqueue_is_broken(vi->cvq)) { >>>>>>>>>>>>>> + if (timeout) >>>>>>>>>>>>>> + timeout--; >>>>>>>>>>>>> This is not really a timeout, just a loop counter. 200 iterations could >>>>>>>>>>>>> be a very short time on reasonable H/W. I guess this avoid the soft >>>>>>>>>>>>> lockup, but possibly (likely?) breaks the functionality when we need to >>>>>>>>>>>>> loop for some non negligible time. >>>>>>>>>>>>> >>>>>>>>>>>>> I fear we need a more complex solution, as mentioned by Micheal in the >>>>>>>>>>>>> thread you quoted. >>>>>>>>>>>> Got it. I also look forward to the more complex solution to this problem. >>>>>>>>>>> Can we add a device capability (new feature bit) such as ctrq_wait_timeout >>>>>>>>>>> to get a reasonable timeout? >>>>>>>>>> The usual solution to this is include/linux/iopoll.h. If you can sleep >>>>>>>>>> read_poll_timeout() otherwise read_poll_timeout_atomic(). >>>>>>>>> I read carefully the functions read_poll_timeout() and >>>>>>>>> read_poll_timeout_atomic(). The timeout is set by the caller of the 2 >>>>>>>>> functions. >>>>>>>> FYI, in order to avoid a swtich of atomic or not, we need convert rx >>>>>>>> mode setting to workqueue first: >>>>>>>> >>>>>>>> https://www.mail-archive.com/virtualization@lists.linux-foundation.org/msg60298.html >>>>>>>> >>>>>>>>> As such, can we add a module parameter to customize this timeout value >>>>>>>>> by the user? >>>>>>>> Who is the "user" here, or how can the "user" know the value? >>>>>>>> >>>>>>>>> Or this timeout value is stored in device register, virtio_net driver >>>>>>>>> will read this timeout value at initialization? >>>>>>>> See another thread. The design needs to be general, or you can post a RFC. >>>>>>>> >>>>>>>> In another thought, we've already had a tx watchdog, maybe we can have >>>>>>>> something similar to cvq and use timeout + reset in that case. >>>>>>> But we may block by the reset ^_^ if the device is broken? >>>>>> I mean vq reset here. >>>>> I see. >>>>> >>>>> I mean when the deivce is broken, the vq reset also many be blocked. >>>>> >>>>> void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) >>>>> { >>>>> struct virtio_pci_modern_common_cfg __iomem *cfg; >>>>> >>>>> cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; >>>>> >>>>> vp_iowrite16(index, &cfg->cfg.queue_select); >>>>> vp_iowrite16(1, &cfg->queue_reset); >>>>> >>>>> while (vp_ioread16(&cfg->queue_reset)) >>>>> msleep(1); >>>>> >>>>> while (vp_ioread16(&cfg->cfg.queue_enable)) >>>>> msleep(1); >>>>> } >>>>> EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); >>>>> >>>>> In this function, for the broken device, we can not expect something. >>>> Yes, it's best effort, there's no guarantee then. But it doesn't harm to try. >>>> >>>> Thanks >>>> >>>>>> It looks like we have multiple goals here >>>>>> >>>>>> 1) avoid lockups, using workqueue + cond_resched() seems to be >>>>>> sufficient, it has issue but nothing new >>>>>> 2) recover from the unresponsive device, the issue for timeout is that >>>>>> it needs to deal with false positives >>>>> I agree. >>>>> >>>>> But I want to add a new goal, cvq async. In the netdim, we will >>>>> send many requests via the cvq, so the cvq async will be nice. >>> Then you need an interrupt for cvq. >>> >>> FYI, I've posted a series that use interrupt for cvq in the past: >>> >>> https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ >> I know this. But the interrupt maybe not a good solution without new space. >> >>> Haven't found time in working on this anymore, maybe we can start from >>> this or not. >> I said async, but my aim is to put many requests to the cvq before getting the >> response. >> >> Heng Qi posted thishttps://lore.kernel.org/all/1705410693-118895-4-git-send-email-hengqi@linux.alibaba.com/ Sorry. This mail is rejected by netdev maillist. So I have to resend it. Thanks a lot. I read Heng Qi's commits carefully. This patch series are similiar with the NIC feature xmit_more. But if cvq command is urgent, can we let this urgent cvq command be passed ASAP? I mean, can we set a flag similar to xmit_more? if cvq command is not urgent, it can be queued. If it is urgent, this cvq command is passed ASAP. Zhu Yanjun > Zhu Yanjun > >> Thanks. >> >> >>> Thanks >>> >>>>> Thanks. >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>>> Thans >>>>>>>> >>>>>>>>> Zhu Yanjun >>>>>>>>> >>>>>>>>>> Andrew
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 51b1868d2f22..28b7dd917a43 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2468,7 +2468,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, { struct scatterlist *sgs[4], hdr, stat; unsigned out_num = 0, tmp; - int ret; + int ret, timeout = 200; /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); @@ -2502,8 +2502,14 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, * into the hypervisor, so the request should be handled immediately. */ while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) + !virtqueue_is_broken(vi->cvq)) { + if (timeout) + timeout--; + else + return false; /* long time no response */ + cpu_relax(); + } return vi->ctrl->status == VIRTIO_NET_OK; }