mbox series

[v3,0/3] virtio-net: disable delayed refill when pausing rx

Message ID 20250415074341.12461-1-minhquangbui99@gmail.com (mailing list archive)
Headers show
Series virtio-net: disable delayed refill when pausing rx | expand

Message

Bui Quang Minh April 15, 2025, 7:43 a.m. UTC
Hi everyone,

This series tries to fix a deadlock in virtio-net when binding/unbinding
XDP program, XDP socket or resizing the rx queue.

When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call
napi_disable() on the receive queue's napi. In delayed refill_work, it
also calls napi_disable() on the receive queue's napi. When
napi_disable() is called on an already disabled napi, it will sleep in
napi_disable_locked while still holding the netdev_lock. As a result,
later napi_enable gets stuck too as it cannot acquire the netdev_lock.
This leads to refill_work and the pause-then-resume tx are stuck
altogether.

This scenario can be reproducible by binding a XDP socket to virtio-net
interface without setting up the fill ring. As a result, try_fill_recv
will fail until the fill ring is set up and refill_work is scheduled.

This fix adds virtnet_rx_(pause/resume)_all helpers and fixes up the
virtnet_rx_resume to disable future and cancel all inflights delayed
refill_work before calling napi_disable() to pause the rx.

Version 3 changes:
- Patch 1: refactor to avoid code duplication

Version 2 changes:
- Add selftest for deadlock scenario

Thanks,
Quang Minh.

Bui Quang Minh (3):
  virtio-net: disable delayed refill when pausing rx
  selftests: net: move xdp_helper to net/lib
  selftests: net: add a virtio_net deadlock selftest

 drivers/net/virtio_net.c                      | 69 +++++++++++++++----
 tools/testing/selftests/Makefile              |  2 +-
 tools/testing/selftests/drivers/net/Makefile  |  2 -
 tools/testing/selftests/drivers/net/queues.py |  4 +-
 .../selftests/drivers/net/virtio_net/Makefile |  2 +
 .../selftests/drivers/net/virtio_net/config   |  1 +
 .../drivers/net/virtio_net/lib/py/__init__.py | 16 +++++
 .../drivers/net/virtio_net/xsk_pool.py        | 52 ++++++++++++++
 tools/testing/selftests/net/lib/.gitignore    |  1 +
 tools/testing/selftests/net/lib/Makefile      |  1 +
 .../{drivers/net => net/lib}/xdp_helper.c     |  0
 11 files changed, 133 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/drivers/net/virtio_net/lib/py/__init__.py
 create mode 100755 tools/testing/selftests/drivers/net/virtio_net/xsk_pool.py
 rename tools/testing/selftests/{drivers/net => net/lib}/xdp_helper.c (100%)

Comments

Michael S. Tsirkin April 15, 2025, 2:04 p.m. UTC | #1
On Tue, Apr 15, 2025 at 02:43:38PM +0700, Bui Quang Minh wrote:
> Hi everyone,
> 
> This series tries to fix a deadlock in virtio-net when binding/unbinding
> XDP program, XDP socket or resizing the rx queue.
> 
> When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call
> napi_disable() on the receive queue's napi. In delayed refill_work, it
> also calls napi_disable() on the receive queue's napi. When
> napi_disable() is called on an already disabled napi, it will sleep in
> napi_disable_locked while still holding the netdev_lock. As a result,
> later napi_enable gets stuck too as it cannot acquire the netdev_lock.
> This leads to refill_work and the pause-then-resume tx are stuck
> altogether.
> 
> This scenario can be reproducible by binding a XDP socket to virtio-net
> interface without setting up the fill ring. As a result, try_fill_recv
> will fail until the fill ring is set up and refill_work is scheduled.
> 
> This fix adds virtnet_rx_(pause/resume)_all helpers and fixes up the
> virtnet_rx_resume to disable future and cancel all inflights delayed
> refill_work before calling napi_disable() to pause the rx.
> 
> Version 3 changes:
> - Patch 1: refactor to avoid code duplication
> 
> Version 2 changes:
> - Add selftest for deadlock scenario
> 
> Thanks,
> Quang Minh.


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> Bui Quang Minh (3):
>   virtio-net: disable delayed refill when pausing rx
>   selftests: net: move xdp_helper to net/lib
>   selftests: net: add a virtio_net deadlock selftest
> 
>  drivers/net/virtio_net.c                      | 69 +++++++++++++++----
>  tools/testing/selftests/Makefile              |  2 +-
>  tools/testing/selftests/drivers/net/Makefile  |  2 -
>  tools/testing/selftests/drivers/net/queues.py |  4 +-
>  .../selftests/drivers/net/virtio_net/Makefile |  2 +
>  .../selftests/drivers/net/virtio_net/config   |  1 +
>  .../drivers/net/virtio_net/lib/py/__init__.py | 16 +++++
>  .../drivers/net/virtio_net/xsk_pool.py        | 52 ++++++++++++++
>  tools/testing/selftests/net/lib/.gitignore    |  1 +
>  tools/testing/selftests/net/lib/Makefile      |  1 +
>  .../{drivers/net => net/lib}/xdp_helper.c     |  0
>  11 files changed, 133 insertions(+), 17 deletions(-)
>  create mode 100644 tools/testing/selftests/drivers/net/virtio_net/lib/py/__init__.py
>  create mode 100755 tools/testing/selftests/drivers/net/virtio_net/xsk_pool.py
>  rename tools/testing/selftests/{drivers/net => net/lib}/xdp_helper.c (100%)
> 
> -- 
> 2.43.0