Message ID | 20210526082423.47837-5-mst@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | virtio net: spurious interrupt related fixes | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | fail | Errors and warnings before: 0 this patch: 1 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 31 lines checked |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 0 this patch: 1 |
netdev/header_inline | success | Link |
On 5/26/21 10:24 AM, Michael S. Tsirkin wrote: > There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with interrupts disabled but that is no longer the > case, and in some cases the rate of spurious interrupts is so high > linux detects this and actually kills the interrupt. > > Fix up by disabling the callbacks before polling the tx vq. It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ? This seems not needed, adding costs to sender thread, this might reduce the ability to use a different cpu for tx completions. Also this will likely conflict with BQL model if we want to use BQL at some point. > This probably needs a Fixes: tag > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/net/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c29f42d1e04f..a83dc038d8af 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > - free_old_xmit_skbs(sq, true); > + do { > + virtqueue_disable_cb(sq->vq); > + free_old_xmit_skbs(sq, true); > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; > > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq, false); > + do { > + if (use_napi) > + virtqueue_disable_cb(sq->vq); > > - if (use_napi && kick) > - virtqueue_enable_cb_delayed(sq->vq); > + free_old_xmit_skbs(sq, false); > + > + } while (use_napi && kick && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > /* timestamp packet in software */ > skb_tx_timestamp(skb); >
On Wed, 26 May 2021 04:24:43 -0400 Michael S. Tsirkin wrote: > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; FWIW GCC says bytes is unused.
On Wed, May 26, 2021 at 11:15 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 5/26/21 10:24 AM, Michael S. Tsirkin wrote: > > There are currently two cases where we poll TX vq not in response to a > > callback: start xmit and rx napi. We currently do this with callbacks > > enabled which can cause extra interrupts from the card. Used not to be > > a big issue as we run with interrupts disabled but that is no longer the > > case, and in some cases the rate of spurious interrupts is so high > > linux detects this and actually kills the interrupt. Temporarily disabling interrupts during free_old_xmit_skbs in virtnet_poll_cleantx might reduce the spurious interrupt rate by avoiding an additional Tx interrupt from being scheduled during virtnet_poll_cleantx. It probably does not address all spurious interrupts, as virtnet_poll_cleantx might also run in between the scheduling of the Tx interrupt and the call to virtnet_poll_tx, right? The Tx and Rx interrupts racing. If I can reproduce the report, I can also test how much this helps in practice. > > Fix up by disabling the callbacks before polling the tx vq. > > > It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ? Yes, we can simply exclude that. The original napi-tx patch did not make that change, but not for any strong reason.
在 2021/5/26 下午4:24, Michael S. Tsirkin 写道: > There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with interrupts disabled but that is no longer the > case, and in some cases the rate of spurious interrupts is so high > linux detects this and actually kills the interrupt. > > Fix up by disabling the callbacks before polling the tx vq. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/net/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c29f42d1e04f..a83dc038d8af 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > - free_old_xmit_skbs(sq, true); > + do { > + virtqueue_disable_cb(sq->vq); > + free_old_xmit_skbs(sq, true); > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; > > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq, false); > + do { > + if (use_napi) > + virtqueue_disable_cb(sq->vq); > > - if (use_napi && kick) > - virtqueue_enable_cb_delayed(sq->vq); > + free_old_xmit_skbs(sq, false); > + > + } while (use_napi && kick && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > /* timestamp packet in software */ > skb_tx_timestamp(skb); I wonder whehter we can simple disable cb during ndo_start_xmit(), or is there a way to make xmit and napi work in parallel? Thanks
Hi Michael, On 5/26/21 10:24, Michael S. Tsirkin wrote: > There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with interrupts disabled but that is no longer the > case, and in some cases the rate of spurious interrupts is so high > linux detects this and actually kills the interrupt. > > Fix up by disabling the callbacks before polling the tx vq. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/net/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c29f42d1e04f..a83dc038d8af 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > - free_old_xmit_skbs(sq, true); > + do { > + virtqueue_disable_cb(sq->vq); > + free_old_xmit_skbs(sq, true); > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; > > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq, false); > + do { > + if (use_napi) > + virtqueue_disable_cb(sq->vq); > > - if (use_napi && kick) > - virtqueue_enable_cb_delayed(sq->vq); > + free_old_xmit_skbs(sq, false); > + > + } while (use_napi && kick && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > /* timestamp packet in software */ > skb_tx_timestamp(skb); This patch seems to introduce a problem with QEMU connected to passt using netdev stream backend. When I run an iperf3 test I get after 1 or 2 seconds of test: [ 254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out ... [ 254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 8856000 usecs ago [ 259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 13951000 usecs ago In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed all the queue entries and re-enabled the queue notification with virtio_queue_set_notification() and tries to flush again the queue and as it is empty it does nothing more and then rely on a kernel notification to re-enable the bottom half function. As this notification never comes the queue is stuck and kernel add entries but QEMU doesn't remove them: 2812 static void virtio_net_tx_bh(void *opaque) 2813 { ... 2833 ret = virtio_net_flush_tx(q); -> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the function) ... 2850 virtio_queue_set_notification(q->tx_vq, 1); -> re-enable the queue notification 2851 ret = virtio_net_flush_tx(q); 2852 if (ret == -EINVAL) { 2853 return; 2854 } else if (ret > 0) { 2855 virtio_queue_set_notification(q->tx_vq, 0); 2856 qemu_bh_schedule(q->tx_bh); 2857 q->tx_waiting = 1; 2858 } -> ret is 0, exit the function without re-scheduling the function. ... 2859 } If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb aggressively")), it works fine. How to reproduce it: I start passt (https://passt.top/passt): passt -f and then QEMU qemu-system-x86_64 ... --netdev stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 Host side: sysctl -w net.core.rmem_max=134217728 sysctl -w net.core.wmem_max=134217728 iperf3 -s Guest side: sysctl -w net.core.rmem_max=536870912 sysctl -w net.core.wmem_max=536870912 ip link set dev $DEV mtu 256 iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M Any idea of what is the problem? Thanks, Laurent
On Mon, Jan 16, 2023 at 9:41 PM Laurent Vivier <lvivier@redhat.com> wrote: > > Hi Michael, > > On 5/26/21 10:24, Michael S. Tsirkin wrote: > > There are currently two cases where we poll TX vq not in response to a > > callback: start xmit and rx napi. We currently do this with callbacks > > enabled which can cause extra interrupts from the card. Used not to be > > a big issue as we run with interrupts disabled but that is no longer the > > case, and in some cases the rate of spurious interrupts is so high > > linux detects this and actually kills the interrupt. > > > > Fix up by disabling the callbacks before polling the tx vq. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > drivers/net/virtio_net.c | 16 ++++++++++++---- > > 1 file changed, 12 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c29f42d1e04f..a83dc038d8af 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > > return; > > > > if (__netif_tx_trylock(txq)) { > > - free_old_xmit_skbs(sq, true); > > + do { > > + virtqueue_disable_cb(sq->vq); > > + free_old_xmit_skbs(sq, true); > > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > > netif_tx_wake_queue(txq); > > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > > bool kick = !netdev_xmit_more(); > > bool use_napi = sq->napi.weight; > > + unsigned int bytes = skb->len; > > > > /* Free up any pending old buffers before queueing new ones. */ > > - free_old_xmit_skbs(sq, false); > > + do { > > + if (use_napi) > > + virtqueue_disable_cb(sq->vq); > > > > - if (use_napi && kick) > > - virtqueue_enable_cb_delayed(sq->vq); > > + free_old_xmit_skbs(sq, false); > > + > > + } while (use_napi && kick && > > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > > > /* timestamp packet in software */ > > skb_tx_timestamp(skb); > > This patch seems to introduce a problem with QEMU connected to passt using netdev stream > backend. > > When I run an iperf3 test I get after 1 or 2 seconds of test: > > [ 254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out > ... > [ 254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, > name: output.0, 8856000 usecs ago > [ 259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, > name: output.0, 13951000 usecs ago > > In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed > all the queue entries and re-enabled the queue notification with > virtio_queue_set_notification() and tries to flush again the queue and as it is empty it > does nothing more and then rely on a kernel notification to re-enable the bottom half > function. As this notification never comes the queue is stuck and kernel add entries but > QEMU doesn't remove them: > > 2812 static void virtio_net_tx_bh(void *opaque) > 2813 { > ... > 2833 ret = virtio_net_flush_tx(q); > > -> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the > function) > > ... > 2850 virtio_queue_set_notification(q->tx_vq, 1); > > -> re-enable the queue notification > > 2851 ret = virtio_net_flush_tx(q); > 2852 if (ret == -EINVAL) { > 2853 return; > 2854 } else if (ret > 0) { > 2855 virtio_queue_set_notification(q->tx_vq, 0); > 2856 qemu_bh_schedule(q->tx_bh); > 2857 q->tx_waiting = 1; > 2858 } > > -> ret is 0, exit the function without re-scheduling the function. > ... > 2859 } > > If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb > aggressively")), it works fine. > > How to reproduce it: > > I start passt (https://passt.top/passt): > > passt -f > > and then QEMU > > qemu-system-x86_64 ... --netdev > stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device > virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 > > Host side: > > sysctl -w net.core.rmem_max=134217728 > sysctl -w net.core.wmem_max=134217728 > iperf3 -s > > Guest side: > > sysctl -w net.core.rmem_max=536870912 > sysctl -w net.core.wmem_max=536870912 > > ip link set dev $DEV mtu 256 > iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M > > Any idea of what is the problem? This looks similar to what I spot and try to fix in: [PATCH net V3] virtio-net: correctly enable callback during start_xmit (I've cced you in this version). Thanks > > Thanks, > Laurent > >
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c29f42d1e04f..a83dc038d8af 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) return; if (__netif_tx_trylock(txq)) { - free_old_xmit_skbs(sq, true); + do { + virtqueue_disable_cb(sq->vq); + free_old_xmit_skbs(sq, true); + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) netif_tx_wake_queue(txq); @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !netdev_xmit_more(); bool use_napi = sq->napi.weight; + unsigned int bytes = skb->len; /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(sq, false); + do { + if (use_napi) + virtqueue_disable_cb(sq->vq); - if (use_napi && kick) - virtqueue_enable_cb_delayed(sq->vq); + free_old_xmit_skbs(sq, false); + + } while (use_napi && kick && + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); /* timestamp packet in software */ skb_tx_timestamp(skb);
There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case, and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt. Fix up by disabling the callbacks before polling the tx vq. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- drivers/net/virtio_net.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)