diff mbox series

[net] virtio-net: suppress bad irq warning for tx napi

Message ID 20210129002136.70865-1-weiwan@google.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] virtio-net: suppress bad irq warning for tx napi | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: jasowang@redhat.com mst@redhat.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 49 this patch: 49
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 102 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 40 this patch: 40
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Wei Wang Jan. 29, 2021, 12:21 a.m. UTC
With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  <IRQ>
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  </IRQ>
[ 3588.017989] handlers:
[ 3588.020374] [<000000001b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones <jonesrick@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c     | 19 ++++++++++++++-----
 drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
 include/linux/virtio.h       |  2 ++
 3 files changed, 32 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski Feb. 2, 2021, 2:24 a.m. UTC | #1
On Thu, 28 Jan 2021 16:21:36 -0800 Wei Wang wrote:
> With the implementation of napi-tx in virtio driver, we clean tx
> descriptors from rx napi handler, for the purpose of reducing tx
> complete interrupts. But this could introduce a race where tx complete
> interrupt has been raised, but the handler found there is no work to do
> because we have done the work in the previous rx interrupt handler.
> This could lead to the following warning msg:
> [ 3588.010778] irq 38: nobody cared (try booting with the
> "irqpoll" option)
> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.3.0-19-generic #20~18.04.2-Ubuntu
> [ 3588.017940] Call Trace:
> [ 3588.017942]  <IRQ>
> [ 3588.017951]  dump_stack+0x63/0x85
> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> [ 3588.017957]  handle_irq_event+0x3b/0x60
> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> [ 3588.017961]  handle_irq+0x20/0x30
> [ 3588.017964]  do_IRQ+0x50/0xe0
> [ 3588.017966]  common_interrupt+0xf/0xf
> [ 3588.017966]  </IRQ>
> [ 3588.017989] handlers:
> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> [ 3588.025099] Disabling IRQ #38
> 
> This patch adds a new param to struct vring_virtqueue, and we set it for
> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> case.
> 
> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> Reported-by: Rick Jones <jonesrick@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Michael, Jason, does this one look okay to you?
Jason Wang Feb. 2, 2021, 3:06 a.m. UTC | #2
On 2021/1/29 上午8:21, Wei Wang wrote:
> With the implementation of napi-tx in virtio driver, we clean tx
> descriptors from rx napi handler, for the purpose of reducing tx
> complete interrupts. But this could introduce a race where tx complete
> interrupt has been raised, but the handler found there is no work to do
> because we have done the work in the previous rx interrupt handler.
> This could lead to the following warning msg:
> [ 3588.010778] irq 38: nobody cared (try booting with the
> "irqpoll" option)
> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.3.0-19-generic #20~18.04.2-Ubuntu
> [ 3588.017940] Call Trace:
> [ 3588.017942]  <IRQ>
> [ 3588.017951]  dump_stack+0x63/0x85
> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> [ 3588.017957]  handle_irq_event+0x3b/0x60
> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> [ 3588.017961]  handle_irq+0x20/0x30
> [ 3588.017964]  do_IRQ+0x50/0xe0
> [ 3588.017966]  common_interrupt+0xf/0xf
> [ 3588.017966]  </IRQ>
> [ 3588.017989] handlers:
> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> [ 3588.025099] Disabling IRQ #38
>
> This patch adds a new param to struct vring_virtqueue, and we set it for
> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> case.
>
> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> Reported-by: Rick Jones <jonesrick@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>


Please use get_maintainer.pl to make sure Michael and me were cced.


> ---
>   drivers/net/virtio_net.c     | 19 ++++++++++++++-----
>   drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>   include/linux/virtio.h       |  2 ++
>   3 files changed, 32 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 508408fbe78f..e9a3f30864e8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>   		return;
>   	}
>   
> +	/* With napi_tx enabled, free_old_xmit_skbs() could be called from
> +	 * rx napi handler. Set work_steal to suppress bad irq warning for
> +	 * IRQ_NONE case from tx complete interrupt handler.
> +	 */
> +	virtqueue_set_work_steal(vq, true);
> +
>   	return virtnet_napi_enable(vq, napi);


Do we need to force the ordering between steal set and napi enable?


>   }
>   
> -static void virtnet_napi_tx_disable(struct napi_struct *napi)
> +static void virtnet_napi_tx_disable(struct virtqueue *vq,
> +				    struct napi_struct *napi)
>   {
> -	if (napi->weight)
> +	if (napi->weight) {
>   		napi_disable(napi);
> +		virtqueue_set_work_steal(vq, false);
> +	}
>   }
>   
>   static void refill_work(struct work_struct *work)
> @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
>   		napi_disable(&vi->rq[i].napi);
> -		virtnet_napi_tx_disable(&vi->sq[i].napi);
> +		virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>   	}
>   
>   	return 0;
> @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>   	if (netif_running(vi->dev)) {
>   		for (i = 0; i < vi->max_queue_pairs; i++) {
>   			napi_disable(&vi->rq[i].napi);
> -			virtnet_napi_tx_disable(&vi->sq[i].napi);
> +			virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>   		}
>   	}
>   }
> @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>   	if (netif_running(dev)) {
>   		for (i = 0; i < vi->max_queue_pairs; i++) {
>   			napi_disable(&vi->rq[i].napi);
> -			virtnet_napi_tx_disable(&vi->sq[i].napi);
> +			virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>   		}
>   	}
>   
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f7c5d697c302 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -105,6 +105,9 @@ struct vring_virtqueue {
>   	/* Host publishes avail event idx */
>   	bool event;
>   
> +	/* Tx side napi work could be done from rx side. */
> +	bool work_steal;


So vring_vritqueue is a general structure, let's avoid mentioning 
network specific stuffs here. And we need a better name like 
"no_interrupt_check"?

And we need a separate patch for virtio core changes.


> +
>   	/* Head of free buffer list. */
>   	unsigned int free_head;
>   	/* Number we've added since last sync. */
> @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   	vq->notify = notify;
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
> +	vq->work_steal = false;
>   	vq->last_used_idx = 0;
>   	vq->num_added = 0;
>   	vq->packed_ring = true;
> @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   
>   	if (!more_used(vq)) {
>   		pr_debug("virtqueue interrupt with no work for %p\n", vq);


Do we still need to keep this warning?


> +		if (vq->work_steal)
> +			return IRQ_HANDLED;


So I wonder instead of doing trick like this, maybe it's time to unify 
TX/RX NAPI with the help of[1] (virtio-net use queue pairs).

Thanks

[1] https://lkml.org/lkml/2014/12/25/169

> +
>   		return IRQ_NONE;
>   	}
>   
> @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->notify = notify;
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
> +	vq->work_steal = false;
>   	vq->last_used_idx = 0;
>   	vq->num_added = 0;
>   	vq->use_dma_api = vring_use_dma_api(vdev);
> @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_is_broken);
>   
> +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	vq->work_steal = val;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal);
> +
>   /*
>    * This should prevent the device from being used, allowing drivers to
>    * recover.  You may need to grab appropriate locks to flush.
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 55ea329fe72a..091c30f21ff9 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
>   
>   bool virtqueue_is_broken(struct virtqueue *vq);
>   
> +void virtqueue_set_work_steal(struct virtqueue *vq, bool val);
> +
>   const struct vring *virtqueue_get_vring(struct virtqueue *vq);
>   dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq);
>   dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);
Willem de Bruijn Feb. 2, 2021, 2:37 p.m. UTC | #3
On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/1/29 上午8:21, Wei Wang wrote:
> > With the implementation of napi-tx in virtio driver, we clean tx
> > descriptors from rx napi handler, for the purpose of reducing tx
> > complete interrupts. But this could introduce a race where tx complete
> > interrupt has been raised, but the handler found there is no work to do
> > because we have done the work in the previous rx interrupt handler.
> > This could lead to the following warning msg:
> > [ 3588.010778] irq 38: nobody cared (try booting with the
> > "irqpoll" option)
> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > [ 3588.017940] Call Trace:
> > [ 3588.017942]  <IRQ>
> > [ 3588.017951]  dump_stack+0x63/0x85
> > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > [ 3588.017961]  handle_irq+0x20/0x30
> > [ 3588.017964]  do_IRQ+0x50/0xe0
> > [ 3588.017966]  common_interrupt+0xf/0xf
> > [ 3588.017966]  </IRQ>
> > [ 3588.017989] handlers:
> > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > [ 3588.025099] Disabling IRQ #38
> >
> > This patch adds a new param to struct vring_virtqueue, and we set it for
> > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > case.
> >
> > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > Reported-by: Rick Jones <jonesrick@google.com>
> > Signed-off-by: Wei Wang <weiwan@google.com>
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
>
>
> Please use get_maintainer.pl to make sure Michael and me were cced.

Will do. Sorry about that. I suggested just the virtualization list, my bad.

>
> > ---
> >   drivers/net/virtio_net.c     | 19 ++++++++++++++-----
> >   drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
> >   include/linux/virtio.h       |  2 ++
> >   3 files changed, 32 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 508408fbe78f..e9a3f30864e8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
> >               return;
> >       }
> >
> > +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> > +      * rx napi handler. Set work_steal to suppress bad irq warning for
> > +      * IRQ_NONE case from tx complete interrupt handler.
> > +      */
> > +     virtqueue_set_work_steal(vq, true);
> > +
> >       return virtnet_napi_enable(vq, napi);
>
>
> Do we need to force the ordering between steal set and napi enable?

The warning only occurs after one hundred spurious interrupts, so not
really.

>
> >   }
> >
> > -static void virtnet_napi_tx_disable(struct napi_struct *napi)
> > +static void virtnet_napi_tx_disable(struct virtqueue *vq,
> > +                                 struct napi_struct *napi)
> >   {
> > -     if (napi->weight)
> > +     if (napi->weight) {
> >               napi_disable(napi);
> > +             virtqueue_set_work_steal(vq, false);
> > +     }
> >   }
> >
> >   static void refill_work(struct work_struct *work)
> > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
> >       for (i = 0; i < vi->max_queue_pairs; i++) {
> >               xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> >               napi_disable(&vi->rq[i].napi);
> > -             virtnet_napi_tx_disable(&vi->sq[i].napi);
> > +             virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
> >       }
> >
> >       return 0;
> > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
> >       if (netif_running(vi->dev)) {
> >               for (i = 0; i < vi->max_queue_pairs; i++) {
> >                       napi_disable(&vi->rq[i].napi);
> > -                     virtnet_napi_tx_disable(&vi->sq[i].napi);
> > +                     virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
> >               }
> >       }
> >   }
> > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> >       if (netif_running(dev)) {
> >               for (i = 0; i < vi->max_queue_pairs; i++) {
> >                       napi_disable(&vi->rq[i].napi);
> > -                     virtnet_napi_tx_disable(&vi->sq[i].napi);
> > +                     virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
> >               }
> >       }
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71e16b53e9c1..f7c5d697c302 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -105,6 +105,9 @@ struct vring_virtqueue {
> >       /* Host publishes avail event idx */
> >       bool event;
> >
> > +     /* Tx side napi work could be done from rx side. */
> > +     bool work_steal;
>
>
> So vring_vritqueue is a general structure, let's avoid mentioning
> network specific stuffs here. And we need a better name like
> "no_interrupt_check"?
>
> And we need a separate patch for virtio core changes.

Ack. Will change.

>
> > +
> >       /* Head of free buffer list. */
> >       unsigned int free_head;
> >       /* Number we've added since last sync. */
> > @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> >       vq->broken = false;
> > +     vq->work_steal = false;
> >       vq->last_used_idx = 0;
> >       vq->num_added = 0;
> >       vq->packed_ring = true;
> > @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >
> >       if (!more_used(vq)) {
> >               pr_debug("virtqueue interrupt with no work for %p\n", vq);
>
>
> Do we still need to keep this warning?

Come to think of it, I would say no, in this case.

>
>
> > +             if (vq->work_steal)
> > +                     return IRQ_HANDLED;
>
>
> So I wonder instead of doing trick like this, maybe it's time to unify
> TX/RX NAPI with the help of[1] (virtio-net use queue pairs).
>
> Thanks
>
> [1] https://lkml.org/lkml/2014/12/25/169

Interesting idea. It does sound like a good fit for this model. The
patch in the Fixes line proved effective at suppressing unnecessary TX
interrupts when processing in RX interrupt handler. So not sure how
much will help in practice. Might be a nice project to evaluate
separate for net-next at some point.

Thanks for the review!
Michael S. Tsirkin Feb. 2, 2021, 11:11 p.m. UTC | #4
On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> With the implementation of napi-tx in virtio driver, we clean tx
> descriptors from rx napi handler, for the purpose of reducing tx
> complete interrupts. But this could introduce a race where tx complete
> interrupt has been raised, but the handler found there is no work to do
> because we have done the work in the previous rx interrupt handler.
> This could lead to the following warning msg:
> [ 3588.010778] irq 38: nobody cared (try booting with the
> "irqpoll" option)
> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.3.0-19-generic #20~18.04.2-Ubuntu
> [ 3588.017940] Call Trace:
> [ 3588.017942]  <IRQ>
> [ 3588.017951]  dump_stack+0x63/0x85
> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> [ 3588.017957]  handle_irq_event+0x3b/0x60
> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> [ 3588.017961]  handle_irq+0x20/0x30
> [ 3588.017964]  do_IRQ+0x50/0xe0
> [ 3588.017966]  common_interrupt+0xf/0xf
> [ 3588.017966]  </IRQ>
> [ 3588.017989] handlers:
> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> [ 3588.025099] Disabling IRQ #38
> 
> This patch adds a new param to struct vring_virtqueue, and we set it for
> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> case.
> 
> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> Reported-by: Rick Jones <jonesrick@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>


This description does not make sense to me.

irq X: nobody cared
only triggers after an interrupt is unhandled repeatedly.

So something causes a storm of useless tx interrupts here.

Let's find out what it was please. What you are doing is
just preventing linux from complaining.



> ---
>  drivers/net/virtio_net.c     | 19 ++++++++++++++-----
>  drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  3 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 508408fbe78f..e9a3f30864e8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>  		return;
>  	}
>  
> +	/* With napi_tx enabled, free_old_xmit_skbs() could be called from
> +	 * rx napi handler. Set work_steal to suppress bad irq warning for
> +	 * IRQ_NONE case from tx complete interrupt handler.
> +	 */
> +	virtqueue_set_work_steal(vq, true);
> +
>  	return virtnet_napi_enable(vq, napi);
>  }
>  
> -static void virtnet_napi_tx_disable(struct napi_struct *napi)
> +static void virtnet_napi_tx_disable(struct virtqueue *vq,
> +				    struct napi_struct *napi)
>  {
> -	if (napi->weight)
> +	if (napi->weight) {
>  		napi_disable(napi);
> +		virtqueue_set_work_steal(vq, false);
> +	}
>  }
>  
>  static void refill_work(struct work_struct *work)
> @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
>  		napi_disable(&vi->rq[i].napi);
> -		virtnet_napi_tx_disable(&vi->sq[i].napi);
> +		virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>  	}
>  
>  	return 0;
> @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  	if (netif_running(vi->dev)) {
>  		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			napi_disable(&vi->rq[i].napi);
> -			virtnet_napi_tx_disable(&vi->sq[i].napi);
> +			virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>  		}
>  	}
>  }
> @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  	if (netif_running(dev)) {
>  		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			napi_disable(&vi->rq[i].napi);
> -			virtnet_napi_tx_disable(&vi->sq[i].napi);
> +			virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>  		}
>  	}
>  
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f7c5d697c302 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -105,6 +105,9 @@ struct vring_virtqueue {
>  	/* Host publishes avail event idx */
>  	bool event;
>  
> +	/* Tx side napi work could be done from rx side. */
> +	bool work_steal;
> +
>  	/* Head of free buffer list. */
>  	unsigned int free_head;
>  	/* Number we've added since last sync. */
> @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
>  	vq->broken = false;
> +	vq->work_steal = false;
>  	vq->last_used_idx = 0;
>  	vq->num_added = 0;
>  	vq->packed_ring = true;
> @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  
>  	if (!more_used(vq)) {
>  		pr_debug("virtqueue interrupt with no work for %p\n", vq);
> +		if (vq->work_steal)
> +			return IRQ_HANDLED;
> +
>  		return IRQ_NONE;
>  	}
>  
> @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->notify = notify;
>  	vq->weak_barriers = weak_barriers;
>  	vq->broken = false;
> +	vq->work_steal = false;
>  	vq->last_used_idx = 0;
>  	vq->num_added = 0;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
> @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_is_broken);
>  
> +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	vq->work_steal = val;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal);
> +
>  /*
>   * This should prevent the device from being used, allowing drivers to
>   * recover.  You may need to grab appropriate locks to flush.
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 55ea329fe72a..091c30f21ff9 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
>  
>  bool virtqueue_is_broken(struct virtqueue *vq);
>  
> +void virtqueue_set_work_steal(struct virtqueue *vq, bool val);
> +
>  const struct vring *virtqueue_get_vring(struct virtqueue *vq);
>  dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq);
>  dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);
> -- 
> 2.30.0.365.g02bc693789-goog
>
Wei Wang Feb. 2, 2021, 11:47 p.m. UTC | #5
On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > With the implementation of napi-tx in virtio driver, we clean tx
> > descriptors from rx napi handler, for the purpose of reducing tx
> > complete interrupts. But this could introduce a race where tx complete
> > interrupt has been raised, but the handler found there is no work to do
> > because we have done the work in the previous rx interrupt handler.
> > This could lead to the following warning msg:
> > [ 3588.010778] irq 38: nobody cared (try booting with the
> > "irqpoll" option)
> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > [ 3588.017940] Call Trace:
> > [ 3588.017942]  <IRQ>
> > [ 3588.017951]  dump_stack+0x63/0x85
> > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > [ 3588.017961]  handle_irq+0x20/0x30
> > [ 3588.017964]  do_IRQ+0x50/0xe0
> > [ 3588.017966]  common_interrupt+0xf/0xf
> > [ 3588.017966]  </IRQ>
> > [ 3588.017989] handlers:
> > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > [ 3588.025099] Disabling IRQ #38
> >
> > This patch adds a new param to struct vring_virtqueue, and we set it for
> > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > case.
> >
> > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > Reported-by: Rick Jones <jonesrick@google.com>
> > Signed-off-by: Wei Wang <weiwan@google.com>
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
>
>
> This description does not make sense to me.
>
> irq X: nobody cared
> only triggers after an interrupt is unhandled repeatedly.
>
> So something causes a storm of useless tx interrupts here.
>
> Let's find out what it was please. What you are doing is
> just preventing linux from complaining.

The traffic that causes this warning is a netperf tcp_stream with at
least 128 flows between 2 hosts. And the warning gets triggered on the
receiving host, which has a lot of rx interrupts firing on all queues,
and a few tx interrupts.
And I think the scenario is: when the tx interrupt gets fired, it gets
coalesced with the rx interrupt. Basically, the rx and tx interrupts
get triggered very close to each other, and gets handled in one round
of do_IRQ(). And the rx irq handler gets called first, which calls
virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
to try to do the work on the corresponding tx queue as well. That's
why when tx interrupt handler gets called, it sees no work to do.
And the reason for the rx handler to handle the tx work is here:
https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html

>
>
>
> > ---
> >  drivers/net/virtio_net.c     | 19 ++++++++++++++-----
> >  drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
> >  include/linux/virtio.h       |  2 ++
> >  3 files changed, 32 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 508408fbe78f..e9a3f30864e8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
> >               return;
> >       }
> >
> > +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> > +      * rx napi handler. Set work_steal to suppress bad irq warning for
> > +      * IRQ_NONE case from tx complete interrupt handler.
> > +      */
> > +     virtqueue_set_work_steal(vq, true);
> > +
> >       return virtnet_napi_enable(vq, napi);
> >  }
> >
> > -static void virtnet_napi_tx_disable(struct napi_struct *napi)
> > +static void virtnet_napi_tx_disable(struct virtqueue *vq,
> > +                                 struct napi_struct *napi)
> >  {
> > -     if (napi->weight)
> > +     if (napi->weight) {
> >               napi_disable(napi);
> > +             virtqueue_set_work_steal(vq, false);
> > +     }
> >  }
> >
> >  static void refill_work(struct work_struct *work)
> > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
> >       for (i = 0; i < vi->max_queue_pairs; i++) {
> >               xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
> >               napi_disable(&vi->rq[i].napi);
> > -             virtnet_napi_tx_disable(&vi->sq[i].napi);
> > +             virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
> >       }
> >
> >       return 0;
> > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
> >       if (netif_running(vi->dev)) {
> >               for (i = 0; i < vi->max_queue_pairs; i++) {
> >                       napi_disable(&vi->rq[i].napi);
> > -                     virtnet_napi_tx_disable(&vi->sq[i].napi);
> > +                     virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
> >               }
> >       }
> >  }
> > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> >       if (netif_running(dev)) {
> >               for (i = 0; i < vi->max_queue_pairs; i++) {
> >                       napi_disable(&vi->rq[i].napi);
> > -                     virtnet_napi_tx_disable(&vi->sq[i].napi);
> > +                     virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
> >               }
> >       }
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71e16b53e9c1..f7c5d697c302 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -105,6 +105,9 @@ struct vring_virtqueue {
> >       /* Host publishes avail event idx */
> >       bool event;
> >
> > +     /* Tx side napi work could be done from rx side. */
> > +     bool work_steal;
> > +
> >       /* Head of free buffer list. */
> >       unsigned int free_head;
> >       /* Number we've added since last sync. */
> > @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> >       vq->broken = false;
> > +     vq->work_steal = false;
> >       vq->last_used_idx = 0;
> >       vq->num_added = 0;
> >       vq->packed_ring = true;
> > @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >
> >       if (!more_used(vq)) {
> >               pr_debug("virtqueue interrupt with no work for %p\n", vq);
> > +             if (vq->work_steal)
> > +                     return IRQ_HANDLED;
> > +
> >               return IRQ_NONE;
> >       }
> >
> > @@ -2082,6 +2089,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >       vq->notify = notify;
> >       vq->weak_barriers = weak_barriers;
> >       vq->broken = false;
> > +     vq->work_steal = false;
> >       vq->last_used_idx = 0;
> >       vq->num_added = 0;
> >       vq->use_dma_api = vring_use_dma_api(vdev);
> > @@ -2266,6 +2274,14 @@ bool virtqueue_is_broken(struct virtqueue *_vq)
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_is_broken);
> >
> > +void virtqueue_set_work_steal(struct virtqueue *_vq, bool val)
> > +{
> > +     struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +     vq->work_steal = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_set_work_steal);
> > +
> >  /*
> >   * This should prevent the device from being used, allowing drivers to
> >   * recover.  You may need to grab appropriate locks to flush.
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 55ea329fe72a..091c30f21ff9 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -84,6 +84,8 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
> >
> >  bool virtqueue_is_broken(struct virtqueue *vq);
> >
> > +void virtqueue_set_work_steal(struct virtqueue *vq, bool val);
> > +
> >  const struct vring *virtqueue_get_vring(struct virtqueue *vq);
> >  dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq);
> >  dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);
> > --
> > 2.30.0.365.g02bc693789-goog
> >
>
Willem de Bruijn Feb. 2, 2021, 11:53 p.m. UTC | #6
On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
>
> On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > With the implementation of napi-tx in virtio driver, we clean tx
> > > descriptors from rx napi handler, for the purpose of reducing tx
> > > complete interrupts. But this could introduce a race where tx complete
> > > interrupt has been raised, but the handler found there is no work to do
> > > because we have done the work in the previous rx interrupt handler.
> > > This could lead to the following warning msg:
> > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > "irqpoll" option)
> > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > [ 3588.017940] Call Trace:
> > > [ 3588.017942]  <IRQ>
> > > [ 3588.017951]  dump_stack+0x63/0x85
> > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > [ 3588.017961]  handle_irq+0x20/0x30
> > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > [ 3588.017966]  </IRQ>
> > > [ 3588.017989] handlers:
> > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > [ 3588.025099] Disabling IRQ #38
> > >
> > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > case.
> > >
> > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > Reported-by: Rick Jones <jonesrick@google.com>
> > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> >
> >
> > This description does not make sense to me.
> >
> > irq X: nobody cared
> > only triggers after an interrupt is unhandled repeatedly.
> >
> > So something causes a storm of useless tx interrupts here.
> >
> > Let's find out what it was please. What you are doing is
> > just preventing linux from complaining.
>
> The traffic that causes this warning is a netperf tcp_stream with at
> least 128 flows between 2 hosts. And the warning gets triggered on the
> receiving host, which has a lot of rx interrupts firing on all queues,
> and a few tx interrupts.
> And I think the scenario is: when the tx interrupt gets fired, it gets
> coalesced with the rx interrupt. Basically, the rx and tx interrupts
> get triggered very close to each other, and gets handled in one round
> of do_IRQ(). And the rx irq handler gets called first, which calls
> virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> to try to do the work on the corresponding tx queue as well. That's
> why when tx interrupt handler gets called, it sees no work to do.
> And the reason for the rx handler to handle the tx work is here:
> https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html

Indeed. It's not a storm necessarily. The warning occurs after one
hundred such events, since boot, which is a small number compared real
interrupt load.

Occasionally seeing an interrupt with no work is expected after
7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
long as this rate of events is very low compared to useful interrupts,
and total interrupt count is greatly reduced vs not having work
stealing, it is a net win.
Willem de Bruijn Feb. 3, 2021, 12:06 a.m. UTC | #7
On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
>
> On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> >
> > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > complete interrupts. But this could introduce a race where tx complete
> > > > interrupt has been raised, but the handler found there is no work to do
> > > > because we have done the work in the previous rx interrupt handler.
> > > > This could lead to the following warning msg:
> > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > "irqpoll" option)
> > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > [ 3588.017940] Call Trace:
> > > > [ 3588.017942]  <IRQ>
> > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > [ 3588.017966]  </IRQ>
> > > > [ 3588.017989] handlers:
> > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > [ 3588.025099] Disabling IRQ #38
> > > >
> > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > case.
> > > >
> > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > >
> > >
> > > This description does not make sense to me.
> > >
> > > irq X: nobody cared
> > > only triggers after an interrupt is unhandled repeatedly.
> > >
> > > So something causes a storm of useless tx interrupts here.
> > >
> > > Let's find out what it was please. What you are doing is
> > > just preventing linux from complaining.
> >
> > The traffic that causes this warning is a netperf tcp_stream with at
> > least 128 flows between 2 hosts. And the warning gets triggered on the
> > receiving host, which has a lot of rx interrupts firing on all queues,
> > and a few tx interrupts.
> > And I think the scenario is: when the tx interrupt gets fired, it gets
> > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > get triggered very close to each other, and gets handled in one round
> > of do_IRQ(). And the rx irq handler gets called first, which calls
> > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > to try to do the work on the corresponding tx queue as well. That's
> > why when tx interrupt handler gets called, it sees no work to do.
> > And the reason for the rx handler to handle the tx work is here:
> > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
>
> Indeed. It's not a storm necessarily. The warning occurs after one
> hundred such events, since boot, which is a small number compared real
> interrupt load.

Sorry, this is wrong. It is the other call to __report_bad_irq from
note_interrupt that applies here.

> Occasionally seeing an interrupt with no work is expected after
> 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> long as this rate of events is very low compared to useful interrupts,
> and total interrupt count is greatly reduced vs not having work
> stealing, it is a net win.
Jason Wang Feb. 3, 2021, 5:33 a.m. UTC | #8
On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/1/29 上午8:21, Wei Wang wrote:
>>> With the implementation of napi-tx in virtio driver, we clean tx
>>> descriptors from rx napi handler, for the purpose of reducing tx
>>> complete interrupts. But this could introduce a race where tx complete
>>> interrupt has been raised, but the handler found there is no work to do
>>> because we have done the work in the previous rx interrupt handler.
>>> This could lead to the following warning msg:
>>> [ 3588.010778] irq 38: nobody cared (try booting with the
>>> "irqpoll" option)
>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
>>> 5.3.0-19-generic #20~18.04.2-Ubuntu
>>> [ 3588.017940] Call Trace:
>>> [ 3588.017942]  <IRQ>
>>> [ 3588.017951]  dump_stack+0x63/0x85
>>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
>>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
>>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
>>> [ 3588.017957]  handle_irq_event+0x3b/0x60
>>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
>>> [ 3588.017961]  handle_irq+0x20/0x30
>>> [ 3588.017964]  do_IRQ+0x50/0xe0
>>> [ 3588.017966]  common_interrupt+0xf/0xf
>>> [ 3588.017966]  </IRQ>
>>> [ 3588.017989] handlers:
>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
>>> [ 3588.025099] Disabling IRQ #38
>>>
>>> This patch adds a new param to struct vring_virtqueue, and we set it for
>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
>>> case.
>>>
>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
>>> Reported-by: Rick Jones <jonesrick@google.com>
>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>
>> Please use get_maintainer.pl to make sure Michael and me were cced.
> Will do. Sorry about that. I suggested just the virtualization list, my bad.
>
>>> ---
>>>    drivers/net/virtio_net.c     | 19 ++++++++++++++-----
>>>    drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>>>    include/linux/virtio.h       |  2 ++
>>>    3 files changed, 32 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 508408fbe78f..e9a3f30864e8 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>>>                return;
>>>        }
>>>
>>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
>>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
>>> +      * IRQ_NONE case from tx complete interrupt handler.
>>> +      */
>>> +     virtqueue_set_work_steal(vq, true);
>>> +
>>>        return virtnet_napi_enable(vq, napi);
>>
>> Do we need to force the ordering between steal set and napi enable?
> The warning only occurs after one hundred spurious interrupts, so not
> really.


Ok, so it looks like a hint. Then I wonder how much value do we need to 
introduce helper like virtqueue_set_work_steal() that allows the caller 
to toggle. How about disable the check forever during virtqueue 
initialization?


>
>>>    }
>>>
>>> -static void virtnet_napi_tx_disable(struct napi_struct *napi)
>>> +static void virtnet_napi_tx_disable(struct virtqueue *vq,
>>> +                                 struct napi_struct *napi)
>>>    {
>>> -     if (napi->weight)
>>> +     if (napi->weight) {
>>>                napi_disable(napi);
>>> +             virtqueue_set_work_steal(vq, false);
>>> +     }
>>>    }
>>>
>>>    static void refill_work(struct work_struct *work)
>>> @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
>>>        for (i = 0; i < vi->max_queue_pairs; i++) {
>>>                xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
>>>                napi_disable(&vi->rq[i].napi);
>>> -             virtnet_napi_tx_disable(&vi->sq[i].napi);
>>> +             virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>>>        }
>>>
>>>        return 0;
>>> @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>>>        if (netif_running(vi->dev)) {
>>>                for (i = 0; i < vi->max_queue_pairs; i++) {
>>>                        napi_disable(&vi->rq[i].napi);
>>> -                     virtnet_napi_tx_disable(&vi->sq[i].napi);
>>> +                     virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>>>                }
>>>        }
>>>    }
>>> @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>>        if (netif_running(dev)) {
>>>                for (i = 0; i < vi->max_queue_pairs; i++) {
>>>                        napi_disable(&vi->rq[i].napi);
>>> -                     virtnet_napi_tx_disable(&vi->sq[i].napi);
>>> +                     virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
>>>                }
>>>        }
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index 71e16b53e9c1..f7c5d697c302 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -105,6 +105,9 @@ struct vring_virtqueue {
>>>        /* Host publishes avail event idx */
>>>        bool event;
>>>
>>> +     /* Tx side napi work could be done from rx side. */
>>> +     bool work_steal;
>>
>> So vring_vritqueue is a general structure, let's avoid mentioning
>> network specific stuffs here. And we need a better name like
>> "no_interrupt_check"?
>>
>> And we need a separate patch for virtio core changes.
> Ack. Will change.
>
>>> +
>>>        /* Head of free buffer list. */
>>>        unsigned int free_head;
>>>        /* Number we've added since last sync. */
>>> @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>>>        vq->notify = notify;
>>>        vq->weak_barriers = weak_barriers;
>>>        vq->broken = false;
>>> +     vq->work_steal = false;
>>>        vq->last_used_idx = 0;
>>>        vq->num_added = 0;
>>>        vq->packed_ring = true;
>>> @@ -2038,6 +2042,9 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>>>
>>>        if (!more_used(vq)) {
>>>                pr_debug("virtqueue interrupt with no work for %p\n", vq);
>>
>> Do we still need to keep this warning?
> Come to think of it, I would say no, in this case.
>
>>
>>> +             if (vq->work_steal)
>>> +                     return IRQ_HANDLED;
>>
>> So I wonder instead of doing trick like this, maybe it's time to unify
>> TX/RX NAPI with the help of[1] (virtio-net use queue pairs).
>>
>> Thanks
>>
>> [1] https://lkml.org/lkml/2014/12/25/169
> Interesting idea. It does sound like a good fit for this model. The
> patch in the Fixes line proved effective at suppressing unnecessary TX
> interrupts when processing in RX interrupt handler. So not sure how
> much will help in practice. Might be a nice project to evaluate
> separate for net-next at some point.


Right, basically the idea is that if an irq is shared among several 
virtqueues, there's no need to check for more_used() there.

Yes, we can try sometime in the future. (Or e.g we have more than 128 
queue pairs).

Thanks


>
> Thanks for the review!
>
Michael S. Tsirkin Feb. 3, 2021, 10:38 a.m. UTC | #9
On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> >
> > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > >
> > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > because we have done the work in the previous rx interrupt handler.
> > > > > This could lead to the following warning msg:
> > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > "irqpoll" option)
> > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > [ 3588.017940] Call Trace:
> > > > > [ 3588.017942]  <IRQ>
> > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > [ 3588.017966]  </IRQ>
> > > > > [ 3588.017989] handlers:
> > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > [ 3588.025099] Disabling IRQ #38
> > > > >
> > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > case.
> > > > >
> > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > >
> > > >
> > > > This description does not make sense to me.
> > > >
> > > > irq X: nobody cared
> > > > only triggers after an interrupt is unhandled repeatedly.
> > > >
> > > > So something causes a storm of useless tx interrupts here.
> > > >
> > > > Let's find out what it was please. What you are doing is
> > > > just preventing linux from complaining.
> > >
> > > The traffic that causes this warning is a netperf tcp_stream with at
> > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > and a few tx interrupts.
> > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > get triggered very close to each other, and gets handled in one round
> > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > to try to do the work on the corresponding tx queue as well. That's
> > > why when tx interrupt handler gets called, it sees no work to do.
> > > And the reason for the rx handler to handle the tx work is here:
> > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> >
> > Indeed. It's not a storm necessarily. The warning occurs after one
> > hundred such events, since boot, which is a small number compared real
> > interrupt load.
> 
> Sorry, this is wrong. It is the other call to __report_bad_irq from
> note_interrupt that applies here.
> 
> > Occasionally seeing an interrupt with no work is expected after
> > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > long as this rate of events is very low compared to useful interrupts,
> > and total interrupt count is greatly reduced vs not having work
> > stealing, it is a net win.

Right, but if 99900 out of 100000 interrupts were wasted, then it is
surely an even greater win to disable interrupts while polling like
this.  Might be tricky to detect, disabling/enabling aggressively every
time even if there's nothing in the queue is sure to cause lots of cache
line bounces, and we don't want to enable callbacks if they were not
enabled e.g. by start_xmit ...  Some kind of counter?
Willem de Bruijn Feb. 3, 2021, 6:24 p.m. UTC | #10
On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > >
> > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > >
> > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > This could lead to the following warning msg:
> > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > "irqpoll" option)
> > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > [ 3588.017940] Call Trace:
> > > > > > [ 3588.017942]  <IRQ>
> > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > [ 3588.017966]  </IRQ>
> > > > > > [ 3588.017989] handlers:
> > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > >
> > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > case.
> > > > > >
> > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > >
> > > > >
> > > > > This description does not make sense to me.
> > > > >
> > > > > irq X: nobody cared
> > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > >
> > > > > So something causes a storm of useless tx interrupts here.
> > > > >
> > > > > Let's find out what it was please. What you are doing is
> > > > > just preventing linux from complaining.
> > > >
> > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > and a few tx interrupts.
> > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > get triggered very close to each other, and gets handled in one round
> > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > to try to do the work on the corresponding tx queue as well. That's
> > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > And the reason for the rx handler to handle the tx work is here:
> > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > >
> > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > hundred such events, since boot, which is a small number compared real
> > > interrupt load.
> >
> > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > note_interrupt that applies here.
> >
> > > Occasionally seeing an interrupt with no work is expected after
> > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > long as this rate of events is very low compared to useful interrupts,
> > > and total interrupt count is greatly reduced vs not having work
> > > stealing, it is a net win.
>
> Right, but if 99900 out of 100000 interrupts were wasted, then it is
> surely an even greater win to disable interrupts while polling like
> this.  Might be tricky to detect, disabling/enabling aggressively every
> time even if there's nothing in the queue is sure to cause lots of cache
> line bounces, and we don't want to enable callbacks if they were not
> enabled e.g. by start_xmit ...  Some kind of counter?

Yes. It was known that the work stealing is more effective in some
workloads than others. But a 99% spurious rate I had not anticipated.

Most interesting is the number of interrupts suppressed as a result of
the feature. That is not captured by this statistic.

In any case, we'll take a step back to better understand behavior. And
especially why this high spurious rate exhibits in this workload with
many concurrent flows.
Willem de Bruijn Feb. 3, 2021, 6:28 p.m. UTC | #11
On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> > On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2021/1/29 上午8:21, Wei Wang wrote:
> >>> With the implementation of napi-tx in virtio driver, we clean tx
> >>> descriptors from rx napi handler, for the purpose of reducing tx
> >>> complete interrupts. But this could introduce a race where tx complete
> >>> interrupt has been raised, but the handler found there is no work to do
> >>> because we have done the work in the previous rx interrupt handler.
> >>> This could lead to the following warning msg:
> >>> [ 3588.010778] irq 38: nobody cared (try booting with the
> >>> "irqpoll" option)
> >>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> >>> 5.3.0-19-generic #20~18.04.2-Ubuntu
> >>> [ 3588.017940] Call Trace:
> >>> [ 3588.017942]  <IRQ>
> >>> [ 3588.017951]  dump_stack+0x63/0x85
> >>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> >>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> >>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> >>> [ 3588.017957]  handle_irq_event+0x3b/0x60
> >>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> >>> [ 3588.017961]  handle_irq+0x20/0x30
> >>> [ 3588.017964]  do_IRQ+0x50/0xe0
> >>> [ 3588.017966]  common_interrupt+0xf/0xf
> >>> [ 3588.017966]  </IRQ>
> >>> [ 3588.017989] handlers:
> >>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> >>> [ 3588.025099] Disabling IRQ #38
> >>>
> >>> This patch adds a new param to struct vring_virtqueue, and we set it for
> >>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> >>> case.
> >>>
> >>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >>> Reported-by: Rick Jones <jonesrick@google.com>
> >>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>> Signed-off-by: Willem de Bruijn <willemb@google.com>
> >>
> >> Please use get_maintainer.pl to make sure Michael and me were cced.
> > Will do. Sorry about that. I suggested just the virtualization list, my bad.
> >
> >>> ---
> >>>    drivers/net/virtio_net.c     | 19 ++++++++++++++-----
> >>>    drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
> >>>    include/linux/virtio.h       |  2 ++
> >>>    3 files changed, 32 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 508408fbe78f..e9a3f30864e8 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
> >>>                return;
> >>>        }
> >>>
> >>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> >>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
> >>> +      * IRQ_NONE case from tx complete interrupt handler.
> >>> +      */
> >>> +     virtqueue_set_work_steal(vq, true);
> >>> +
> >>>        return virtnet_napi_enable(vq, napi);
> >>
> >> Do we need to force the ordering between steal set and napi enable?
> > The warning only occurs after one hundred spurious interrupts, so not
> > really.
>
>
> Ok, so it looks like a hint. Then I wonder how much value do we need to
> introduce helper like virtqueue_set_work_steal() that allows the caller
> to toggle. How about disable the check forever during virtqueue
> initialization?

Yes, that is even simpler.

We still need the helper, as the internal variables of vring_virtqueue
are not accessible from virtio-net. An earlier patch added the
variable to virtqueue itself, but I think it belongs in
vring_virtqueue. And the helper is not a lot of code.
Michael S. Tsirkin Feb. 3, 2021, 11:09 p.m. UTC | #12
On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > >
> > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > >
> > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > This could lead to the following warning msg:
> > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > "irqpoll" option)
> > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > [ 3588.017940] Call Trace:
> > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > [ 3588.017989] handlers:
> > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > >
> > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > case.
> > > > > > >
> > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > >
> > > > > >
> > > > > > This description does not make sense to me.
> > > > > >
> > > > > > irq X: nobody cared
> > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > >
> > > > > > So something causes a storm of useless tx interrupts here.
> > > > > >
> > > > > > Let's find out what it was please. What you are doing is
> > > > > > just preventing linux from complaining.
> > > > >
> > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > and a few tx interrupts.
> > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > get triggered very close to each other, and gets handled in one round
> > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > >
> > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > hundred such events, since boot, which is a small number compared real
> > > > interrupt load.
> > >
> > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > note_interrupt that applies here.
> > >
> > > > Occasionally seeing an interrupt with no work is expected after
> > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > long as this rate of events is very low compared to useful interrupts,
> > > > and total interrupt count is greatly reduced vs not having work
> > > > stealing, it is a net win.
> >
> > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > surely an even greater win to disable interrupts while polling like
> > this.  Might be tricky to detect, disabling/enabling aggressively every
> > time even if there's nothing in the queue is sure to cause lots of cache
> > line bounces, and we don't want to enable callbacks if they were not
> > enabled e.g. by start_xmit ...  Some kind of counter?
> 
> Yes. It was known that the work stealing is more effective in some
> workloads than others. But a 99% spurious rate I had not anticipated.
> 
> Most interesting is the number of interrupts suppressed as a result of
> the feature. That is not captured by this statistic.
> 
> In any case, we'll take a step back to better understand behavior. And
> especially why this high spurious rate exhibits in this workload with
> many concurrent flows.


I've been thinking about it. Imagine work stealing working perfectly.
Each time we xmit a packet, it is stolen and freed.
Since xmit enables callbacks (just in case!) we also
get an interrupt, which is automatically spurious.

My conclusion is that we shouldn't just work around it but instead
(or additionally?)
reduce the number of interrupts by disabling callbacks e.g. when
a. we are currently stealing packets
or
b. we stole all packets

This should be enough to reduce the chances below 99% ;)

One annoying thing is that with split and event index, we do not disable
interrupts. Could be worth revisiting, for now maybe just disable the
event index feature? I am not sure it is actually worth it with
stealing.
Wei Wang Feb. 3, 2021, 11:52 p.m. UTC | #13
On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > >
> > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > This could lead to the following warning msg:
> > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > "irqpoll" option)
> > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > [ 3588.017989] handlers:
> > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > >
> > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > case.
> > > > > > > >
> > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > >
> > > > > > >
> > > > > > > This description does not make sense to me.
> > > > > > >
> > > > > > > irq X: nobody cared
> > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > >
> > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > >
> > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > just preventing linux from complaining.
> > > > > >
> > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > and a few tx interrupts.
> > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > >
> > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > hundred such events, since boot, which is a small number compared real
> > > > > interrupt load.
> > > >
> > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > note_interrupt that applies here.
> > > >
> > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > and total interrupt count is greatly reduced vs not having work
> > > > > stealing, it is a net win.
> > >
> > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > surely an even greater win to disable interrupts while polling like
> > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > time even if there's nothing in the queue is sure to cause lots of cache
> > > line bounces, and we don't want to enable callbacks if they were not
> > > enabled e.g. by start_xmit ...  Some kind of counter?
> >
> > Yes. It was known that the work stealing is more effective in some
> > workloads than others. But a 99% spurious rate I had not anticipated.
> >
> > Most interesting is the number of interrupts suppressed as a result of
> > the feature. That is not captured by this statistic.
> >
> > In any case, we'll take a step back to better understand behavior. And
> > especially why this high spurious rate exhibits in this workload with
> > many concurrent flows.
>
>
> I've been thinking about it. Imagine work stealing working perfectly.
> Each time we xmit a packet, it is stolen and freed.
> Since xmit enables callbacks (just in case!) we also
> get an interrupt, which is automatically spurious.
>
> My conclusion is that we shouldn't just work around it but instead
> (or additionally?)
> reduce the number of interrupts by disabling callbacks e.g. when
> a. we are currently stealing packets
> or
> b. we stole all packets
>
Thinking along this line, that probably means, we should disable cb on
the tx virtqueue, when scheduling the napi work on the rx side, and
reenable it after the rx napi work is done?
Also, I wonder if it is too late to disable cb at the point we start
to steal pkts or have stolen all pkts. Because the steal work is done
in the napi handler of the rx queue. But the tx interrupt must have
been raised before that. Will we come back to process the tx interrupt
again after we re-enabled the cb on the tx side?

> This should be enough to reduce the chances below 99% ;)
>
> One annoying thing is that with split and event index, we do not disable
> interrupts. Could be worth revisiting, for now maybe just disable the
> event index feature? I am not sure it is actually worth it with
> stealing.
>
> --
> MST
>
Jason Wang Feb. 4, 2021, 3:06 a.m. UTC | #14
On 2021/2/4 上午2:28, Willem de Bruijn wrote:
> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2021/1/29 上午8:21, Wei Wang wrote:
>>>>> With the implementation of napi-tx in virtio driver, we clean tx
>>>>> descriptors from rx napi handler, for the purpose of reducing tx
>>>>> complete interrupts. But this could introduce a race where tx complete
>>>>> interrupt has been raised, but the handler found there is no work to do
>>>>> because we have done the work in the previous rx interrupt handler.
>>>>> This could lead to the following warning msg:
>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the
>>>>> "irqpoll" option)
>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu
>>>>> [ 3588.017940] Call Trace:
>>>>> [ 3588.017942]  <IRQ>
>>>>> [ 3588.017951]  dump_stack+0x63/0x85
>>>>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
>>>>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
>>>>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
>>>>> [ 3588.017957]  handle_irq_event+0x3b/0x60
>>>>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
>>>>> [ 3588.017961]  handle_irq+0x20/0x30
>>>>> [ 3588.017964]  do_IRQ+0x50/0xe0
>>>>> [ 3588.017966]  common_interrupt+0xf/0xf
>>>>> [ 3588.017966]  </IRQ>
>>>>> [ 3588.017989] handlers:
>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
>>>>> [ 3588.025099] Disabling IRQ #38
>>>>>
>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for
>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
>>>>> case.
>>>>>
>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
>>>>> Reported-by: Rick Jones <jonesrick@google.com>
>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>> Please use get_maintainer.pl to make sure Michael and me were cced.
>>> Will do. Sorry about that. I suggested just the virtualization list, my bad.
>>>
>>>>> ---
>>>>>     drivers/net/virtio_net.c     | 19 ++++++++++++++-----
>>>>>     drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>>>>>     include/linux/virtio.h       |  2 ++
>>>>>     3 files changed, 32 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index 508408fbe78f..e9a3f30864e8 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>>>>>                 return;
>>>>>         }
>>>>>
>>>>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
>>>>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
>>>>> +      * IRQ_NONE case from tx complete interrupt handler.
>>>>> +      */
>>>>> +     virtqueue_set_work_steal(vq, true);
>>>>> +
>>>>>         return virtnet_napi_enable(vq, napi);
>>>> Do we need to force the ordering between steal set and napi enable?
>>> The warning only occurs after one hundred spurious interrupts, so not
>>> really.
>>
>> Ok, so it looks like a hint. Then I wonder how much value do we need to
>> introduce helper like virtqueue_set_work_steal() that allows the caller
>> to toggle. How about disable the check forever during virtqueue
>> initialization?
> Yes, that is even simpler.
>
> We still need the helper, as the internal variables of vring_virtqueue
> are not accessible from virtio-net. An earlier patch added the
> variable to virtqueue itself, but I think it belongs in
> vring_virtqueue. And the helper is not a lot of code.


It's better to do this before the allocating the irq. But it looks not 
easy unless we extend find_vqs().

Thanks


>
Willem de Bruijn Feb. 4, 2021, 8:47 p.m. UTC | #15
On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
>
> On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > >
> > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > "irqpoll" option)
> > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > >
> > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > case.
> > > > > > > > >
> > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > >
> > > > > > > >
> > > > > > > > This description does not make sense to me.
> > > > > > > >
> > > > > > > > irq X: nobody cared
> > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > >
> > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > >
> > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > just preventing linux from complaining.
> > > > > > >
> > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > and a few tx interrupts.
> > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > >
> > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > interrupt load.
> > > > >
> > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > note_interrupt that applies here.
> > > > >
> > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > stealing, it is a net win.
> > > >
> > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > surely an even greater win to disable interrupts while polling like
> > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > line bounces, and we don't want to enable callbacks if they were not
> > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > >
> > > Yes. It was known that the work stealing is more effective in some
> > > workloads than others. But a 99% spurious rate I had not anticipated.
> > >
> > > Most interesting is the number of interrupts suppressed as a result of
> > > the feature. That is not captured by this statistic.
> > >
> > > In any case, we'll take a step back to better understand behavior. And
> > > especially why this high spurious rate exhibits in this workload with
> > > many concurrent flows.
> >
> >
> > I've been thinking about it. Imagine work stealing working perfectly.
> > Each time we xmit a packet, it is stolen and freed.
> > Since xmit enables callbacks (just in case!) we also
> > get an interrupt, which is automatically spurious.
> >
> > My conclusion is that we shouldn't just work around it but instead
> > (or additionally?)
> > reduce the number of interrupts by disabling callbacks e.g. when
> > a. we are currently stealing packets
> > or
> > b. we stole all packets

Agreed. This might prove a significant performance gain at the same time :)

> >
> Thinking along this line, that probably means, we should disable cb on
> the tx virtqueue, when scheduling the napi work on the rx side, and
> reenable it after the rx napi work is done?
> Also, I wonder if it is too late to disable cb at the point we start
> to steal pkts or have stolen all pkts.

The earlier the better. I see no benefit to delay until the rx handler
actually runs.

> Because the steal work is done
> in the napi handler of the rx queue. But the tx interrupt must have
> been raised before that. Will we come back to process the tx interrupt
> again after we re-enabled the cb on the tx side?
>
> > This should be enough to reduce the chances below 99% ;)
> >
> > One annoying thing is that with split and event index, we do not disable
> > interrupts. Could be worth revisiting, for now maybe just disable the
> > event index feature? I am not sure it is actually worth it with
> > stealing.

With event index, we suppress interrupts when another interrupt is
already pending from a previous packet, right? When the previous
position of the producer is already beyond the consumer. It doesn't
matter whether the previous packet triggered a tx interrupt or
deferred to an already scheduled rx interrupt? From that seems fine to
leave it out.
Willem de Bruijn Feb. 4, 2021, 8:50 p.m. UTC | #16
On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/2/4 上午2:28, Willem de Bruijn wrote:
> > On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> >>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2021/1/29 上午8:21, Wei Wang wrote:
> >>>>> With the implementation of napi-tx in virtio driver, we clean tx
> >>>>> descriptors from rx napi handler, for the purpose of reducing tx
> >>>>> complete interrupts. But this could introduce a race where tx complete
> >>>>> interrupt has been raised, but the handler found there is no work to do
> >>>>> because we have done the work in the previous rx interrupt handler.
> >>>>> This could lead to the following warning msg:
> >>>>> [ 3588.010778] irq 38: nobody cared (try booting with the
> >>>>> "irqpoll" option)
> >>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> >>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu
> >>>>> [ 3588.017940] Call Trace:
> >>>>> [ 3588.017942]  <IRQ>
> >>>>> [ 3588.017951]  dump_stack+0x63/0x85
> >>>>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> >>>>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> >>>>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> >>>>> [ 3588.017957]  handle_irq_event+0x3b/0x60
> >>>>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> >>>>> [ 3588.017961]  handle_irq+0x20/0x30
> >>>>> [ 3588.017964]  do_IRQ+0x50/0xe0
> >>>>> [ 3588.017966]  common_interrupt+0xf/0xf
> >>>>> [ 3588.017966]  </IRQ>
> >>>>> [ 3588.017989] handlers:
> >>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> >>>>> [ 3588.025099] Disabling IRQ #38
> >>>>>
> >>>>> This patch adds a new param to struct vring_virtqueue, and we set it for
> >>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> >>>>> case.
> >>>>>
> >>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >>>>> Reported-by: Rick Jones <jonesrick@google.com>
> >>>>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
> >>>> Please use get_maintainer.pl to make sure Michael and me were cced.
> >>> Will do. Sorry about that. I suggested just the virtualization list, my bad.
> >>>
> >>>>> ---
> >>>>>     drivers/net/virtio_net.c     | 19 ++++++++++++++-----
> >>>>>     drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
> >>>>>     include/linux/virtio.h       |  2 ++
> >>>>>     3 files changed, 32 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>> index 508408fbe78f..e9a3f30864e8 100644
> >>>>> --- a/drivers/net/virtio_net.c
> >>>>> +++ b/drivers/net/virtio_net.c
> >>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
> >>>>>                 return;
> >>>>>         }
> >>>>>
> >>>>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> >>>>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
> >>>>> +      * IRQ_NONE case from tx complete interrupt handler.
> >>>>> +      */
> >>>>> +     virtqueue_set_work_steal(vq, true);
> >>>>> +
> >>>>>         return virtnet_napi_enable(vq, napi);
> >>>> Do we need to force the ordering between steal set and napi enable?
> >>> The warning only occurs after one hundred spurious interrupts, so not
> >>> really.
> >>
> >> Ok, so it looks like a hint. Then I wonder how much value do we need to
> >> introduce helper like virtqueue_set_work_steal() that allows the caller
> >> to toggle. How about disable the check forever during virtqueue
> >> initialization?
> > Yes, that is even simpler.
> >
> > We still need the helper, as the internal variables of vring_virtqueue
> > are not accessible from virtio-net. An earlier patch added the
> > variable to virtqueue itself, but I think it belongs in
> > vring_virtqueue. And the helper is not a lot of code.
>
>
> It's better to do this before the allocating the irq. But it looks not
> easy unless we extend find_vqs().

Can you elaborate why that is better? At virtnet_open the interrupts
are not firing either.

I have no preference. Just curious, especially if it complicates the patch.
Wei Wang Feb. 5, 2021, 10:28 p.m. UTC | #17
On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> >
> > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > >
> > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > "irqpoll" option)
> > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > >
> > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > case.
> > > > > > > > > >
> > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > This description does not make sense to me.
> > > > > > > > >
> > > > > > > > > irq X: nobody cared
> > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > >
> > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > >
> > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > just preventing linux from complaining.
> > > > > > > >
> > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > and a few tx interrupts.
> > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > >
> > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > interrupt load.
> > > > > >
> > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > note_interrupt that applies here.
> > > > > >
> > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > stealing, it is a net win.
> > > > >
> > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > surely an even greater win to disable interrupts while polling like
> > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > >
> > > > Yes. It was known that the work stealing is more effective in some
> > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > >
> > > > Most interesting is the number of interrupts suppressed as a result of
> > > > the feature. That is not captured by this statistic.
> > > >
> > > > In any case, we'll take a step back to better understand behavior. And
> > > > especially why this high spurious rate exhibits in this workload with
> > > > many concurrent flows.
> > >
> > >
> > > I've been thinking about it. Imagine work stealing working perfectly.
> > > Each time we xmit a packet, it is stolen and freed.
> > > Since xmit enables callbacks (just in case!) we also
> > > get an interrupt, which is automatically spurious.
> > >
> > > My conclusion is that we shouldn't just work around it but instead
> > > (or additionally?)
> > > reduce the number of interrupts by disabling callbacks e.g. when
> > > a. we are currently stealing packets
> > > or
> > > b. we stole all packets
>
> Agreed. This might prove a significant performance gain at the same time :)
>
> > >
> > Thinking along this line, that probably means, we should disable cb on
> > the tx virtqueue, when scheduling the napi work on the rx side, and
> > reenable it after the rx napi work is done?
> > Also, I wonder if it is too late to disable cb at the point we start
> > to steal pkts or have stolen all pkts.
>
> The earlier the better. I see no benefit to delay until the rx handler
> actually runs.
>

I've been thinking more on this. I think the fundamental issue here is
that the rx napi handler virtnet_poll() does the tx side work by
calling virtnet_poll_cleantx() without any notification to the tx
side.
I am thinking, in virtnet_poll(), instead of directly call
virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
schedule the tx side napi, and let the tx napi handler do the cleaning
work. This way, we automatically call virtqueue_disable_cb() on the tx
vq, and after the tx work is done, virtqueue_napi_complete() is called
to re-enable the cb on the tx side. This way, the tx side knows what
has been done, and will likely reduce the # of spurious tx interrupts?
And I don't think there is much cost in doing that, since
napi_schedule() basically queues the tx napi to the back of its
napi_list, and serves it right after the rx napi handler is done.
What do you guys think? I could quickly test it up to see if it solves
the issue.

> > Because the steal work is done
> > in the napi handler of the rx queue. But the tx interrupt must have
> > been raised before that. Will we come back to process the tx interrupt
> > again after we re-enabled the cb on the tx side?
> >
> > > This should be enough to reduce the chances below 99% ;)
> > >
> > > One annoying thing is that with split and event index, we do not disable
> > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > event index feature? I am not sure it is actually worth it with
> > > stealing.
>
> With event index, we suppress interrupts when another interrupt is
> already pending from a previous packet, right? When the previous
> position of the producer is already beyond the consumer. It doesn't
> matter whether the previous packet triggered a tx interrupt or
> deferred to an already scheduled rx interrupt? From that seems fine to
> leave it out.
Jason Wang Feb. 8, 2021, 3:29 a.m. UTC | #18
On 2021/2/5 上午4:50, Willem de Bruijn wrote:
> On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/2/4 上午2:28, Willem de Bruijn wrote:
>>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
>>>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2021/1/29 上午8:21, Wei Wang wrote:
>>>>>>> With the implementation of napi-tx in virtio driver, we clean tx
>>>>>>> descriptors from rx napi handler, for the purpose of reducing tx
>>>>>>> complete interrupts. But this could introduce a race where tx complete
>>>>>>> interrupt has been raised, but the handler found there is no work to do
>>>>>>> because we have done the work in the previous rx interrupt handler.
>>>>>>> This could lead to the following warning msg:
>>>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the
>>>>>>> "irqpoll" option)
>>>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
>>>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu
>>>>>>> [ 3588.017940] Call Trace:
>>>>>>> [ 3588.017942]  <IRQ>
>>>>>>> [ 3588.017951]  dump_stack+0x63/0x85
>>>>>>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
>>>>>>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
>>>>>>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
>>>>>>> [ 3588.017957]  handle_irq_event+0x3b/0x60
>>>>>>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
>>>>>>> [ 3588.017961]  handle_irq+0x20/0x30
>>>>>>> [ 3588.017964]  do_IRQ+0x50/0xe0
>>>>>>> [ 3588.017966]  common_interrupt+0xf/0xf
>>>>>>> [ 3588.017966]  </IRQ>
>>>>>>> [ 3588.017989] handlers:
>>>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
>>>>>>> [ 3588.025099] Disabling IRQ #38
>>>>>>>
>>>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for
>>>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
>>>>>>> case.
>>>>>>>
>>>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
>>>>>>> Reported-by: Rick Jones <jonesrick@google.com>
>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>>>> Please use get_maintainer.pl to make sure Michael and me were cced.
>>>>> Will do. Sorry about that. I suggested just the virtualization list, my bad.
>>>>>
>>>>>>> ---
>>>>>>>      drivers/net/virtio_net.c     | 19 ++++++++++++++-----
>>>>>>>      drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>>>>>>>      include/linux/virtio.h       |  2 ++
>>>>>>>      3 files changed, 32 insertions(+), 5 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 508408fbe78f..e9a3f30864e8 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>>>>>>>                  return;
>>>>>>>          }
>>>>>>>
>>>>>>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
>>>>>>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
>>>>>>> +      * IRQ_NONE case from tx complete interrupt handler.
>>>>>>> +      */
>>>>>>> +     virtqueue_set_work_steal(vq, true);
>>>>>>> +
>>>>>>>          return virtnet_napi_enable(vq, napi);
>>>>>> Do we need to force the ordering between steal set and napi enable?
>>>>> The warning only occurs after one hundred spurious interrupts, so not
>>>>> really.
>>>> Ok, so it looks like a hint. Then I wonder how much value do we need to
>>>> introduce helper like virtqueue_set_work_steal() that allows the caller
>>>> to toggle. How about disable the check forever during virtqueue
>>>> initialization?
>>> Yes, that is even simpler.
>>>
>>> We still need the helper, as the internal variables of vring_virtqueue
>>> are not accessible from virtio-net. An earlier patch added the
>>> variable to virtqueue itself, but I think it belongs in
>>> vring_virtqueue. And the helper is not a lot of code.
>>
>> It's better to do this before the allocating the irq. But it looks not
>> easy unless we extend find_vqs().
> Can you elaborate why that is better? At virtnet_open the interrupts
> are not firing either.


I think you meant NAPI actually?


>
> I have no preference. Just curious, especially if it complicates the patch.
>

My understanding is that. It's probably ok for net. But we probably need 
to document the assumptions to make sure it was not abused in other drivers.

Introduce new parameters for find_vqs() can help to eliminate the subtle 
stuffs but I agree it looks like a overkill.

(Btw, I forget the numbers but wonder how much difference if we simple 
remove the free_old_xmits() from the rx NAPI path?)

Thanks
Willem de Bruijn Feb. 8, 2021, 7:08 p.m. UTC | #19
On Sun, Feb 7, 2021 at 10:29 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/2/5 上午4:50, Willem de Bruijn wrote:
> > On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On 2021/2/4 上午2:28, Willem de Bruijn wrote:
> >>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote:
> >>>> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> >>>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> On 2021/1/29 上午8:21, Wei Wang wrote:
> >>>>>>> With the implementation of napi-tx in virtio driver, we clean tx
> >>>>>>> descriptors from rx napi handler, for the purpose of reducing tx
> >>>>>>> complete interrupts. But this could introduce a race where tx complete
> >>>>>>> interrupt has been raised, but the handler found there is no work to do
> >>>>>>> because we have done the work in the previous rx interrupt handler.
> >>>>>>> This could lead to the following warning msg:
> >>>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the
> >>>>>>> "irqpoll" option)
> >>>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> >>>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu
> >>>>>>> [ 3588.017940] Call Trace:
> >>>>>>> [ 3588.017942]  <IRQ>
> >>>>>>> [ 3588.017951]  dump_stack+0x63/0x85
> >>>>>>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> >>>>>>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> >>>>>>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> >>>>>>> [ 3588.017957]  handle_irq_event+0x3b/0x60
> >>>>>>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> >>>>>>> [ 3588.017961]  handle_irq+0x20/0x30
> >>>>>>> [ 3588.017964]  do_IRQ+0x50/0xe0
> >>>>>>> [ 3588.017966]  common_interrupt+0xf/0xf
> >>>>>>> [ 3588.017966]  </IRQ>
> >>>>>>> [ 3588.017989] handlers:
> >>>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> >>>>>>> [ 3588.025099] Disabling IRQ #38
> >>>>>>>
> >>>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for
> >>>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> >>>>>>> case.
> >>>>>>>
> >>>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >>>>>>> Reported-by: Rick Jones <jonesrick@google.com>
> >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
> >>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
> >>>>>> Please use get_maintainer.pl to make sure Michael and me were cced.
> >>>>> Will do. Sorry about that. I suggested just the virtualization list, my bad.
> >>>>>
> >>>>>>> ---
> >>>>>>>      drivers/net/virtio_net.c     | 19 ++++++++++++++-----
> >>>>>>>      drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
> >>>>>>>      include/linux/virtio.h       |  2 ++
> >>>>>>>      3 files changed, 32 insertions(+), 5 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>>>> index 508408fbe78f..e9a3f30864e8 100644
> >>>>>>> --- a/drivers/net/virtio_net.c
> >>>>>>> +++ b/drivers/net/virtio_net.c
> >>>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
> >>>>>>>                  return;
> >>>>>>>          }
> >>>>>>>
> >>>>>>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> >>>>>>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
> >>>>>>> +      * IRQ_NONE case from tx complete interrupt handler.
> >>>>>>> +      */
> >>>>>>> +     virtqueue_set_work_steal(vq, true);
> >>>>>>> +
> >>>>>>>          return virtnet_napi_enable(vq, napi);
> >>>>>> Do we need to force the ordering between steal set and napi enable?
> >>>>> The warning only occurs after one hundred spurious interrupts, so not
> >>>>> really.
> >>>> Ok, so it looks like a hint. Then I wonder how much value do we need to
> >>>> introduce helper like virtqueue_set_work_steal() that allows the caller
> >>>> to toggle. How about disable the check forever during virtqueue
> >>>> initialization?
> >>> Yes, that is even simpler.
> >>>
> >>> We still need the helper, as the internal variables of vring_virtqueue
> >>> are not accessible from virtio-net. An earlier patch added the
> >>> variable to virtqueue itself, but I think it belongs in
> >>> vring_virtqueue. And the helper is not a lot of code.
> >>
> >> It's better to do this before the allocating the irq. But it looks not
> >> easy unless we extend find_vqs().
> > Can you elaborate why that is better? At virtnet_open the interrupts
> > are not firing either.
>
>
> I think you meant NAPI actually?

I meant interrupt: we don't have to worry about the spurious interrupt
warning when no interrupts will be firing. Until virtnet_open
completes, the device is down.


>
> >
> > I have no preference. Just curious, especially if it complicates the patch.
> >
>
> My understanding is that. It's probably ok for net. But we probably need
> to document the assumptions to make sure it was not abused in other drivers.
>
> Introduce new parameters for find_vqs() can help to eliminate the subtle
> stuffs but I agree it looks like a overkill.
>
> (Btw, I forget the numbers but wonder how much difference if we simple
> remove the free_old_xmits() from the rx NAPI path?)

The committed patchset did not record those numbers, but I found them
in an earlier iteration:

  [PATCH net-next 0/3] virtio-net tx napi
  https://lists.openwall.net/netdev/2017/04/02/55

It did seem to significantly reduce compute cycles ("Gcyc") at the
time. For instance:

    TCP_RR Latency (us):
    1x:
      p50              24       24       21
      p99              27       27       27
      Gcycles         299      432      308

I'm concerned that removing it now may cause a regression report in a
few months. That is higher risk than the spurious interrupt warning
that was only reported after years of use.
Jason Wang Feb. 9, 2021, 3:27 a.m. UTC | #20
On 2021/2/9 上午3:08, Willem de Bruijn wrote:
> On Sun, Feb 7, 2021 at 10:29 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/2/5 上午4:50, Willem de Bruijn wrote:
>>> On Wed, Feb 3, 2021 at 10:06 PM Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2021/2/4 上午2:28, Willem de Bruijn wrote:
>>>>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
>>>>>>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> On 2021/1/29 上午8:21, Wei Wang wrote:
>>>>>>>>> With the implementation of napi-tx in virtio driver, we clean tx
>>>>>>>>> descriptors from rx napi handler, for the purpose of reducing tx
>>>>>>>>> complete interrupts. But this could introduce a race where tx complete
>>>>>>>>> interrupt has been raised, but the handler found there is no work to do
>>>>>>>>> because we have done the work in the previous rx interrupt handler.
>>>>>>>>> This could lead to the following warning msg:
>>>>>>>>> [ 3588.010778] irq 38: nobody cared (try booting with the
>>>>>>>>> "irqpoll" option)
>>>>>>>>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
>>>>>>>>> 5.3.0-19-generic #20~18.04.2-Ubuntu
>>>>>>>>> [ 3588.017940] Call Trace:
>>>>>>>>> [ 3588.017942]  <IRQ>
>>>>>>>>> [ 3588.017951]  dump_stack+0x63/0x85
>>>>>>>>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
>>>>>>>>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
>>>>>>>>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
>>>>>>>>> [ 3588.017957]  handle_irq_event+0x3b/0x60
>>>>>>>>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
>>>>>>>>> [ 3588.017961]  handle_irq+0x20/0x30
>>>>>>>>> [ 3588.017964]  do_IRQ+0x50/0xe0
>>>>>>>>> [ 3588.017966]  common_interrupt+0xf/0xf
>>>>>>>>> [ 3588.017966]  </IRQ>
>>>>>>>>> [ 3588.017989] handlers:
>>>>>>>>> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
>>>>>>>>> [ 3588.025099] Disabling IRQ #38
>>>>>>>>>
>>>>>>>>> This patch adds a new param to struct vring_virtqueue, and we set it for
>>>>>>>>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
>>>>>>>>> case.
>>>>>>>>>
>>>>>>>>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
>>>>>>>>> Reported-by: Rick Jones <jonesrick@google.com>
>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>>>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>>>>>> Please use get_maintainer.pl to make sure Michael and me were cced.
>>>>>>> Will do. Sorry about that. I suggested just the virtualization list, my bad.
>>>>>>>
>>>>>>>>> ---
>>>>>>>>>       drivers/net/virtio_net.c     | 19 ++++++++++++++-----
>>>>>>>>>       drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
>>>>>>>>>       include/linux/virtio.h       |  2 ++
>>>>>>>>>       3 files changed, 32 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>>> index 508408fbe78f..e9a3f30864e8 100644
>>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info *vi,
>>>>>>>>>                   return;
>>>>>>>>>           }
>>>>>>>>>
>>>>>>>>> +     /* With napi_tx enabled, free_old_xmit_skbs() could be called from
>>>>>>>>> +      * rx napi handler. Set work_steal to suppress bad irq warning for
>>>>>>>>> +      * IRQ_NONE case from tx complete interrupt handler.
>>>>>>>>> +      */
>>>>>>>>> +     virtqueue_set_work_steal(vq, true);
>>>>>>>>> +
>>>>>>>>>           return virtnet_napi_enable(vq, napi);
>>>>>>>> Do we need to force the ordering between steal set and napi enable?
>>>>>>> The warning only occurs after one hundred spurious interrupts, so not
>>>>>>> really.
>>>>>> Ok, so it looks like a hint. Then I wonder how much value do we need to
>>>>>> introduce helper like virtqueue_set_work_steal() that allows the caller
>>>>>> to toggle. How about disable the check forever during virtqueue
>>>>>> initialization?
>>>>> Yes, that is even simpler.
>>>>>
>>>>> We still need the helper, as the internal variables of vring_virtqueue
>>>>> are not accessible from virtio-net. An earlier patch added the
>>>>> variable to virtqueue itself, but I think it belongs in
>>>>> vring_virtqueue. And the helper is not a lot of code.
>>>> It's better to do this before the allocating the irq. But it looks not
>>>> easy unless we extend find_vqs().
>>> Can you elaborate why that is better? At virtnet_open the interrupts
>>> are not firing either.
>>
>> I think you meant NAPI actually?
> I meant interrupt: we don't have to worry about the spurious interrupt
> warning when no interrupts will be firing. Until virtnet_open
> completes, the device is down.


Ok.


>
>
>>> I have no preference. Just curious, especially if it complicates the patch.
>>>
>> My understanding is that. It's probably ok for net. But we probably need
>> to document the assumptions to make sure it was not abused in other drivers.
>>
>> Introduce new parameters for find_vqs() can help to eliminate the subtle
>> stuffs but I agree it looks like a overkill.
>>
>> (Btw, I forget the numbers but wonder how much difference if we simple
>> remove the free_old_xmits() from the rx NAPI path?)
> The committed patchset did not record those numbers, but I found them
> in an earlier iteration:
>
>    [PATCH net-next 0/3] virtio-net tx napi
>    https://lists.openwall.net/netdev/2017/04/02/55
>
> It did seem to significantly reduce compute cycles ("Gcyc") at the
> time. For instance:
>
>      TCP_RR Latency (us):
>      1x:
>        p50              24       24       21
>        p99              27       27       27
>        Gcycles         299      432      308
>
> I'm concerned that removing it now may cause a regression report in a
> few months. That is higher risk than the spurious interrupt warning
> that was only reported after years of use.


Right.

So if Michael is fine with this approach, I'm ok with it. But we 
probably need to a TODO to invent the interrupt handlers that can be 
used for more than one virtqueues. When MSI-X is enabled, the interrupt 
handler (vring_interrup()) assumes the interrupt is used by a single 
virtqueue.

Thanks


>
Willem de Bruijn Feb. 9, 2021, 2:58 p.m. UTC | #21
> >>> I have no preference. Just curious, especially if it complicates the patch.
> >>>
> >> My understanding is that. It's probably ok for net. But we probably need
> >> to document the assumptions to make sure it was not abused in other drivers.
> >>
> >> Introduce new parameters for find_vqs() can help to eliminate the subtle
> >> stuffs but I agree it looks like a overkill.
> >>
> >> (Btw, I forget the numbers but wonder how much difference if we simple
> >> remove the free_old_xmits() from the rx NAPI path?)
> > The committed patchset did not record those numbers, but I found them
> > in an earlier iteration:
> >
> >    [PATCH net-next 0/3] virtio-net tx napi
> >    https://lists.openwall.net/netdev/2017/04/02/55
> >
> > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > time. For instance:
> >
> >      TCP_RR Latency (us):
> >      1x:
> >        p50              24       24       21
> >        p99              27       27       27
> >        Gcycles         299      432      308
> >
> > I'm concerned that removing it now may cause a regression report in a
> > few months. That is higher risk than the spurious interrupt warning
> > that was only reported after years of use.
>
>
> Right.
>
> So if Michael is fine with this approach, I'm ok with it. But we
> probably need to a TODO to invent the interrupt handlers that can be
> used for more than one virtqueues. When MSI-X is enabled, the interrupt
> handler (vring_interrup()) assumes the interrupt is used by a single
> virtqueue.

Thanks.

The approach to schedule tx-napi from virtnet_poll_cleantx instead of
cleaning directly in this rx-napi function was not effective at
suppressing the warning, I understand.

It should be easy to drop the spurious rate to a little under 99%
percent, if only to suppress the warning. By probabilistically
cleaning in virtnet_poll_cleantx only every 98/100, say. But that
really is a hack.

There does seem to be a huge potential for cycle savings if we can
really avoid the many spurious interrupts.

As scheduling napi_tx from virtnet_poll_cleantx is not effective,
agreed that we should probably look at the more complete solution to
handle both tx and rx virtqueues from the same IRQ.

And revert this explicit warning suppression patch once we have that.
Wei Wang Feb. 9, 2021, 6 p.m. UTC | #22
On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> > >>> I have no preference. Just curious, especially if it complicates the patch.
> > >>>
> > >> My understanding is that. It's probably ok for net. But we probably need
> > >> to document the assumptions to make sure it was not abused in other drivers.
> > >>
> > >> Introduce new parameters for find_vqs() can help to eliminate the subtle
> > >> stuffs but I agree it looks like a overkill.
> > >>
> > >> (Btw, I forget the numbers but wonder how much difference if we simple
> > >> remove the free_old_xmits() from the rx NAPI path?)
> > > The committed patchset did not record those numbers, but I found them
> > > in an earlier iteration:
> > >
> > >    [PATCH net-next 0/3] virtio-net tx napi
> > >    https://lists.openwall.net/netdev/2017/04/02/55
> > >
> > > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > > time. For instance:
> > >
> > >      TCP_RR Latency (us):
> > >      1x:
> > >        p50              24       24       21
> > >        p99              27       27       27
> > >        Gcycles         299      432      308
> > >
> > > I'm concerned that removing it now may cause a regression report in a
> > > few months. That is higher risk than the spurious interrupt warning
> > > that was only reported after years of use.
> >
> >
> > Right.
> >
> > So if Michael is fine with this approach, I'm ok with it. But we
> > probably need to a TODO to invent the interrupt handlers that can be
> > used for more than one virtqueues. When MSI-X is enabled, the interrupt
> > handler (vring_interrup()) assumes the interrupt is used by a single
> > virtqueue.
>
> Thanks.
>
> The approach to schedule tx-napi from virtnet_poll_cleantx instead of
> cleaning directly in this rx-napi function was not effective at
> suppressing the warning, I understand.

Correct. I tried the approach to schedule tx napi instead of directly
do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
still happens.

>
> It should be easy to drop the spurious rate to a little under 99%
> percent, if only to suppress the warning. By probabilistically
> cleaning in virtnet_poll_cleantx only every 98/100, say. But that
> really is a hack.
>
> There does seem to be a huge potential for cycle savings if we can
> really avoid the many spurious interrupts.
>
> As scheduling napi_tx from virtnet_poll_cleantx is not effective,
> agreed that we should probably look at the more complete solution to
> handle both tx and rx virtqueues from the same IRQ.
>
> And revert this explicit warning suppression patch once we have that.

If everyone agrees, I will submit a new version of this patch to
address previous comments, before a more complete solution is
proposed.
Michael S. Tsirkin Feb. 10, 2021, 9:14 a.m. UTC | #23
On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:
> On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > > >>> I have no preference. Just curious, especially if it complicates the patch.
> > > >>>
> > > >> My understanding is that. It's probably ok for net. But we probably need
> > > >> to document the assumptions to make sure it was not abused in other drivers.
> > > >>
> > > >> Introduce new parameters for find_vqs() can help to eliminate the subtle
> > > >> stuffs but I agree it looks like a overkill.
> > > >>
> > > >> (Btw, I forget the numbers but wonder how much difference if we simple
> > > >> remove the free_old_xmits() from the rx NAPI path?)
> > > > The committed patchset did not record those numbers, but I found them
> > > > in an earlier iteration:
> > > >
> > > >    [PATCH net-next 0/3] virtio-net tx napi
> > > >    https://lists.openwall.net/netdev/2017/04/02/55
> > > >
> > > > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > > > time. For instance:
> > > >
> > > >      TCP_RR Latency (us):
> > > >      1x:
> > > >        p50              24       24       21
> > > >        p99              27       27       27
> > > >        Gcycles         299      432      308
> > > >
> > > > I'm concerned that removing it now may cause a regression report in a
> > > > few months. That is higher risk than the spurious interrupt warning
> > > > that was only reported after years of use.
> > >
> > >
> > > Right.
> > >
> > > So if Michael is fine with this approach, I'm ok with it. But we
> > > probably need to a TODO to invent the interrupt handlers that can be
> > > used for more than one virtqueues. When MSI-X is enabled, the interrupt
> > > handler (vring_interrup()) assumes the interrupt is used by a single
> > > virtqueue.
> >
> > Thanks.
> >
> > The approach to schedule tx-napi from virtnet_poll_cleantx instead of
> > cleaning directly in this rx-napi function was not effective at
> > suppressing the warning, I understand.
> 
> Correct. I tried the approach to schedule tx napi instead of directly
> do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
> still happens.

Two questions here: is the device using packed or split vqs?
And is event index enabled?

I think one issue is that at the moment with split and event index we
don't actually disable events at all.

static void virtqueue_disable_cb_split(struct virtqueue *_vq)
{
        struct vring_virtqueue *vq = to_vvq(_vq);

        if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
                vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
                if (!vq->event)
                        vq->split.vring.avail->flags =
                                cpu_to_virtio16(_vq->vdev,
                                                vq->split.avail_flags_shadow);
        }
}

Can you try your napi patch + disable event index?
Wei Wang Feb. 11, 2021, 12:13 a.m. UTC | #24
On Wed, Feb 10, 2021 at 1:14 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:
> > On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > > >>> I have no preference. Just curious, especially if it complicates the patch.
> > > > >>>
> > > > >> My understanding is that. It's probably ok for net. But we probably need
> > > > >> to document the assumptions to make sure it was not abused in other drivers.
> > > > >>
> > > > >> Introduce new parameters for find_vqs() can help to eliminate the subtle
> > > > >> stuffs but I agree it looks like a overkill.
> > > > >>
> > > > >> (Btw, I forget the numbers but wonder how much difference if we simple
> > > > >> remove the free_old_xmits() from the rx NAPI path?)
> > > > > The committed patchset did not record those numbers, but I found them
> > > > > in an earlier iteration:
> > > > >
> > > > >    [PATCH net-next 0/3] virtio-net tx napi
> > > > >    https://lists.openwall.net/netdev/2017/04/02/55
> > > > >
> > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > > > > time. For instance:
> > > > >
> > > > >      TCP_RR Latency (us):
> > > > >      1x:
> > > > >        p50              24       24       21
> > > > >        p99              27       27       27
> > > > >        Gcycles         299      432      308
> > > > >
> > > > > I'm concerned that removing it now may cause a regression report in a
> > > > > few months. That is higher risk than the spurious interrupt warning
> > > > > that was only reported after years of use.
> > > >
> > > >
> > > > Right.
> > > >
> > > > So if Michael is fine with this approach, I'm ok with it. But we
> > > > probably need to a TODO to invent the interrupt handlers that can be
> > > > used for more than one virtqueues. When MSI-X is enabled, the interrupt
> > > > handler (vring_interrup()) assumes the interrupt is used by a single
> > > > virtqueue.
> > >
> > > Thanks.
> > >
> > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of
> > > cleaning directly in this rx-napi function was not effective at
> > > suppressing the warning, I understand.
> >
> > Correct. I tried the approach to schedule tx napi instead of directly
> > do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
> > still happens.
>
> Two questions here: is the device using packed or split vqs?
> And is event index enabled?
>

The device is indeed using split vqs with event index enabled.

> I think one issue is that at the moment with split and event index we
> don't actually disable events at all.
>
You mean we don't disable 'interrupts' right?
What is the reason for that?

> static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> {
>         struct vring_virtqueue *vq = to_vvq(_vq);
>
>         if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
>                 vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
>                 if (!vq->event)
>                         vq->split.vring.avail->flags =
>                                 cpu_to_virtio16(_vq->vdev,
>                                                 vq->split.avail_flags_shadow);
>         }
> }
>
> Can you try your napi patch + disable event index?
>

Thanks for the suggestion.
I've run the reproducer with  napi patch + disable event index, and so
far, I did not see the warning getting triggered. Will keep it running
for a bit longer.

>
> --
> MST
>
Jason Wang Feb. 18, 2021, 5:39 a.m. UTC | #25
On 2021/2/10 下午5:14, Michael S. Tsirkin wrote:
> On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:
>> On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>>>>>> I have no preference. Just curious, especially if it complicates the patch.
>>>>>>>
>>>>>> My understanding is that. It's probably ok for net. But we probably need
>>>>>> to document the assumptions to make sure it was not abused in other drivers.
>>>>>>
>>>>>> Introduce new parameters for find_vqs() can help to eliminate the subtle
>>>>>> stuffs but I agree it looks like a overkill.
>>>>>>
>>>>>> (Btw, I forget the numbers but wonder how much difference if we simple
>>>>>> remove the free_old_xmits() from the rx NAPI path?)
>>>>> The committed patchset did not record those numbers, but I found them
>>>>> in an earlier iteration:
>>>>>
>>>>>     [PATCH net-next 0/3] virtio-net tx napi
>>>>>     https://lists.openwall.net/netdev/2017/04/02/55
>>>>>
>>>>> It did seem to significantly reduce compute cycles ("Gcyc") at the
>>>>> time. For instance:
>>>>>
>>>>>       TCP_RR Latency (us):
>>>>>       1x:
>>>>>         p50              24       24       21
>>>>>         p99              27       27       27
>>>>>         Gcycles         299      432      308
>>>>>
>>>>> I'm concerned that removing it now may cause a regression report in a
>>>>> few months. That is higher risk than the spurious interrupt warning
>>>>> that was only reported after years of use.
>>>>
>>>> Right.
>>>>
>>>> So if Michael is fine with this approach, I'm ok with it. But we
>>>> probably need to a TODO to invent the interrupt handlers that can be
>>>> used for more than one virtqueues. When MSI-X is enabled, the interrupt
>>>> handler (vring_interrup()) assumes the interrupt is used by a single
>>>> virtqueue.
>>> Thanks.
>>>
>>> The approach to schedule tx-napi from virtnet_poll_cleantx instead of
>>> cleaning directly in this rx-napi function was not effective at
>>> suppressing the warning, I understand.
>> Correct. I tried the approach to schedule tx napi instead of directly
>> do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
>> still happens.
> Two questions here: is the device using packed or split vqs?
> And is event index enabled?
>
> I think one issue is that at the moment with split and event index we
> don't actually disable events at all.


Do we really have a way to disable that? (We don't have a flag like 
packed virtqueue)

Or you mean the trick [1] when I post tx interrupt RFC?

Thanks

[1] https://lkml.org/lkml/2015/2/9/113


>
> static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> {
>          struct vring_virtqueue *vq = to_vvq(_vq);
>
>          if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
>                  vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
>                  if (!vq->event)
>                          vq->split.vring.avail->flags =
>                                  cpu_to_virtio16(_vq->vdev,
>                                                  vq->split.avail_flags_shadow);
>          }
> }
>
> Can you try your napi patch + disable event index?
>
>
Michael S. Tsirkin Feb. 21, 2021, 11:33 a.m. UTC | #26
On Thu, Feb 18, 2021 at 01:39:19PM +0800, Jason Wang wrote:
> 
> On 2021/2/10 下午5:14, Michael S. Tsirkin wrote:
> > On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:
> > > On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > > > > I have no preference. Just curious, especially if it complicates the patch.
> > > > > > > > 
> > > > > > > My understanding is that. It's probably ok for net. But we probably need
> > > > > > > to document the assumptions to make sure it was not abused in other drivers.
> > > > > > > 
> > > > > > > Introduce new parameters for find_vqs() can help to eliminate the subtle
> > > > > > > stuffs but I agree it looks like a overkill.
> > > > > > > 
> > > > > > > (Btw, I forget the numbers but wonder how much difference if we simple
> > > > > > > remove the free_old_xmits() from the rx NAPI path?)
> > > > > > The committed patchset did not record those numbers, but I found them
> > > > > > in an earlier iteration:
> > > > > > 
> > > > > >     [PATCH net-next 0/3] virtio-net tx napi
> > > > > >     https://lists.openwall.net/netdev/2017/04/02/55
> > > > > > 
> > > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > > > > > time. For instance:
> > > > > > 
> > > > > >       TCP_RR Latency (us):
> > > > > >       1x:
> > > > > >         p50              24       24       21
> > > > > >         p99              27       27       27
> > > > > >         Gcycles         299      432      308
> > > > > > 
> > > > > > I'm concerned that removing it now may cause a regression report in a
> > > > > > few months. That is higher risk than the spurious interrupt warning
> > > > > > that was only reported after years of use.
> > > > > 
> > > > > Right.
> > > > > 
> > > > > So if Michael is fine with this approach, I'm ok with it. But we
> > > > > probably need to a TODO to invent the interrupt handlers that can be
> > > > > used for more than one virtqueues. When MSI-X is enabled, the interrupt
> > > > > handler (vring_interrup()) assumes the interrupt is used by a single
> > > > > virtqueue.
> > > > Thanks.
> > > > 
> > > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of
> > > > cleaning directly in this rx-napi function was not effective at
> > > > suppressing the warning, I understand.
> > > Correct. I tried the approach to schedule tx napi instead of directly
> > > do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
> > > still happens.
> > Two questions here: is the device using packed or split vqs?
> > And is event index enabled?
> > 
> > I think one issue is that at the moment with split and event index we
> > don't actually disable events at all.
> 
> 
> Do we really have a way to disable that? (We don't have a flag like packed
> virtqueue)
> 
> Or you mean the trick [1] when I post tx interrupt RFC?
> 
> Thanks
> 
> [1] https://lkml.org/lkml/2015/2/9/113

Something like this. Or basically any other value will do,
e.g. move the index back to a value just signalled ...

> 
> > 
> > static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > {
> >          struct vring_virtqueue *vq = to_vvq(_vq);
> > 
> >          if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> >                  vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> >                  if (!vq->event)
> >                          vq->split.vring.avail->flags =
> >                                  cpu_to_virtio16(_vq->vdev,
> >                                                  vq->split.avail_flags_shadow);
> >          }
> > }
> > 
> > Can you try your napi patch + disable event index?
> > 
> >
Michael S. Tsirkin April 12, 2021, 10:08 p.m. UTC | #27
OK I started looking at this again. My idea is simple.
A. disable callbacks before we try to drain skbs
B. actually do disable callbacks even with event idx

To make B not regress, we need to
C. detect the common case of disable after event triggering and skip the write then.

I added a new event_triggered flag for that.
Completely untested - but then I could not see the warnings either.
Would be very much interested to know whether this patch helps
resolve the sruprious interrupt problem at all ...


Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..a91a2d6d1ee3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;
 
 	if (__netif_tx_trylock(txq)) {
+		virtqueue_disable_cb(vq);
 		free_old_xmit_skbs(sq, true);
 		__netif_tx_unlock(txq);
 	}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..213bfe8b6051 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* Hint for event idx: already triggered no need to disable. */
+	bool event_triggered;
+
 	union {
 		/* Available for split ring */
 		struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
 
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
+		if (vq->event)
+			/* TODO: this is a hack. Figure out a cleaner value to write. */
+			vring_used_event(&vq->split.vring) = 0x0;
+		else
 			vq->split.vring.avail->flags =
 				cpu_to_virtio16(_vq->vdev,
 						vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->packed_ring = true;
 	vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,14 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	/* If device triggered an event already it won't trigger one again:
+	 * no need to disable.
+	 */
+	if (vq->event_triggered) {
+		vq->event_triggered = false;
+		return;
+	}
+
 	if (vq->packed_ring)
 		virtqueue_disable_cb_packed(_vq);
 	else
@@ -2044,6 +2059,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 	if (unlikely(vq->broken))
 		return IRQ_HANDLED;
 
+	/* Just a hint for performance: so it's ok that this can be racy! */
+	if (vq->event)
+		vq->event_triggered = true;
+
 	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
 	if (vq->vq.callback)
 		vq->vq.callback(&vq->vq);
@@ -2083,6 +2102,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG
Michael S. Tsirkin April 12, 2021, 10:33 p.m. UTC | #28
On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
> OK I started looking at this again. My idea is simple.
> A. disable callbacks before we try to drain skbs
> B. actually do disable callbacks even with event idx
> 
> To make B not regress, we need to
> C. detect the common case of disable after event triggering and skip the write then.
> 
> I added a new event_triggered flag for that.
> Completely untested - but then I could not see the warnings either.
> Would be very much interested to know whether this patch helps
> resolve the sruprious interrupt problem at all ...
> 
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Hmm a slightly cleaner alternative is to clear the flag when enabling interrupts ...
I wonder which cacheline it's best to use for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..c23341b18eb5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;
 
 	if (__netif_tx_trylock(txq)) {
+		virtqueue_disable_cb(vq);
 		free_old_xmit_skbs(sq, true);
 		__netif_tx_unlock(txq);
 	}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..88f0b16b11b8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* Hint for event idx: already triggered no need to disable. */
+	bool event_triggered;
+
 	union {
 		/* Available for split ring */
 		struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
 
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
+		if (vq->event)
+			/* TODO: this is a hack. Figure out a cleaner value to write. */
+			vring_used_event(&vq->split.vring) = 0x0;
+		else
 			vq->split.vring.avail->flags =
 				cpu_to_virtio16(_vq->vdev,
 						vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->packed_ring = true;
 	vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	/* If device triggered an event already it won't trigger one again:
+	 * no need to disable.
+	 */
+	if (vq->event_triggered)
+		return;
+
 	if (vq->packed_ring)
 		virtqueue_disable_cb_packed(_vq);
 	else
@@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	if (vq->event_triggered)
+		vq->event_triggered = false;
+
 	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
 				 virtqueue_enable_cb_prepare_split(_vq);
 }
@@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	if (vq->event_triggered)
+		vq->event_triggered = false;
+
 	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
 				 virtqueue_enable_cb_delayed_split(_vq);
 }
@@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 	if (unlikely(vq->broken))
 		return IRQ_HANDLED;
 
+	/* Just a hint for performance: so it's ok that this can be racy! */
+	if (vq->event)
+		vq->event_triggered = true;
+
 	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
 	if (vq->vq.callback)
 		vq->vq.callback(&vq->vq);
@@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG
David Miller April 12, 2021, 11:14 p.m. UTC | #29
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 12 Apr 2021 18:33:45 -0400

> On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
>> OK I started looking at this again. My idea is simple.
>> A. disable callbacks before we try to drain skbs
>> B. actually do disable callbacks even with event idx
>> 
>> To make B not regress, we need to
>> C. detect the common case of disable after event triggering and skip the write then.
>> 
>> I added a new event_triggered flag for that.
>> Completely untested - but then I could not see the warnings either.
>> Would be very much interested to know whether this patch helps
>> resolve the sruprious interrupt problem at all ...
>> 
>> 
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Hmm a slightly cleaner alternative is to clear the flag when enabling interrupts ...
> I wonder which cacheline it's best to use for this.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Please make a fresh new submission if you want to use this approach, thanks.
Michael S. Tsirkin April 13, 2021, 4:24 a.m. UTC | #30
On Mon, Apr 12, 2021 at 04:14:58PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Mon, 12 Apr 2021 18:33:45 -0400
> 
> > On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
> >> OK I started looking at this again. My idea is simple.
> >> A. disable callbacks before we try to drain skbs
> >> B. actually do disable callbacks even with event idx
> >> 
> >> To make B not regress, we need to
> >> C. detect the common case of disable after event triggering and skip the write then.
> >> 
> >> I added a new event_triggered flag for that.
> >> Completely untested - but then I could not see the warnings either.
> >> Would be very much interested to know whether this patch helps
> >> resolve the sruprious interrupt problem at all ...
> >> 
> >> 
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > Hmm a slightly cleaner alternative is to clear the flag when enabling interrupts ...
> > I wonder which cacheline it's best to use for this.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Please make a fresh new submission if you want to use this approach, thanks.

Absolutely. This is untested so I just sent this idea out for early feedback
and hopefully help with testing on real hardware.
Sorry about being unclear.
Michael S. Tsirkin April 13, 2021, 5:15 a.m. UTC | #31
On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > >
> > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > >
> > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This description does not make sense to me.
> > > > > > > > > >
> > > > > > > > > > irq X: nobody cared
> > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > >
> > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > >
> > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > just preventing linux from complaining.
> > > > > > > > >
> > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > and a few tx interrupts.
> > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > >
> > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > interrupt load.
> > > > > > >
> > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > note_interrupt that applies here.
> > > > > > >
> > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > stealing, it is a net win.
> > > > > >
> > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > >
> > > > > Yes. It was known that the work stealing is more effective in some
> > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > >
> > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > the feature. That is not captured by this statistic.
> > > > >
> > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > especially why this high spurious rate exhibits in this workload with
> > > > > many concurrent flows.
> > > >
> > > >
> > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > Each time we xmit a packet, it is stolen and freed.
> > > > Since xmit enables callbacks (just in case!) we also
> > > > get an interrupt, which is automatically spurious.
> > > >
> > > > My conclusion is that we shouldn't just work around it but instead
> > > > (or additionally?)
> > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > a. we are currently stealing packets
> > > > or
> > > > b. we stole all packets
> >
> > Agreed. This might prove a significant performance gain at the same time :)
> >
> > > >
> > > Thinking along this line, that probably means, we should disable cb on
> > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > reenable it after the rx napi work is done?
> > > Also, I wonder if it is too late to disable cb at the point we start
> > > to steal pkts or have stolen all pkts.
> >
> > The earlier the better. I see no benefit to delay until the rx handler
> > actually runs.
> >
> 
> I've been thinking more on this. I think the fundamental issue here is
> that the rx napi handler virtnet_poll() does the tx side work by
> calling virtnet_poll_cleantx() without any notification to the tx
> side.
> I am thinking, in virtnet_poll(), instead of directly call
> virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> schedule the tx side napi, and let the tx napi handler do the cleaning
> work. This way, we automatically call virtqueue_disable_cb() on the tx
> vq, and after the tx work is done, virtqueue_napi_complete() is called
> to re-enable the cb on the tx side. This way, the tx side knows what
> has been done, and will likely reduce the # of spurious tx interrupts?
> And I don't think there is much cost in doing that, since
> napi_schedule() basically queues the tx napi to the back of its
> napi_list, and serves it right after the rx napi handler is done.
> What do you guys think? I could quickly test it up to see if it solves
> the issue.


Sure pls test. I think you will want to disable event index
for now to make sure disable cb is not a nop (I am working on
fixing that).

> > > Because the steal work is done
> > > in the napi handler of the rx queue. But the tx interrupt must have
> > > been raised before that. Will we come back to process the tx interrupt
> > > again after we re-enabled the cb on the tx side?
> > >
> > > > This should be enough to reduce the chances below 99% ;)
> > > >
> > > > One annoying thing is that with split and event index, we do not disable
> > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > event index feature? I am not sure it is actually worth it with
> > > > stealing.
> >
> > With event index, we suppress interrupts when another interrupt is
> > already pending from a previous packet, right? When the previous
> > position of the producer is already beyond the consumer. It doesn't
> > matter whether the previous packet triggered a tx interrupt or
> > deferred to an already scheduled rx interrupt? From that seems fine to
> > leave it out.
Wei Wang Sept. 29, 2021, 8:21 p.m. UTC | #32
On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > > >
> > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > >
> > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > > case.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > >
> > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > > >
> > > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > > >
> > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > >
> > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > > >
> > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > > interrupt load.
> > > > > > > >
> > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > > note_interrupt that applies here.
> > > > > > > >
> > > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > > stealing, it is a net win.
> > > > > > >
> > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > > >
> > > > > > Yes. It was known that the work stealing is more effective in some
> > > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > > >
> > > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > > the feature. That is not captured by this statistic.
> > > > > >
> > > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > > especially why this high spurious rate exhibits in this workload with
> > > > > > many concurrent flows.
> > > > >
> > > > >
> > > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > > Each time we xmit a packet, it is stolen and freed.
> > > > > Since xmit enables callbacks (just in case!) we also
> > > > > get an interrupt, which is automatically spurious.
> > > > >
> > > > > My conclusion is that we shouldn't just work around it but instead
> > > > > (or additionally?)
> > > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > > a. we are currently stealing packets
> > > > > or
> > > > > b. we stole all packets
> > >
> > > Agreed. This might prove a significant performance gain at the same time :)
> > >
> > > > >
> > > > Thinking along this line, that probably means, we should disable cb on
> > > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > > reenable it after the rx napi work is done?
> > > > Also, I wonder if it is too late to disable cb at the point we start
> > > > to steal pkts or have stolen all pkts.
> > >
> > > The earlier the better. I see no benefit to delay until the rx handler
> > > actually runs.
> > >
> >
> > I've been thinking more on this. I think the fundamental issue here is
> > that the rx napi handler virtnet_poll() does the tx side work by
> > calling virtnet_poll_cleantx() without any notification to the tx
> > side.
> > I am thinking, in virtnet_poll(), instead of directly call
> > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> > schedule the tx side napi, and let the tx napi handler do the cleaning
> > work. This way, we automatically call virtqueue_disable_cb() on the tx
> > vq, and after the tx work is done, virtqueue_napi_complete() is called
> > to re-enable the cb on the tx side. This way, the tx side knows what
> > has been done, and will likely reduce the # of spurious tx interrupts?
> > And I don't think there is much cost in doing that, since
> > napi_schedule() basically queues the tx napi to the back of its
> > napi_list, and serves it right after the rx napi handler is done.
> > What do you guys think? I could quickly test it up to see if it solves
> > the issue.
>
>
> Sure pls test. I think you will want to disable event index
> for now to make sure disable cb is not a nop (I am working on
> fixing that).
>

Hi Michael and Jason,

I'd like to follow up on this issue a bit more.
I've done some more investigation into this issue:
1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb
aggressively"), we are still seeing this issue with a tcp_stream test
with 240 flows.
2. We've tried with the following patch to suppress cleaning tx queue
from rx napi handler for 10% of the time:
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 79bd2585ec6b..711768dbc617 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
                return;

        if (__netif_tx_trylock(txq)) {
+               if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10))
+                       goto unlock;
                do {
                        virtqueue_disable_cb(sq->vq);
                        free_old_xmit_skbs(sq, true);
@@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
                if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
                        netif_tx_wake_queue(txq);

+unlock:
                __netif_tx_unlock(txq);
        }
 }
This also does not help. It turns out skipping 10% is just not enough.
We have to skip for 50% of the time in order for the warning to be
suppressed.
And this does not seem to be a viable solution since how much we skip
probably will depend on the traffic pattern.

My questions here:
1. Michael mentioned that if we use split queues with event idx, the
interrupts are not actually being disabled. Is this still the case? If
so, is that also the cause for so many spurious interrupts?
2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up
virtio_disable_cb"). I am not quite sure, would that change help
reduce the # of spurious interrupts we see if we use split queues with
event idx? From my limited understanding, that patch skips calling
virtqueue_disable_cb_split() if event_trigger is set for split queues.

BTW, I have the setup to reproduce this issue easily. So do let me
know if you have other ideas on how to fix it.

Thanks.
Wei


> > > > Because the steal work is done
> > > > in the napi handler of the rx queue. But the tx interrupt must have
> > > > been raised before that. Will we come back to process the tx interrupt
> > > > again after we re-enabled the cb on the tx side?
> > > >
> > > > > This should be enough to reduce the chances below 99% ;)
> > > > >
> > > > > One annoying thing is that with split and event index, we do not disable
> > > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > > event index feature? I am not sure it is actually worth it with
> > > > > stealing.
> > >
> > > With event index, we suppress interrupts when another interrupt is
> > > already pending from a previous packet, right? When the previous
> > > position of the producer is already beyond the consumer. It doesn't
> > > matter whether the previous packet triggered a tx interrupt or
> > > deferred to an already scheduled rx interrupt? From that seems fine to
> > > leave it out.
>
Michael S. Tsirkin Sept. 29, 2021, 9:53 p.m. UTC | #33
On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > > > >
> > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > >
> > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > > > >
> > > > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > > > >
> > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > >
> > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > > > >
> > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > > > interrupt load.
> > > > > > > > >
> > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > > > note_interrupt that applies here.
> > > > > > > > >
> > > > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > > > stealing, it is a net win.
> > > > > > > >
> > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > > > >
> > > > > > > Yes. It was known that the work stealing is more effective in some
> > > > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > > > >
> > > > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > > > the feature. That is not captured by this statistic.
> > > > > > >
> > > > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > > > especially why this high spurious rate exhibits in this workload with
> > > > > > > many concurrent flows.
> > > > > >
> > > > > >
> > > > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > > > Each time we xmit a packet, it is stolen and freed.
> > > > > > Since xmit enables callbacks (just in case!) we also
> > > > > > get an interrupt, which is automatically spurious.
> > > > > >
> > > > > > My conclusion is that we shouldn't just work around it but instead
> > > > > > (or additionally?)
> > > > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > > > a. we are currently stealing packets
> > > > > > or
> > > > > > b. we stole all packets
> > > >
> > > > Agreed. This might prove a significant performance gain at the same time :)
> > > >
> > > > > >
> > > > > Thinking along this line, that probably means, we should disable cb on
> > > > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > > > reenable it after the rx napi work is done?
> > > > > Also, I wonder if it is too late to disable cb at the point we start
> > > > > to steal pkts or have stolen all pkts.
> > > >
> > > > The earlier the better. I see no benefit to delay until the rx handler
> > > > actually runs.
> > > >
> > >
> > > I've been thinking more on this. I think the fundamental issue here is
> > > that the rx napi handler virtnet_poll() does the tx side work by
> > > calling virtnet_poll_cleantx() without any notification to the tx
> > > side.
> > > I am thinking, in virtnet_poll(), instead of directly call
> > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> > > schedule the tx side napi, and let the tx napi handler do the cleaning
> > > work. This way, we automatically call virtqueue_disable_cb() on the tx
> > > vq, and after the tx work is done, virtqueue_napi_complete() is called
> > > to re-enable the cb on the tx side. This way, the tx side knows what
> > > has been done, and will likely reduce the # of spurious tx interrupts?
> > > And I don't think there is much cost in doing that, since
> > > napi_schedule() basically queues the tx napi to the back of its
> > > napi_list, and serves it right after the rx napi handler is done.
> > > What do you guys think? I could quickly test it up to see if it solves
> > > the issue.
> >
> >
> > Sure pls test. I think you will want to disable event index
> > for now to make sure disable cb is not a nop (I am working on
> > fixing that).
> >
> 
> Hi Michael and Jason,
> 
> I'd like to follow up on this issue a bit more.
> I've done some more investigation into this issue:
> 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb
> aggressively"), we are still seeing this issue with a tcp_stream test
> with 240 flows.
> 2. We've tried with the following patch to suppress cleaning tx queue
> from rx napi handler for 10% of the time:
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 79bd2585ec6b..711768dbc617 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>                 return;
> 
>         if (__netif_tx_trylock(txq)) {
> +               if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10))
> +                       goto unlock;
>                 do {
>                         virtqueue_disable_cb(sq->vq);
>                         free_old_xmit_skbs(sq, true);
> @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>                 if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>                         netif_tx_wake_queue(txq);
> 
> +unlock:
>                 __netif_tx_unlock(txq);
>         }
>  }
> This also does not help. It turns out skipping 10% is just not enough.
> We have to skip for 50% of the time in order for the warning to be
> suppressed.
> And this does not seem to be a viable solution since how much we skip
> probably will depend on the traffic pattern.
> 
> My questions here:
> 1. Michael mentioned that if we use split queues with event idx, the
> interrupts are not actually being disabled. Is this still the case? If
> so, is that also the cause for so many spurious interrupts?
> 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up
> virtio_disable_cb"). I am not quite sure, would that change help
> reduce the # of spurious interrupts we see if we use split queues with
> event idx? From my limited understanding, that patch skips calling
> virtqueue_disable_cb_split() if event_trigger is set for split queues.
> 
> BTW, I have the setup to reproduce this issue easily. So do let me
> know if you have other ideas on how to fix it.
> 
> Thanks.
> Wei

I think that commit is needed to fix the issue, yes.

My suggestion is to try v5.14 in its entirety
rather than cherry-picking.

If you see that the issue is
fixed there I can point you to a list of commit to backport.

> 
> > > > > Because the steal work is done
> > > > > in the napi handler of the rx queue. But the tx interrupt must have
> > > > > been raised before that. Will we come back to process the tx interrupt
> > > > > again after we re-enabled the cb on the tx side?
> > > > >
> > > > > > This should be enough to reduce the chances below 99% ;)
> > > > > >
> > > > > > One annoying thing is that with split and event index, we do not disable
> > > > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > > > event index feature? I am not sure it is actually worth it with
> > > > > > stealing.
> > > >
> > > > With event index, we suppress interrupts when another interrupt is
> > > > already pending from a previous packet, right? When the previous
> > > > position of the producer is already beyond the consumer. It doesn't
> > > > matter whether the previous packet triggered a tx interrupt or
> > > > deferred to an already scheduled rx interrupt? From that seems fine to
> > > > leave it out.
> >
Wei Wang Sept. 29, 2021, 11:08 p.m. UTC | #34
On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > > > > case.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > > >
> > > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > > >
> > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > > > > >
> > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > > > > interrupt load.
> > > > > > > > > >
> > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > > > > note_interrupt that applies here.
> > > > > > > > > >
> > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > > > > stealing, it is a net win.
> > > > > > > > >
> > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > > > > >
> > > > > > > > Yes. It was known that the work stealing is more effective in some
> > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > > > > >
> > > > > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > > > > the feature. That is not captured by this statistic.
> > > > > > > >
> > > > > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > > > > especially why this high spurious rate exhibits in this workload with
> > > > > > > > many concurrent flows.
> > > > > > >
> > > > > > >
> > > > > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > > > > Each time we xmit a packet, it is stolen and freed.
> > > > > > > Since xmit enables callbacks (just in case!) we also
> > > > > > > get an interrupt, which is automatically spurious.
> > > > > > >
> > > > > > > My conclusion is that we shouldn't just work around it but instead
> > > > > > > (or additionally?)
> > > > > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > > > > a. we are currently stealing packets
> > > > > > > or
> > > > > > > b. we stole all packets
> > > > >
> > > > > Agreed. This might prove a significant performance gain at the same time :)
> > > > >
> > > > > > >
> > > > > > Thinking along this line, that probably means, we should disable cb on
> > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > > > > reenable it after the rx napi work is done?
> > > > > > Also, I wonder if it is too late to disable cb at the point we start
> > > > > > to steal pkts or have stolen all pkts.
> > > > >
> > > > > The earlier the better. I see no benefit to delay until the rx handler
> > > > > actually runs.
> > > > >
> > > >
> > > > I've been thinking more on this. I think the fundamental issue here is
> > > > that the rx napi handler virtnet_poll() does the tx side work by
> > > > calling virtnet_poll_cleantx() without any notification to the tx
> > > > side.
> > > > I am thinking, in virtnet_poll(), instead of directly call
> > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> > > > schedule the tx side napi, and let the tx napi handler do the cleaning
> > > > work. This way, we automatically call virtqueue_disable_cb() on the tx
> > > > vq, and after the tx work is done, virtqueue_napi_complete() is called
> > > > to re-enable the cb on the tx side. This way, the tx side knows what
> > > > has been done, and will likely reduce the # of spurious tx interrupts?
> > > > And I don't think there is much cost in doing that, since
> > > > napi_schedule() basically queues the tx napi to the back of its
> > > > napi_list, and serves it right after the rx napi handler is done.
> > > > What do you guys think? I could quickly test it up to see if it solves
> > > > the issue.
> > >
> > >
> > > Sure pls test. I think you will want to disable event index
> > > for now to make sure disable cb is not a nop (I am working on
> > > fixing that).
> > >
> >
> > Hi Michael and Jason,
> >
> > I'd like to follow up on this issue a bit more.
> > I've done some more investigation into this issue:
> > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb
> > aggressively"), we are still seeing this issue with a tcp_stream test
> > with 240 flows.
> > 2. We've tried with the following patch to suppress cleaning tx queue
> > from rx napi handler for 10% of the time:
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 79bd2585ec6b..711768dbc617 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >                 return;
> >
> >         if (__netif_tx_trylock(txq)) {
> > +               if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10))
> > +                       goto unlock;
> >                 do {
> >                         virtqueue_disable_cb(sq->vq);
> >                         free_old_xmit_skbs(sq, true);
> > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >                 if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> >                         netif_tx_wake_queue(txq);
> >
> > +unlock:
> >                 __netif_tx_unlock(txq);
> >         }
> >  }
> > This also does not help. It turns out skipping 10% is just not enough.
> > We have to skip for 50% of the time in order for the warning to be
> > suppressed.
> > And this does not seem to be a viable solution since how much we skip
> > probably will depend on the traffic pattern.
> >
> > My questions here:
> > 1. Michael mentioned that if we use split queues with event idx, the
> > interrupts are not actually being disabled. Is this still the case? If
> > so, is that also the cause for so many spurious interrupts?
> > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up
> > virtio_disable_cb"). I am not quite sure, would that change help
> > reduce the # of spurious interrupts we see if we use split queues with
> > event idx? From my limited understanding, that patch skips calling
> > virtqueue_disable_cb_split() if event_trigger is set for split queues.
> >
> > BTW, I have the setup to reproduce this issue easily. So do let me
> > know if you have other ideas on how to fix it.
> >
> > Thanks.
> > Wei
>
> I think that commit is needed to fix the issue, yes.
>
> My suggestion is to try v5.14 in its entirety
> rather than cherry-picking.
>
> If you see that the issue is
> fixed there I can point you to a list of commit to backport.
>

Thanks Michael. It does seem with 5.14 kernel, I no longer see this
issue happening, which is great!
Could you point me to a list of commits that fixed it?

> >
> > > > > > Because the steal work is done
> > > > > > in the napi handler of the rx queue. But the tx interrupt must have
> > > > > > been raised before that. Will we come back to process the tx interrupt
> > > > > > again after we re-enabled the cb on the tx side?
> > > > > >
> > > > > > > This should be enough to reduce the chances below 99% ;)
> > > > > > >
> > > > > > > One annoying thing is that with split and event index, we do not disable
> > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > > > > event index feature? I am not sure it is actually worth it with
> > > > > > > stealing.
> > > > >
> > > > > With event index, we suppress interrupts when another interrupt is
> > > > > already pending from a previous packet, right? When the previous
> > > > > position of the producer is already beyond the consumer. It doesn't
> > > > > matter whether the previous packet triggered a tx interrupt or
> > > > > deferred to an already scheduled rx interrupt? From that seems fine to
> > > > > leave it out.
> > >
>
Michael S. Tsirkin Sept. 30, 2021, 5:40 a.m. UTC | #35
On Wed, Sep 29, 2021 at 04:08:29PM -0700, Wei Wang wrote:
> On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > >
> > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > > > > > >
> > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > > > > > interrupt load.
> > > > > > > > > > >
> > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > > > > > note_interrupt that applies here.
> > > > > > > > > > >
> > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > > > > > stealing, it is a net win.
> > > > > > > > > >
> > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > > > > > >
> > > > > > > > > Yes. It was known that the work stealing is more effective in some
> > > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > > > > > >
> > > > > > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > > > > > the feature. That is not captured by this statistic.
> > > > > > > > >
> > > > > > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > > > > > especially why this high spurious rate exhibits in this workload with
> > > > > > > > > many concurrent flows.
> > > > > > > >
> > > > > > > >
> > > > > > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > > > > > Each time we xmit a packet, it is stolen and freed.
> > > > > > > > Since xmit enables callbacks (just in case!) we also
> > > > > > > > get an interrupt, which is automatically spurious.
> > > > > > > >
> > > > > > > > My conclusion is that we shouldn't just work around it but instead
> > > > > > > > (or additionally?)
> > > > > > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > > > > > a. we are currently stealing packets
> > > > > > > > or
> > > > > > > > b. we stole all packets
> > > > > >
> > > > > > Agreed. This might prove a significant performance gain at the same time :)
> > > > > >
> > > > > > > >
> > > > > > > Thinking along this line, that probably means, we should disable cb on
> > > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > > > > > reenable it after the rx napi work is done?
> > > > > > > Also, I wonder if it is too late to disable cb at the point we start
> > > > > > > to steal pkts or have stolen all pkts.
> > > > > >
> > > > > > The earlier the better. I see no benefit to delay until the rx handler
> > > > > > actually runs.
> > > > > >
> > > > >
> > > > > I've been thinking more on this. I think the fundamental issue here is
> > > > > that the rx napi handler virtnet_poll() does the tx side work by
> > > > > calling virtnet_poll_cleantx() without any notification to the tx
> > > > > side.
> > > > > I am thinking, in virtnet_poll(), instead of directly call
> > > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> > > > > schedule the tx side napi, and let the tx napi handler do the cleaning
> > > > > work. This way, we automatically call virtqueue_disable_cb() on the tx
> > > > > vq, and after the tx work is done, virtqueue_napi_complete() is called
> > > > > to re-enable the cb on the tx side. This way, the tx side knows what
> > > > > has been done, and will likely reduce the # of spurious tx interrupts?
> > > > > And I don't think there is much cost in doing that, since
> > > > > napi_schedule() basically queues the tx napi to the back of its
> > > > > napi_list, and serves it right after the rx napi handler is done.
> > > > > What do you guys think? I could quickly test it up to see if it solves
> > > > > the issue.
> > > >
> > > >
> > > > Sure pls test. I think you will want to disable event index
> > > > for now to make sure disable cb is not a nop (I am working on
> > > > fixing that).
> > > >
> > >
> > > Hi Michael and Jason,
> > >
> > > I'd like to follow up on this issue a bit more.
> > > I've done some more investigation into this issue:
> > > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb
> > > aggressively"), we are still seeing this issue with a tcp_stream test
> > > with 240 flows.
> > > 2. We've tried with the following patch to suppress cleaning tx queue
> > > from rx napi handler for 10% of the time:
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 79bd2585ec6b..711768dbc617 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >                 return;
> > >
> > >         if (__netif_tx_trylock(txq)) {
> > > +               if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10))
> > > +                       goto unlock;
> > >                 do {
> > >                         virtqueue_disable_cb(sq->vq);
> > >                         free_old_xmit_skbs(sq, true);
> > > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >                 if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > >                         netif_tx_wake_queue(txq);
> > >
> > > +unlock:
> > >                 __netif_tx_unlock(txq);
> > >         }
> > >  }
> > > This also does not help. It turns out skipping 10% is just not enough.
> > > We have to skip for 50% of the time in order for the warning to be
> > > suppressed.
> > > And this does not seem to be a viable solution since how much we skip
> > > probably will depend on the traffic pattern.
> > >
> > > My questions here:
> > > 1. Michael mentioned that if we use split queues with event idx, the
> > > interrupts are not actually being disabled. Is this still the case? If
> > > so, is that also the cause for so many spurious interrupts?
> > > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up
> > > virtio_disable_cb"). I am not quite sure, would that change help
> > > reduce the # of spurious interrupts we see if we use split queues with
> > > event idx? From my limited understanding, that patch skips calling
> > > virtqueue_disable_cb_split() if event_trigger is set for split queues.
> > >
> > > BTW, I have the setup to reproduce this issue easily. So do let me
> > > know if you have other ideas on how to fix it.
> > >
> > > Thanks.
> > > Wei
> >
> > I think that commit is needed to fix the issue, yes.
> >
> > My suggestion is to try v5.14 in its entirety
> > rather than cherry-picking.
> >
> > If you see that the issue is
> > fixed there I can point you to a list of commit to backport.
> >
> 
> Thanks Michael. It does seem with 5.14 kernel, I no longer see this
> issue happening, which is great!
> Could you point me to a list of commits that fixed it?

Try this:

a7766ef18b33 virtio_net: disable cb aggressively
8d622d21d248 virtio: fix up virtio_disable_cb
22bc63c58e87 virtio_net: move txq wakeups under tx q lock
5a2f966d0f3f virtio_net: move tx vq operation under tx queue lock


> > >
> > > > > > > Because the steal work is done
> > > > > > > in the napi handler of the rx queue. But the tx interrupt must have
> > > > > > > been raised before that. Will we come back to process the tx interrupt
> > > > > > > again after we re-enabled the cb on the tx side?
> > > > > > >
> > > > > > > > This should be enough to reduce the chances below 99% ;)
> > > > > > > >
> > > > > > > > One annoying thing is that with split and event index, we do not disable
> > > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > > > > > event index feature? I am not sure it is actually worth it with
> > > > > > > > stealing.
> > > > > >
> > > > > > With event index, we suppress interrupts when another interrupt is
> > > > > > already pending from a previous packet, right? When the previous
> > > > > > position of the producer is already beyond the consumer. It doesn't
> > > > > > matter whether the previous packet triggered a tx interrupt or
> > > > > > deferred to an already scheduled rx interrupt? From that seems fine to
> > > > > > leave it out.
> > > >
> >
Dave Taht Sept. 30, 2021, 6:10 p.m. UTC | #36
To me, the symptoms of this problem (a packet storm) smelt like the
lack of bql leading to very bursty tcp behavior
and enormous tcp RTT inflation in this driver which I observed 6+ months back.

now that it seems to be fixed (?), could you re-run the netperf 128
stream test and share a packet capture? (try with pfifo_fast or
fq_codel
at the qdisc), and sch_fq as a control (which was working correctly).

thx.
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@  static void virtnet_napi_tx_enable(struct virtnet_info *vi,
 		return;
 	}
 
+	/* With napi_tx enabled, free_old_xmit_skbs() could be called from
+	 * rx napi handler. Set work_steal to suppress bad irq warning for
+	 * IRQ_NONE case from tx complete interrupt handler.
+	 */
+	virtqueue_set_work_steal(vq, true);
+
 	return virtnet_napi_enable(vq, napi);
 }
 
-static void virtnet_napi_tx_disable(struct napi_struct *napi)
+static void virtnet_napi_tx_disable(struct virtqueue *vq,
+				    struct napi_struct *napi)
 {
-	if (napi->weight)
+	if (napi->weight) {
 		napi_disable(napi);
+		virtqueue_set_work_steal(vq, false);
+	}
 }
 
 static void refill_work(struct work_struct *work)
@@ -1835,7 +1844,7 @@  static int virtnet_close(struct net_device *dev)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq);
 		napi_disable(&vi->rq[i].napi);
-		virtnet_napi_tx_disable(&vi->sq[i].napi);
+		virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
 	}
 
 	return 0;
@@ -2315,7 +2324,7 @@  static void virtnet_freeze_down(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
-			virtnet_napi_tx_disable(&vi->sq[i].napi);
+			virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
 		}
 	}
 }
@@ -2440,7 +2449,7 @@  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	if (netif_running(dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
-			virtnet_napi_tx_disable(&vi->sq[i].napi);
+			virtnet_napi_tx_disable(vi->sq[i].vq, &vi->sq[i].napi);
 		}
 	}
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..f7c5d697c302 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -105,6 +105,9 @@  struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
+	/* Tx side napi work could be done from rx side. */
+	bool work_steal;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -1604,6 +1607,7 @@  static struct virtqueue *vring_create_virtqueue_packed(
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
+	vq->work_steal = false;
 	vq->last_used_idx = 0;
 	vq->num_added = 0;
 	vq->packed_ring = true;
@@ -2038,6 +2042,9 @@  irqreturn_t vring_interrupt(int irq, void *_vq)
 
 	if (!more_used(vq)) {
 		pr_debug("virtqueue interrupt with no work for %p\n", vq);
+		if (vq->work_steal)
+			return IRQ_HANDLED;
+
 		return IRQ_NONE;
 	}
 
@@ -2082,6 +2089,7 @@  struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->notify = notify;
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
+	vq->work_steal = false;
 	vq->last_used_idx = 0;
 	vq->num_added = 0;
 	vq->use_dma_api = vring_use_dma_api(vdev);
@@ -2266,6 +2274,14 @@  bool virtqueue_is_broken(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_is_broken);
 
+void virtqueue_set_work_steal(struct virtqueue *_vq, bool val)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	vq->work_steal = val;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_work_steal);
+
 /*
  * This should prevent the device from being used, allowing drivers to
  * recover.  You may need to grab appropriate locks to flush.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 55ea329fe72a..091c30f21ff9 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -84,6 +84,8 @@  unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
 bool virtqueue_is_broken(struct virtqueue *vq);
 
+void virtqueue_set_work_steal(struct virtqueue *vq, bool val);
+
 const struct vring *virtqueue_get_vring(struct virtqueue *vq);
 dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq);
 dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);