diff mbox series

[net] virtio-net: correctly enable callback during start_xmit

Message ID 20221212091029.54390-1-jasowang@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] virtio-net: correctly enable callback during start_xmit | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Wang Dec. 12, 2022, 9:10 a.m. UTC
Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
virtqueue callback via the following statement:

        do {
           ......
	} while (use_napi && kick &&
               unlikely(!virtqueue_enable_cb_delayed(sq->vq)));

This will cause a missing call to virtqueue_enable_cb_delayed() when
kick is false. Fixing this by removing the checking of the kick from
the condition to make sure callback is enabled correctly.

Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
The patch is needed for -stable.
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Michael S. Tsirkin Dec. 12, 2022, 9:25 a.m. UTC | #1
On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> virtqueue callback via the following statement:
> 
>         do {
>            ......
> 	} while (use_napi && kick &&
>                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> 
> This will cause a missing call to virtqueue_enable_cb_delayed() when
> kick is false. Fixing this by removing the checking of the kick from
> the condition to make sure callback is enabled correctly.
> 
> Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> The patch is needed for -stable.

stable rules don't allow for theoretical fixes. Was a problem observed?

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 86e52454b5b5..44d7daf0267b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  		free_old_xmit_skbs(sq, false);
>  
> -	} while (use_napi && kick &&
> -	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> +	} while (use_napi &&
> +		 unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>

A bit more explanation pls.  kick simply means !netdev_xmit_more -
if it's false we know there will be another packet, then transmissing
that packet will invoke virtqueue_enable_cb_delayed. No?




  
>  	/* timestamp packet in software */
>  	skb_tx_timestamp(skb);
> -- 
> 2.25.1
Xuan Zhuo Dec. 13, 2022, 3:33 a.m. UTC | #2
On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > virtqueue callback via the following statement:
> >
> >         do {
> >            ......
> > 	} while (use_napi && kick &&
> >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > kick is false. Fixing this by removing the checking of the kick from
> > the condition to make sure callback is enabled correctly.
> >
> > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > The patch is needed for -stable.
>
> stable rules don't allow for theoretical fixes. Was a problem observed?
>
> > ---
> >  drivers/net/virtio_net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 86e52454b5b5..44d7daf0267b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >
> >  		free_old_xmit_skbs(sq, false);
> >
> > -	} while (use_napi && kick &&
> > -	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > +	} while (use_napi &&
> > +		 unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
>
> A bit more explanation pls.  kick simply means !netdev_xmit_more -
> if it's false we know there will be another packet, then transmissing
> that packet will invoke virtqueue_enable_cb_delayed. No?

It's just that there may be a next packet, but in fact there may not be.
For example, the vq is full, and the driver stops the queue.

Thanks.

>
>
>
>
>
> >  	/* timestamp packet in software */
> >  	skb_tx_timestamp(skb);
> > --
> > 2.25.1
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Jason Wang Dec. 13, 2022, 3:43 a.m. UTC | #3
On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > virtqueue callback via the following statement:
> > >
> > >         do {
> > >            ......
> > >     } while (use_napi && kick &&
> > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > >
> > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > kick is false. Fixing this by removing the checking of the kick from
> > > the condition to make sure callback is enabled correctly.
> > >
> > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > > The patch is needed for -stable.
> >
> > stable rules don't allow for theoretical fixes. Was a problem observed?

Yes, running a pktgen sample script can lead to a tx timeout.

> >
> > > ---
> > >  drivers/net/virtio_net.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 86e52454b5b5..44d7daf0267b 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >
> > >             free_old_xmit_skbs(sq, false);
> > >
> > > -   } while (use_napi && kick &&
> > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > +   } while (use_napi &&
> > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > >
> >
> > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > if it's false we know there will be another packet, then transmissing
> > that packet will invoke virtqueue_enable_cb_delayed. No?
>
> It's just that there may be a next packet, but in fact there may not be.
> For example, the vq is full, and the driver stops the queue.

Exactly, when the queue is about to be full we disable tx and wait for
the next tx interrupt to re-enable tx.

Thanks

>
> Thanks.
>
> >
> >
> >
> >
> >
> > >     /* timestamp packet in software */
> > >     skb_tx_timestamp(skb);
> > > --
> > > 2.25.1
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
Michael S. Tsirkin Dec. 13, 2022, 6:38 a.m. UTC | #4
On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > virtqueue callback via the following statement:
> > > >
> > > >         do {
> > > >            ......
> > > >     } while (use_napi && kick &&
> > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > >
> > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > kick is false. Fixing this by removing the checking of the kick from
> > > > the condition to make sure callback is enabled correctly.
> > > >
> > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > The patch is needed for -stable.
> > >
> > > stable rules don't allow for theoretical fixes. Was a problem observed?
> 
> Yes, running a pktgen sample script can lead to a tx timeout.

Since April 2021 and we only noticed now? Are you sure it's the
right Fixes tag?

> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >
> > > >             free_old_xmit_skbs(sq, false);
> > > >
> > > > -   } while (use_napi && kick &&
> > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > +   } while (use_napi &&
> > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > >
> > >
> > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > if it's false we know there will be another packet, then transmissing
> > > that packet will invoke virtqueue_enable_cb_delayed. No?
> >
> > It's just that there may be a next packet, but in fact there may not be.
> > For example, the vq is full, and the driver stops the queue.
> 
> Exactly, when the queue is about to be full we disable tx and wait for
> the next tx interrupt to re-enable tx.
> 
> Thanks

OK, it's a good idea to document that.
And we should enable callbacks at that point, not here on data path.


> >
> > Thanks.
> >
> > >
> > >
> > >
> > >
> > >
> > > >     /* timestamp packet in software */
> > > >     skb_tx_timestamp(skb);
> > > > --
> > > > 2.25.1
> > >
> > > _______________________________________________
> > > Virtualization mailing list
> > > Virtualization@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >
Jason Wang Dec. 13, 2022, 6:57 a.m. UTC | #5
On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > > virtqueue callback via the following statement:
> > > > >
> > > > >         do {
> > > > >            ......
> > > > >     } while (use_napi && kick &&
> > > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > >
> > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > > kick is false. Fixing this by removing the checking of the kick from
> > > > > the condition to make sure callback is enabled correctly.
> > > > >
> > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > > > The patch is needed for -stable.
> > > >
> > > > stable rules don't allow for theoretical fixes. Was a problem observed?
> >
> > Yes, running a pktgen sample script can lead to a tx timeout.
>
> Since April 2021 and we only noticed now? Are you sure it's the
> right Fixes tag?

Well, reverting a7766ef18b33 makes pktgen work again.

The reason we doesn't notice is probably because:

1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic
2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own

>
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >
> > > > >             free_old_xmit_skbs(sq, false);
> > > > >
> > > > > -   } while (use_napi && kick &&
> > > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > +   } while (use_napi &&
> > > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > >
> > > >
> > > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > > if it's false we know there will be another packet, then transmissing
> > > > that packet will invoke virtqueue_enable_cb_delayed. No?
> > >
> > > It's just that there may be a next packet, but in fact there may not be.
> > > For example, the vq is full, and the driver stops the queue.
> >
> > Exactly, when the queue is about to be full we disable tx and wait for
> > the next tx interrupt to re-enable tx.
> >
> > Thanks
>
> OK, it's a good idea to document that.

Will do.

> And we should enable callbacks at that point, not here on data path.

I'm not sure I understand here. Are you suggesting removing the
!user_napi check here?

                if (!use_napi &&
                    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
                        /* More just got used, free them then recheck. */
                        free_old_xmit_skbs(sq, false);
                        if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
                                netif_start_subqueue(dev, qnum);
                                virtqueue_disable_cb(sq->vq);
                        }
                }

Btw, it doesn't differ too much as kick is always true without pktgen
and that may even need more comments or make the code even harder to
read. We need a patch for -stable at least so I prefer to let this
patch go first and do optimization on top.

Thanks

>
>
> > >
> > > Thanks.
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >     /* timestamp packet in software */
> > > > >     skb_tx_timestamp(skb);
> > > > > --
> > > > > 2.25.1
> > > >
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > >
>
Michael S. Tsirkin Dec. 13, 2022, 3:15 p.m. UTC | #6
On Tue, Dec 13, 2022 at 02:57:54PM +0800, Jason Wang wrote:
> On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> > > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > > > virtqueue callback via the following statement:
> > > > > >
> > > > > >         do {
> > > > > >            ......
> > > > > >     } while (use_napi && kick &&
> > > > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > >
> > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > > > kick is false. Fixing this by removing the checking of the kick from
> > > > > > the condition to make sure callback is enabled correctly.
> > > > > >
> > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > > The patch is needed for -stable.
> > > > >
> > > > > stable rules don't allow for theoretical fixes. Was a problem observed?
> > >
> > > Yes, running a pktgen sample script can lead to a tx timeout.
> >
> > Since April 2021 and we only noticed now? Are you sure it's the
> > right Fixes tag?
> 
> Well, reverting a7766ef18b33 makes pktgen work again.
> 
> The reason we doesn't notice is probably because:
> 
> 1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic
> 2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own
> 
> >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >
> > > > > >             free_old_xmit_skbs(sq, false);
> > > > > >
> > > > > > -   } while (use_napi && kick &&
> > > > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > > +   } while (use_napi &&
> > > > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > >
> > > > >
> > > > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > > > if it's false we know there will be another packet, then transmissing
> > > > > that packet will invoke virtqueue_enable_cb_delayed. No?
> > > >
> > > > It's just that there may be a next packet, but in fact there may not be.
> > > > For example, the vq is full, and the driver stops the queue.
> > >
> > > Exactly, when the queue is about to be full we disable tx and wait for
> > > the next tx interrupt to re-enable tx.
> > >
> > > Thanks
> >
> > OK, it's a good idea to document that.
> 
> Will do.
> 
> > And we should enable callbacks at that point, not here on data path.
> 
> I'm not sure I understand here. Are you suggesting removing the
> !user_napi check here?
> 
>                 if (!use_napi &&
>                     unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>                         /* More just got used, free them then recheck. */
>                         free_old_xmit_skbs(sq, false);
>                         if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>                                 netif_start_subqueue(dev, qnum);
>                                 virtqueue_disable_cb(sq->vq);
>                         }
>                 }


At least, I suggest calling virtqueue_enable_cb_delayed around
this area of code. I have not really thought all this path through
and how all the corner cases interact.



> Btw, it doesn't differ too much as kick is always true without pktgen
> and that may even need more comments or make the code even harder to
> read. We need a patch for -stable at least so I prefer to let this
> patch go first and do optimization on top.
> 
> Thanks

There's a chance of perf regression here too.  Let's write the full
patch first of all. If you want to make it a 2 patch series that is fine
but it is here since 2021 I don't see why we should rush a fix. Worry
about backporting later.

> >
> >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >     /* timestamp packet in software */
> > > > > >     skb_tx_timestamp(skb);
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > > > > _______________________________________________
> > > > > Virtualization mailing list
> > > > > Virtualization@lists.linux-foundation.org
> > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > >
> >
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 86e52454b5b5..44d7daf0267b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1834,8 +1834,8 @@  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		free_old_xmit_skbs(sq, false);
 
-	} while (use_napi && kick &&
-	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
+	} while (use_napi &&
+		 unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);