[net-next,5/5] virtio_net: sq support premapped mode

Message ID	20240116075924.42798-6-xuanzhuo@linux.alibaba.com (mailing list archive)
State	Deferred
Delegated to:	Netdev Maintainers
Headers	show Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3627812B80; Tue, 16 Jan 2024 07:59:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1\|-1;BR=01201311R371e4;CH=green;DM=\|\|false\|;DS=\|\|;FP=0\|-1\|-1\|-1\|0\|-1\|-1\|-1;HT=ay29a033018046049;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=14;SR=0;TI=SMTPD_---0W-lNYPz_1705391970; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0W-lNYPz_1705391970) by smtp.aliyun-inc.com; Tue, 16 Jan 2024 15:59:31 +0800 From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Xuan Zhuo <xuanzhuo@linux.alibaba.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, virtualization@lists.linux.dev, bpf@vger.kernel.org Subject: [PATCH net-next 5/5] virtio_net: sq support premapped mode Date: Tue, 16 Jan 2024 15:59:24 +0800 Message-Id: <20240116075924.42798-6-xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 2.32.0.3.g01195cf9f In-Reply-To: <20240116075924.42798-1-xuanzhuo@linux.alibaba.com> References: <20240116075924.42798-1-xuanzhuo@linux.alibaba.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: <bpf.vger.kernel.org> List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org> MIME-Version: 1.0 X-Git-Hash: e56718642c76 Content-Transfer-Encoding: 8bit X-Patchwork-Delegate: kuba@kernel.org
Series	virtio-net: sq support premapped mode \| expand [net-next,0/5] virtio-net: sq support premapped mode [net-next,1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() [net-next,2/5] virtio_ring: virtqueue_disable_and_recycle let the callback detach bufs [net-next,3/5] virtio_ring: introduce virtqueue_detach_unused_buf_dma() [net-next,4/5] virtio_ring: introduce virtqueue_get_dma_premapped() [net-next,5/5] virtio_net: sq support premapped mode

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net-next
netdev/apply	fail	Patch does not apply to net-next

Xuan Zhuo Jan. 16, 2024, 7:59 a.m. UTC

If the xsk is enabling, the xsk tx will share the send queue.
But the xsk requires that the send queue use the premapped mode.
So the send queue must support premapped mode.

command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
machine:  ecs.ebmg6e.26xlarge of Aliyun
cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt

                      |        iommu off           |        iommu on
----------------------|-----------------------------------------------------
                      | 16         |  1400         | 16         | 1400
----------------------|-----------------------------------------------------
Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
 drivers/net/virtio/virtio_net.h |  10 ++-
 2 files changed, 116 insertions(+), 13 deletions(-)

Jason Wang Jan. 25, 2024, 3:39 a.m. UTC | #1

On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the xsk is enabling, the xsk tx will share the send queue.

Any reason for this? Technically, virtio-net can work as other NIC
like 256 queues. There could be some work like optimizing the
interrupt allocations etc.

> But the xsk requires that the send queue use the premapped mode.
> So the send queue must support premapped mode.
>
> command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> machine:  ecs.ebmg6e.26xlarge of Aliyun
> cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
>
>                       |        iommu off           |        iommu on
> ----------------------|-----------------------------------------------------
>                       | 16         |  1400         | 16         | 1400
> ----------------------|-----------------------------------------------------
> Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
>  drivers/net/virtio/virtio_net.h |  10 ++-
>  2 files changed, 116 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 4fbf612da235..53143f95a3a0 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
>         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
>  }
>
> +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> +{
> +       int i;
> +
> +       if (!dma)
> +               return;
> +
> +       for (i = 0; i < dma->next; ++i)
> +               virtqueue_dma_unmap_single_attrs(sq->vq,
> +                                                dma->items[i].addr,
> +                                                dma->items[i].length,
> +                                                DMA_TO_DEVICE, 0);
> +       dma->next = 0;
> +}
> +
>  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
>                             u64 *bytes, u64 *packets)
>  {
> +       struct virtio_dma_head *dma;
>         unsigned int len;
>         void *ptr;
>
> -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +       if (virtqueue_get_dma_premapped(sq->vq)) {

Any chance this.can be false?

> +               dma = &sq->dma.head;
> +               dma->num = ARRAY_SIZE(sq->dma.items);
> +               dma->next = 0;

Btw, I found in the case of RX we have:

virtnet_rq_alloc():

                        alloc_frag->offset = sizeof(*dma);

This seems to defeat frag coalescing when the memory is highly
fragmented or high order allocation is disallowed.

Any idea to solve this?

> +       } else {
> +               dma = NULL;
> +       }
> +
> +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> +               virtnet_sq_unmap_buf(sq, dma);
> +
>                 if (!is_xdp_frame(ptr)) {
>                         struct sk_buff *skb = ptr;
>
> @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
>         return buf;
>  }
>
> -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> +static void virtnet_set_premapped(struct virtnet_info *vi)
>  {
>         int i;
>
> -       /* disable for big mode */
> -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> -               return;
> +       for (i = 0; i < vi->max_queue_pairs; i++) {
> +               virtqueue_set_dma_premapped(vi->sq[i].vq);
>
> -       for (i = 0; i < vi->max_queue_pairs; i++)
> -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> +               /* TODO for big mode */

Btw, how hard to support big mode? If we can do premapping for that
code could be simplified.

(There are vendors that doesn't support mergeable rx buffers).

> +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> +       }
> +}
> +
> +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> +{
> +       struct scatterlist *sg;
> +       u32 i;
> +
> +       for (i = 0; i < num; ++i) {
> +               sg = &sq->sg[i];
> +
> +               virtqueue_dma_unmap_single_attrs(sq->vq,
> +                                                sg->dma_address,
> +                                                sg->length,
> +                                                DMA_TO_DEVICE, 0);
> +       }
> +}
> +
> +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> +{
> +       struct scatterlist *sg;
> +       u32 i;
> +
> +       for (i = 0; i < num; ++i) {
> +               sg = &sq->sg[i];
> +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> +                                                                sg->length,
> +                                                                DMA_TO_DEVICE, 0);
> +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> +                       goto err;
> +       }
> +

This seems nothing virtio-net specific, let's move it to the core?

Thanks


> +       return 0;
> +
> +err:
> +       virtnet_sq_unmap_sg(sq, i);
> +       return -ENOMEM;
> +}
> +
> +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> +{
> +       int ret;
> +
> +       if (virtqueue_get_dma_premapped(sq->vq)) {
> +               ret = virtnet_sq_map_sg(sq, num);
> +               if (ret)
> +                       return -ENOMEM;
> +       }
> +
> +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> +               virtnet_sq_unmap_sg(sq, num);
> +
> +       return ret;
>  }
>
>  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>                             skb_frag_size(frag), skb_frag_off(frag));
>         }
>
> -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>         if (unlikely(err))
>                 return -ENOSPC; /* Caller handle free/refcnt */
>
> @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
>                         return num_sg;
>                 num_sg++;
>         }
> -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +       return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>
>  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
>  {
> +       struct virtnet_info *vi = vq->vdev->priv;
> +       struct virtio_dma_head *dma;
> +       struct virtnet_sq *sq;
> +       int i = vq2txq(vq);
>         void *buf;
>
> -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> +       sq = &vi->sq[i];
> +
> +       if (virtqueue_get_dma_premapped(sq->vq)) {
> +               dma = &sq->dma.head;
> +               dma->num = ARRAY_SIZE(sq->dma.items);
> +               dma->next = 0;
> +       } else {
> +               dma = NULL;
> +       }
> +
> +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> +               virtnet_sq_unmap_buf(sq, dma);
> +
>                 if (!is_xdp_frame(buf))
>                         dev_kfree_skb(buf);
>                 else
> @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
>         if (ret)
>                 goto err_free;
>
> -       virtnet_rq_set_premapped(vi);
> +       virtnet_set_premapped(vi);
>
>         cpus_read_lock();
>         virtnet_set_affinity(vi);
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 066a2b9d2b3c..dda144cc91c7 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
>         u16 need_sync;
>  };
>
> +struct virtnet_sq_dma {
> +       struct virtio_dma_head head;
> +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct virtnet_sq {
>         /* Virtqueue associated with this virtnet_sq */
>         struct virtqueue *vq;
>
>         /* TX: fragments + linear part + virtio header */
> -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> +       union {
> +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> +               struct virtnet_sq_dma dma;
> +       };
>
>         /* Name of the send queue: output.$index */
>         char name[16];
> --
> 2.32.0.3.g01195cf9f
>

Xuan Zhuo Jan. 25, 2024, 5:58 a.m. UTC | #2

On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the xsk is enabling, the xsk tx will share the send queue.
>
> Any reason for this? Technically, virtio-net can work as other NIC
> like 256 queues. There could be some work like optimizing the
> interrupt allocations etc.

Just like the logic of XDP_TX.

Now the virtio spec does not allow to add new dynamic queues.
As I know, most hypervisors just support few queues. The num of
queues is not bigger than the cpu num. So the best way is
to share the send queues.

Parav and I tried to introduce dynamic queues. But that is dropped.
Before that I think we can share the send queues.


>
> > But the xsk requires that the send queue use the premapped mode.
> > So the send queue must support premapped mode.
> >
> > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> >
> >                       |        iommu off           |        iommu on
> > ----------------------|-----------------------------------------------------
> >                       | 16         |  1400         | 16         | 1400
> > ----------------------|-----------------------------------------------------
> > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> >  drivers/net/virtio/virtio_net.h |  10 ++-
> >  2 files changed, 116 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > index 4fbf612da235..53143f95a3a0 100644
> > --- a/drivers/net/virtio/main.c
> > +++ b/drivers/net/virtio/main.c
> > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> >  }
> >
> > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > +{
> > +       int i;
> > +
> > +       if (!dma)
> > +               return;
> > +
> > +       for (i = 0; i < dma->next; ++i)
> > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > +                                                dma->items[i].addr,
> > +                                                dma->items[i].length,
> > +                                                DMA_TO_DEVICE, 0);
> > +       dma->next = 0;
> > +}
> > +
> >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> >                             u64 *bytes, u64 *packets)
> >  {
> > +       struct virtio_dma_head *dma;
> >         unsigned int len;
> >         void *ptr;
> >
> > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +       if (virtqueue_get_dma_premapped(sq->vq)) {
>
> Any chance this.can be false?

__free_old_xmit is the common path.

The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
mode.

>
> > +               dma = &sq->dma.head;
> > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > +               dma->next = 0;
>
> Btw, I found in the case of RX we have:
>
> virtnet_rq_alloc():
>
>                         alloc_frag->offset = sizeof(*dma);
>
> This seems to defeat frag coalescing when the memory is highly
> fragmented or high order allocation is disallowed.
>
> Any idea to solve this?


On the rq premapped pathset, I answered this.

http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com

>
> > +       } else {
> > +               dma = NULL;
> > +       }
> > +
> > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > +               virtnet_sq_unmap_buf(sq, dma);
> > +
> >                 if (!is_xdp_frame(ptr)) {
> >                         struct sk_buff *skb = ptr;
> >
> > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> >         return buf;
> >  }
> >
> > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > +static void virtnet_set_premapped(struct virtnet_info *vi)
> >  {
> >         int i;
> >
> > -       /* disable for big mode */
> > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > -               return;
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> >
> > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > +               /* TODO for big mode */
>
> Btw, how hard to support big mode? If we can do premapping for that
> code could be simplified.
>
> (There are vendors that doesn't support mergeable rx buffers).

I will do that after these patch-sets

>
> > +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> > +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> > +       }
> > +}
> > +
> > +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> > +{
> > +       struct scatterlist *sg;
> > +       u32 i;
> > +
> > +       for (i = 0; i < num; ++i) {
> > +               sg = &sq->sg[i];
> > +
> > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > +                                                sg->dma_address,
> > +                                                sg->length,
> > +                                                DMA_TO_DEVICE, 0);
> > +       }
> > +}
> > +
> > +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> > +{
> > +       struct scatterlist *sg;
> > +       u32 i;
> > +
> > +       for (i = 0; i < num; ++i) {
> > +               sg = &sq->sg[i];
> > +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> > +                                                                sg->length,
> > +                                                                DMA_TO_DEVICE, 0);
> > +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> > +                       goto err;
> > +       }
> > +
>
> This seems nothing virtio-net specific, let's move it to the core?


This is the dma api style.

And the caller can not judge it by the return value of
virtqueue_dma_map_single_attrs.

Thanks


>
> Thanks
>
>
> > +       return 0;
> > +
> > +err:
> > +       virtnet_sq_unmap_sg(sq, i);
> > +       return -ENOMEM;
> > +}
> > +
> > +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> > +{
> > +       int ret;
> > +
> > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > +               ret = virtnet_sq_map_sg(sq, num);
> > +               if (ret)
> > +                       return -ENOMEM;
> > +       }
> > +
> > +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> > +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> > +               virtnet_sq_unmap_sg(sq, num);
> > +
> > +       return ret;
> >  }
> >
> >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >                             skb_frag_size(frag), skb_frag_off(frag));
> >         }
> >
> > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> >         if (unlikely(err))
> >                 return -ENOSPC; /* Caller handle free/refcnt */
> >
> > @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
> >                         return num_sg;
> >                 num_sg++;
> >         }
> > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +       return virtnet_add_outbuf(sq, num_sg, skb);
> >  }
> >
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> >
> >  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
> >  {
> > +       struct virtnet_info *vi = vq->vdev->priv;
> > +       struct virtio_dma_head *dma;
> > +       struct virtnet_sq *sq;
> > +       int i = vq2txq(vq);
> >         void *buf;
> >
> > -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > +       sq = &vi->sq[i];
> > +
> > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > +               dma = &sq->dma.head;
> > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > +               dma->next = 0;
> > +       } else {
> > +               dma = NULL;
> > +       }
> > +
> > +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> > +               virtnet_sq_unmap_buf(sq, dma);
> > +
> >                 if (!is_xdp_frame(buf))
> >                         dev_kfree_skb(buf);
> >                 else
> > @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > -       virtnet_rq_set_premapped(vi);
> > +       virtnet_set_premapped(vi);
> >
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > index 066a2b9d2b3c..dda144cc91c7 100644
> > --- a/drivers/net/virtio/virtio_net.h
> > +++ b/drivers/net/virtio/virtio_net.h
> > @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
> >         u16 need_sync;
> >  };
> >
> > +struct virtnet_sq_dma {
> > +       struct virtio_dma_head head;
> > +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct virtnet_sq {
> >         /* Virtqueue associated with this virtnet_sq */
> >         struct virtqueue *vq;
> >
> >         /* TX: fragments + linear part + virtio header */
> > -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > +       union {
> > +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > +               struct virtnet_sq_dma dma;
> > +       };
> >
> >         /* Name of the send queue: output.$index */
> >         char name[16];
> > --
> > 2.32.0.3.g01195cf9f
> >
>

Jason Wang Jan. 29, 2024, 3:06 a.m. UTC | #3

On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > If the xsk is enabling, the xsk tx will share the send queue.
> >
> > Any reason for this? Technically, virtio-net can work as other NIC
> > like 256 queues. There could be some work like optimizing the
> > interrupt allocations etc.
>
> Just like the logic of XDP_TX.
>
> Now the virtio spec does not allow to add new dynamic queues.
> As I know, most hypervisors just support few queues.

When multiqueue is developed in Qemu, it support as least 256 queue
pairs if my memory is correct.

> The num of
> queues is not bigger than the cpu num. So the best way is
> to share the send queues.
>
> Parav and I tried to introduce dynamic queues.

Virtio-net doesn't differ from real NIC where most of them can create
queue dynamically. It's more about the resource allocation, if mgmt
can start with 256 queues, then we probably fine.

But I think we can leave this question now.

> But that is dropped.
> Before that I think we can share the send queues.
>
>
> >
> > > But the xsk requires that the send queue use the premapped mode.
> > > So the send queue must support premapped mode.
> > >
> > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > >
> > >                       |        iommu off           |        iommu on
> > > ----------------------|-----------------------------------------------------
> > >                       | 16         |  1400         | 16         | 1400
> > > ----------------------|-----------------------------------------------------
> > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > index 4fbf612da235..53143f95a3a0 100644
> > > --- a/drivers/net/virtio/main.c
> > > +++ b/drivers/net/virtio/main.c
> > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > >  }
> > >
> > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > +{
> > > +       int i;
> > > +
> > > +       if (!dma)
> > > +               return;
> > > +
> > > +       for (i = 0; i < dma->next; ++i)
> > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > +                                                dma->items[i].addr,
> > > +                                                dma->items[i].length,
> > > +                                                DMA_TO_DEVICE, 0);
> > > +       dma->next = 0;
> > > +}
> > > +
> > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > >                             u64 *bytes, u64 *packets)
> > >  {
> > > +       struct virtio_dma_head *dma;
> > >         unsigned int len;
> > >         void *ptr;
> > >
> > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> >
> > Any chance this.can be false?
>
> __free_old_xmit is the common path.

Did you mean the XDP path doesn't work with this? If yes, we need to
change that.

>
> The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> mode.
>
> >
> > > +               dma = &sq->dma.head;
> > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > +               dma->next = 0;
> >
> > Btw, I found in the case of RX we have:
> >
> > virtnet_rq_alloc():
> >
> >                         alloc_frag->offset = sizeof(*dma);
> >
> > This seems to defeat frag coalescing when the memory is highly
> > fragmented or high order allocation is disallowed.
> >
> > Any idea to solve this?
>
>
> On the rq premapped pathset, I answered this.
>
> http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com

Oops, I forget that.

>
> >
> > > +       } else {
> > > +               dma = NULL;
> > > +       }
> > > +
> > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > +               virtnet_sq_unmap_buf(sq, dma);
> > > +
> > >                 if (!is_xdp_frame(ptr)) {
> > >                         struct sk_buff *skb = ptr;
> > >
> > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > >         return buf;
> > >  }
> > >
> > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > >  {
> > >         int i;
> > >
> > > -       /* disable for big mode */
> > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > -               return;
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > >
> > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > +               /* TODO for big mode */
> >
> > Btw, how hard to support big mode? If we can do premapping for that
> > code could be simplified.
> >
> > (There are vendors that doesn't support mergeable rx buffers).
>
> I will do that after these patch-sets

If it's not too hard, I'd suggest to do it now.

>
> >
> > > +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> > > +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > +       }
> > > +}
> > > +
> > > +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> > > +{
> > > +       struct scatterlist *sg;
> > > +       u32 i;
> > > +
> > > +       for (i = 0; i < num; ++i) {
> > > +               sg = &sq->sg[i];
> > > +
> > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > +                                                sg->dma_address,
> > > +                                                sg->length,
> > > +                                                DMA_TO_DEVICE, 0);
> > > +       }
> > > +}
> > > +
> > > +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> > > +{
> > > +       struct scatterlist *sg;
> > > +       u32 i;
> > > +
> > > +       for (i = 0; i < num; ++i) {
> > > +               sg = &sq->sg[i];
> > > +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> > > +                                                                sg->length,
> > > +                                                                DMA_TO_DEVICE, 0);
> > > +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> > > +                       goto err;
> > > +       }
> > > +
> >
> > This seems nothing virtio-net specific, let's move it to the core?
>
>
> This is the dma api style.
>
> And the caller can not judge it by the return value of
> virtqueue_dma_map_single_attrs.

I meant, if e.g virtio-fs want to use premapped, the code will for
sure be duplicated there as well.

Thanks


>
> Thanks
>
>
> >
> > Thanks
> >
> >
> > > +       return 0;
> > > +
> > > +err:
> > > +       virtnet_sq_unmap_sg(sq, i);
> > > +       return -ENOMEM;
> > > +}
> > > +
> > > +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> > > +{
> > > +       int ret;
> > > +
> > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > +               ret = virtnet_sq_map_sg(sq, num);
> > > +               if (ret)
> > > +                       return -ENOMEM;
> > > +       }
> > > +
> > > +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> > > +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> > > +               virtnet_sq_unmap_sg(sq, num);
> > > +
> > > +       return ret;
> > >  }
> > >
> > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> > >                             skb_frag_size(frag), skb_frag_off(frag));
> > >         }
> > >
> > > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> > >         if (unlikely(err))
> > >                 return -ENOSPC; /* Caller handle free/refcnt */
> > >
> > > @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
> > >                         return num_sg;
> > >                 num_sg++;
> > >         }
> > > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > > +       return virtnet_add_outbuf(sq, num_sg, skb);
> > >  }
> > >
> > >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > >
> > >  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
> > >  {
> > > +       struct virtnet_info *vi = vq->vdev->priv;
> > > +       struct virtio_dma_head *dma;
> > > +       struct virtnet_sq *sq;
> > > +       int i = vq2txq(vq);
> > >         void *buf;
> > >
> > > -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > +       sq = &vi->sq[i];
> > > +
> > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > +               dma = &sq->dma.head;
> > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > +               dma->next = 0;
> > > +       } else {
> > > +               dma = NULL;
> > > +       }
> > > +
> > > +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> > > +               virtnet_sq_unmap_buf(sq, dma);
> > > +
> > >                 if (!is_xdp_frame(buf))
> > >                         dev_kfree_skb(buf);
> > >                 else
> > > @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
> > >         if (ret)
> > >                 goto err_free;
> > >
> > > -       virtnet_rq_set_premapped(vi);
> > > +       virtnet_set_premapped(vi);
> > >
> > >         cpus_read_lock();
> > >         virtnet_set_affinity(vi);
> > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > index 066a2b9d2b3c..dda144cc91c7 100644
> > > --- a/drivers/net/virtio/virtio_net.h
> > > +++ b/drivers/net/virtio/virtio_net.h
> > > @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
> > >         u16 need_sync;
> > >  };
> > >
> > > +struct virtnet_sq_dma {
> > > +       struct virtio_dma_head head;
> > > +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct virtnet_sq {
> > >         /* Virtqueue associated with this virtnet_sq */
> > >         struct virtqueue *vq;
> > >
> > >         /* TX: fragments + linear part + virtio header */
> > > -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > +       union {
> > > +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > +               struct virtnet_sq_dma dma;
> > > +       };
> > >
> > >         /* Name of the send queue: output.$index */
> > >         char name[16];
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>

Xuan Zhuo Jan. 29, 2024, 3:11 a.m. UTC | #4

On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > If the xsk is enabling, the xsk tx will share the send queue.
> > >
> > > Any reason for this? Technically, virtio-net can work as other NIC
> > > like 256 queues. There could be some work like optimizing the
> > > interrupt allocations etc.
> >
> > Just like the logic of XDP_TX.
> >
> > Now the virtio spec does not allow to add new dynamic queues.
> > As I know, most hypervisors just support few queues.
>
> When multiqueue is developed in Qemu, it support as least 256 queue
> pairs if my memory is correct.
>


YES, but that is configured by the hypervisor.

For the user on any platform, when he got a vm, the queue num is fixed.
As I know, on most case, the num is less.
If we want the af-xdp/xdp-tx has the the independent queues
I think the dynamic queue is good way.


> > The num of
> > queues is not bigger than the cpu num. So the best way is
> > to share the send queues.
> >
> > Parav and I tried to introduce dynamic queues.
>
> Virtio-net doesn't differ from real NIC where most of them can create
> queue dynamically. It's more about the resource allocation, if mgmt
> can start with 256 queues, then we probably fine.

But now, if the devices has 256, we will enable the 256 queues by default.
that is too much.

So, the dynamic queue is not to create a new queue out of the resource.

The device may tell the driver, the max queue resource is 256,
but let we start from 8. If the driver need more, then we can
enable more.

But for me, the xdp tx can share the sq queue, so let we start
the af-xdp from sharing sq queue.


>
> But I think we can leave this question now.
>
> > But that is dropped.
> > Before that I think we can share the send queues.
> >
> >
> > >
> > > > But the xsk requires that the send queue use the premapped mode.
> > > > So the send queue must support premapped mode.
> > > >
> > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > >
> > > >                       |        iommu off           |        iommu on
> > > > ----------------------|-----------------------------------------------------
> > > >                       | 16         |  1400         | 16         | 1400
> > > > ----------------------|-----------------------------------------------------
> > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > index 4fbf612da235..53143f95a3a0 100644
> > > > --- a/drivers/net/virtio/main.c
> > > > +++ b/drivers/net/virtio/main.c
> > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > >  }
> > > >
> > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       if (!dma)
> > > > +               return;
> > > > +
> > > > +       for (i = 0; i < dma->next; ++i)
> > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > +                                                dma->items[i].addr,
> > > > +                                                dma->items[i].length,
> > > > +                                                DMA_TO_DEVICE, 0);
> > > > +       dma->next = 0;
> > > > +}
> > > > +
> > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > >                             u64 *bytes, u64 *packets)
> > > >  {
> > > > +       struct virtio_dma_head *dma;
> > > >         unsigned int len;
> > > >         void *ptr;
> > > >
> > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > >
> > > Any chance this.can be false?
> >
> > __free_old_xmit is the common path.
>
> Did you mean the XDP path doesn't work with this? If yes, we need to
> change that.


NO. If the virtio core use_dma_api is false, the dma premapped
can not be ture.

>
> >
> > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > mode.
> >
> > >
> > > > +               dma = &sq->dma.head;
> > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > +               dma->next = 0;
> > >
> > > Btw, I found in the case of RX we have:
> > >
> > > virtnet_rq_alloc():
> > >
> > >                         alloc_frag->offset = sizeof(*dma);
> > >
> > > This seems to defeat frag coalescing when the memory is highly
> > > fragmented or high order allocation is disallowed.
> > >
> > > Any idea to solve this?
> >
> >
> > On the rq premapped pathset, I answered this.
> >
> > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
>
> Oops, I forget that.
>
> >
> > >
> > > > +       } else {
> > > > +               dma = NULL;
> > > > +       }
> > > > +
> > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > +
> > > >                 if (!is_xdp_frame(ptr)) {
> > > >                         struct sk_buff *skb = ptr;
> > > >
> > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > >         return buf;
> > > >  }
> > > >
> > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > >  {
> > > >         int i;
> > > >
> > > > -       /* disable for big mode */
> > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > -               return;
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > >
> > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > +               /* TODO for big mode */
> > >
> > > Btw, how hard to support big mode? If we can do premapping for that
> > > code could be simplified.
> > >
> > > (There are vendors that doesn't support mergeable rx buffers).
> >
> > I will do that after these patch-sets
>
> If it's not too hard, I'd suggest to do it now.


YES. Is not too hard, but I was doing too much.

* virtio-net + device stats
* virtio-net + af-xdp, this patch set has about 27 commits

And I was pushing this too long, I just want to finish the work.
Then I can work on the next (premapped big mode, af-xdp multi-buf....).

So, let we step by step.


>
> >
> > >
> > > > +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> > > > +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > +       }
> > > > +}
> > > > +
> > > > +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> > > > +{
> > > > +       struct scatterlist *sg;
> > > > +       u32 i;
> > > > +
> > > > +       for (i = 0; i < num; ++i) {
> > > > +               sg = &sq->sg[i];
> > > > +
> > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > +                                                sg->dma_address,
> > > > +                                                sg->length,
> > > > +                                                DMA_TO_DEVICE, 0);
> > > > +       }
> > > > +}
> > > > +
> > > > +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> > > > +{
> > > > +       struct scatterlist *sg;
> > > > +       u32 i;
> > > > +
> > > > +       for (i = 0; i < num; ++i) {
> > > > +               sg = &sq->sg[i];
> > > > +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> > > > +                                                                sg->length,
> > > > +                                                                DMA_TO_DEVICE, 0);
> > > > +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> > > > +                       goto err;
> > > > +       }
> > > > +
> > >
> > > This seems nothing virtio-net specific, let's move it to the core?
> >
> >
> > This is the dma api style.
> >
> > And the caller can not judge it by the return value of
> > virtqueue_dma_map_single_attrs.
>
> I meant, if e.g virtio-fs want to use premapped, the code will for
> sure be duplicated there as well.

If you mean this function virtnet_sq_map_sg, I think you are right.

I will put it to the virtio core.

Thanks.




>
> Thanks
>
>
> >
> > Thanks
> >
> >
> > >
> > > Thanks
> > >
> > >
> > > > +       return 0;
> > > > +
> > > > +err:
> > > > +       virtnet_sq_unmap_sg(sq, i);
> > > > +       return -ENOMEM;
> > > > +}
> > > > +
> > > > +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> > > > +{
> > > > +       int ret;
> > > > +
> > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > +               ret = virtnet_sq_map_sg(sq, num);
> > > > +               if (ret)
> > > > +                       return -ENOMEM;
> > > > +       }
> > > > +
> > > > +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> > > > +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> > > > +               virtnet_sq_unmap_sg(sq, num);
> > > > +
> > > > +       return ret;
> > > >  }
> > > >
> > > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > > @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> > > >                             skb_frag_size(frag), skb_frag_off(frag));
> > > >         }
> > > >
> > > > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > > > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > > > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> > > >         if (unlikely(err))
> > > >                 return -ENOSPC; /* Caller handle free/refcnt */
> > > >
> > > > @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
> > > >                         return num_sg;
> > > >                 num_sg++;
> > > >         }
> > > > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > > > +       return virtnet_add_outbuf(sq, num_sg, skb);
> > > >  }
> > > >
> > > >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > > >
> > > >  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
> > > >  {
> > > > +       struct virtnet_info *vi = vq->vdev->priv;
> > > > +       struct virtio_dma_head *dma;
> > > > +       struct virtnet_sq *sq;
> > > > +       int i = vq2txq(vq);
> > > >         void *buf;
> > > >
> > > > -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > > +       sq = &vi->sq[i];
> > > > +
> > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > +               dma = &sq->dma.head;
> > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > +               dma->next = 0;
> > > > +       } else {
> > > > +               dma = NULL;
> > > > +       }
> > > > +
> > > > +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > +
> > > >                 if (!is_xdp_frame(buf))
> > > >                         dev_kfree_skb(buf);
> > > >                 else
> > > > @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
> > > >         if (ret)
> > > >                 goto err_free;
> > > >
> > > > -       virtnet_rq_set_premapped(vi);
> > > > +       virtnet_set_premapped(vi);
> > > >
> > > >         cpus_read_lock();
> > > >         virtnet_set_affinity(vi);
> > > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > > index 066a2b9d2b3c..dda144cc91c7 100644
> > > > --- a/drivers/net/virtio/virtio_net.h
> > > > +++ b/drivers/net/virtio/virtio_net.h
> > > > @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
> > > >         u16 need_sync;
> > > >  };
> > > >
> > > > +struct virtnet_sq_dma {
> > > > +       struct virtio_dma_head head;
> > > > +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> > > > +};
> > > > +
> > > >  /* Internal representation of a send virtqueue */
> > > >  struct virtnet_sq {
> > > >         /* Virtqueue associated with this virtnet_sq */
> > > >         struct virtqueue *vq;
> > > >
> > > >         /* TX: fragments + linear part + virtio header */
> > > > -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > > +       union {
> > > > +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > > +               struct virtnet_sq_dma dma;
> > > > +       };
> > > >
> > > >         /* Name of the send queue: output.$index */
> > > >         char name[16];
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>

Jason Wang Jan. 30, 2024, 2:56 a.m. UTC | #5

On Mon, Jan 29, 2024 at 11:28 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > If the xsk is enabling, the xsk tx will share the send queue.
> > > >
> > > > Any reason for this? Technically, virtio-net can work as other NIC
> > > > like 256 queues. There could be some work like optimizing the
> > > > interrupt allocations etc.
> > >
> > > Just like the logic of XDP_TX.
> > >
> > > Now the virtio spec does not allow to add new dynamic queues.
> > > As I know, most hypervisors just support few queues.
> >
> > When multiqueue is developed in Qemu, it support as least 256 queue
> > pairs if my memory is correct.
> >
>
>
> YES, but that is configured by the hypervisor.
>
> For the user on any platform, when he got a vm, the queue num is fixed.
> As I know, on most case, the num is less.
> If we want the af-xdp/xdp-tx has the the independent queues
> I think the dynamic queue is good way.

Yes, we can start from this.

>
>
> > > The num of
> > > queues is not bigger than the cpu num. So the best way is
> > > to share the send queues.
> > >
> > > Parav and I tried to introduce dynamic queues.
> >
> > Virtio-net doesn't differ from real NIC where most of them can create
> > queue dynamically. It's more about the resource allocation, if mgmt
> > can start with 256 queues, then we probably fine.
>
> But now, if the devices has 256, we will enable the 256 queues by default.
> that is too much.

It doesn't differ from the other NIC. E.g currently the active #qps is
determined by the number of cpus. this is only true if we have 256
cpus.

>
> So, the dynamic queue is not to create a new queue out of the resource.
>
> The device may tell the driver, the max queue resource is 256,
> but let we start from 8. If the driver need more, then we can
> enable more.

This is the policy we used now.

>
> But for me, the xdp tx can share the sq queue, so let we start
> the af-xdp from sharing sq queue.
>
>
> >
> > But I think we can leave this question now.
> >
> > > But that is dropped.
> > > Before that I think we can share the send queues.
> > >
> > >
> > > >
> > > > > But the xsk requires that the send queue use the premapped mode.
> > > > > So the send queue must support premapped mode.
> > > > >
> > > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > > >
> > > > >                       |        iommu off           |        iommu on
> > > > > ----------------------|-----------------------------------------------------
> > > > >                       | 16         |  1400         | 16         | 1400
> > > > > ----------------------|-----------------------------------------------------
> > > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > index 4fbf612da235..53143f95a3a0 100644
> > > > > --- a/drivers/net/virtio/main.c
> > > > > +++ b/drivers/net/virtio/main.c
> > > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > >  }
> > > > >
> > > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > > +{
> > > > > +       int i;
> > > > > +
> > > > > +       if (!dma)
> > > > > +               return;
> > > > > +
> > > > > +       for (i = 0; i < dma->next; ++i)
> > > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > > +                                                dma->items[i].addr,
> > > > > +                                                dma->items[i].length,
> > > > > +                                                DMA_TO_DEVICE, 0);
> > > > > +       dma->next = 0;
> > > > > +}
> > > > > +
> > > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > >                             u64 *bytes, u64 *packets)
> > > > >  {
> > > > > +       struct virtio_dma_head *dma;
> > > > >         unsigned int len;
> > > > >         void *ptr;
> > > > >
> > > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > >
> > > > Any chance this.can be false?
> > >
> > > __free_old_xmit is the common path.
> >
> > Did you mean the XDP path doesn't work with this? If yes, we need to
> > change that.
>
>
> NO. If the virtio core use_dma_api is false, the dma premapped
> can not be ture.

Ok, I see.

>
> >
> > >
> > > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > > mode.
> > >
> > > >
> > > > > +               dma = &sq->dma.head;
> > > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > > +               dma->next = 0;
> > > >
> > > > Btw, I found in the case of RX we have:
> > > >
> > > > virtnet_rq_alloc():
> > > >
> > > >                         alloc_frag->offset = sizeof(*dma);
> > > >
> > > > This seems to defeat frag coalescing when the memory is highly
> > > > fragmented or high order allocation is disallowed.
> > > >
> > > > Any idea to solve this?
> > >
> > >
> > > On the rq premapped pathset, I answered this.
> > >
> > > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
> >
> > Oops, I forget that.
> >
> > >
> > > >
> > > > > +       } else {
> > > > > +               dma = NULL;
> > > > > +       }
> > > > > +
> > > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > > +
> > > > >                 if (!is_xdp_frame(ptr)) {
> > > > >                         struct sk_buff *skb = ptr;
> > > > >
> > > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > >         return buf;
> > > > >  }
> > > > >
> > > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > > >  {
> > > > >         int i;
> > > > >
> > > > > -       /* disable for big mode */
> > > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > -               return;
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > > >
> > > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > +               /* TODO for big mode */
> > > >
> > > > Btw, how hard to support big mode? If we can do premapping for that
> > > > code could be simplified.
> > > >
> > > > (There are vendors that doesn't support mergeable rx buffers).
> > >
> > > I will do that after these patch-sets
> >
> > If it's not too hard, I'd suggest to do it now.
>
>
> YES. Is not too hard, but I was doing too much.
>
> * virtio-net + device stats
> * virtio-net + af-xdp, this patch set has about 27 commits
>
> And I was pushing this too long, I just want to finish the work.
> Then I can work on the next (premapped big mode, af-xdp multi-buf....).
>
> So, let we step by step.

That's fine.

Thanks

Xuan Zhuo Jan. 30, 2024, 3:15 a.m. UTC | #6

On Tue, 30 Jan 2024 10:56:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 29, 2024 at 11:28 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > If the xsk is enabling, the xsk tx will share the send queue.
> > > > >
> > > > > Any reason for this? Technically, virtio-net can work as other NIC
> > > > > like 256 queues. There could be some work like optimizing the
> > > > > interrupt allocations etc.
> > > >
> > > > Just like the logic of XDP_TX.
> > > >
> > > > Now the virtio spec does not allow to add new dynamic queues.
> > > > As I know, most hypervisors just support few queues.
> > >
> > > When multiqueue is developed in Qemu, it support as least 256 queue
> > > pairs if my memory is correct.
> > >
> >
> >
> > YES, but that is configured by the hypervisor.
> >
> > For the user on any platform, when he got a vm, the queue num is fixed.
> > As I know, on most case, the num is less.
> > If we want the af-xdp/xdp-tx has the the independent queues
> > I think the dynamic queue is good way.
>
> Yes, we can start from this.


My plan is start from sharing send queues.

After that I will push the dynamic queues rfc to the virtio spec.

If the new feature is negotiated, then we can support xdp/af-xdp
with independent send queues, if the feature is not supported,
xdp/af-xdp can work with sharing send queue.

I think that will not conflict.


>
> >
> >
> > > > The num of
> > > > queues is not bigger than the cpu num. So the best way is
> > > > to share the send queues.
> > > >
> > > > Parav and I tried to introduce dynamic queues.
> > >
> > > Virtio-net doesn't differ from real NIC where most of them can create
> > > queue dynamically. It's more about the resource allocation, if mgmt
> > > can start with 256 queues, then we probably fine.
> >
> > But now, if the devices has 256, we will enable the 256 queues by default.
> > that is too much.
>
> It doesn't differ from the other NIC. E.g currently the active #qps is
> determined by the number of cpus. this is only true if we have 256
> cpus.


YES. But now, the normal devices just have few queues (such as 8, 32).

Thanks.


>
> >
> > So, the dynamic queue is not to create a new queue out of the resource.
> >
> > The device may tell the driver, the max queue resource is 256,
> > but let we start from 8. If the driver need more, then we can
> > enable more.
>
> This is the policy we used now.
>
> >
> > But for me, the xdp tx can share the sq queue, so let we start
> > the af-xdp from sharing sq queue.
> >
> >
> > >
> > > But I think we can leave this question now.
> > >
> > > > But that is dropped.
> > > > Before that I think we can share the send queues.
> > > >
> > > >
> > > > >
> > > > > > But the xsk requires that the send queue use the premapped mode.
> > > > > > So the send queue must support premapped mode.
> > > > > >
> > > > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > > > >
> > > > > >                       |        iommu off           |        iommu on
> > > > > > ----------------------|-----------------------------------------------------
> > > > > >                       | 16         |  1400         | 16         | 1400
> > > > > > ----------------------|-----------------------------------------------------
> > > > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > > index 4fbf612da235..53143f95a3a0 100644
> > > > > > --- a/drivers/net/virtio/main.c
> > > > > > +++ b/drivers/net/virtio/main.c
> > > > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > > >  }
> > > > > >
> > > > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > > > +{
> > > > > > +       int i;
> > > > > > +
> > > > > > +       if (!dma)
> > > > > > +               return;
> > > > > > +
> > > > > > +       for (i = 0; i < dma->next; ++i)
> > > > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > > > +                                                dma->items[i].addr,
> > > > > > +                                                dma->items[i].length,
> > > > > > +                                                DMA_TO_DEVICE, 0);
> > > > > > +       dma->next = 0;
> > > > > > +}
> > > > > > +
> > > > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > >                             u64 *bytes, u64 *packets)
> > > > > >  {
> > > > > > +       struct virtio_dma_head *dma;
> > > > > >         unsigned int len;
> > > > > >         void *ptr;
> > > > > >
> > > > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > >
> > > > > Any chance this.can be false?
> > > >
> > > > __free_old_xmit is the common path.
> > >
> > > Did you mean the XDP path doesn't work with this? If yes, we need to
> > > change that.
> >
> >
> > NO. If the virtio core use_dma_api is false, the dma premapped
> > can not be ture.
>
> Ok, I see.
>
> >
> > >
> > > >
> > > > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > > > mode.
> > > >
> > > > >
> > > > > > +               dma = &sq->dma.head;
> > > > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > > > +               dma->next = 0;
> > > > >
> > > > > Btw, I found in the case of RX we have:
> > > > >
> > > > > virtnet_rq_alloc():
> > > > >
> > > > >                         alloc_frag->offset = sizeof(*dma);
> > > > >
> > > > > This seems to defeat frag coalescing when the memory is highly
> > > > > fragmented or high order allocation is disallowed.
> > > > >
> > > > > Any idea to solve this?
> > > >
> > > >
> > > > On the rq premapped pathset, I answered this.
> > > >
> > > > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
> > >
> > > Oops, I forget that.
> > >
> > > >
> > > > >
> > > > > > +       } else {
> > > > > > +               dma = NULL;
> > > > > > +       }
> > > > > > +
> > > > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > > > +
> > > > > >                 if (!is_xdp_frame(ptr)) {
> > > > > >                         struct sk_buff *skb = ptr;
> > > > > >
> > > > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > > >         return buf;
> > > > > >  }
> > > > > >
> > > > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > > > >  {
> > > > > >         int i;
> > > > > >
> > > > > > -       /* disable for big mode */
> > > > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > -               return;
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > > > >
> > > > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > > +               /* TODO for big mode */
> > > > >
> > > > > Btw, how hard to support big mode? If we can do premapping for that
> > > > > code could be simplified.
> > > > >
> > > > > (There are vendors that doesn't support mergeable rx buffers).
> > > >
> > > > I will do that after these patch-sets
> > >
> > > If it's not too hard, I'd suggest to do it now.
> >
> >
> > YES. Is not too hard, but I was doing too much.
> >
> > * virtio-net + device stats
> > * virtio-net + af-xdp, this patch set has about 27 commits
> >
> > And I was pushing this too long, I just want to finish the work.
> > Then I can work on the next (premapped big mode, af-xdp multi-buf....).
> >
> > So, let we step by step.
>
> That's fine.
>
> Thanks
>

Xuan Zhuo Jan. 30, 2024, 3:27 a.m. UTC | #7

On Tue, 30 Jan 2024 11:15:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 30 Jan 2024 10:56:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 29, 2024 at 11:28 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > If the xsk is enabling, the xsk tx will share the send queue.
> > > > > >
> > > > > > Any reason for this? Technically, virtio-net can work as other NIC
> > > > > > like 256 queues. There could be some work like optimizing the
> > > > > > interrupt allocations etc.
> > > > >
> > > > > Just like the logic of XDP_TX.
> > > > >
> > > > > Now the virtio spec does not allow to add new dynamic queues.
> > > > > As I know, most hypervisors just support few queues.
> > > >
> > > > When multiqueue is developed in Qemu, it support as least 256 queue
> > > > pairs if my memory is correct.
> > > >
> > >
> > >
> > > YES, but that is configured by the hypervisor.
> > >
> > > For the user on any platform, when he got a vm, the queue num is fixed.
> > > As I know, on most case, the num is less.
> > > If we want the af-xdp/xdp-tx has the the independent queues
> > > I think the dynamic queue is good way.
> >
> > Yes, we can start from this.
>
>
> My plan is start from sharing send queues.
>
> After that I will push the dynamic queues rfc to the virtio spec.
>
> If the new feature is negotiated, then we can support xdp/af-xdp
> with independent send queues, if the feature is not supported,
> xdp/af-xdp can work with sharing send queue.
>
> I think that will not conflict.

cc Parav.


>
>
> >
> > >
> > >
> > > > > The num of
> > > > > queues is not bigger than the cpu num. So the best way is
> > > > > to share the send queues.
> > > > >
> > > > > Parav and I tried to introduce dynamic queues.
> > > >
> > > > Virtio-net doesn't differ from real NIC where most of them can create
> > > > queue dynamically. It's more about the resource allocation, if mgmt
> > > > can start with 256 queues, then we probably fine.
> > >
> > > But now, if the devices has 256, we will enable the 256 queues by default.
> > > that is too much.
> >
> > It doesn't differ from the other NIC. E.g currently the active #qps is
> > determined by the number of cpus. this is only true if we have 256
> > cpus.
>
>
> YES. But now, the normal devices just have few queues (such as 8, 32).
>
> Thanks.
>
>
> >
> > >
> > > So, the dynamic queue is not to create a new queue out of the resource.
> > >
> > > The device may tell the driver, the max queue resource is 256,
> > > but let we start from 8. If the driver need more, then we can
> > > enable more.
> >
> > This is the policy we used now.
> >
> > >
> > > But for me, the xdp tx can share the sq queue, so let we start
> > > the af-xdp from sharing sq queue.
> > >
> > >
> > > >
> > > > But I think we can leave this question now.
> > > >
> > > > > But that is dropped.
> > > > > Before that I think we can share the send queues.
> > > > >
> > > > >
> > > > > >
> > > > > > > But the xsk requires that the send queue use the premapped mode.
> > > > > > > So the send queue must support premapped mode.
> > > > > > >
> > > > > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > > > > >
> > > > > > >                       |        iommu off           |        iommu on
> > > > > > > ----------------------|-----------------------------------------------------
> > > > > > >                       | 16         |  1400         | 16         | 1400
> > > > > > > ----------------------|-----------------------------------------------------
> > > > > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > > > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > > > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > > > index 4fbf612da235..53143f95a3a0 100644
> > > > > > > --- a/drivers/net/virtio/main.c
> > > > > > > +++ b/drivers/net/virtio/main.c
> > > > > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > > > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > > > > +{
> > > > > > > +       int i;
> > > > > > > +
> > > > > > > +       if (!dma)
> > > > > > > +               return;
> > > > > > > +
> > > > > > > +       for (i = 0; i < dma->next; ++i)
> > > > > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > > > > +                                                dma->items[i].addr,
> > > > > > > +                                                dma->items[i].length,
> > > > > > > +                                                DMA_TO_DEVICE, 0);
> > > > > > > +       dma->next = 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > > >                             u64 *bytes, u64 *packets)
> > > > > > >  {
> > > > > > > +       struct virtio_dma_head *dma;
> > > > > > >         unsigned int len;
> > > > > > >         void *ptr;
> > > > > > >
> > > > > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > > >
> > > > > > Any chance this.can be false?
> > > > >
> > > > > __free_old_xmit is the common path.
> > > >
> > > > Did you mean the XDP path doesn't work with this? If yes, we need to
> > > > change that.
> > >
> > >
> > > NO. If the virtio core use_dma_api is false, the dma premapped
> > > can not be ture.
> >
> > Ok, I see.
> >
> > >
> > > >
> > > > >
> > > > > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > > > > mode.
> > > > >
> > > > > >
> > > > > > > +               dma = &sq->dma.head;
> > > > > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > > > > +               dma->next = 0;
> > > > > >
> > > > > > Btw, I found in the case of RX we have:
> > > > > >
> > > > > > virtnet_rq_alloc():
> > > > > >
> > > > > >                         alloc_frag->offset = sizeof(*dma);
> > > > > >
> > > > > > This seems to defeat frag coalescing when the memory is highly
> > > > > > fragmented or high order allocation is disallowed.
> > > > > >
> > > > > > Any idea to solve this?
> > > > >
> > > > >
> > > > > On the rq premapped pathset, I answered this.
> > > > >
> > > > > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
> > > >
> > > > Oops, I forget that.
> > > >
> > > > >
> > > > > >
> > > > > > > +       } else {
> > > > > > > +               dma = NULL;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > > > > +
> > > > > > >                 if (!is_xdp_frame(ptr)) {
> > > > > > >                         struct sk_buff *skb = ptr;
> > > > > > >
> > > > > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > > > >         return buf;
> > > > > > >  }
> > > > > > >
> > > > > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > > > > >  {
> > > > > > >         int i;
> > > > > > >
> > > > > > > -       /* disable for big mode */
> > > > > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > -               return;
> > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > > > > >
> > > > > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > > > +               /* TODO for big mode */
> > > > > >
> > > > > > Btw, how hard to support big mode? If we can do premapping for that
> > > > > > code could be simplified.
> > > > > >
> > > > > > (There are vendors that doesn't support mergeable rx buffers).
> > > > >
> > > > > I will do that after these patch-sets
> > > >
> > > > If it's not too hard, I'd suggest to do it now.
> > >
> > >
> > > YES. Is not too hard, but I was doing too much.
> > >
> > > * virtio-net + device stats
> > > * virtio-net + af-xdp, this patch set has about 27 commits
> > >
> > > And I was pushing this too long, I just want to finish the work.
> > > Then I can work on the next (premapped big mode, af-xdp multi-buf....).
> > >
> > > So, let we step by step.
> >
> > That's fine.
> >
> > Thanks
> >
>

[net-next,5/5] virtio_net: sq support premapped mode

Checks

Commit Message

Comments

Patch