diff mbox series

[net-next,04/13] virtio_ring: perform premapped operations based on per-buffer

Message ID 20240820073330.9161-5-xuanzhuo@linux.alibaba.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series virtio-net: support AF_XDP zero copy (tx) | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 22 this patch: 22
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-20--15-00 (tests: 712)

Commit Message

Xuan Zhuo Aug. 20, 2024, 7:33 a.m. UTC
The current configuration sets the virtqueue (vq) to premapped mode,
implying that all buffers submitted to this queue must be mapped ahead
of time. This presents a challenge for the virtnet send queue (sq): the
virtnet driver would be required to keep track of dma information for vq
size * 17, which can be substantial. However, if the premapped mode were
applied on a per-buffer basis, the complexity would be greatly reduced.
With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
skb buffers could remain unmapped.

We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
indicates that the driver has performed DMA mapping in advance, allowing
the Virtio core to directly utilize sg_dma_address(sg) without
conducting any internal DMA mapping. Additionally, DMA unmap operations
for this buffer will be bypassed.

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 29 deletions(-)

Comments

Jason Wang Sept. 11, 2024, 3:54 a.m. UTC | #1
On Tue, Aug 20, 2024 at 3:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> The current configuration sets the virtqueue (vq) to premapped mode,
> implying that all buffers submitted to this queue must be mapped ahead
> of time. This presents a challenge for the virtnet send queue (sq): the
> virtnet driver would be required to keep track of dma information for vq
> size * 17, which can be substantial. However, if the premapped mode were
> applied on a per-buffer basis, the complexity would be greatly reduced.
> With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
> skb buffers could remain unmapped.

Is this only applied to TX or both TX and RX.

>
> We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
> indicates that the driver has performed DMA mapping in advance, allowing
> the Virtio core to directly utilize sg_dma_address(sg) without
> conducting any internal DMA mapping.

This seems conflict with the code below?

#define sg_is_premapped(sg) (!sg_page(sg))

And rethink of the design, a question is that is there a chance that
VM_PFNMAP area could be used for virtio-net? If it is true, the trick
for sg_page() can not work?

A quick glance told me AF_XEP seems to be safe as it uses
pin_user_pages(), but we need to check other possibilities.

Or we need to fall back to our previous idea, having new APIs.

> Additionally, DMA unmap operations
> for this buffer will be bypassed.
>
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index b43eca93015c..7efddc71af67 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -235,6 +235,7 @@ static void vring_free(struct virtqueue *_vq);
>   */
>
>  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> +#define sg_is_premapped(sg) (!sg_page(sg))
>
>  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
>                                    unsigned int total_sg)
> @@ -292,9 +293,10 @@ static bool vring_use_dma_api(const struct virtio_device *vdev)
>         return false;
>  }
>
> -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring)
> +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring,
> +                                   const struct vring_desc_extra *extra)
>  {
> -       return vring->use_dma_api && !vring->premapped;
> +       return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR);
>  }
>
>  size_t virtio_max_dma_size(const struct virtio_device *vdev)
> @@ -366,7 +368,7 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
>                             enum dma_data_direction direction, dma_addr_t *addr)
>  {
> -       if (vq->premapped) {
> +       if (sg_is_premapped(sg)) {
>                 *addr = sg_dma_address(sg);
>                 return 0;
>         }
> @@ -457,7 +459,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> -               if (!vring_need_unmap_buffer(vq))
> +               if (!vring_need_unmap_buffer(vq, extra))
>                         goto out;
>
>                 dma_unmap_page(vring_dma_dev(vq),
> @@ -510,7 +512,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
>                                                     dma_addr_t addr,
>                                                     unsigned int len,
>                                                     u16 flags,
> -                                                   bool indirect)
> +                                                   bool indirect, bool premapped)
>  {
>         u16 next;
>
> @@ -518,7 +520,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
>         desc[i].addr = cpu_to_virtio64(vq->vdev, addr);
>         desc[i].len = cpu_to_virtio32(vq->vdev, len);
>
> -       extra[i].addr = addr;
> +       extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr;
>         extra[i].len = len;
>         extra[i].flags = flags;
>
> @@ -611,7 +613,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                          */
>                         i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length,
>                                                      VRING_DESC_F_NEXT,
> -                                                    indirect);
> +                                                    indirect, sg_is_premapped(sg));
>                 }
>         }
>         for (; n < (out_sgs + in_sgs); n++) {
> @@ -629,12 +631,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                                                      sg->length,
>                                                      VRING_DESC_F_NEXT |
>                                                      VRING_DESC_F_WRITE,
> -                                                    indirect);
> +                                                    indirect, sg_is_premapped(sg));
>                 }
>         }
>         /* Last one doesn't continue. */
>         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> -       if (!indirect && vring_need_unmap_buffer(vq))
> +       if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
>                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
>                         ~VRING_DESC_F_NEXT;
>
> @@ -643,19 +645,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 dma_addr_t addr = vring_map_single(
>                         vq, desc, total_sg * sizeof(struct vring_desc),
>                         DMA_TO_DEVICE);
> -               if (vring_mapping_error(vq, addr)) {
> -                       if (vq->premapped)
> -                               goto free_indirect;
> -
> +               if (vring_mapping_error(vq, addr))
>                         goto unmap_release;
> -               }
>
>                 virtqueue_add_desc_split(_vq, vq->split.vring.desc,
>                                          vq->split.desc_extra,
>                                          head, addr,
>                                          total_sg * sizeof(struct vring_desc),
>                                          VRING_DESC_F_INDIRECT,
> -                                        false);
> +                                        false, false);
>         }
>
>         /* We're using some buffers from the free list. */
> @@ -712,7 +710,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 i = vring_unmap_one_split(vq, &extra[i]);
>         }
>
> -free_indirect:
>         if (indirect)
>                 kfree(desc);
>
> @@ -794,7 +791,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>                                 VRING_DESC_F_INDIRECT));
>                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> -               if (vring_need_unmap_buffer(vq)) {
> +               if (vq->use_dma_api) {
>                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
>                                 vring_unmap_one_split(vq, &extra[j]);
>                 }
> @@ -1228,7 +1225,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> -               if (!vring_need_unmap_buffer(vq))
> +               if (!vring_need_unmap_buffer(vq, extra))
>                         return;
>
>                 dma_unmap_page(vring_dma_dev(vq),
> @@ -1309,7 +1306,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>                         i++;
>
>                         if (unlikely(vq->use_dma_api)) {
> -                               extra[i].addr = addr;
> +                               extra[i].addr = sg_is_premapped(sg) ? DMA_MAPPING_ERROR : addr;
>                                 extra[i].len = sg->length;
>                                 extra[i].flags = n < out_sgs ?  0 : VRING_DESC_F_WRITE;
>                         }
> @@ -1320,12 +1317,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         addr = vring_map_single(vq, desc,
>                         total_sg * sizeof(struct vring_packed_desc),
>                         DMA_TO_DEVICE);
> -       if (vring_mapping_error(vq, addr)) {
> -               if (vq->premapped)
> -                       goto free_desc;
> -
> +       if (vring_mapping_error(vq, addr))
>                 goto unmap_release;
> -       }
>
>         vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
>         vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
> @@ -1383,7 +1376,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         for (i = 0; i < err_idx; i++)
>                 vring_unmap_extra_packed(vq, &extra[i]);
>
> -free_desc:
>         kfree(desc);
>
>         END_USE(vq);
> @@ -1474,7 +1466,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>                         desc[i].id = cpu_to_le16(id);
>
>                         if (unlikely(vq->use_dma_api)) {
> -                               vq->packed.desc_extra[curr].addr = addr;
> +                               vq->packed.desc_extra[curr].addr = sg_is_premapped(sg) ?
> +                                       DMA_MAPPING_ERROR : addr;
>                                 vq->packed.desc_extra[curr].len = sg->length;
>                                 vq->packed.desc_extra[curr].flags =
>                                         le16_to_cpu(flags);
> @@ -1625,10 +1618,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>                 if (!extra)
>                         return;
>
> -               if (vring_need_unmap_buffer(vq)) {
> +               if (vq->use_dma_api) {
>                         len = vq->packed.desc_extra[id].len;
> -                       for (i = 0; i < len / sizeof(struct vring_packed_desc);
> -                                       i++)
> +                       for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)

Unnecessary changes?


>                                 vring_unmap_extra_packed(vq, &extra[i]);
>                 }
>                 kfree(desc);
> @@ -2212,6 +2204,11 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   * @data: the token identifying the buffer.
>   * @gfp: how to do memory allocations (if necessary).
>   *
> + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> + * mapping in advance, allowing the virtio core to directly utilize
> + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> + * DMA unmap operations for this buffer will be bypassed.
> + *
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> @@ -2246,6 +2243,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
>   * @data: the token identifying the buffer.
>   * @gfp: how to do memory allocations (if necessary).
>   *
> + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> + * mapping in advance, allowing the virtio core to directly utilize
> + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> + * DMA unmap operations for this buffer will be bypassed.
> + *
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> @@ -2268,6 +2270,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
>   * @data: the token identifying the buffer.
>   * @gfp: how to do memory allocations (if necessary).
>   *
> + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> + * mapping in advance, allowing the virtio core to directly utilize
> + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> + * DMA unmap operations for this buffer will be bypassed.
> + *
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> @@ -2291,6 +2298,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
>   * @ctx: extra context for the token
>   * @gfp: how to do memory allocations (if necessary).
>   *
> + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> + * mapping in advance, allowing the virtio core to directly utilize
> + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> + * DMA unmap operations for this buffer will be bypassed.
> + *
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> --
> 2.32.0.3.g01195cf9f
>

Thanks
Xuan Zhuo Sept. 12, 2024, 7:36 a.m. UTC | #2
On Wed, 11 Sep 2024 11:54:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 20, 2024 at 3:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > The current configuration sets the virtqueue (vq) to premapped mode,
> > implying that all buffers submitted to this queue must be mapped ahead
> > of time. This presents a challenge for the virtnet send queue (sq): the
> > virtnet driver would be required to keep track of dma information for vq
> > size * 17, which can be substantial. However, if the premapped mode were
> > applied on a per-buffer basis, the complexity would be greatly reduced.
> > With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
> > skb buffers could remain unmapped.
>
> Is this only applied to TX or both TX and RX.


For rx, if you mean per-buffer dma buffer, I think it is yes,
rx can reuse this. If you mean should we do premapped for the
normal rx buffers, I think we should, that can reduce the
dma map operations.


>
> >
> > We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
> > indicates that the driver has performed DMA mapping in advance, allowing
> > the Virtio core to directly utilize sg_dma_address(sg) without
> > conducting any internal DMA mapping.
>
> This seems conflict with the code below?
>
> #define sg_is_premapped(sg) (!sg_page(sg))

Sorry, I do not get for you.

The key point is that the sg->page is setted by driver.

So I mean if the driver sets sg->page = NULL, then for this sg,
the virtio core can skip dma mapping. If the driver sets
sg->page to the page of the buffer, then the virtio core should
do dma mapping for this sg.



>
> And rethink of the design, a question is that is there a chance that
> VM_PFNMAP area could be used for virtio-net? If it is true, the trick
> for sg_page() can not work?
>
> A quick glance told me AF_XEP seems to be safe as it uses
> pin_user_pages(), but we need to check other possibilities.
>
> Or we need to fall back to our previous idea, having new APIs.
>
> > Additionally, DMA unmap operations
> > for this buffer will be bypassed.
> >
> > Suggested-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++---------------
> >  1 file changed, 41 insertions(+), 29 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index b43eca93015c..7efddc71af67 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -235,6 +235,7 @@ static void vring_free(struct virtqueue *_vq);
> >   */
> >
> >  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> > +#define sg_is_premapped(sg) (!sg_page(sg))
> >
> >  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> >                                    unsigned int total_sg)
> > @@ -292,9 +293,10 @@ static bool vring_use_dma_api(const struct virtio_device *vdev)
> >         return false;
> >  }
> >
> > -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring)
> > +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring,
> > +                                   const struct vring_desc_extra *extra)
> >  {
> > -       return vring->use_dma_api && !vring->premapped;
> > +       return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR);
> >  }
> >
> >  size_t virtio_max_dma_size(const struct virtio_device *vdev)
> > @@ -366,7 +368,7 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> >  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> >                             enum dma_data_direction direction, dma_addr_t *addr)
> >  {
> > -       if (vq->premapped) {
> > +       if (sg_is_premapped(sg)) {
> >                 *addr = sg_dma_address(sg);
> >                 return 0;
> >         }
> > @@ -457,7 +459,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >                                  (flags & VRING_DESC_F_WRITE) ?
> >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >         } else {
> > -               if (!vring_need_unmap_buffer(vq))
> > +               if (!vring_need_unmap_buffer(vq, extra))
> >                         goto out;
> >
> >                 dma_unmap_page(vring_dma_dev(vq),
> > @@ -510,7 +512,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
> >                                                     dma_addr_t addr,
> >                                                     unsigned int len,
> >                                                     u16 flags,
> > -                                                   bool indirect)
> > +                                                   bool indirect, bool premapped)
> >  {
> >         u16 next;
> >
> > @@ -518,7 +520,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
> >         desc[i].addr = cpu_to_virtio64(vq->vdev, addr);
> >         desc[i].len = cpu_to_virtio32(vq->vdev, len);
> >
> > -       extra[i].addr = addr;
> > +       extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr;
> >         extra[i].len = len;
> >         extra[i].flags = flags;
> >
> > @@ -611,7 +613,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                          */
> >                         i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length,
> >                                                      VRING_DESC_F_NEXT,
> > -                                                    indirect);
> > +                                                    indirect, sg_is_premapped(sg));
> >                 }
> >         }
> >         for (; n < (out_sgs + in_sgs); n++) {
> > @@ -629,12 +631,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                                                      sg->length,
> >                                                      VRING_DESC_F_NEXT |
> >                                                      VRING_DESC_F_WRITE,
> > -                                                    indirect);
> > +                                                    indirect, sg_is_premapped(sg));
> >                 }
> >         }
> >         /* Last one doesn't continue. */
> >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > -       if (!indirect && vring_need_unmap_buffer(vq))
> > +       if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
> >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> >                         ~VRING_DESC_F_NEXT;
> >
> > @@ -643,19 +645,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 dma_addr_t addr = vring_map_single(
> >                         vq, desc, total_sg * sizeof(struct vring_desc),
> >                         DMA_TO_DEVICE);
> > -               if (vring_mapping_error(vq, addr)) {
> > -                       if (vq->premapped)
> > -                               goto free_indirect;
> > -
> > +               if (vring_mapping_error(vq, addr))
> >                         goto unmap_release;
> > -               }
> >
> >                 virtqueue_add_desc_split(_vq, vq->split.vring.desc,
> >                                          vq->split.desc_extra,
> >                                          head, addr,
> >                                          total_sg * sizeof(struct vring_desc),
> >                                          VRING_DESC_F_INDIRECT,
> > -                                        false);
> > +                                        false, false);
> >         }
> >
> >         /* We're using some buffers from the free list. */
> > @@ -712,7 +710,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 i = vring_unmap_one_split(vq, &extra[i]);
> >         }
> >
> > -free_indirect:
> >         if (indirect)
> >                 kfree(desc);
> >
> > @@ -794,7 +791,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >                                 VRING_DESC_F_INDIRECT));
> >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >
> > -               if (vring_need_unmap_buffer(vq)) {
> > +               if (vq->use_dma_api) {
> >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> >                                 vring_unmap_one_split(vq, &extra[j]);
> >                 }
> > @@ -1228,7 +1225,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> >                                  (flags & VRING_DESC_F_WRITE) ?
> >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >         } else {
> > -               if (!vring_need_unmap_buffer(vq))
> > +               if (!vring_need_unmap_buffer(vq, extra))
> >                         return;
> >
> >                 dma_unmap_page(vring_dma_dev(vq),
> > @@ -1309,7 +1306,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >                         i++;
> >
> >                         if (unlikely(vq->use_dma_api)) {
> > -                               extra[i].addr = addr;
> > +                               extra[i].addr = sg_is_premapped(sg) ? DMA_MAPPING_ERROR : addr;
> >                                 extra[i].len = sg->length;
> >                                 extra[i].flags = n < out_sgs ?  0 : VRING_DESC_F_WRITE;
> >                         }
> > @@ -1320,12 +1317,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         addr = vring_map_single(vq, desc,
> >                         total_sg * sizeof(struct vring_packed_desc),
> >                         DMA_TO_DEVICE);
> > -       if (vring_mapping_error(vq, addr)) {
> > -               if (vq->premapped)
> > -                       goto free_desc;
> > -
> > +       if (vring_mapping_error(vq, addr))
> >                 goto unmap_release;
> > -       }
> >
> >         vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
> >         vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
> > @@ -1383,7 +1376,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         for (i = 0; i < err_idx; i++)
> >                 vring_unmap_extra_packed(vq, &extra[i]);
> >
> > -free_desc:
> >         kfree(desc);
> >
> >         END_USE(vq);
> > @@ -1474,7 +1466,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                         desc[i].id = cpu_to_le16(id);
> >
> >                         if (unlikely(vq->use_dma_api)) {
> > -                               vq->packed.desc_extra[curr].addr = addr;
> > +                               vq->packed.desc_extra[curr].addr = sg_is_premapped(sg) ?
> > +                                       DMA_MAPPING_ERROR : addr;
> >                                 vq->packed.desc_extra[curr].len = sg->length;
> >                                 vq->packed.desc_extra[curr].flags =
> >                                         le16_to_cpu(flags);
> > @@ -1625,10 +1618,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >                 if (!extra)
> >                         return;
> >
> > -               if (vring_need_unmap_buffer(vq)) {
> > +               if (vq->use_dma_api) {
> >                         len = vq->packed.desc_extra[id].len;
> > -                       for (i = 0; i < len / sizeof(struct vring_packed_desc);
> > -                                       i++)
> > +                       for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
>
> Unnecessary changes?


Will fix.

Thanks.

>
>
> >                                 vring_unmap_extra_packed(vq, &extra[i]);
> >                 }
> >                 kfree(desc);
> > @@ -2212,6 +2204,11 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >   * @data: the token identifying the buffer.
> >   * @gfp: how to do memory allocations (if necessary).
> >   *
> > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> > + * mapping in advance, allowing the virtio core to directly utilize
> > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> > + * DMA unmap operations for this buffer will be bypassed.
> > + *
> >   * Caller must ensure we don't call this with other virtqueue operations
> >   * at the same time (except where noted).
> >   *
> > @@ -2246,6 +2243,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
> >   * @data: the token identifying the buffer.
> >   * @gfp: how to do memory allocations (if necessary).
> >   *
> > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> > + * mapping in advance, allowing the virtio core to directly utilize
> > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> > + * DMA unmap operations for this buffer will be bypassed.
> > + *
> >   * Caller must ensure we don't call this with other virtqueue operations
> >   * at the same time (except where noted).
> >   *
> > @@ -2268,6 +2270,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
> >   * @data: the token identifying the buffer.
> >   * @gfp: how to do memory allocations (if necessary).
> >   *
> > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> > + * mapping in advance, allowing the virtio core to directly utilize
> > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> > + * DMA unmap operations for this buffer will be bypassed.
> > + *
> >   * Caller must ensure we don't call this with other virtqueue operations
> >   * at the same time (except where noted).
> >   *
> > @@ -2291,6 +2298,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
> >   * @ctx: extra context for the token
> >   * @gfp: how to do memory allocations (if necessary).
> >   *
> > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
> > + * mapping in advance, allowing the virtio core to directly utilize
> > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
> > + * DMA unmap operations for this buffer will be bypassed.
> > + *
> >   * Caller must ensure we don't call this with other virtqueue operations
> >   * at the same time (except where noted).
> >   *
> > --
> > 2.32.0.3.g01195cf9f
> >
>
> Thanks
>
Jason Wang Sept. 13, 2024, 3:36 a.m. UTC | #3
On Thu, Sep 12, 2024 at 3:43 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 11 Sep 2024 11:54:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 20, 2024 at 3:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > The current configuration sets the virtqueue (vq) to premapped mode,
> > > implying that all buffers submitted to this queue must be mapped ahead
> > > of time. This presents a challenge for the virtnet send queue (sq): the
> > > virtnet driver would be required to keep track of dma information for vq
> > > size * 17, which can be substantial. However, if the premapped mode were
> > > applied on a per-buffer basis, the complexity would be greatly reduced.
> > > With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel
> > > skb buffers could remain unmapped.
> >
> > Is this only applied to TX or both TX and RX.
>
>
> For rx, if you mean per-buffer dma buffer, I think it is yes,
> rx can reuse this. If you mean should we do premapped for the
> normal rx buffers, I think we should, that can reduce the
> dma map operations.
>
>
> >
> > >
> > > We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this
> > > indicates that the driver has performed DMA mapping in advance, allowing
> > > the Virtio core to directly utilize sg_dma_address(sg) without
> > > conducting any internal DMA mapping.
> >
> > This seems conflict with the code below?
> >
> > #define sg_is_premapped(sg) (!sg_page(sg))
>
> Sorry, I do not get for you.
>
> The key point is that the sg->page is setted by driver.

Ok, I forget that but let's document this assumption in the changelog.

>
> So I mean if the driver sets sg->page = NULL, then for this sg,
> the virtio core can skip dma mapping. If the driver sets
> sg->page to the page of the buffer, then the virtio core should
> do dma mapping for this sg.
>

Ok, let's describe this in the changelog.

Thanks
diff mbox series

Patch

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b43eca93015c..7efddc71af67 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -235,6 +235,7 @@  static void vring_free(struct virtqueue *_vq);
  */
 
 #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
+#define sg_is_premapped(sg) (!sg_page(sg))
 
 static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
 				   unsigned int total_sg)
@@ -292,9 +293,10 @@  static bool vring_use_dma_api(const struct virtio_device *vdev)
 	return false;
 }
 
-static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring)
+static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring,
+				    const struct vring_desc_extra *extra)
 {
-	return vring->use_dma_api && !vring->premapped;
+	return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR);
 }
 
 size_t virtio_max_dma_size(const struct virtio_device *vdev)
@@ -366,7 +368,7 @@  static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
 			    enum dma_data_direction direction, dma_addr_t *addr)
 {
-	if (vq->premapped) {
+	if (sg_is_premapped(sg)) {
 		*addr = sg_dma_address(sg);
 		return 0;
 	}
@@ -457,7 +459,7 @@  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		if (!vring_need_unmap_buffer(vq))
+		if (!vring_need_unmap_buffer(vq, extra))
 			goto out;
 
 		dma_unmap_page(vring_dma_dev(vq),
@@ -510,7 +512,7 @@  static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 						    dma_addr_t addr,
 						    unsigned int len,
 						    u16 flags,
-						    bool indirect)
+						    bool indirect, bool premapped)
 {
 	u16 next;
 
@@ -518,7 +520,7 @@  static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 	desc[i].addr = cpu_to_virtio64(vq->vdev, addr);
 	desc[i].len = cpu_to_virtio32(vq->vdev, len);
 
-	extra[i].addr = addr;
+	extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr;
 	extra[i].len = len;
 	extra[i].flags = flags;
 
@@ -611,7 +613,7 @@  static inline int virtqueue_add_split(struct virtqueue *_vq,
 			 */
 			i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length,
 						     VRING_DESC_F_NEXT,
-						     indirect);
+						     indirect, sg_is_premapped(sg));
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
@@ -629,12 +631,12 @@  static inline int virtqueue_add_split(struct virtqueue *_vq,
 						     sg->length,
 						     VRING_DESC_F_NEXT |
 						     VRING_DESC_F_WRITE,
-						     indirect);
+						     indirect, sg_is_premapped(sg));
 		}
 	}
 	/* Last one doesn't continue. */
 	desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
-	if (!indirect && vring_need_unmap_buffer(vq))
+	if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
 		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
 			~VRING_DESC_F_NEXT;
 
@@ -643,19 +645,15 @@  static inline int virtqueue_add_split(struct virtqueue *_vq,
 		dma_addr_t addr = vring_map_single(
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
-		if (vring_mapping_error(vq, addr)) {
-			if (vq->premapped)
-				goto free_indirect;
-
+		if (vring_mapping_error(vq, addr))
 			goto unmap_release;
-		}
 
 		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
 					 vq->split.desc_extra,
 					 head, addr,
 					 total_sg * sizeof(struct vring_desc),
 					 VRING_DESC_F_INDIRECT,
-					 false);
+					 false, false);
 	}
 
 	/* We're using some buffers from the free list. */
@@ -712,7 +710,6 @@  static inline int virtqueue_add_split(struct virtqueue *_vq,
 		i = vring_unmap_one_split(vq, &extra[i]);
 	}
 
-free_indirect:
 	if (indirect)
 		kfree(desc);
 
@@ -794,7 +791,7 @@  static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		if (vring_need_unmap_buffer(vq)) {
+		if (vq->use_dma_api) {
 			for (j = 0; j < len / sizeof(struct vring_desc); j++)
 				vring_unmap_one_split(vq, &extra[j]);
 		}
@@ -1228,7 +1225,7 @@  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		if (!vring_need_unmap_buffer(vq))
+		if (!vring_need_unmap_buffer(vq, extra))
 			return;
 
 		dma_unmap_page(vring_dma_dev(vq),
@@ -1309,7 +1306,7 @@  static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 			i++;
 
 			if (unlikely(vq->use_dma_api)) {
-				extra[i].addr = addr;
+				extra[i].addr = sg_is_premapped(sg) ? DMA_MAPPING_ERROR : addr;
 				extra[i].len = sg->length;
 				extra[i].flags = n < out_sgs ?  0 : VRING_DESC_F_WRITE;
 			}
@@ -1320,12 +1317,8 @@  static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	addr = vring_map_single(vq, desc,
 			total_sg * sizeof(struct vring_packed_desc),
 			DMA_TO_DEVICE);
-	if (vring_mapping_error(vq, addr)) {
-		if (vq->premapped)
-			goto free_desc;
-
+	if (vring_mapping_error(vq, addr))
 		goto unmap_release;
-	}
 
 	vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
 	vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
@@ -1383,7 +1376,6 @@  static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	for (i = 0; i < err_idx; i++)
 		vring_unmap_extra_packed(vq, &extra[i]);
 
-free_desc:
 	kfree(desc);
 
 	END_USE(vq);
@@ -1474,7 +1466,8 @@  static inline int virtqueue_add_packed(struct virtqueue *_vq,
 			desc[i].id = cpu_to_le16(id);
 
 			if (unlikely(vq->use_dma_api)) {
-				vq->packed.desc_extra[curr].addr = addr;
+				vq->packed.desc_extra[curr].addr = sg_is_premapped(sg) ?
+					DMA_MAPPING_ERROR : addr;
 				vq->packed.desc_extra[curr].len = sg->length;
 				vq->packed.desc_extra[curr].flags =
 					le16_to_cpu(flags);
@@ -1625,10 +1618,9 @@  static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (!extra)
 			return;
 
-		if (vring_need_unmap_buffer(vq)) {
+		if (vq->use_dma_api) {
 			len = vq->packed.desc_extra[id].len;
-			for (i = 0; i < len / sizeof(struct vring_packed_desc);
-					i++)
+			for (i = 0; i < len / sizeof(struct vring_packed_desc); i++)
 				vring_unmap_extra_packed(vq, &extra[i]);
 		}
 		kfree(desc);
@@ -2212,6 +2204,11 @@  static inline int virtqueue_add(struct virtqueue *_vq,
  * @data: the token identifying the buffer.
  * @gfp: how to do memory allocations (if necessary).
  *
+ * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
+ * mapping in advance, allowing the virtio core to directly utilize
+ * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
+ * DMA unmap operations for this buffer will be bypassed.
+ *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
@@ -2246,6 +2243,11 @@  EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
  * @data: the token identifying the buffer.
  * @gfp: how to do memory allocations (if necessary).
  *
+ * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
+ * mapping in advance, allowing the virtio core to directly utilize
+ * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
+ * DMA unmap operations for this buffer will be bypassed.
+ *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
@@ -2268,6 +2270,11 @@  EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
  * @data: the token identifying the buffer.
  * @gfp: how to do memory allocations (if necessary).
  *
+ * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
+ * mapping in advance, allowing the virtio core to directly utilize
+ * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
+ * DMA unmap operations for this buffer will be bypassed.
+ *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
@@ -2291,6 +2298,11 @@  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  * @ctx: extra context for the token
  * @gfp: how to do memory allocations (if necessary).
  *
+ * When sg_page(sg) is NULL, this indicates that the driver has performed DMA
+ * mapping in advance, allowing the virtio core to directly utilize
+ * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally,
+ * DMA unmap operations for this buffer will be bypassed.
+ *
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *