Message ID | 20240820073330.9161-5-xuanzhuo@linux.alibaba.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | virtio-net: support AF_XDP zero copy (tx) | expand |
On Tue, Aug 20, 2024 at 3:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > The current configuration sets the virtqueue (vq) to premapped mode, > implying that all buffers submitted to this queue must be mapped ahead > of time. This presents a challenge for the virtnet send queue (sq): the > virtnet driver would be required to keep track of dma information for vq > size * 17, which can be substantial. However, if the premapped mode were > applied on a per-buffer basis, the complexity would be greatly reduced. > With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel > skb buffers could remain unmapped. Is this only applied to TX or both TX and RX. > > We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this > indicates that the driver has performed DMA mapping in advance, allowing > the Virtio core to directly utilize sg_dma_address(sg) without > conducting any internal DMA mapping. This seems conflict with the code below? #define sg_is_premapped(sg) (!sg_page(sg)) And rethink of the design, a question is that is there a chance that VM_PFNMAP area could be used for virtio-net? If it is true, the trick for sg_page() can not work? A quick glance told me AF_XEP seems to be safe as it uses pin_user_pages(), but we need to check other possibilities. Or we need to fall back to our previous idea, having new APIs. > Additionally, DMA unmap operations > for this buffer will be bypassed. > > Suggested-by: Jason Wang <jasowang@redhat.com> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++--------------- > 1 file changed, 41 insertions(+), 29 deletions(-) > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index b43eca93015c..7efddc71af67 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -235,6 +235,7 @@ static void vring_free(struct virtqueue *_vq); > */ > > #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq) > +#define sg_is_premapped(sg) (!sg_page(sg)) > > static bool virtqueue_use_indirect(const struct vring_virtqueue *vq, > unsigned int total_sg) > @@ -292,9 +293,10 @@ static bool vring_use_dma_api(const struct virtio_device *vdev) > return false; > } > > -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring) > +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring, > + const struct vring_desc_extra *extra) > { > - return vring->use_dma_api && !vring->premapped; > + return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR); > } > > size_t virtio_max_dma_size(const struct virtio_device *vdev) > @@ -366,7 +368,7 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq) > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg, > enum dma_data_direction direction, dma_addr_t *addr) > { > - if (vq->premapped) { > + if (sg_is_premapped(sg)) { > *addr = sg_dma_address(sg); > return 0; > } > @@ -457,7 +459,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, > (flags & VRING_DESC_F_WRITE) ? > DMA_FROM_DEVICE : DMA_TO_DEVICE); > } else { > - if (!vring_need_unmap_buffer(vq)) > + if (!vring_need_unmap_buffer(vq, extra)) > goto out; > > dma_unmap_page(vring_dma_dev(vq), > @@ -510,7 +512,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, > dma_addr_t addr, > unsigned int len, > u16 flags, > - bool indirect) > + bool indirect, bool premapped) > { > u16 next; > > @@ -518,7 +520,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, > desc[i].addr = cpu_to_virtio64(vq->vdev, addr); > desc[i].len = cpu_to_virtio32(vq->vdev, len); > > - extra[i].addr = addr; > + extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr; > extra[i].len = len; > extra[i].flags = flags; > > @@ -611,7 +613,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > */ > i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length, > VRING_DESC_F_NEXT, > - indirect); > + indirect, sg_is_premapped(sg)); > } > } > for (; n < (out_sgs + in_sgs); n++) { > @@ -629,12 +631,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > sg->length, > VRING_DESC_F_NEXT | > VRING_DESC_F_WRITE, > - indirect); > + indirect, sg_is_premapped(sg)); > } > } > /* Last one doesn't continue. */ > desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); > - if (!indirect && vring_need_unmap_buffer(vq)) > + if (!indirect && vring_need_unmap_buffer(vq, &extra[prev])) > vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &= > ~VRING_DESC_F_NEXT; > > @@ -643,19 +645,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > dma_addr_t addr = vring_map_single( > vq, desc, total_sg * sizeof(struct vring_desc), > DMA_TO_DEVICE); > - if (vring_mapping_error(vq, addr)) { > - if (vq->premapped) > - goto free_indirect; > - > + if (vring_mapping_error(vq, addr)) > goto unmap_release; > - } > > virtqueue_add_desc_split(_vq, vq->split.vring.desc, > vq->split.desc_extra, > head, addr, > total_sg * sizeof(struct vring_desc), > VRING_DESC_F_INDIRECT, > - false); > + false, false); > } > > /* We're using some buffers from the free list. */ > @@ -712,7 +710,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > i = vring_unmap_one_split(vq, &extra[i]); > } > > -free_indirect: > if (indirect) > kfree(desc); > > @@ -794,7 +791,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, > VRING_DESC_F_INDIRECT)); > BUG_ON(len == 0 || len % sizeof(struct vring_desc)); > > - if (vring_need_unmap_buffer(vq)) { > + if (vq->use_dma_api) { > for (j = 0; j < len / sizeof(struct vring_desc); j++) > vring_unmap_one_split(vq, &extra[j]); > } > @@ -1228,7 +1225,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq, > (flags & VRING_DESC_F_WRITE) ? > DMA_FROM_DEVICE : DMA_TO_DEVICE); > } else { > - if (!vring_need_unmap_buffer(vq)) > + if (!vring_need_unmap_buffer(vq, extra)) > return; > > dma_unmap_page(vring_dma_dev(vq), > @@ -1309,7 +1306,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > i++; > > if (unlikely(vq->use_dma_api)) { > - extra[i].addr = addr; > + extra[i].addr = sg_is_premapped(sg) ? DMA_MAPPING_ERROR : addr; > extra[i].len = sg->length; > extra[i].flags = n < out_sgs ? 0 : VRING_DESC_F_WRITE; > } > @@ -1320,12 +1317,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > addr = vring_map_single(vq, desc, > total_sg * sizeof(struct vring_packed_desc), > DMA_TO_DEVICE); > - if (vring_mapping_error(vq, addr)) { > - if (vq->premapped) > - goto free_desc; > - > + if (vring_mapping_error(vq, addr)) > goto unmap_release; > - } > > vq->packed.vring.desc[head].addr = cpu_to_le64(addr); > vq->packed.vring.desc[head].len = cpu_to_le32(total_sg * > @@ -1383,7 +1376,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > for (i = 0; i < err_idx; i++) > vring_unmap_extra_packed(vq, &extra[i]); > > -free_desc: > kfree(desc); > > END_USE(vq); > @@ -1474,7 +1466,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, > desc[i].id = cpu_to_le16(id); > > if (unlikely(vq->use_dma_api)) { > - vq->packed.desc_extra[curr].addr = addr; > + vq->packed.desc_extra[curr].addr = sg_is_premapped(sg) ? > + DMA_MAPPING_ERROR : addr; > vq->packed.desc_extra[curr].len = sg->length; > vq->packed.desc_extra[curr].flags = > le16_to_cpu(flags); > @@ -1625,10 +1618,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq, > if (!extra) > return; > > - if (vring_need_unmap_buffer(vq)) { > + if (vq->use_dma_api) { > len = vq->packed.desc_extra[id].len; > - for (i = 0; i < len / sizeof(struct vring_packed_desc); > - i++) > + for (i = 0; i < len / sizeof(struct vring_packed_desc); i++) Unnecessary changes? > vring_unmap_extra_packed(vq, &extra[i]); > } > kfree(desc); > @@ -2212,6 +2204,11 @@ static inline int virtqueue_add(struct virtqueue *_vq, > * @data: the token identifying the buffer. > * @gfp: how to do memory allocations (if necessary). > * > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > + * mapping in advance, allowing the virtio core to directly utilize > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > + * DMA unmap operations for this buffer will be bypassed. > + * > * Caller must ensure we don't call this with other virtqueue operations > * at the same time (except where noted). > * > @@ -2246,6 +2243,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs); > * @data: the token identifying the buffer. > * @gfp: how to do memory allocations (if necessary). > * > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > + * mapping in advance, allowing the virtio core to directly utilize > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > + * DMA unmap operations for this buffer will be bypassed. > + * > * Caller must ensure we don't call this with other virtqueue operations > * at the same time (except where noted). > * > @@ -2268,6 +2270,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf); > * @data: the token identifying the buffer. > * @gfp: how to do memory allocations (if necessary). > * > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > + * mapping in advance, allowing the virtio core to directly utilize > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > + * DMA unmap operations for this buffer will be bypassed. > + * > * Caller must ensure we don't call this with other virtqueue operations > * at the same time (except where noted). > * > @@ -2291,6 +2298,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf); > * @ctx: extra context for the token > * @gfp: how to do memory allocations (if necessary). > * > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > + * mapping in advance, allowing the virtio core to directly utilize > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > + * DMA unmap operations for this buffer will be bypassed. > + * > * Caller must ensure we don't call this with other virtqueue operations > * at the same time (except where noted). > * > -- > 2.32.0.3.g01195cf9f > Thanks
On Wed, 11 Sep 2024 11:54:25 +0800, Jason Wang <jasowang@redhat.com> wrote: > On Tue, Aug 20, 2024 at 3:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > The current configuration sets the virtqueue (vq) to premapped mode, > > implying that all buffers submitted to this queue must be mapped ahead > > of time. This presents a challenge for the virtnet send queue (sq): the > > virtnet driver would be required to keep track of dma information for vq > > size * 17, which can be substantial. However, if the premapped mode were > > applied on a per-buffer basis, the complexity would be greatly reduced. > > With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel > > skb buffers could remain unmapped. > > Is this only applied to TX or both TX and RX. For rx, if you mean per-buffer dma buffer, I think it is yes, rx can reuse this. If you mean should we do premapped for the normal rx buffers, I think we should, that can reduce the dma map operations. > > > > > We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this > > indicates that the driver has performed DMA mapping in advance, allowing > > the Virtio core to directly utilize sg_dma_address(sg) without > > conducting any internal DMA mapping. > > This seems conflict with the code below? > > #define sg_is_premapped(sg) (!sg_page(sg)) Sorry, I do not get for you. The key point is that the sg->page is setted by driver. So I mean if the driver sets sg->page = NULL, then for this sg, the virtio core can skip dma mapping. If the driver sets sg->page to the page of the buffer, then the virtio core should do dma mapping for this sg. > > And rethink of the design, a question is that is there a chance that > VM_PFNMAP area could be used for virtio-net? If it is true, the trick > for sg_page() can not work? > > A quick glance told me AF_XEP seems to be safe as it uses > pin_user_pages(), but we need to check other possibilities. > > Or we need to fall back to our previous idea, having new APIs. > > > Additionally, DMA unmap operations > > for this buffer will be bypassed. > > > > Suggested-by: Jason Wang <jasowang@redhat.com> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > --- > > drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++--------------- > > 1 file changed, 41 insertions(+), 29 deletions(-) > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index b43eca93015c..7efddc71af67 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -235,6 +235,7 @@ static void vring_free(struct virtqueue *_vq); > > */ > > > > #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq) > > +#define sg_is_premapped(sg) (!sg_page(sg)) > > > > static bool virtqueue_use_indirect(const struct vring_virtqueue *vq, > > unsigned int total_sg) > > @@ -292,9 +293,10 @@ static bool vring_use_dma_api(const struct virtio_device *vdev) > > return false; > > } > > > > -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring) > > +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring, > > + const struct vring_desc_extra *extra) > > { > > - return vring->use_dma_api && !vring->premapped; > > + return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR); > > } > > > > size_t virtio_max_dma_size(const struct virtio_device *vdev) > > @@ -366,7 +368,7 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq) > > static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg, > > enum dma_data_direction direction, dma_addr_t *addr) > > { > > - if (vq->premapped) { > > + if (sg_is_premapped(sg)) { > > *addr = sg_dma_address(sg); > > return 0; > > } > > @@ -457,7 +459,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, > > (flags & VRING_DESC_F_WRITE) ? > > DMA_FROM_DEVICE : DMA_TO_DEVICE); > > } else { > > - if (!vring_need_unmap_buffer(vq)) > > + if (!vring_need_unmap_buffer(vq, extra)) > > goto out; > > > > dma_unmap_page(vring_dma_dev(vq), > > @@ -510,7 +512,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, > > dma_addr_t addr, > > unsigned int len, > > u16 flags, > > - bool indirect) > > + bool indirect, bool premapped) > > { > > u16 next; > > > > @@ -518,7 +520,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, > > desc[i].addr = cpu_to_virtio64(vq->vdev, addr); > > desc[i].len = cpu_to_virtio32(vq->vdev, len); > > > > - extra[i].addr = addr; > > + extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr; > > extra[i].len = len; > > extra[i].flags = flags; > > > > @@ -611,7 +613,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > > */ > > i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length, > > VRING_DESC_F_NEXT, > > - indirect); > > + indirect, sg_is_premapped(sg)); > > } > > } > > for (; n < (out_sgs + in_sgs); n++) { > > @@ -629,12 +631,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > > sg->length, > > VRING_DESC_F_NEXT | > > VRING_DESC_F_WRITE, > > - indirect); > > + indirect, sg_is_premapped(sg)); > > } > > } > > /* Last one doesn't continue. */ > > desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); > > - if (!indirect && vring_need_unmap_buffer(vq)) > > + if (!indirect && vring_need_unmap_buffer(vq, &extra[prev])) > > vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &= > > ~VRING_DESC_F_NEXT; > > > > @@ -643,19 +645,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > > dma_addr_t addr = vring_map_single( > > vq, desc, total_sg * sizeof(struct vring_desc), > > DMA_TO_DEVICE); > > - if (vring_mapping_error(vq, addr)) { > > - if (vq->premapped) > > - goto free_indirect; > > - > > + if (vring_mapping_error(vq, addr)) > > goto unmap_release; > > - } > > > > virtqueue_add_desc_split(_vq, vq->split.vring.desc, > > vq->split.desc_extra, > > head, addr, > > total_sg * sizeof(struct vring_desc), > > VRING_DESC_F_INDIRECT, > > - false); > > + false, false); > > } > > > > /* We're using some buffers from the free list. */ > > @@ -712,7 +710,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, > > i = vring_unmap_one_split(vq, &extra[i]); > > } > > > > -free_indirect: > > if (indirect) > > kfree(desc); > > > > @@ -794,7 +791,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, > > VRING_DESC_F_INDIRECT)); > > BUG_ON(len == 0 || len % sizeof(struct vring_desc)); > > > > - if (vring_need_unmap_buffer(vq)) { > > + if (vq->use_dma_api) { > > for (j = 0; j < len / sizeof(struct vring_desc); j++) > > vring_unmap_one_split(vq, &extra[j]); > > } > > @@ -1228,7 +1225,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq, > > (flags & VRING_DESC_F_WRITE) ? > > DMA_FROM_DEVICE : DMA_TO_DEVICE); > > } else { > > - if (!vring_need_unmap_buffer(vq)) > > + if (!vring_need_unmap_buffer(vq, extra)) > > return; > > > > dma_unmap_page(vring_dma_dev(vq), > > @@ -1309,7 +1306,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > > i++; > > > > if (unlikely(vq->use_dma_api)) { > > - extra[i].addr = addr; > > + extra[i].addr = sg_is_premapped(sg) ? DMA_MAPPING_ERROR : addr; > > extra[i].len = sg->length; > > extra[i].flags = n < out_sgs ? 0 : VRING_DESC_F_WRITE; > > } > > @@ -1320,12 +1317,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > > addr = vring_map_single(vq, desc, > > total_sg * sizeof(struct vring_packed_desc), > > DMA_TO_DEVICE); > > - if (vring_mapping_error(vq, addr)) { > > - if (vq->premapped) > > - goto free_desc; > > - > > + if (vring_mapping_error(vq, addr)) > > goto unmap_release; > > - } > > > > vq->packed.vring.desc[head].addr = cpu_to_le64(addr); > > vq->packed.vring.desc[head].len = cpu_to_le32(total_sg * > > @@ -1383,7 +1376,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, > > for (i = 0; i < err_idx; i++) > > vring_unmap_extra_packed(vq, &extra[i]); > > > > -free_desc: > > kfree(desc); > > > > END_USE(vq); > > @@ -1474,7 +1466,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, > > desc[i].id = cpu_to_le16(id); > > > > if (unlikely(vq->use_dma_api)) { > > - vq->packed.desc_extra[curr].addr = addr; > > + vq->packed.desc_extra[curr].addr = sg_is_premapped(sg) ? > > + DMA_MAPPING_ERROR : addr; > > vq->packed.desc_extra[curr].len = sg->length; > > vq->packed.desc_extra[curr].flags = > > le16_to_cpu(flags); > > @@ -1625,10 +1618,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq, > > if (!extra) > > return; > > > > - if (vring_need_unmap_buffer(vq)) { > > + if (vq->use_dma_api) { > > len = vq->packed.desc_extra[id].len; > > - for (i = 0; i < len / sizeof(struct vring_packed_desc); > > - i++) > > + for (i = 0; i < len / sizeof(struct vring_packed_desc); i++) > > Unnecessary changes? Will fix. Thanks. > > > > vring_unmap_extra_packed(vq, &extra[i]); > > } > > kfree(desc); > > @@ -2212,6 +2204,11 @@ static inline int virtqueue_add(struct virtqueue *_vq, > > * @data: the token identifying the buffer. > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operations > > * at the same time (except where noted). > > * > > @@ -2246,6 +2243,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs); > > * @data: the token identifying the buffer. > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operations > > * at the same time (except where noted). > > * > > @@ -2268,6 +2270,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf); > > * @data: the token identifying the buffer. > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operations > > * at the same time (except where noted). > > * > > @@ -2291,6 +2298,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf); > > * @ctx: extra context for the token > > * @gfp: how to do memory allocations (if necessary). > > * > > + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA > > + * mapping in advance, allowing the virtio core to directly utilize > > + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, > > + * DMA unmap operations for this buffer will be bypassed. > > + * > > * Caller must ensure we don't call this with other virtqueue operations > > * at the same time (except where noted). > > * > > -- > > 2.32.0.3.g01195cf9f > > > > Thanks >
On Thu, Sep 12, 2024 at 3:43 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > On Wed, 11 Sep 2024 11:54:25 +0800, Jason Wang <jasowang@redhat.com> wrote: > > On Tue, Aug 20, 2024 at 3:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote: > > > > > > The current configuration sets the virtqueue (vq) to premapped mode, > > > implying that all buffers submitted to this queue must be mapped ahead > > > of time. This presents a challenge for the virtnet send queue (sq): the > > > virtnet driver would be required to keep track of dma information for vq > > > size * 17, which can be substantial. However, if the premapped mode were > > > applied on a per-buffer basis, the complexity would be greatly reduced. > > > With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel > > > skb buffers could remain unmapped. > > > > Is this only applied to TX or both TX and RX. > > > For rx, if you mean per-buffer dma buffer, I think it is yes, > rx can reuse this. If you mean should we do premapped for the > normal rx buffers, I think we should, that can reduce the > dma map operations. > > > > > > > > > > We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this > > > indicates that the driver has performed DMA mapping in advance, allowing > > > the Virtio core to directly utilize sg_dma_address(sg) without > > > conducting any internal DMA mapping. > > > > This seems conflict with the code below? > > > > #define sg_is_premapped(sg) (!sg_page(sg)) > > Sorry, I do not get for you. > > The key point is that the sg->page is setted by driver. Ok, I forget that but let's document this assumption in the changelog. > > So I mean if the driver sets sg->page = NULL, then for this sg, > the virtio core can skip dma mapping. If the driver sets > sg->page to the page of the buffer, then the virtio core should > do dma mapping for this sg. > Ok, let's describe this in the changelog. Thanks
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index b43eca93015c..7efddc71af67 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -235,6 +235,7 @@ static void vring_free(struct virtqueue *_vq); */ #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq) +#define sg_is_premapped(sg) (!sg_page(sg)) static bool virtqueue_use_indirect(const struct vring_virtqueue *vq, unsigned int total_sg) @@ -292,9 +293,10 @@ static bool vring_use_dma_api(const struct virtio_device *vdev) return false; } -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring) +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring, + const struct vring_desc_extra *extra) { - return vring->use_dma_api && !vring->premapped; + return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR); } size_t virtio_max_dma_size(const struct virtio_device *vdev) @@ -366,7 +368,7 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq) static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg, enum dma_data_direction direction, dma_addr_t *addr) { - if (vq->premapped) { + if (sg_is_premapped(sg)) { *addr = sg_dma_address(sg); return 0; } @@ -457,7 +459,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { - if (!vring_need_unmap_buffer(vq)) + if (!vring_need_unmap_buffer(vq, extra)) goto out; dma_unmap_page(vring_dma_dev(vq), @@ -510,7 +512,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, dma_addr_t addr, unsigned int len, u16 flags, - bool indirect) + bool indirect, bool premapped) { u16 next; @@ -518,7 +520,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, desc[i].addr = cpu_to_virtio64(vq->vdev, addr); desc[i].len = cpu_to_virtio32(vq->vdev, len); - extra[i].addr = addr; + extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr; extra[i].len = len; extra[i].flags = flags; @@ -611,7 +613,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, */ i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length, VRING_DESC_F_NEXT, - indirect); + indirect, sg_is_premapped(sg)); } } for (; n < (out_sgs + in_sgs); n++) { @@ -629,12 +631,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, sg->length, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE, - indirect); + indirect, sg_is_premapped(sg)); } } /* Last one doesn't continue. */ desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); - if (!indirect && vring_need_unmap_buffer(vq)) + if (!indirect && vring_need_unmap_buffer(vq, &extra[prev])) vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &= ~VRING_DESC_F_NEXT; @@ -643,19 +645,15 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, dma_addr_t addr = vring_map_single( vq, desc, total_sg * sizeof(struct vring_desc), DMA_TO_DEVICE); - if (vring_mapping_error(vq, addr)) { - if (vq->premapped) - goto free_indirect; - + if (vring_mapping_error(vq, addr)) goto unmap_release; - } virtqueue_add_desc_split(_vq, vq->split.vring.desc, vq->split.desc_extra, head, addr, total_sg * sizeof(struct vring_desc), VRING_DESC_F_INDIRECT, - false); + false, false); } /* We're using some buffers from the free list. */ @@ -712,7 +710,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, i = vring_unmap_one_split(vq, &extra[i]); } -free_indirect: if (indirect) kfree(desc); @@ -794,7 +791,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, VRING_DESC_F_INDIRECT)); BUG_ON(len == 0 || len % sizeof(struct vring_desc)); - if (vring_need_unmap_buffer(vq)) { + if (vq->use_dma_api) { for (j = 0; j < len / sizeof(struct vring_desc); j++) vring_unmap_one_split(vq, &extra[j]); } @@ -1228,7 +1225,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { - if (!vring_need_unmap_buffer(vq)) + if (!vring_need_unmap_buffer(vq, extra)) return; dma_unmap_page(vring_dma_dev(vq), @@ -1309,7 +1306,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, i++; if (unlikely(vq->use_dma_api)) { - extra[i].addr = addr; + extra[i].addr = sg_is_premapped(sg) ? DMA_MAPPING_ERROR : addr; extra[i].len = sg->length; extra[i].flags = n < out_sgs ? 0 : VRING_DESC_F_WRITE; } @@ -1320,12 +1317,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, addr = vring_map_single(vq, desc, total_sg * sizeof(struct vring_packed_desc), DMA_TO_DEVICE); - if (vring_mapping_error(vq, addr)) { - if (vq->premapped) - goto free_desc; - + if (vring_mapping_error(vq, addr)) goto unmap_release; - } vq->packed.vring.desc[head].addr = cpu_to_le64(addr); vq->packed.vring.desc[head].len = cpu_to_le32(total_sg * @@ -1383,7 +1376,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, for (i = 0; i < err_idx; i++) vring_unmap_extra_packed(vq, &extra[i]); -free_desc: kfree(desc); END_USE(vq); @@ -1474,7 +1466,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, desc[i].id = cpu_to_le16(id); if (unlikely(vq->use_dma_api)) { - vq->packed.desc_extra[curr].addr = addr; + vq->packed.desc_extra[curr].addr = sg_is_premapped(sg) ? + DMA_MAPPING_ERROR : addr; vq->packed.desc_extra[curr].len = sg->length; vq->packed.desc_extra[curr].flags = le16_to_cpu(flags); @@ -1625,10 +1618,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq, if (!extra) return; - if (vring_need_unmap_buffer(vq)) { + if (vq->use_dma_api) { len = vq->packed.desc_extra[id].len; - for (i = 0; i < len / sizeof(struct vring_packed_desc); - i++) + for (i = 0; i < len / sizeof(struct vring_packed_desc); i++) vring_unmap_extra_packed(vq, &extra[i]); } kfree(desc); @@ -2212,6 +2204,11 @@ static inline int virtqueue_add(struct virtqueue *_vq, * @data: the token identifying the buffer. * @gfp: how to do memory allocations (if necessary). * + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA + * mapping in advance, allowing the virtio core to directly utilize + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, + * DMA unmap operations for this buffer will be bypassed. + * * Caller must ensure we don't call this with other virtqueue operations * at the same time (except where noted). * @@ -2246,6 +2243,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs); * @data: the token identifying the buffer. * @gfp: how to do memory allocations (if necessary). * + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA + * mapping in advance, allowing the virtio core to directly utilize + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, + * DMA unmap operations for this buffer will be bypassed. + * * Caller must ensure we don't call this with other virtqueue operations * at the same time (except where noted). * @@ -2268,6 +2270,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf); * @data: the token identifying the buffer. * @gfp: how to do memory allocations (if necessary). * + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA + * mapping in advance, allowing the virtio core to directly utilize + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, + * DMA unmap operations for this buffer will be bypassed. + * * Caller must ensure we don't call this with other virtqueue operations * at the same time (except where noted). * @@ -2291,6 +2298,11 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf); * @ctx: extra context for the token * @gfp: how to do memory allocations (if necessary). * + * When sg_page(sg) is NULL, this indicates that the driver has performed DMA + * mapping in advance, allowing the virtio core to directly utilize + * sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, + * DMA unmap operations for this buffer will be bypassed. + * * Caller must ensure we don't call this with other virtqueue operations * at the same time (except where noted). *
The current configuration sets the virtqueue (vq) to premapped mode, implying that all buffers submitted to this queue must be mapped ahead of time. This presents a challenge for the virtnet send queue (sq): the virtnet driver would be required to keep track of dma information for vq size * 17, which can be substantial. However, if the premapped mode were applied on a per-buffer basis, the complexity would be greatly reduced. With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel skb buffers could remain unmapped. We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this indicates that the driver has performed DMA mapping in advance, allowing the Virtio core to directly utilize sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, DMA unmap operations for this buffer will be bypassed. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 70 +++++++++++++++++++++--------------- 1 file changed, 41 insertions(+), 29 deletions(-)