diff mbox series

[vhost,v13,05/12] virtio_ring: introduce virtqueue_dma_dev()

Message ID 20230810123057.43407-6-xuanzhuo@linux.alibaba.com (mailing list archive)
State Not Applicable
Headers show
Series virtio core prepares for AF_XDP | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Xuan Zhuo Aug. 10, 2023, 12:30 p.m. UTC
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 19 insertions(+)

Comments

Jason Wang Aug. 14, 2023, 3:05 a.m. UTC | #1
On Thu, Aug 10, 2023 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

So I think we don't have actual users for this in this series? Can we
simply have another independent patch for this?

> ---
>  drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  2 files changed, 19 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index f9f772e85a38..bb3d73d221cd 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
>
> +/**
> + * virtqueue_dma_dev - get the dma dev
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Returns the dma dev. That can been used for dma api.
> + */
> +struct device *virtqueue_dma_dev(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +       if (vq->use_dma_api)
> +               return vring_dma_dev(vq);
> +       else
> +               return NULL;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_dma_dev);

One possible concern is that exporting things like NULL may result in
the switch in the caller (driver). I wonder if it's better to do
BUG_ON() in the path of NULL?

Thanks

> +
>  /**
>   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
>   * @_vq: the struct virtqueue
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 8add38038877..bd55a05eec04 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
>                       void *data,
>                       gfp_t gfp);
>
> +struct device *virtqueue_dma_dev(struct virtqueue *vq);
> +
>  bool virtqueue_kick(struct virtqueue *vq);
>
>  bool virtqueue_kick_prepare(struct virtqueue *vq);
> --
> 2.32.0.3.g01195cf9f
>
Xuan Zhuo Aug. 14, 2023, 8:56 a.m. UTC | #2
On Mon, 14 Aug 2023 11:05:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Aug 10, 2023 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > caller can do dma operation in advance. The purpose is to keep memory
> > mapped across multiple add/get buf operations.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> So I think we don't have actual users for this in this series? Can we
> simply have another independent patch for this?

I am ok. I will remove this from the next version.

But I also help merge this to 6.6. Then we can let the virtio-net to support
AF_XDP in 6.7+.


>
> > ---
> >  drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
> >  include/linux/virtio.h       |  2 ++
> >  2 files changed, 19 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index f9f772e85a38..bb3d73d221cd 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
> >
> > +/**
> > + * virtqueue_dma_dev - get the dma dev
> > + * @_vq: the struct virtqueue we're talking about.
> > + *
> > + * Returns the dma dev. That can been used for dma api.
> > + */
> > +struct device *virtqueue_dma_dev(struct virtqueue *_vq)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +       if (vq->use_dma_api)
> > +               return vring_dma_dev(vq);
> > +       else
> > +               return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
>
> One possible concern is that exporting things like NULL may result in
> the switch in the caller (driver). I wonder if it's better to do
> BUG_ON() in the path of NULL?


I agree.

But we need a new helper to tell the driver(or AF_XDP) that the device support
ACCESS_PLATFORM or not.

We need a switch, but we can make the switch is irrelevant to the DMA.

Thanks.



>
> Thanks
>
> > +
> >  /**
> >   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
> >   * @_vq: the struct virtqueue
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 8add38038877..bd55a05eec04 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
> >                       void *data,
> >                       gfp_t gfp);
> >
> > +struct device *virtqueue_dma_dev(struct virtqueue *vq);
> > +
> >  bool virtqueue_kick(struct virtqueue *vq);
> >
> >  bool virtqueue_kick_prepare(struct virtqueue *vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
>
Michael S. Tsirkin Aug. 14, 2023, 11:24 a.m. UTC | #3
On Mon, Aug 14, 2023 at 04:56:53PM +0800, Xuan Zhuo wrote:
> On Mon, 14 Aug 2023 11:05:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Aug 10, 2023 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > caller can do dma operation in advance. The purpose is to keep memory
> > > mapped across multiple add/get buf operations.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > Acked-by: Jason Wang <jasowang@redhat.com>
> >
> > So I think we don't have actual users for this in this series? Can we
> > simply have another independent patch for this?
> 
> I am ok. I will remove this from the next version.
> 
> But I also help merge this to 6.6. Then we can let the virtio-net to support
> AF_XDP in 6.7+.

Is there going to be a next version? Because if yes it will be too late for the next release.
if all you want to do is drop this patch then just say so, no need
for another version.

> 
> >
> > > ---
> > >  drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
> > >  include/linux/virtio.h       |  2 ++
> > >  2 files changed, 19 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index f9f772e85a38..bb3d73d221cd 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
> > >
> > > +/**
> > > + * virtqueue_dma_dev - get the dma dev
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + *
> > > + * Returns the dma dev. That can been used for dma api.
> > > + */
> > > +struct device *virtqueue_dma_dev(struct virtqueue *_vq)
> > > +{
> > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +       if (vq->use_dma_api)
> > > +               return vring_dma_dev(vq);
> > > +       else
> > > +               return NULL;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
> >
> > One possible concern is that exporting things like NULL may result in
> > the switch in the caller (driver). I wonder if it's better to do
> > BUG_ON() in the path of NULL?
> 
> 
> I agree.
> 
> But we need a new helper to tell the driver(or AF_XDP) that the device support
> ACCESS_PLATFORM or not.
> 
> We need a switch, but we can make the switch is irrelevant to the DMA.
> 
> Thanks.
> 
> 
> 
> >
> > Thanks
> >
> > > +
> > >  /**
> > >   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
> > >   * @_vq: the struct virtqueue
> > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > index 8add38038877..bd55a05eec04 100644
> > > --- a/include/linux/virtio.h
> > > +++ b/include/linux/virtio.h
> > > @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
> > >                       void *data,
> > >                       gfp_t gfp);
> > >
> > > +struct device *virtqueue_dma_dev(struct virtqueue *vq);
> > > +
> > >  bool virtqueue_kick(struct virtqueue *vq);
> > >
> > >  bool virtqueue_kick_prepare(struct virtqueue *vq);
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
Xuan Zhuo Aug. 14, 2023, 11:55 a.m. UTC | #4
On Mon, 14 Aug 2023 07:24:59 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Aug 14, 2023 at 04:56:53PM +0800, Xuan Zhuo wrote:
> > On Mon, 14 Aug 2023 11:05:49 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Aug 10, 2023 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > >
> > > So I think we don't have actual users for this in this series? Can we
> > > simply have another independent patch for this?
> >
> > I am ok. I will remove this from the next version.
> >
> > But I also help merge this to 6.6. Then we can let the virtio-net to support
> > AF_XDP in 6.7+.
>
> Is there going to be a next version? Because if yes it will be too late for the next release.
> if all you want to do is drop this patch then just say so, no need
> for another version.


For me, I want that this patch can be merged to 6.6. Because that the AF_XDP
needs this.

Thanks.


>
> >
> > >
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
> > > >  include/linux/virtio.h       |  2 ++
> > > >  2 files changed, 19 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index f9f772e85a38..bb3d73d221cd 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
> > > >
> > > > +/**
> > > > + * virtqueue_dma_dev - get the dma dev
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + *
> > > > + * Returns the dma dev. That can been used for dma api.
> > > > + */
> > > > +struct device *virtqueue_dma_dev(struct virtqueue *_vq)
> > > > +{
> > > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +       if (vq->use_dma_api)
> > > > +               return vring_dma_dev(vq);
> > > > +       else
> > > > +               return NULL;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
> > >
> > > One possible concern is that exporting things like NULL may result in
> > > the switch in the caller (driver). I wonder if it's better to do
> > > BUG_ON() in the path of NULL?
> >
> >
> > I agree.
> >
> > But we need a new helper to tell the driver(or AF_XDP) that the device support
> > ACCESS_PLATFORM or not.
> >
> > We need a switch, but we can make the switch is irrelevant to the DMA.
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > > +
> > > >  /**
> > > >   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
> > > >   * @_vq: the struct virtqueue
> > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > index 8add38038877..bd55a05eec04 100644
> > > > --- a/include/linux/virtio.h
> > > > +++ b/include/linux/virtio.h
> > > > @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
> > > >                       void *data,
> > > >                       gfp_t gfp);
> > > >
> > > > +struct device *virtqueue_dma_dev(struct virtqueue *vq);
> > > > +
> > > >  bool virtqueue_kick(struct virtqueue *vq);
> > > >
> > > >  bool virtqueue_kick_prepare(struct virtqueue *vq);
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
>
Xuan Zhuo Aug. 15, 2023, 6:30 a.m. UTC | #5
Hi, Jason

Could you skip this patch?

Let we review other patches firstly?

Thanks.
Jason Wang Aug. 15, 2023, 7:50 a.m. UTC | #6
On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
>
> Hi, Jason
>
> Could you skip this patch?

I'm fine with either merging or dropping this.

>
> Let we review other patches firstly?

I will be on vacation soon, and won't have time to do this until next week.

But I spot two possible "issues":

1) the DMA metadata were stored in the headroom of the page, this
breaks frags coalescing, we need to benchmark it's impact
2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT

I see Michael has merge this series so I'm fine to let it go first.

Thanks

>
> Thanks.
>
Xuan Zhuo Aug. 15, 2023, 9:27 a.m. UTC | #7
On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> >
> > Hi, Jason
> >
> > Could you skip this patch?
>
> I'm fine with either merging or dropping this.
>
> >
> > Let we review other patches firstly?
>
> I will be on vacation soon, and won't have time to do this until next week.

Have a happly vacation.

>
> But I spot two possible "issues":
>
> 1) the DMA metadata were stored in the headroom of the page, this
> breaks frags coalescing, we need to benchmark it's impact

Not every page, just the first page of the COMP pages.

So I think there is no impact.


> 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT

Because that the tx is not the premapped mode.

Thanks.

>
> I see Michael has merge this series so I'm fine to let it go first.
>
> Thanks
>
> >
> > Thanks.
> >
>
Michael S. Tsirkin Aug. 15, 2023, 11:57 a.m. UTC | #8
On Tue, Aug 15, 2023 at 03:50:23PM +0800, Jason Wang wrote:
> On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> >
> > Hi, Jason
> >
> > Could you skip this patch?
> 
> I'm fine with either merging or dropping this.
> 
> >
> > Let we review other patches firstly?
> 
> I will be on vacation soon, and won't have time to do this until next week.
> 
> But I spot two possible "issues":
> 
> 1) the DMA metadata were stored in the headroom of the page, this
> breaks frags coalescing, we need to benchmark it's impact
> 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> 
> I see Michael has merge this series so I'm fine to let it go first.
> 
> Thanks

it's still queued for next. not too late to drop or better add
a patch on top.


> >
> > Thanks.
> >
Jason Wang Aug. 16, 2023, 1:13 a.m. UTC | #9
On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > >
> > > Hi, Jason
> > >
> > > Could you skip this patch?
> >
> > I'm fine with either merging or dropping this.
> >
> > >
> > > Let we review other patches firstly?
> >
> > I will be on vacation soon, and won't have time to do this until next week.
>
> Have a happly vacation.
>
> >
> > But I spot two possible "issues":
> >
> > 1) the DMA metadata were stored in the headroom of the page, this
> > breaks frags coalescing, we need to benchmark it's impact
>
> Not every page, just the first page of the COMP pages.
>
> So I think there is no impact.

Nope, see this:

        if (SKB_FRAG_PAGE_ORDER &&
            !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
                /* Avoid direct reclaim but allow kswapd to wake */
                pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
                                          __GFP_COMP | __GFP_NOWARN |
                                          __GFP_NORETRY,
                                          SKB_FRAG_PAGE_ORDER);
                if (likely(pfrag->page)) {
                        pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
                        return true;
                }
        }

The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
net_high_order_alloc_disable_key.

>
>
> > 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
>
> Because that the tx is not the premapped mode.

Yes, we can optimize this on top.

Thanks

>
> Thanks.
>
> >
> > I see Michael has merge this series so I'm fine to let it go first.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> >
>
Xuan Zhuo Aug. 16, 2023, 2:08 a.m. UTC | #10
On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > >
> > > > Hi, Jason
> > > >
> > > > Could you skip this patch?
> > >
> > > I'm fine with either merging or dropping this.
> > >
> > > >
> > > > Let we review other patches firstly?
> > >
> > > I will be on vacation soon, and won't have time to do this until next week.
> >
> > Have a happly vacation.
> >
> > >
> > > But I spot two possible "issues":
> > >
> > > 1) the DMA metadata were stored in the headroom of the page, this
> > > breaks frags coalescing, we need to benchmark it's impact
> >
> > Not every page, just the first page of the COMP pages.
> >
> > So I think there is no impact.
>
> Nope, see this:
>
>         if (SKB_FRAG_PAGE_ORDER &&
>             !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
>                 /* Avoid direct reclaim but allow kswapd to wake */
>                 pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>                                           __GFP_COMP | __GFP_NOWARN |
>                                           __GFP_NORETRY,
>                                           SKB_FRAG_PAGE_ORDER);
>                 if (likely(pfrag->page)) {
>                         pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>                         return true;
>                 }
>         }
>
> The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> net_high_order_alloc_disable_key.


YES.

But if comp page is disabled. Then we only get one page each time. The pages are
not contiguous, so we don't have frags coalescing.

If you mean the two pages got from alloc_page may be contiguous. The coalescing
may then be broken. It's a possibility, but I think the impact will be small.

Thanks.


>
> >
> >
> > > 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> >
> > Because that the tx is not the premapped mode.
>
> Yes, we can optimize this on top.
>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > I see Michael has merge this series so I'm fine to let it go first.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > >
> >
>
Jason Wang Aug. 16, 2023, 2:19 a.m. UTC | #11
On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > >
> > > > > Hi, Jason
> > > > >
> > > > > Could you skip this patch?
> > > >
> > > > I'm fine with either merging or dropping this.
> > > >
> > > > >
> > > > > Let we review other patches firstly?
> > > >
> > > > I will be on vacation soon, and won't have time to do this until next week.
> > >
> > > Have a happly vacation.
> > >
> > > >
> > > > But I spot two possible "issues":
> > > >
> > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > breaks frags coalescing, we need to benchmark it's impact
> > >
> > > Not every page, just the first page of the COMP pages.
> > >
> > > So I think there is no impact.
> >
> > Nope, see this:
> >
> >         if (SKB_FRAG_PAGE_ORDER &&
> >             !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> >                 /* Avoid direct reclaim but allow kswapd to wake */
> >                 pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >                                           __GFP_COMP | __GFP_NOWARN |
> >                                           __GFP_NORETRY,
> >                                           SKB_FRAG_PAGE_ORDER);
> >                 if (likely(pfrag->page)) {
> >                         pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> >                         return true;
> >                 }
> >         }
> >
> > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > net_high_order_alloc_disable_key.
>
>
> YES.
>
> But if comp page is disabled. Then we only get one page each time. The pages are
> not contiguous, so we don't have frags coalescing.
>
> If you mean the two pages got from alloc_page may be contiguous. The coalescing
> may then be broken. It's a possibility, but I think the impact will be small.

Let's have a simple benchmark and see?

Thanks

>
> Thanks.
>
>
> >
> > >
> > >
> > > > 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> > >
> > > Because that the tx is not the premapped mode.
> >
> > Yes, we can optimize this on top.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > I see Michael has merge this series so I'm fine to let it go first.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > >
> >
>
Xuan Zhuo Aug. 16, 2023, 2:21 a.m. UTC | #12
On Wed, 16 Aug 2023 10:19:34 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > >
> > > > > > Hi, Jason
> > > > > >
> > > > > > Could you skip this patch?
> > > > >
> > > > > I'm fine with either merging or dropping this.
> > > > >
> > > > > >
> > > > > > Let we review other patches firstly?
> > > > >
> > > > > I will be on vacation soon, and won't have time to do this until next week.
> > > >
> > > > Have a happly vacation.
> > > >
> > > > >
> > > > > But I spot two possible "issues":
> > > > >
> > > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > > breaks frags coalescing, we need to benchmark it's impact
> > > >
> > > > Not every page, just the first page of the COMP pages.
> > > >
> > > > So I think there is no impact.
> > >
> > > Nope, see this:
> > >
> > >         if (SKB_FRAG_PAGE_ORDER &&
> > >             !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> > >                 /* Avoid direct reclaim but allow kswapd to wake */
> > >                 pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> > >                                           __GFP_COMP | __GFP_NOWARN |
> > >                                           __GFP_NORETRY,
> > >                                           SKB_FRAG_PAGE_ORDER);
> > >                 if (likely(pfrag->page)) {
> > >                         pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> > >                         return true;
> > >                 }
> > >         }
> > >
> > > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > > net_high_order_alloc_disable_key.
> >
> >
> > YES.
> >
> > But if comp page is disabled. Then we only get one page each time. The pages are
> > not contiguous, so we don't have frags coalescing.
> >
> > If you mean the two pages got from alloc_page may be contiguous. The coalescing
> > may then be broken. It's a possibility, but I think the impact will be small.
>
> Let's have a simple benchmark and see?


That is ok.

I think you want to know the perf num with big traffic and the comp page
disabled.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > >
> > > > > 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> > > >
> > > > Because that the tx is not the premapped mode.
> > >
> > > Yes, we can optimize this on top.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > I see Michael has merge this series so I'm fine to let it go first.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > >
> > >
> >
>
Jason Wang Aug. 16, 2023, 2:33 a.m. UTC | #13
On Wed, Aug 16, 2023 at 10:24 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 16 Aug 2023 10:19:34 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hi, Jason
> > > > > > >
> > > > > > > Could you skip this patch?
> > > > > >
> > > > > > I'm fine with either merging or dropping this.
> > > > > >
> > > > > > >
> > > > > > > Let we review other patches firstly?
> > > > > >
> > > > > > I will be on vacation soon, and won't have time to do this until next week.
> > > > >
> > > > > Have a happly vacation.
> > > > >
> > > > > >
> > > > > > But I spot two possible "issues":
> > > > > >
> > > > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > > > breaks frags coalescing, we need to benchmark it's impact
> > > > >
> > > > > Not every page, just the first page of the COMP pages.
> > > > >
> > > > > So I think there is no impact.
> > > >
> > > > Nope, see this:
> > > >
> > > >         if (SKB_FRAG_PAGE_ORDER &&
> > > >             !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> > > >                 /* Avoid direct reclaim but allow kswapd to wake */
> > > >                 pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> > > >                                           __GFP_COMP | __GFP_NOWARN |
> > > >                                           __GFP_NORETRY,
> > > >                                           SKB_FRAG_PAGE_ORDER);
> > > >                 if (likely(pfrag->page)) {
> > > >                         pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> > > >                         return true;
> > > >                 }
> > > >         }
> > > >
> > > > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > > > net_high_order_alloc_disable_key.
> > >
> > >
> > > YES.
> > >
> > > But if comp page is disabled. Then we only get one page each time. The pages are
> > > not contiguous, so we don't have frags coalescing.
> > >
> > > If you mean the two pages got from alloc_page may be contiguous. The coalescing
> > > may then be broken. It's a possibility, but I think the impact will be small.
> >
> > Let's have a simple benchmark and see?
>
>
> That is ok.
>
> I think you want to know the perf num with big traffic and the comp page
> disabled.

Yes.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > > 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> > > > >
> > > > > Because that the tx is not the premapped mode.
> > > >
> > > > Yes, we can optimize this on top.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > I see Michael has merge this series so I'm fine to let it go first.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Xuan Zhuo Aug. 16, 2023, 3:22 a.m. UTC | #14
On Wed, 16 Aug 2023 10:33:34 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Aug 16, 2023 at 10:24 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 16 Aug 2023 10:19:34 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi, Jason
> > > > > > > >
> > > > > > > > Could you skip this patch?
> > > > > > >
> > > > > > > I'm fine with either merging or dropping this.
> > > > > > >
> > > > > > > >
> > > > > > > > Let we review other patches firstly?
> > > > > > >
> > > > > > > I will be on vacation soon, and won't have time to do this until next week.
> > > > > >
> > > > > > Have a happly vacation.
> > > > > >
> > > > > > >
> > > > > > > But I spot two possible "issues":
> > > > > > >
> > > > > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > > > > breaks frags coalescing, we need to benchmark it's impact
> > > > > >
> > > > > > Not every page, just the first page of the COMP pages.
> > > > > >
> > > > > > So I think there is no impact.
> > > > >
> > > > > Nope, see this:
> > > > >
> > > > >         if (SKB_FRAG_PAGE_ORDER &&
> > > > >             !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> > > > >                 /* Avoid direct reclaim but allow kswapd to wake */
> > > > >                 pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> > > > >                                           __GFP_COMP | __GFP_NOWARN |
> > > > >                                           __GFP_NORETRY,
> > > > >                                           SKB_FRAG_PAGE_ORDER);
> > > > >                 if (likely(pfrag->page)) {
> > > > >                         pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> > > > >                         return true;
> > > > >                 }
> > > > >         }
> > > > >
> > > > > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > > > > net_high_order_alloc_disable_key.
> > > >
> > > >
> > > > YES.
> > > >
> > > > But if comp page is disabled. Then we only get one page each time. The pages are
> > > > not contiguous, so we don't have frags coalescing.
> > > >
> > > > If you mean the two pages got from alloc_page may be contiguous. The coalescing
> > > > may then be broken. It's a possibility, but I think the impact will be small.
> > >
> > > Let's have a simple benchmark and see?
> >
> >
> > That is ok.
> >
> > I think you want to know the perf num with big traffic and the comp page
> > disabled.
>
> Yes.


Hi,

Host:
	for ((i=0; i < 10; ++i)) do sockperf tp -i 192.168.122.100 -t 1000  -m 64000& done
Guest:
	03:23:12 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
	03:23:13 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
	03:23:13 AM      ens4  61848.00      1.00 3868036.73      0.58      0.00      0.00      0.00      0.00

	tcpdump:
		03:25:01.741563 IP 192.168.122.1.29693 > 192.168.122.100.11111: UDP, length 64000
		03:25:01.741580 IP 192.168.122.1.22239 > 192.168.122.100.11111: UDP, length 64000
		03:25:01.741623 IP 192.168.122.1.22396 > 192.168.122.100.11111: UDP, length 64000

The Guest CPU util is low, every packet is 64000. But the Host vhost process is
100%. So we can not judge by the traffic or the cpu of the Guest.

So I use the kernel without my patches 0635819decaf9d60e6cacfecfebfabe3cbdddafb.

I want to count the frags coalescing num when the comp page is disabled.

	$ sh -x test.sh
	+ sysctl -w net.core.high_order_alloc_disable=1
	net.core.high_order_alloc_disable = 1
	+ sysctl net.core.high_order_alloc_disable
	net.core.high_order_alloc_disable = 1
	+ sleep 5
	+ timeout 5 bpftrace -e 'kprobe: skb_coalesce_rx_frag{@[nsecs/1000/1000/1000]=count()}'
	Attaching 1 probe...



	+ sysctl -w net.core.high_order_alloc_disable=0
	net.core.high_order_alloc_disable = 0
	+ sysctl net.core.high_order_alloc_disable
	net.core.high_order_alloc_disable = 0
	+ sleep 5
	+ timeout 5 bpftrace -e 'kprobe: skb_coalesce_rx_frag{@[nsecs/1000/1000/1000]=count()}'
	Attaching 1 probe...


	@[356]: 167020
	@[361]: 673653
	@[359]: 900844
	@[360]: 912657
	@[358]: 915853
	@[357]: 932245


We can see that the skb_coalesce_rx_frag is not called when comp page is disabled.
If the comp page is enable, there will be many frags coalescing.

So I think that my change will not have impact.

Thanks.




>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > > 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> > > > > >
> > > > > > Because that the tx is not the premapped mode.
> > > > >
> > > > > Yes, we can optimize this on top.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > > I see Michael has merge this series so I'm fine to let it go first.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
diff mbox series

Patch

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f9f772e85a38..bb3d73d221cd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2265,6 +2265,23 @@  int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->use_dma_api)
+		return vring_dma_dev(vq);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8add38038877..bd55a05eec04 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@  int virtqueue_add_sgs(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);