mbox series

[v3,0/3] virtio support cache indirect desc

Message ID 20211029062814.76594-1-xuanzhuo@linux.alibaba.com (mailing list archive)
Headers show
Series virtio support cache indirect desc | expand

Message

Xuan Zhuo Oct. 29, 2021, 6:28 a.m. UTC
If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
of sgs used for sending packets is greater than 1. We must constantly
call __kmalloc/kfree to allocate/release desc.

In the case of extremely fast package delivery, the overhead cannot be
ignored:

  27.46%  [kernel]  [k] virtqueue_add
  16.66%  [kernel]  [k] detach_buf_split
  16.51%  [kernel]  [k] virtnet_xsk_xmit
  14.04%  [kernel]  [k] virtqueue_add_outbuf
   5.18%  [kernel]  [k] __kmalloc
   4.08%  [kernel]  [k] kfree
   2.80%  [kernel]  [k] virtqueue_get_buf_ctx
   2.22%  [kernel]  [k] xsk_tx_peek_desc
   2.08%  [kernel]  [k] memset_erms
   0.83%  [kernel]  [k] virtqueue_kick_prepare
   0.76%  [kernel]  [k] virtnet_xsk_run
   0.62%  [kernel]  [k] __free_old_xmit_ptr
   0.60%  [kernel]  [k] vring_map_one_sg
   0.53%  [kernel]  [k] native_apic_mem_write
   0.46%  [kernel]  [k] sg_next
   0.43%  [kernel]  [k] sg_init_table
   0.41%  [kernel]  [k] kmalloc_slab

This patch adds a cache function to virtio to cache these allocated indirect
desc instead of constantly allocating and releasing desc.

v3:
  pre-allocate per buffer indirect descriptors array

v2:
  use struct list_head to cache the desc

*** BLURB HERE ***

Xuan Zhuo (3):
  virtio: cache indirect desc for split
  virtio: cache indirect desc for packed
  virtio-net: enable virtio desc cache

 drivers/net/virtio_net.c     |  11 +++
 drivers/virtio/virtio.c      |   6 ++
 drivers/virtio/virtio_ring.c | 131 ++++++++++++++++++++++++++++++-----
 include/linux/virtio.h       |  14 ++++
 4 files changed, 145 insertions(+), 17 deletions(-)

--
2.31.0

Comments

Michael S. Tsirkin Jan. 6, 2022, 12:28 p.m. UTC | #1
On Fri, Oct 29, 2021 at 02:28:11PM +0800, Xuan Zhuo wrote:
> If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
> of sgs used for sending packets is greater than 1. We must constantly
> call __kmalloc/kfree to allocate/release desc.


So where is this going? I really like the performance boost. My concern
is that if guest spans NUMA nodes and when handler switches from
node to another this will keep reusing the cache from
the old node. A bunch of ways were suggested to address this, but
even just making the cache per numa node would help.


> In the case of extremely fast package delivery, the overhead cannot be
> ignored:
> 
>   27.46%  [kernel]  [k] virtqueue_add
>   16.66%  [kernel]  [k] detach_buf_split
>   16.51%  [kernel]  [k] virtnet_xsk_xmit
>   14.04%  [kernel]  [k] virtqueue_add_outbuf
>    5.18%  [kernel]  [k] __kmalloc
>    4.08%  [kernel]  [k] kfree
>    2.80%  [kernel]  [k] virtqueue_get_buf_ctx
>    2.22%  [kernel]  [k] xsk_tx_peek_desc
>    2.08%  [kernel]  [k] memset_erms
>    0.83%  [kernel]  [k] virtqueue_kick_prepare
>    0.76%  [kernel]  [k] virtnet_xsk_run
>    0.62%  [kernel]  [k] __free_old_xmit_ptr
>    0.60%  [kernel]  [k] vring_map_one_sg
>    0.53%  [kernel]  [k] native_apic_mem_write
>    0.46%  [kernel]  [k] sg_next
>    0.43%  [kernel]  [k] sg_init_table
>    0.41%  [kernel]  [k] kmalloc_slab
> 
> This patch adds a cache function to virtio to cache these allocated indirect
> desc instead of constantly allocating and releasing desc.
> 
> v3:
>   pre-allocate per buffer indirect descriptors array
> 
> v2:
>   use struct list_head to cache the desc
> 
> *** BLURB HERE ***
> 
> Xuan Zhuo (3):
>   virtio: cache indirect desc for split
>   virtio: cache indirect desc for packed
>   virtio-net: enable virtio desc cache
> 
>  drivers/net/virtio_net.c     |  11 +++
>  drivers/virtio/virtio.c      |   6 ++
>  drivers/virtio/virtio_ring.c | 131 ++++++++++++++++++++++++++++++-----
>  include/linux/virtio.h       |  14 ++++
>  4 files changed, 145 insertions(+), 17 deletions(-)
> 
> --
> 2.31.0
Michael S. Tsirkin Jan. 10, 2022, 1:50 p.m. UTC | #2
On Thu, Jan 06, 2022 at 08:48:59PM +0800, Xuan Zhuo wrote:
> On Thu, 6 Jan 2022 07:28:31 -0500, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Oct 29, 2021 at 02:28:11PM +0800, Xuan Zhuo wrote:
> > > If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
> > > of sgs used for sending packets is greater than 1. We must constantly
> > > call __kmalloc/kfree to allocate/release desc.
> >
> >
> > So where is this going? I really like the performance boost. My concern
> > is that if guest spans NUMA nodes and when handler switches from
> > node to another this will keep reusing the cache from
> > the old node. A bunch of ways were suggested to address this, but
> > even just making the cache per numa node would help.
> >
> 
> In fact, this is the problem I encountered in implementing virtio-net to support
> xdp socket. With virtqueue reset[0] has been merged into virtio spec. I
> am completing this series of work. My plan is:
> 
> 1. virtio support advance dma
> 2. linux kernel/qemu support virtqueue reset
> 3. virtio-net support AF_XDP
> 4. virtio support cache indirect desc
> 
> [0]: https://github.com/oasis-tcs/virtio-spec/issues/124
> 
> Thanks.

OK it's up to you how to prioritize your work.
An idea though: isn't there a way to reduce the use of indirect?
Even with all the caching, it is surely not free.
We made it work better in the past with:

commit e7428e95a06fb516fac1308bd0e176e27c0b9287
    ("virtio-net: put virtio-net header inline with data"). 
and
commit 6ebbc1a6383fe78be3c0961d1475043ac6cc2542
    virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true

can't something similar be done for XDP?



Another idea is to skip indirect even with s/g as number of outstanding
entries is small. The difficulty with this approach is that it has
to be tested across a large number of configurations, including
storage to make sure we don't cause regressions, unless we
are very conservative and only make a small % of entries direct.
Will doing that still help? It looks attractive on paper:
if guest starts outpacing host and ring begins to fill
up to more than say 10% then we switch to allocating indirect
entries which slows guest down.



> >
> > > In the case of extremely fast package delivery, the overhead cannot be
> > > ignored:
> > >
> > >   27.46%  [kernel]  [k] virtqueue_add
> > >   16.66%  [kernel]  [k] detach_buf_split
> > >   16.51%  [kernel]  [k] virtnet_xsk_xmit
> > >   14.04%  [kernel]  [k] virtqueue_add_outbuf
> > >    5.18%  [kernel]  [k] __kmalloc
> > >    4.08%  [kernel]  [k] kfree
> > >    2.80%  [kernel]  [k] virtqueue_get_buf_ctx
> > >    2.22%  [kernel]  [k] xsk_tx_peek_desc
> > >    2.08%  [kernel]  [k] memset_erms
> > >    0.83%  [kernel]  [k] virtqueue_kick_prepare
> > >    0.76%  [kernel]  [k] virtnet_xsk_run
> > >    0.62%  [kernel]  [k] __free_old_xmit_ptr
> > >    0.60%  [kernel]  [k] vring_map_one_sg
> > >    0.53%  [kernel]  [k] native_apic_mem_write
> > >    0.46%  [kernel]  [k] sg_next
> > >    0.43%  [kernel]  [k] sg_init_table
> > >    0.41%  [kernel]  [k] kmalloc_slab
> > >
> > > This patch adds a cache function to virtio to cache these allocated indirect
> > > desc instead of constantly allocating and releasing desc.
> > >
> > > v3:
> > >   pre-allocate per buffer indirect descriptors array
> > >
> > > v2:
> > >   use struct list_head to cache the desc
> > >
> > > *** BLURB HERE ***
> > >
> > > Xuan Zhuo (3):
> > >   virtio: cache indirect desc for split
> > >   virtio: cache indirect desc for packed
> > >   virtio-net: enable virtio desc cache
> > >
> > >  drivers/net/virtio_net.c     |  11 +++
> > >  drivers/virtio/virtio.c      |   6 ++
> > >  drivers/virtio/virtio_ring.c | 131 ++++++++++++++++++++++++++++++-----
> > >  include/linux/virtio.h       |  14 ++++
> > >  4 files changed, 145 insertions(+), 17 deletions(-)
> > >
> > > --
> > > 2.31.0
> >