mbox series

[0/1] vhost: parallel virtqueue handling

Message ID 20181102160710.3741-1-v.mayatskih@gmail.com (mailing list archive)
Headers show
Series vhost: parallel virtqueue handling | expand

Message

Vitaly Mayatskih Nov. 2, 2018, 4:07 p.m. UTC
Hi,

I stumbled across poor performance of virtio-blk while working on a
high-performance network storage protocol. Moving virtio-blk's host
side to kernel did increase single queue IOPS, but multiqueue disk
still was not scaling well. It turned out that vhost handles events
from all virtio queues in one helper thread, and that's pretty much a
big serialization point.

The following patch enables events handling in per-queue thread and
increases IO concurrency, see IOPS numbers:

# num-queues
# bare metal
# virtio-blk
# vhost-blk

1  171k  148k 195k 
2  328k  249k 349k 
3  479k  179k 501k 
4  622k  143k 620k 
5  755k  136k 737k 
6  887k  131k 830k 
7  1004k 126k 926k 
8  1099k 117k 1001k
9  1194k 115k 1055k
10 1278k 109k 1130k
11 1345k 110k 1119k
12 1411k 104k 1201k
13 1466k 106k 1260k
14 1517k 103k 1296k
15 1552k 102k 1322k
16 1480k 101k 1346k

Vitaly Mayatskikh (1):
  vhost: add per-vq worker thread

 drivers/vhost/vhost.c | 123 +++++++++++++++++++++++++++++++-----------
 drivers/vhost/vhost.h |  11 +++-
 2 files changed, 100 insertions(+), 34 deletions(-)

Comments

Jason Wang Nov. 5, 2018, 2:51 a.m. UTC | #1
On 2018/11/3 上午12:07, Vitaly Mayatskikh wrote:
> Hi,
>
> I stumbled across poor performance of virtio-blk while working on a
> high-performance network storage protocol. Moving virtio-blk's host
> side to kernel did increase single queue IOPS, but multiqueue disk
> still was not scaling well. It turned out that vhost handles events
> from all virtio queues in one helper thread, and that's pretty much a
> big serialization point.
>
> The following patch enables events handling in per-queue thread and
> increases IO concurrency, see IOPS numbers:


Thanks a lot for the patches. Here's some thoughts:

- This is not the first attempt that tries to parallelize vhost workers. 
So we need a comparing among them.

1) Multiple vhost workers from Anthony, 
https://www.spinics.net/lists/netdev/msg189432.html

2) ELVIS from IBM, http://www.mulix.org/pubs/eli/elvis-h319.pdf

3) CMWQ from Bandan, 
http://www.linux-kvm.org/images/5/52/02x08-Aspen-Bandan_Das-vhost-sharing_is_better.pdf

- vhost-net use a different multiqueue model. Each vhost device on host 
is only dealing with a specific queue pair instead of a whole device. 
This allow great flexibility and multiqueue could be implemented without 
touching vhost codes.

- current vhost-net implementation depends heavily on the assumption of 
single thread model especially its busy polling code. It would be broken 
by this attempt. If we decide to go this way, this needs to be fixed. 
And we do need performance result of networking.

- Having more threads is not necessarily a win, at least we need a 
module parameter to other stuffs to control the number of threads I 
believe.


Thanks


>
> # num-queues
> # bare metal
> # virtio-blk
> # vhost-blk
>
> 1  171k  148k 195k
> 2  328k  249k 349k
> 3  479k  179k 501k
> 4  622k  143k 620k
> 5  755k  136k 737k
> 6  887k  131k 830k
> 7  1004k 126k 926k
> 8  1099k 117k 1001k
> 9  1194k 115k 1055k
> 10 1278k 109k 1130k
> 11 1345k 110k 1119k
> 12 1411k 104k 1201k
> 13 1466k 106k 1260k
> 14 1517k 103k 1296k
> 15 1552k 102k 1322k
> 16 1480k 101k 1346k
>
> Vitaly Mayatskikh (1):
>    vhost: add per-vq worker thread
>
>   drivers/vhost/vhost.c | 123 +++++++++++++++++++++++++++++++-----------
>   drivers/vhost/vhost.h |  11 +++-
>   2 files changed, 100 insertions(+), 34 deletions(-)
>
Vitaly Mayatskih Nov. 5, 2018, 3:40 a.m. UTC | #2
On Sun, Nov 4, 2018 at 9:52 PM Jason Wang <jasowang@redhat.com> wrote:

> Thanks a lot for the patches. Here's some thoughts:
>
> - This is not the first attempt that tries to parallelize vhost workers.
> So we need a comparing among them.
>
> 1) Multiple vhost workers from Anthony,
> https://www.spinics.net/lists/netdev/msg189432.html
>
> 2) ELVIS from IBM, http://www.mulix.org/pubs/eli/elvis-h319.pdf
>
> 3) CMWQ from Bandan,
> http://www.linux-kvm.org/images/5/52/02x08-Aspen-Bandan_Das-vhost-sharing_is_better.pdf
>
> - vhost-net use a different multiqueue model. Each vhost device on host
> is only dealing with a specific queue pair instead of a whole device.
> This allow great flexibility and multiqueue could be implemented without
> touching vhost codes.

I'm no way a network expert, but I think this is because it follows a
combined queue model of the NIC. Having a TX/RX queues pair looks like
a natural choice for this case.

> - current vhost-net implementation depends heavily on the assumption of
> single thread model especially its busy polling code. It would be broken
> by this attempt. If we decide to go this way, this needs to be fixed.
> And we do need performance result of networking.

Thanks for noting that, I miss a lot of historical background. Will
check that up.

> - Having more threads is not necessarily a win, at least we need a
> module parameter to other stuffs to control the number of threads I
> believe.

I agree I didn't think fully about other cases, but for the disk it is
already under controll: QEMU's num-queues disk parameter.

There's a certain saturation point when adding more threads does not
yield lot more more performance. For my environment it's about 12
queues.

So, how does it sound: the default behaviour is 1 worker per vhost
device. If the user needs per-vq worker he does a new VHOST_SET_
ioctl?