mbox series

[RFC,v2,00/10] vhost-blk: in-kernel accelerator for virtio-blk guests

Message ID 20221013151839.689700-1-andrey.zhadchenko@virtuozzo.com (mailing list archive)
Headers show
Series vhost-blk: in-kernel accelerator for virtio-blk guests | expand

Message

Andrey Zhadchenko Oct. 13, 2022, 3:18 p.m. UTC
As there is some interest from QEMU userspace, I am sending second version
of this patchet.

Main addition is a few patches about vhost multithreading so vhost-blk can
be scaled. Generally the idea is not a new one - somehow attach workers to
the virtqueues and do the work on them.

I have seen several previous attemps like cgroup-aware worker pools or the
userspace threads, but they seem very complicated and involve a lot of
subsystems. Probably just spawning a few more vhost threads can do a good
job.

As this is RFC, I did not convert any vhost users except vhost_blk. If
anyone is interested in this regarding other modules, please tell me.
I can test it to see if it is beneficial and maybe send multithreading
separately.
Also multithreading part may eventually be of help with vdpa-blk.

---

Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk requests
in host kernel so we avoid a lot of syscalls and context switches.
The idea is quite simple - QEMU gives us block device and we translate
any incoming virtio requests into bio and push them into bdev.
The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html

Also by using kernel modules we can bypass iothread limitation and finaly scale
block requests with cpus for high-performance devices.


There have already been several attempts to write vhost-blk:

Asias' version: https://lkml.org/lkml/2012/12/1/174
Badari's version: https://lwn.net/Articles/379864/
Vitaly's https://lwn.net/Articles/770965/

The main difference between them is API to access backend file. The fastest
one is Asias's version with bio flavor. It is also the most reviewed and
have the most features. So vhost_blk module is partially based on it. Multiple
virtqueue support was addded, some places reworked. Added support for several
vhost workers.

test setup and results:
fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
QEMU drive options: cache=none
filesystem: xfs

SSD:
               | randread, IOPS  | randwrite, IOPS |
Host           |      95.8k	 |	85.3k	   |
QEMU virtio    |      61.5k	 |	79.9k	   |
QEMU vhost-blk |      95.6k	 |	84.3k	   |

RAMDISK (vq == vcpu == numjobs):
                 | randread, IOPS | randwrite, IOPS |
virtio, 1vcpu    |	133k	  |	 133k       |
virtio, 2vcpu    |	305k      |	 306k       |
virtio, 4vcpu    |	310k	  |	 298k       |
virtio, 8vcpu    |	271k      |	 252k       |
vhost-blk, 1vcpu |	110k	  |	 113k       |
vhost-blk, 2vcpu |	247k	  |	 252k       |
vhost-blk, 4vcpu |	558k	  |	 556k       |
vhost-blk, 8vcpu |	576k	  |	 575k       | *single kernel thread
vhost-blk, 8vcpu |      803k      |      779k       | *two kernel threads

v2:
Re-measured virtio performance with aio=threads and iothread on latest QEMU

vhost-blk changes:
 - removed unused VHOST_BLK_VQ
 - reworked bio handling a bit: now add all pages from signle iov into
bio until it is full istead of allocating one bio per page
 - changed how to calculate sector incrementation
 - check move_iovec() in vhost_blk_req_handle()
 - remove snprintf check and better check ret from copy_to_iter for
VIRTIO_BLK_ID_BYTES requests
 - discard vq request if vhost_blk_req_handle() returned negative code
 - forbid to change nonzero backend in vhost_blk_set_backend(). First of
all, QEMU sets backend only once. Also if we want to change backend when
we already running requests we need to be much more careful in
vhost_blk_handle_guest_kick() as it is not taking any references. If
userspace want to change backend that bad it can always reset device.
 - removed EXPERIMENTAL from Kconfig

Andrey Zhadchenko (10):
  drivers/vhost: vhost-blk accelerator for virtio-blk guests
  drivers/vhost: use array to store workers
  drivers/vhost: adjust vhost to flush all workers
  drivers/vhost: rework cgroups attachment to be worker aware
  drivers/vhost: rework worker creation
  drivers/vhost: add ioctl to increase the number of workers
  drivers/vhost: assign workers to virtqueues
  drivers/vhost: add API to queue work at virtqueue's worker
  drivers/vhost: allow polls to be bound to workers via vqs
  drivers/vhost: queue vhost_blk works at vq workers

 drivers/vhost/Kconfig      |  12 +
 drivers/vhost/Makefile     |   3 +
 drivers/vhost/blk.c        | 819 +++++++++++++++++++++++++++++++++++++
 drivers/vhost/vhost.c      | 263 +++++++++---
 drivers/vhost/vhost.h      |  21 +-
 include/uapi/linux/vhost.h |  13 +
 6 files changed, 1064 insertions(+), 67 deletions(-)
 create mode 100644 drivers/vhost/blk.c

Comments

Andrey Zhadchenko Oct. 13, 2022, 5:33 p.m. UTC | #1
Sorry, I misspelled your email while sending the patchset.
Linking you back as was originally intended

Kind regards, Andrey

On 10/13/22 18:18, Andrey Zhadchenko wrote:
> As there is some interest from QEMU userspace, I am sending second version
> of this patchet.
> 
> Main addition is a few patches about vhost multithreading so vhost-blk can
> be scaled. Generally the idea is not a new one - somehow attach workers to
> the virtqueues and do the work on them.
> 
> I have seen several previous attemps like cgroup-aware worker pools or the
> userspace threads, but they seem very complicated and involve a lot of
> subsystems. Probably just spawning a few more vhost threads can do a good
> job.
> 
> As this is RFC, I did not convert any vhost users except vhost_blk. If
> anyone is interested in this regarding other modules, please tell me.
> I can test it to see if it is beneficial and maybe send multithreading
> separately.
> Also multithreading part may eventually be of help with vdpa-blk.
> 
> ---
> 
> Although QEMU virtio-blk is quite fast, there is still some room for
> improvements. Disk latency can be reduced if we handle virito-blk requests
> in host kernel so we avoid a lot of syscalls and context switches.
> The idea is quite simple - QEMU gives us block device and we translate
> any incoming virtio requests into bio and push them into bdev.
> The biggest disadvantage of this vhost-blk flavor is raw format.
> Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
> files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html
> 
> Also by using kernel modules we can bypass iothread limitation and finaly scale
> block requests with cpus for high-performance devices.
> 
> 
> There have already been several attempts to write vhost-blk:
> 
> Asias' version: https://lkml.org/lkml/2012/12/1/174
> Badari's version: https://lwn.net/Articles/379864/
> Vitaly's https://lwn.net/Articles/770965/
> 
> The main difference between them is API to access backend file. The fastest
> one is Asias's version with bio flavor. It is also the most reviewed and
> have the most features. So vhost_blk module is partially based on it. Multiple
> virtqueue support was addded, some places reworked. Added support for several
> vhost workers.
> 
> test setup and results:
> fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
> QEMU drive options: cache=none
> filesystem: xfs
> 
> SSD:
>                 | randread, IOPS  | randwrite, IOPS |
> Host           |      95.8k	 |	85.3k	   |
> QEMU virtio    |      61.5k	 |	79.9k	   |
> QEMU vhost-blk |      95.6k	 |	84.3k	   |
> 
> RAMDISK (vq == vcpu == numjobs):
>                   | randread, IOPS | randwrite, IOPS |
> virtio, 1vcpu    |	133k	  |	 133k       |
> virtio, 2vcpu    |	305k      |	 306k       |
> virtio, 4vcpu    |	310k	  |	 298k       |
> virtio, 8vcpu    |	271k      |	 252k       |
> vhost-blk, 1vcpu |	110k	  |	 113k       |
> vhost-blk, 2vcpu |	247k	  |	 252k       |
> vhost-blk, 4vcpu |	558k	  |	 556k       |
> vhost-blk, 8vcpu |	576k	  |	 575k       | *single kernel thread
> vhost-blk, 8vcpu |      803k      |      779k       | *two kernel threads
> 
> v2:
> Re-measured virtio performance with aio=threads and iothread on latest QEMU
> 
> vhost-blk changes:
>   - removed unused VHOST_BLK_VQ
>   - reworked bio handling a bit: now add all pages from signle iov into
> bio until it is full istead of allocating one bio per page
>   - changed how to calculate sector incrementation
>   - check move_iovec() in vhost_blk_req_handle()
>   - remove snprintf check and better check ret from copy_to_iter for
> VIRTIO_BLK_ID_BYTES requests
>   - discard vq request if vhost_blk_req_handle() returned negative code
>   - forbid to change nonzero backend in vhost_blk_set_backend(). First of
> all, QEMU sets backend only once. Also if we want to change backend when
> we already running requests we need to be much more careful in
> vhost_blk_handle_guest_kick() as it is not taking any references. If
> userspace want to change backend that bad it can always reset device.
>   - removed EXPERIMENTAL from Kconfig
> 
> Andrey Zhadchenko (10):
>    drivers/vhost: vhost-blk accelerator for virtio-blk guests
>    drivers/vhost: use array to store workers
>    drivers/vhost: adjust vhost to flush all workers
>    drivers/vhost: rework cgroups attachment to be worker aware
>    drivers/vhost: rework worker creation
>    drivers/vhost: add ioctl to increase the number of workers
>    drivers/vhost: assign workers to virtqueues
>    drivers/vhost: add API to queue work at virtqueue's worker
>    drivers/vhost: allow polls to be bound to workers via vqs
>    drivers/vhost: queue vhost_blk works at vq workers
> 
>   drivers/vhost/Kconfig      |  12 +
>   drivers/vhost/Makefile     |   3 +
>   drivers/vhost/blk.c        | 819 +++++++++++++++++++++++++++++++++++++
>   drivers/vhost/vhost.c      | 263 +++++++++---
>   drivers/vhost/vhost.h      |  21 +-
>   include/uapi/linux/vhost.h |  13 +
>   6 files changed, 1064 insertions(+), 67 deletions(-)
>   create mode 100644 drivers/vhost/blk.c
>