mbox series

[RFC,0/1] vhost-blk: in-kernel accelerator for virtio-blk guests

Message ID 20220725202753.298725-1-andrey.zhadchenko@virtuozzo.com (mailing list archive)
Headers show
Series vhost-blk: in-kernel accelerator for virtio-blk guests | expand

Message

Andrey Zhadchenko July 25, 2022, 8:27 p.m. UTC
Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk requests
in host kernel so we avoid a lot of syscalls and context switches.
The idea is quite simple - QEMU gives us block device and we translate
any incoming virtio requests into bio and push them into bdev.
The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html

Also by using kernel modules we can bypass iothread limitation and finaly scale
block requests with cpus for high-performance devices.


There have already been several attempts to write vhost-blk:

Asias' version: https://lkml.org/lkml/2012/12/1/174
Badari's version: https://lwn.net/Articles/379864/
Vitaly's https://lwn.net/Articles/770965/

The main difference between them is API to access backend file. The fastest
one is Asias's version with bio flavor. It is also the most reviewed and
have the most features. So his module is partially based on it. Multiple
virtqueue support was addded, some places reworked.

test setup and results:
fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
QEMU drive options: cache=none
filesystem: xfs

SSD:
               | randread, IOPS  | randwrite, IOPS |
Host           |      95.8k	 |	85.3k	   |
QEMU virtio    |      57.5k	 |	79.4k	   |
QEMU vhost-blk |      95.6k	 |	84.3k	   |

RAMDISK (vq == vcpu):
                 | randread, IOPS | randwrite, IOPS |
virtio, 1vcpu    |	123k	  |	 129k       |
virtio, 2vcpu    |	253k (??) |	 250k (??)  |
virtio, 4vcpu    |	158k	  |	 154k       |
vhost-blk, 1vcpu |	110k	  |	 113k       |
vhost-blk, 2vcpu |	247k	  |	 252k       |
vhost-blk, 4vcpu |	576k	  |	 567k       |

Major features planned for the next versions:
 - DISCARD\WRITE_ZEROES support
 - multiple vhost workers

Andrey Zhadchenko (1):
  drivers/vhost: vhost-blk accelerator for virtio-blk guests

 drivers/vhost/Kconfig      |  12 +
 drivers/vhost/Makefile     |   3 +
 drivers/vhost/blk.c        | 831 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vhost.h |   5 +
 4 files changed, 851 insertions(+)
 create mode 100644 drivers/vhost/blk.c

Comments

Andrey Zhadchenko July 25, 2022, 9:22 p.m. UTC | #1
Corresponding userspace qemu vhost-blk code:
https://lists.nongnu.org/archive/html/qemu-block/2022-07/msg00629.html

On 7/25/22 23:27, Andrey Zhadchenko wrote:
> Although QEMU virtio-blk is quite fast, there is still some room for
> improvements. Disk latency can be reduced if we handle virito-blk requests
> in host kernel so we avoid a lot of syscalls and context switches.
> The idea is quite simple - QEMU gives us block device and we translate
> any incoming virtio requests into bio and push them into bdev.
> The biggest disadvantage of this vhost-blk flavor is raw format.
> Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
> files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html
> 
> Also by using kernel modules we can bypass iothread limitation and finaly scale
> block requests with cpus for high-performance devices.
> 
> 
> There have already been several attempts to write vhost-blk:
> 
> Asias' version: https://lkml.org/lkml/2012/12/1/174
> Badari's version: https://lwn.net/Articles/379864/
> Vitaly's https://lwn.net/Articles/770965/
> 
> The main difference between them is API to access backend file. The fastest
> one is Asias's version with bio flavor. It is also the most reviewed and
> have the most features. So his module is partially based on it. Multiple
                          "So this module..."
> virtqueue support was addded, some places reworked.
> 
> test setup and results:
> fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
> QEMU drive options: cache=none
> filesystem: xfs
> 
> SSD:
>                 | randread, IOPS  | randwrite, IOPS |
> Host           |      95.8k	 |	85.3k	   |
> QEMU virtio    |      57.5k	 |	79.4k	   |
> QEMU vhost-blk |      95.6k	 |	84.3k	   |
> 
> RAMDISK (vq == vcpu):
>                   | randread, IOPS | randwrite, IOPS |
> virtio, 1vcpu    |	123k	  |	 129k       |
> virtio, 2vcpu    |	253k (??) |	 250k (??)  |
> virtio, 4vcpu    |	158k	  |	 154k       |
> vhost-blk, 1vcpu |	110k	  |	 113k       |
> vhost-blk, 2vcpu |	247k	  |	 252k       |
> vhost-blk, 4vcpu |	576k	  |	 567k       |
> 
> Major features planned for the next versions:
>   - DISCARD\WRITE_ZEROES support
>   - multiple vhost workers
> 
> Andrey Zhadchenko (1):
>    drivers/vhost: vhost-blk accelerator for virtio-blk guests
> 
>   drivers/vhost/Kconfig      |  12 +
>   drivers/vhost/Makefile     |   3 +
>   drivers/vhost/blk.c        | 831 +++++++++++++++++++++++++++++++++++++
>   include/uapi/linux/vhost.h |   5 +
>   4 files changed, 851 insertions(+)
>   create mode 100644 drivers/vhost/blk.c
>