mbox series

[V5,0/2] ublk: add io_uring based userspace block driver

Message ID 20220713140711.97356-1-ming.lei@redhat.com (mailing list archive)
Headers show
Series ublk: add io_uring based userspace block driver | expand

Message

Ming Lei July 13, 2022, 2:07 p.m. UTC
Hello Guys,

ublk driver is one kernel driver for implementing generic userspace block
device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
ublk server[1] which is the userspace part of ublk for communicating
with ublk driver and handling specific io logic by its target module.

Another thing ublk driver handles is to copy data between user space buffer
and request/bio's pages, or take zero copy if mm is ready for support it in
future. ublk driver doesn't handle any IO logic of the specific driver, so
it is small/simple, and all io logics are done by the target code in ublkserver.

The above two are main jobs done by ublk driver.

ublk driver can help to move IO logic into userspace, in which the
development work is easier/more effective than doing in kernel, such as,
ublk-loop takes < 200 lines of loop specific code to get basically same 
function with kernel loop block driver, meantime the performance is
is even better than kernel loop with same setting. ublksrv[1] provide built-in
test for comparing both by running "make test T=loop", for example, see
the test result running on VM which is over my lattop(root disk is
nvme/device mapper/xfs):

	[root@ktest-36 ubdsrv]#make -s -C /root/git/ubdsrv/tests run T=loop/001 R=10
	running loop/001
		fio (ublk/loop(/root/git/ubdsrv/tests/tmp/ublk_loop_VqbMA), libaio, bs 4k, dio, hw queues:1)...
		randwrite: jobs 1, iops 32572
		randread: jobs 1, iops 143052
		rw: jobs 1, iops read 29919 write 29964
	
	[root@ktest-36 ubdsrv]# make test T=loop/003
	make -s -C /root/git/ubdsrv/tests run T=loop/003 R=10
	running loop/003
		fio (kernel_loop/kloop(/root/git/ubdsrv/tests/tmp/ublk_loop_ZIVnG), libaio, bs 4k, dio, hw queues:1)...
		randwrite: jobs 1, iops 27436
		randread: jobs 1, iops 95273
		rw: jobs 1, iops read 22542 write 22543 


Another example is high performance qcow2 support[2], which could be built with
ublk framework more easily than doing it inside kernel.

Also there are more people who express interests on userspace block driver[3],
Gabriel Krisman Bertazi proposes this topic in lsf/mm/ebpf 2022 and mentioned
requirement from Google. Ziyang Zhang from Alibaba said they "plan to
replace TCMU by UBD as a new choice" because UBD can get better throughput than
TCMU even with single queue[4], meantime UBD is simple. Also there is userspace
storage service for providing storage to containers.

It is io_uring based: io request is delivered to userspace via new added
io_uring command which has been proved as very efficient for making nvme
passthrough IO to get better IOPS than io_uring(READ/WRITE). Meantime one
shared/mmap buffer is used for sharing io descriptor to userspace, the
buffer is readonly for userspace, each IO just takes 24bytes so far.
It is suggested to use io_uring in userspace(target part of ublk server)
to handle IO request too. And it is still easy for ublkserver to support
io handling by non-io_uring, and this work isn't done yet, but can be
supported easily with help o eventfd.

This way is efficient since no extra io command copy is required, no sleep
is needed in transferring io command to userspace. Meantime the communication
protocol is simple and efficient, one single command of
UBD_IO_COMMIT_AND_FETCH_REQ can handle both fetching io request desc and commit
command result in one trip. IO handling is often batched after single
io_uring_enter() returns, both IO requests from ublk server target and
IO commands could be handled as a whole batch.

And the patch by patch change can be found in the following
tree:

https://github.com/ming1/linux/tree/my_for-5.20-ubd-devel_v4

ublk server repo(master branch):

	https://github.com/ming1/ubdsrv

Any comments are welcome!

Since V4:
- drop patch of "ublk_drv: add UBLK_IO_REFETCH_REQ for supporting to build as module",
instead of using io_uring_cmd_complete_in_task for building driver as module

- simplify aborting code


Since V3:
- address Gabriel Krisman Bertazi's comments on V3: add userspace data
  validation before handling command, remove warning, ...
- remove UBLK_IO_COMMIT_REQ command as suggested by Zixiang and Gabriel Krisman Bertazi
- fix one request double free when running abort
- rewrite/cleanup ublk_copy_pages(), then this handling becomes very
  clean
- add one command of UBLK_IO_REFETCH_REQ for allowing ublk_drv to build
  as module

Since V2:
- fix one big performance problem:
	https://github.com/ming1/linux/commit/3c9fd476951759858cc548dee4cedc074194d0b0
- rename as ublk, as suggested by Gabriel Krisman Bertazi 
- lots of cleanup & code improvement & bugfix, see details in git
  hisotry


Since V1:

Remove RFC now because ublk driver codes gets lots of cleanup, enhancement and
bug fixes since V1:

- cleanup uapi: remove ublk specific error code,  switch to linux error code,
remove one command op, remove one field from cmd_desc

- add monitor mechanism to handle ubq_daemon being killed, ublksrv[1]
  includes builtin tests for covering heavy IO with deleting ublk / killing
  ubq_daemon at the same time, and V2 pass all the two tests(make test T=generic),
  and the abort/stop mechanism is simple

- fix MQ command buffer mmap bug, and now 'xfstetests -g auto' works well on
  MQ ublk-loop devices(test/scratch)

- improve batching submission as suggested by Jens

- improve handling for starting device, replace random wait/poll with
completion

- all kinds of cleanup, bug fix,..


Ming Lei (2):
  ublk_drv: add io_uring based userspace block driver
  ublk_drv: support to complete io command via task_work_add

 drivers/block/Kconfig         |    6 +
 drivers/block/Makefile        |    2 +
 drivers/block/ublk_drv.c      | 1589 +++++++++++++++++++++++++++++++++
 include/uapi/linux/ublk_cmd.h |  162 ++++
 4 files changed, 1759 insertions(+)
 create mode 100644 drivers/block/ublk_drv.c
 create mode 100644 include/uapi/linux/ublk_cmd.h

Comments

Jens Axboe July 13, 2022, 8:25 p.m. UTC | #1
On 7/13/22 8:07 AM, Ming Lei wrote:
> Hello Guys,
> 
> ublk driver is one kernel driver for implementing generic userspace block
> device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
> ublk server[1] which is the userspace part of ublk for communicating
> with ublk driver and handling specific io logic by its target module.

Ming, is this ready to get merged in an experimental state?
Ming Lei July 14, 2022, 12:19 a.m. UTC | #2
On Wed, Jul 13, 2022 at 02:25:25PM -0600, Jens Axboe wrote:
> On 7/13/22 8:07 AM, Ming Lei wrote:
> > Hello Guys,
> > 
> > ublk driver is one kernel driver for implementing generic userspace block
> > device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
> > ublk server[1] which is the userspace part of ublk for communicating
> > with ublk driver and handling specific io logic by its target module.
> 
> Ming, is this ready to get merged in an experimental state?

Hi Jens,

Yeah, I think so.

IO path can survive in xfstests(-g auto), and control path works
well in ublksrv builtin hotplug & 'kill -9' daemon test.

The UAPI data size should be good, but definition may change per
future requirement change, so I think it is ready to go as
experimental.

If you are fine, please add the following delta change into patch 1,
or let me know if resend is needed.


diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 2ba77fd960c2..e19fcab016ba 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -409,10 +409,13 @@ config BLK_DEV_RBD
 	  If unsure, say N.
 
 config BLK_DEV_UBLK
-	tristate "Userspace block driver"
+	tristate "Userspace block driver (Experimental)"
 	select IO_URING
 	help
-          io uring based userspace block driver.
+	  io_uring based userspace block driver. Together with ublk server, ublk
+	  has been working well, but interface with userspace or command data
+	  definition isn't finalized yet, and might change according to future
+	  requirement, so mark is as experimental now.
 
 source "drivers/block/rnbd/Kconfig"
 


Thanks,
Ming
Jens Axboe July 14, 2022, 2:54 a.m. UTC | #3
On 7/13/22 6:19 PM, Ming Lei wrote:
> On Wed, Jul 13, 2022 at 02:25:25PM -0600, Jens Axboe wrote:
>> On 7/13/22 8:07 AM, Ming Lei wrote:
>>> Hello Guys,
>>>
>>> ublk driver is one kernel driver for implementing generic userspace block
>>> device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
>>> ublk server[1] which is the userspace part of ublk for communicating
>>> with ublk driver and handling specific io logic by its target module.
>>
>> Ming, is this ready to get merged in an experimental state?
> 
> Hi Jens,
> 
> Yeah, I think so.
> 
> IO path can survive in xfstests(-g auto), and control path works
> well in ublksrv builtin hotplug & 'kill -9' daemon test.
> 
> The UAPI data size should be good, but definition may change per
> future requirement change, so I think it is ready to go as
> experimental.

OK let's give it a go then. I tried it out and it seems to work for me,
even if the shutdown-while-busy is something I'd to look into a bit
more.

BTW, did notice a typo on the github page:

2) dependency
- liburing with IORING_SETUP_SQE128 support

- linux kernel 5.9(IORING_SETUP_SQE128 support)

that should be 5.19, typo.
Jens Axboe July 14, 2022, 2:54 a.m. UTC | #4
On Wed, 13 Jul 2022 22:07:09 +0800, Ming Lei wrote:
> ublk driver is one kernel driver for implementing generic userspace block
> device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
> ublk server[1] which is the userspace part of ublk for communicating
> with ublk driver and handling specific io logic by its target module.
> 
> Another thing ublk driver handles is to copy data between user space buffer
> and request/bio's pages, or take zero copy if mm is ready for support it in
> future. ublk driver doesn't handle any IO logic of the specific driver, so
> it is small/simple, and all io logics are done by the target code in ublkserver.
> 
> [...]

Applied, thanks!

[1/2] ublk_drv: add io_uring based userspace block driver
      commit: 3fee8d7599e17fe17ef6c1b96e2237babe8b68ea
[2/2] ublk_drv: support to complete io command via task_work_add
      commit: 664ff52d6f338a9afcabee535e8dedf04659f0d6

Best regards,
Jens Axboe July 14, 2022, 2:59 a.m. UTC | #5
On 7/13/22 8:54 PM, Jens Axboe wrote:
> On 7/13/22 6:19 PM, Ming Lei wrote:
>> On Wed, Jul 13, 2022 at 02:25:25PM -0600, Jens Axboe wrote:
>>> On 7/13/22 8:07 AM, Ming Lei wrote:
>>>> Hello Guys,
>>>>
>>>> ublk driver is one kernel driver for implementing generic userspace block
>>>> device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
>>>> ublk server[1] which is the userspace part of ublk for communicating
>>>> with ublk driver and handling specific io logic by its target module.
>>>
>>> Ming, is this ready to get merged in an experimental state?
>>
>> Hi Jens,
>>
>> Yeah, I think so.
>>
>> IO path can survive in xfstests(-g auto), and control path works
>> well in ublksrv builtin hotplug & 'kill -9' daemon test.
>>
>> The UAPI data size should be good, but definition may change per
>> future requirement change, so I think it is ready to go as
>> experimental.
> 
> OK let's give it a go then. I tried it out and it seems to work for me,
> even if the shutdown-while-busy is something I'd to look into a bit
> more.
> 
> BTW, did notice a typo on the github page:
> 
> 2) dependency
> - liburing with IORING_SETUP_SQE128 support
> 
> - linux kernel 5.9(IORING_SETUP_SQE128 support)
> 
> that should be 5.19, typo.

I tried this:

axboe@m1pro-kvm ~/g/ubdsrv (master)> sudo ./ublk add -t loop /dev/nvme0n1
axboe@m1pro-kvm ~/g/ubdsrv (master) [255]> 

and got this dump:

[   34.041647] WARNING: CPU: 3 PID: 60 at block/blk-mq.c:3880 blk_mq_release+0xa4/0xf0
[   34.043858] Modules linked in:
[   34.044911] CPU: 3 PID: 60 Comm: kworker/3:1 Not tainted 5.19.0-rc6-00320-g5c37a506da31 #1608
[   34.047689] Hardware name: linux,dummy-virt (DT)
[   34.049207] Workqueue: events blkg_free_workfn
[   34.050731] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   34.053026] pc : blk_mq_release+0xa4/0xf0
[   34.054360] lr : blk_mq_release+0x44/0xf0
[   34.055694] sp : ffff80000b16bcb0
[   34.056804] x29: ffff80000b16bcb0 x28: 0000000000000000 x27: 0000000000000000
[   34.059135] x26: 0000000000000000 x25: ffff00001fe9bb05 x24: 0000000000000000
[   34.061454] x23: ffff000005062eb8 x22: ffff000004608998 x21: 0000000000000000
[   34.063775] x20: ffff000004608a50 x19: ffff000004608950 x18: ffff80000b7b3c88
[   34.066085] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   34.068410] x14: 0000000000000002 x13: 0000000000013638 x12: 0000000000000000
[   34.070715] x11: ffff80000945b7e8 x10: 0000000000006f2e x9 : 00000000ffffffff
[   34.073037] x8 : ffff800008fb5000 x7 : ffff80000860cf28 x6 : 0000000000000000
[   34.075334] x5 : 0000000000000000 x4 : 0000000000000028 x3 : ffff80000b16bc14
[   34.077650] x2 : ffff0000086d66a8 x1 : ffff0000086d66a8 x0 : ffff0000086d6400
[   34.079966] Call trace:
[   34.080789]  blk_mq_release+0xa4/0xf0
[   34.081811]  blk_release_queue+0x58/0xa0
[   34.082758]  kobject_put+0x84/0xe0
[   34.083590]  blk_put_queue+0x10/0x18
[   34.084468]  blkg_free_workfn+0x58/0x84
[   34.085511]  process_one_work+0x2ac/0x438
[   34.086449]  worker_thread+0x1cc/0x264
[   34.087322]  kthread+0xd0/0xe0
[   34.088053]  ret_from_fork+0x10/0x20
Ming Lei July 14, 2022, 5:30 a.m. UTC | #6
On Wed, Jul 13, 2022 at 08:59:16PM -0600, Jens Axboe wrote:
> On 7/13/22 8:54 PM, Jens Axboe wrote:
> > On 7/13/22 6:19 PM, Ming Lei wrote:
> >> On Wed, Jul 13, 2022 at 02:25:25PM -0600, Jens Axboe wrote:
> >>> On 7/13/22 8:07 AM, Ming Lei wrote:
> >>>> Hello Guys,
> >>>>
> >>>> ublk driver is one kernel driver for implementing generic userspace block
> >>>> device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
> >>>> ublk server[1] which is the userspace part of ublk for communicating
> >>>> with ublk driver and handling specific io logic by its target module.
> >>>
> >>> Ming, is this ready to get merged in an experimental state?
> >>
> >> Hi Jens,
> >>
> >> Yeah, I think so.
> >>
> >> IO path can survive in xfstests(-g auto), and control path works
> >> well in ublksrv builtin hotplug & 'kill -9' daemon test.
> >>
> >> The UAPI data size should be good, but definition may change per
> >> future requirement change, so I think it is ready to go as
> >> experimental.
> > 
> > OK let's give it a go then. I tried it out and it seems to work for me,
> > even if the shutdown-while-busy is something I'd to look into a bit
> > more.
> > 
> > BTW, did notice a typo on the github page:
> > 
> > 2) dependency
> > - liburing with IORING_SETUP_SQE128 support
> > 
> > - linux kernel 5.9(IORING_SETUP_SQE128 support)
> > 
> > that should be 5.19, typo.
> 
> I tried this:
> 
> axboe@m1pro-kvm ~/g/ubdsrv (master)> sudo ./ublk add -t loop /dev/nvme0n1
> axboe@m1pro-kvm ~/g/ubdsrv (master) [255]> 

That looks one issue in ubdsrv, and '-f /dev/nvme0n1' is needed.

> 
> and got this dump:
> 
> [   34.041647] WARNING: CPU: 3 PID: 60 at block/blk-mq.c:3880 blk_mq_release+0xa4/0xf0
> [   34.043858] Modules linked in:
> [   34.044911] CPU: 3 PID: 60 Comm: kworker/3:1 Not tainted 5.19.0-rc6-00320-g5c37a506da31 #1608
> [   34.047689] Hardware name: linux,dummy-virt (DT)
> [   34.049207] Workqueue: events blkg_free_workfn
> [   34.050731] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   34.053026] pc : blk_mq_release+0xa4/0xf0
> [   34.054360] lr : blk_mq_release+0x44/0xf0
> [   34.055694] sp : ffff80000b16bcb0
> [   34.056804] x29: ffff80000b16bcb0 x28: 0000000000000000 x27: 0000000000000000
> [   34.059135] x26: 0000000000000000 x25: ffff00001fe9bb05 x24: 0000000000000000
> [   34.061454] x23: ffff000005062eb8 x22: ffff000004608998 x21: 0000000000000000
> [   34.063775] x20: ffff000004608a50 x19: ffff000004608950 x18: ffff80000b7b3c88
> [   34.066085] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [   34.068410] x14: 0000000000000002 x13: 0000000000013638 x12: 0000000000000000
> [   34.070715] x11: ffff80000945b7e8 x10: 0000000000006f2e x9 : 00000000ffffffff
> [   34.073037] x8 : ffff800008fb5000 x7 : ffff80000860cf28 x6 : 0000000000000000
> [   34.075334] x5 : 0000000000000000 x4 : 0000000000000028 x3 : ffff80000b16bc14
> [   34.077650] x2 : ffff0000086d66a8 x1 : ffff0000086d66a8 x0 : ffff0000086d6400
> [   34.079966] Call trace:
> [   34.080789]  blk_mq_release+0xa4/0xf0
> [   34.081811]  blk_release_queue+0x58/0xa0
> [   34.082758]  kobject_put+0x84/0xe0
> [   34.083590]  blk_put_queue+0x10/0x18
> [   34.084468]  blkg_free_workfn+0x58/0x84
> [   34.085511]  process_one_work+0x2ac/0x438
> [   34.086449]  worker_thread+0x1cc/0x264
> [   34.087322]  kthread+0xd0/0xe0
> [   34.088053]  ret_from_fork+0x10/0x20

I guess there should be some validation missed in driver side too, will
look into it.


Thanks,
Ming
Gabriel Krisman Bertazi July 14, 2022, 2:41 p.m. UTC | #7
Ming Lei <ming.lei@redhat.com> writes:

> ublk driver is one kernel driver for implementing generic userspace block
> device/driver, which delivers io request from ublk block device(/dev/ublkbN) into
> ublk server[1] which is the userspace part of ublk for communicating
> with ublk driver and handling specific io logic by its target module.

Hey Ming,

I didn't get a chance to look deep into v5 as I was on a last minute
leave in the past few days.  Either way, I went through them now and the
patches look good to me.  I'm quite happy they are merged, thank you
very much for this work.

Just for ML archive purposes, the entire series is

Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>

:)
Geert Uytterhoeven July 19, 2022, 10:15 a.m. UTC | #8
Hi Ming,

Thanks for your patch!

On Thu, Jul 14, 2022 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -409,10 +409,13 @@ config BLK_DEV_RBD
>           If unsure, say N.
>
>  config BLK_DEV_UBLK
> -       tristate "Userspace block driver"
> +       tristate "Userspace block driver (Experimental)"
>         select IO_URING
>         help
> -          io uring based userspace block driver.
> +         io_uring based userspace block driver. Together with ublk server, ublk
> +         has been working well, but interface with userspace or command data
> +         definition isn't finalized yet, and might change according to future
> +         requirement, so mark is as experimental now.

it

>
>  source "drivers/block/rnbd/Kconfig"

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds