mbox series

[RFC,0/2] nocopy bvec for direct IO

Message ID cover.1607477897.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series nocopy bvec for direct IO | expand

Message

Pavel Begunkov Dec. 9, 2020, 2:19 a.m. UTC
The idea is to avoid copying, merging, etc. bvec from iterator to bio
in direct I/O and use the one we've already got. Hook it up for io_uring.
Had an eye on it for a long, and it also was brought up by Matthew
just recently. Let me know if I forgot or misplaced some tags.

A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf
shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty
silly, but still imposing. I'll redo it closer to reality for next
iteration, anyway need to double check some cases.

If same applied to iomap, common chunck can be moved from block_dev
into bio_iov_iter_get_pages(), but if there any benefit for filesystems,
they should explicitly opt in with ITER_BVEC_FLAG_FIXED.

# how to apply
based on Jens' for-11/block
+ Ming's nr_vec patch,
+ io_uring fix, 9c3a205c5ffa36e96903c2 ("io_uring: fix ITER_BVEC check")

or there:
https://github.com/isilence/linux/commits/bvec_nocopy

# how to reproduce
null_blk queue_mode=2 completion_nsec=0 submit_queues=NUM_CPU
fio/t/io_uring with null blk, no iopoll, BS=16*4096


Cc: Christoph Hellwig <hch@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>

Pavel Begunkov (2):
  iov: introduce ITER_BVEC_FLAG_FIXED
  block: no-copy bvec for direct IO

 fs/block_dev.c      | 30 +++++++++++++++++++++++++++++-
 fs/io_uring.c       |  1 +
 include/linux/uio.h | 14 +++++++++++---
 3 files changed, 41 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig Dec. 9, 2020, 6:50 a.m. UTC | #1
On Wed, Dec 09, 2020 at 02:19:50AM +0000, Pavel Begunkov wrote:
> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf
> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty
> silly, but still imposing. I'll redo it closer to reality for next
> iteration, anyway need to double check some cases.

That is pretty impressive.  But I only got this cover letter, not the
actual patches..
Pavel Begunkov Dec. 9, 2020, 11:54 a.m. UTC | #2
On 09/12/2020 06:50, Christoph Hellwig wrote:
> On Wed, Dec 09, 2020 at 02:19:50AM +0000, Pavel Begunkov wrote:
>> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf
>> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty
>> silly, but still imposing. I'll redo it closer to reality for next
>> iteration, anyway need to double check some cases.
> 
> That is pretty impressive.  

The difference will go down with BS=~1-2 pages, just need to to find a
moment to test properly.

> But I only got this cover letter, not the
> actual patches..

Apologies, that goes for everyone as lost CCs in the patches.
Jens Axboe Dec. 9, 2020, 4:53 p.m. UTC | #3
On 12/8/20 7:19 PM, Pavel Begunkov wrote:
> The idea is to avoid copying, merging, etc. bvec from iterator to bio
> in direct I/O and use the one we've already got. Hook it up for io_uring.
> Had an eye on it for a long, and it also was brought up by Matthew
> just recently. Let me know if I forgot or misplaced some tags.
> 
> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf
> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty
> silly, but still imposing. I'll redo it closer to reality for next
> iteration, anyway need to double check some cases.
> 
> If same applied to iomap, common chunck can be moved from block_dev
> into bio_iov_iter_get_pages(), but if there any benefit for filesystems,
> they should explicitly opt in with ITER_BVEC_FLAG_FIXED.

Ran this on a real device, and I get a 10% bump in performance with it.
That's pretty amazing! So please do pursue this one and pull it to
completion.
Al Viro Dec. 9, 2020, 5:06 p.m. UTC | #4
On Wed, Dec 09, 2020 at 02:19:50AM +0000, Pavel Begunkov wrote:
> The idea is to avoid copying, merging, etc. bvec from iterator to bio
> in direct I/O and use the one we've already got. Hook it up for io_uring.
> Had an eye on it for a long, and it also was brought up by Matthew
> just recently. Let me know if I forgot or misplaced some tags.
> 
> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf
> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty
> silly, but still imposing. I'll redo it closer to reality for next
> iteration, anyway need to double check some cases.
> 
> If same applied to iomap, common chunck can be moved from block_dev
> into bio_iov_iter_get_pages(), but if there any benefit for filesystems,
> they should explicitly opt in with ITER_BVEC_FLAG_FIXED.

To reiterate what hch said - this "opt in" is wrong.  Out-of-tree
code that does async IO on bvec-backed iov_iter, setting it up on
its own will have to adapt, that all.

iov_iter and its users are already in serious need of simplification
and cleanups; piling more on top of that would be a bloody bad idea.

Proposed semantics change for bvec-backed iov_iter makes a lot of sense,
so let's make sure that everything in tree can live with it, document
the change and switch to better semantics.

This thing should be unconditional.  Document it in D/f/porting and
if something out of tree complains, it's their problem - not ours.
Pavel Begunkov Dec. 13, 2020, 10:03 p.m. UTC | #5
On 09/12/2020 16:53, Jens Axboe wrote:
> On 12/8/20 7:19 PM, Pavel Begunkov wrote:
>> The idea is to avoid copying, merging, etc. bvec from iterator to bio
>> in direct I/O and use the one we've already got. Hook it up for io_uring.
>> Had an eye on it for a long, and it also was brought up by Matthew
>> just recently. Let me know if I forgot or misplaced some tags.
>>
>> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf
>> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty
>> silly, but still imposing. I'll redo it closer to reality for next
>> iteration, anyway need to double check some cases.
>>
>> If same applied to iomap, common chunck can be moved from block_dev
>> into bio_iov_iter_get_pages(), but if there any benefit for filesystems,
>> they should explicitly opt in with ITER_BVEC_FLAG_FIXED.
> 
> Ran this on a real device, and I get a 10% bump in performance with it.
> That's pretty amazing! So please do pursue this one and pull it to
> completion.

I'm curious, what block size did you use?