Message ID | cover.1607477897.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | nocopy bvec for direct IO | expand |
On Wed, Dec 09, 2020 at 02:19:50AM +0000, Pavel Begunkov wrote: > A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf > shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty > silly, but still imposing. I'll redo it closer to reality for next > iteration, anyway need to double check some cases. That is pretty impressive. But I only got this cover letter, not the actual patches..
On 09/12/2020 06:50, Christoph Hellwig wrote: > On Wed, Dec 09, 2020 at 02:19:50AM +0000, Pavel Begunkov wrote: >> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf >> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty >> silly, but still imposing. I'll redo it closer to reality for next >> iteration, anyway need to double check some cases. > > That is pretty impressive. The difference will go down with BS=~1-2 pages, just need to to find a moment to test properly. > But I only got this cover letter, not the > actual patches.. Apologies, that goes for everyone as lost CCs in the patches.
On 12/8/20 7:19 PM, Pavel Begunkov wrote: > The idea is to avoid copying, merging, etc. bvec from iterator to bio > in direct I/O and use the one we've already got. Hook it up for io_uring. > Had an eye on it for a long, and it also was brought up by Matthew > just recently. Let me know if I forgot or misplaced some tags. > > A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf > shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty > silly, but still imposing. I'll redo it closer to reality for next > iteration, anyway need to double check some cases. > > If same applied to iomap, common chunck can be moved from block_dev > into bio_iov_iter_get_pages(), but if there any benefit for filesystems, > they should explicitly opt in with ITER_BVEC_FLAG_FIXED. Ran this on a real device, and I get a 10% bump in performance with it. That's pretty amazing! So please do pursue this one and pull it to completion.
On Wed, Dec 09, 2020 at 02:19:50AM +0000, Pavel Begunkov wrote: > The idea is to avoid copying, merging, etc. bvec from iterator to bio > in direct I/O and use the one we've already got. Hook it up for io_uring. > Had an eye on it for a long, and it also was brought up by Matthew > just recently. Let me know if I forgot or misplaced some tags. > > A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf > shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty > silly, but still imposing. I'll redo it closer to reality for next > iteration, anyway need to double check some cases. > > If same applied to iomap, common chunck can be moved from block_dev > into bio_iov_iter_get_pages(), but if there any benefit for filesystems, > they should explicitly opt in with ITER_BVEC_FLAG_FIXED. To reiterate what hch said - this "opt in" is wrong. Out-of-tree code that does async IO on bvec-backed iov_iter, setting it up on its own will have to adapt, that all. iov_iter and its users are already in serious need of simplification and cleanups; piling more on top of that would be a bloody bad idea. Proposed semantics change for bvec-backed iov_iter makes a lot of sense, so let's make sure that everything in tree can live with it, document the change and switch to better semantics. This thing should be unconditional. Document it in D/f/porting and if something out of tree complains, it's their problem - not ours.
On 09/12/2020 16:53, Jens Axboe wrote: > On 12/8/20 7:19 PM, Pavel Begunkov wrote: >> The idea is to avoid copying, merging, etc. bvec from iterator to bio >> in direct I/O and use the one we've already got. Hook it up for io_uring. >> Had an eye on it for a long, and it also was brought up by Matthew >> just recently. Let me know if I forgot or misplaced some tags. >> >> A benchmark got me 430KIOPS vs 540KIOPS, or +25% on bare metal. And perf >> shows that bio_iov_iter_get_pages() was taking ~20%. The test is pretty >> silly, but still imposing. I'll redo it closer to reality for next >> iteration, anyway need to double check some cases. >> >> If same applied to iomap, common chunck can be moved from block_dev >> into bio_iov_iter_get_pages(), but if there any benefit for filesystems, >> they should explicitly opt in with ITER_BVEC_FLAG_FIXED. > > Ran this on a real device, and I get a 10% bump in performance with it. > That's pretty amazing! So please do pursue this one and pull it to > completion. I'm curious, what block size did you use?