Message ID | cover.1725621577.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | implement async block discards and other ops via io_uring | expand |
On 9/6/24 4:57 PM, Pavel Begunkov wrote: > There is an interest in having asynchronous block operations like > discard and write zeroes. The series implements that as io_uring commands, > which is an io_uring request type allowing to implement custom file > specific operations. > > First 4 are preparation patches. Patch 5 introduces the main chunk of > cmd infrastructure and discard commands. Patches 6-8 implement > write zeroes variants. > > Branch with tests and docs: > https://github.com/isilence/liburing.git discard-cmd > > The man page specifically (need to shuffle it to some cmd section): > https://github.com/isilence/liburing/commit/a6fa2bc2400bf7fcb80496e322b5db4c8b3191f0 This looks good to me now. Only minor nit is that I generally don't like: while ((bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, gfp))) { where assignment and test are in one line as they are harder do read, prefer doing: do { bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, gfp); if (!bio) break; [...] } while (1); instead. But nothing that should need a respin or anything. I'll run some testing on this tomorrow! Thanks,
On 9/6/24 4:57 PM, Pavel Begunkov wrote: > There is an interest in having asynchronous block operations like > discard and write zeroes. The series implements that as io_uring commands, > which is an io_uring request type allowing to implement custom file > specific operations. > > First 4 are preparation patches. Patch 5 introduces the main chunk of > cmd infrastructure and discard commands. Patches 6-8 implement > write zeroes variants. Sitting in for-6.12/io_uring-discard for now, as there's a hidden dependency with the end/len patch in for-6.12/block. Ran a quick test - have 64 4k discards inflight. Here's the current performance, with 64 threads with sync discard: qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec) and using io_uring with async discard, otherwise same test case: qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec) If we switch to doing 1M discards, then we get: qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec) and using io_uring with async discard, otherwise same test case: qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec) This is on a: Samsung Electronics Co Ltd NVMe SSD Controller PM174X nvme device. It doesn't have the fastest discard, but still nicely shows the improvement over a purely sync discard.
On Fri, 06 Sep 2024 23:57:17 +0100, Pavel Begunkov wrote: > There is an interest in having asynchronous block operations like > discard and write zeroes. The series implements that as io_uring commands, > which is an io_uring request type allowing to implement custom file > specific operations. > > First 4 are preparation patches. Patch 5 introduces the main chunk of > cmd infrastructure and discard commands. Patches 6-8 implement > write zeroes variants. > > [...] Applied, thanks! [1/8] io_uring/cmd: expose iowq to cmds commit: c6472f5f9a0806b0598ba513344b5a30cfa53b97 [2/8] io_uring/cmd: give inline space in request to cmds commit: 1a7628d034f8328813163d07ce112e1198289aeb [3/8] filemap: introduce filemap_invalidate_pages commit: 1f027ae3136dfb4bfe40d83f3e0f5019e63db883 [4/8] block: introduce blk_validate_byte_range() commit: da22f537db72c2520c48445840b7e371c58762a7 [5/8] block: implement async discard as io_uring cmd commit: 0d266c981982f0f54165f05dbcdf449bb87f5184 [6/8] block: implement async write zeroes command commit: b56d5132a78db21ca3b386056af38802aea0a274 [7/8] block: add nowait flag for __blkdev_issue_zero_pages commit: 4f8e422a0744f1294c784109cfbedafd97263c2f [8/8] block: implement async write zero pages command commit: 4811c90cbf179b4c58fdbad54c5b05efc0d59159 Best regards,
On 9/9/24 8:51 AM, Jens Axboe wrote: > On 9/6/24 4:57 PM, Pavel Begunkov wrote: >> There is an interest in having asynchronous block operations like >> discard and write zeroes. The series implements that as io_uring commands, >> which is an io_uring request type allowing to implement custom file >> specific operations. >> >> First 4 are preparation patches. Patch 5 introduces the main chunk of >> cmd infrastructure and discard commands. Patches 6-8 implement >> write zeroes variants. > > Sitting in for-6.12/io_uring-discard for now, as there's a hidden > dependency with the end/len patch in for-6.12/block. > > Ran a quick test - have 64 4k discards inflight. Here's the current > performance, with 64 threads with sync discard: > > qd64 sync discard: 21K IOPS, lat avg 3 msec (max 21 msec) > > and using io_uring with async discard, otherwise same test case: > > qd64 async discard: 76K IOPS, lat avg 845 usec (max 2.2 msec) > > If we switch to doing 1M discards, then we get: > > qd64 sync discard: 14K IOPS, lat avg 5 msec (max 25 msec) > > and using io_uring with async discard, otherwise same test case: > > qd64 async discard: 56K IOPS, lat avg 1153 usec (max 3.6 msec) > > This is on a: > > Samsung Electronics Co Ltd NVMe SSD Controller PM174X > > nvme device. It doesn't have the fastest discard, but still nicely shows > the improvement over a purely sync discard. Did some basic testing with null_blk just to get a better idea of what it'd look like on a faster devices. Same test cases as above (qd=64, 4k and 1M random trims): Type Trim size IOPS Lat avg (usec) Lat Max (usec) ============================================================== sync 4k 144K 444 20314 async 4k 1353K 47 595 sync 1M 56K 1136 21031 async 1M 94K 680 760