Message ID | cover.1723601133.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | implement asynchronous BLKDISCARD via io_uring | expand |
On 8/14/24 4:45 AM, Pavel Begunkov wrote: > There is an interest in having asynchronous block operations like > discard. The patch set implements that as io_uring commands, which is > an io_uring request type allowing to implement custom file specific > operations. > > First 4 patches are simple preps, and the main part is in Patch 5. > Not tested with a real drive yet, hence sending as an RFC. > > I'm also going to add BLKDISCARDZEROES and BLKSECDISCARD, which should > reuse structures and helpers from Patch 5. > > liburing tests for reference: > > https://github.com/isilence/liburing.git discard-cmd-test FWIW, did a quick patch to wire it up for fio. Using this job file: [trim] filename=/dev/nvme31n1 ioengine=io_uring iodepth=64 rw=randtrim norandommap bs=4k the stock kernel gets: trim: IOPS=21.6k, BW=84.4MiB/s (88.5MB/s)(847MiB/10036msec); 0 zone resets using ~5% CPU, and with the process predictably stuck in D state all of the time, waiting on a sync trim. With the patches: trim: IOPS=75.8k, BW=296MiB/s (310MB/s)(2653MiB/8961msec); 0 zone resets using ~11% CPU. Didn't verify actual functionality, but did check trims are going up and down. Drive used is: Dell NVMe PM1743 RI E3.S 7.68TB Outside of async trim being useful for actual workloads rather than the sync trim we have now, it'll also help characterizing real world mixed workloads that have trims with reads/writes.
Hi Pavel! > I'm also going to add BLKDISCARDZEROES and BLKSECDISCARD, which should > reuse structures and helpers from Patch 5. Adding these is going to be very useful. Just a nit: Please use either ZEROOUT (or WRITE_ZEROES) and SECURE_ERASE terminology for the additional operations when you add them. DISCARDZEROES is obsolete and secure discard has been replaced by secure erase. We are stuck with the legacy ioctl names but we should avoid perpetuating unfortunate naming choices from the past.
On 8/15/24 17:15, Martin K. Petersen wrote: > > Hi Pavel! > >> I'm also going to add BLKDISCARDZEROES and BLKSECDISCARD, which should >> reuse structures and helpers from Patch 5. > > Adding these is going to be very useful. > > Just a nit: Please use either ZEROOUT (or WRITE_ZEROES) and SECURE_ERASE > terminology for the additional operations when you add them. > > DISCARDZEROES is obsolete and secure discard has been replaced by secure > erase. We are stuck with the legacy ioctl names but we should avoid > perpetuating unfortunate naming choices from the past. Noted, thanks, the ioctl names are indeed confusing.
On 8/15/24 16:50, Jens Axboe wrote: > On 8/14/24 4:45 AM, Pavel Begunkov wrote: >> There is an interest in having asynchronous block operations like >> discard. The patch set implements that as io_uring commands, which is >> an io_uring request type allowing to implement custom file specific >> operations. >> >> First 4 patches are simple preps, and the main part is in Patch 5. >> Not tested with a real drive yet, hence sending as an RFC. >> >> I'm also going to add BLKDISCARDZEROES and BLKSECDISCARD, which should >> reuse structures and helpers from Patch 5. >> >> liburing tests for reference: >> >> https://github.com/isilence/liburing.git discard-cmd-test > > FWIW, did a quick patch to wire it up for fio. Using this > job file: > > [trim] > filename=/dev/nvme31n1 > ioengine=io_uring > iodepth=64 > rw=randtrim > norandommap > bs=4k > > the stock kernel gets: > > trim: IOPS=21.6k, BW=84.4MiB/s (88.5MB/s)(847MiB/10036msec); 0 zone resets > > using ~5% CPU, and with the process predictably stuck in D state all of > the time, waiting on a sync trim. > > With the patches: > > trim: IOPS=75.8k, BW=296MiB/s (310MB/s)(2653MiB/8961msec); 0 zone resets > > using ~11% CPU. Thanks for giving it a run. Can be further improved for particular use cases, e.g. by adding a vectored version as an additional feature, i.e. multiple discard ranges per request, and something like retry of short IO, but for that we'd want getting -EAGAIN directly from submit_bio unlike failing via the callback. > Didn't verify actual functionality, but did check trims are going up and > down. Drive used is: > > Dell NVMe PM1743 RI E3.S 7.68TB > > Outside of async trim being useful for actual workloads rather than the > sync trim we have now, it'll also help characterizing real world mixed > workloads that have trims with reads/writes.