Message ID | cover.1672713341.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | CQ waiting and wake up optimisations | expand |
On Tue, 03 Jan 2023 03:03:51 +0000, Pavel Begunkov wrote: > The series replaces waitqueues for CQ waiting with a custom waiting > loop and adds a couple more perf tweak around it. Benchmarking is done > for QD1 with simulated tw arrival right after we start waiting, it > gets us from 7.5 MIOPS to 9.2, which is +22%, or double the number for > the in-kernel io_uring overhead (i.e. without syscall and userspace). > That matches profiles, wake_up() _without_ wake_up_state() was taking > 12-14% and prepare_to_wait_exclusive() was around 4-6%. > > [...] Applied, thanks! [01/13] io_uring: rearrange defer list checks commit: 9617404e5d86e9cfb2da4ac2b17e99a72836bf69 [02/13] io_uring: don't iterate cq wait fast path commit: 1329dc7e79da3570f6591d9997bd2fe3a7d17ca6 [03/13] io_uring: kill io_run_task_work_ctx commit: 90b8457304e25a137c1b8c89f7cae276b79d3273 [04/13] io_uring: move defer tw task checks commit: 1345a6b381b4d39b15a1e34c0a78be2ee2e452c6 [05/13] io_uring: parse check_cq out of wq waiting commit: b5be9ebe91246b67d4b0dee37e3071d73ba69119 [06/13] io_uring: mimimise io_cqring_wait_schedule commit: de254b5029fa37c4e0a6a16743fa2271fa524fc7 [07/13] io_uring: simplify io_has_work commit: 26736d171ec54487de677f09d682d144489957fa [08/13] io_uring: set TASK_RUNNING right after schedule commit: 8214ccccf64f1335b34b98ed7deb2c6c29969c49 Best regards,
On 1/3/23 03:03, Pavel Begunkov wrote: > The series replaces waitqueues for CQ waiting with a custom waiting > loop and adds a couple more perf tweak around it. Benchmarking is done > for QD1 with simulated tw arrival right after we start waiting, it > gets us from 7.5 MIOPS to 9.2, which is +22%, or double the number for > the in-kernel io_uring overhead (i.e. without syscall and userspace). > That matches profiles, wake_up() _without_ wake_up_state() was taking > 12-14% and prepare_to_wait_exclusive() was around 4-6%. The numbers are gathered with an in-kernel trick. Tried to quickly measure without it: modprobe null_blk no_sched=1 irqmode=2 completion_nsec=0 taskset -c 0 fio/t/io_uring -d1 -s1 -c1 -p0 -B1 -F1 -X -b512 -n4 /dev/nullb0 The important part here is using timers-backed nullblk and pinning multiple workers to a single CPU. -n4 was enough for me to keep the CPU 100% busy. old: IOPS=539.51K, BW=2.11GiB/s, IOS/call=1/1 IOPS=542.26K, BW=2.12GiB/s, IOS/call=1/1 IOPS=540.73K, BW=2.11GiB/s, IOS/call=1/1 IOPS=541.28K, BW=2.11GiB/s, IOS/call=0/0 new: IOPS=561.85K, BW=2.19GiB/s, IOS/call=1/1 IOPS=561.58K, BW=2.19GiB/s, IOS/call=1/1 IOPS=561.56K, BW=2.19GiB/s, IOS/call=1/1 IOPS=559.94K, BW=2.19GiB/s, IOS/call=1/1 The different is only ~3.5% because of huge additional overhead for nullb timers, block qos and other unnecessary bits. P.S. tested with an out-of-tree patch adding a flag enabling/disabling the feature to remove variance b/w reboots.