mbox series

[v3,00/11] CQ waiting / task_work optimisations

Message ID cover.1673274244.git.asml.silence@gmail.com (mailing list archive)
Headers show
Series CQ waiting / task_work optimisations | expand

Message

Pavel Begunkov Jan. 9, 2023, 2:46 p.m. UTC
For DEFER_TASKRUN rings replace CQ waitqueues with a custom implementation
based on the fact that only one task may be waiting for completions. Also,
improve deferred task running by removing one atomic in patch 11

Benchmarking QD1 with simulated tw arrival right after we start waiting:
7.5 MIOPS -> 9.3 (+23%), where half of CPU cycles goes to syscall overhead.

v2: remove merged cleanups and add new ones
    add 11/11 removing one extra atomic
    a small sync adjustment in 10/10
    add extra comments

Pavel Begunkov (11):
  io_uring: move submitter_task out of cold cacheline
  io_uring: refactor io_wake_function
  io_uring: don't set TASK_RUNNING in local tw runner
  io_uring: mark io_run_local_work static
  io_uring: move io_run_local_work_locked
  io_uring: separate wq for ring polling
  io_uring: add lazy poll_wq activation
  io_uring: wake up optimisations
  io_uring: waitqueue-less cq waiting
  io_uring: add io_req_local_work_add wake fast path
  io_uring: optimise deferred tw execution

 include/linux/io_uring_types.h |  15 +--
 io_uring/io_uring.c            | 161 ++++++++++++++++++++++++++-------
 io_uring/io_uring.h            |  28 ++----
 3 files changed, 144 insertions(+), 60 deletions(-)

Comments

Jens Axboe Jan. 11, 2023, 6 p.m. UTC | #1
On Mon, 09 Jan 2023 14:46:02 +0000, Pavel Begunkov wrote:
> For DEFER_TASKRUN rings replace CQ waitqueues with a custom implementation
> based on the fact that only one task may be waiting for completions. Also,
> improve deferred task running by removing one atomic in patch 11
> 
> Benchmarking QD1 with simulated tw arrival right after we start waiting:
> 7.5 MIOPS -> 9.3 (+23%), where half of CPU cycles goes to syscall overhead.
> 
> [...]

Applied, thanks!

[01/11] io_uring: move submitter_task out of cold cacheline
        commit: 8516c8b514839600b7e63090f2dce5b4d658fd68
[02/11] io_uring: refactor io_wake_function
        commit: 291f31bf963c0018a2b84a94388a0e7b535c3dae
[03/11] io_uring: don't set TASK_RUNNING in local tw runner
        commit: 5eb30c28823aed63946c9d2a222bf1158a3aecae
[04/11] io_uring: mark io_run_local_work static
        commit: 88d14c077c1a04555978c499acd12f5f55de51da
[05/11] io_uring: move io_run_local_work_locked
        commit: 78c37b460a63c866050d3e05d6d4bfddf654075e
[06/11] io_uring: separate wq for ring polling
        commit: 6b40f3c9a37b97e629a99a92d2c392d77ae20f60
[07/11] io_uring: add lazy poll_wq activation
        commit: e05f6f47bf8aed0e97d9ba1d52e2a10ea542609c
[08/11] io_uring: wake up optimisations
        commit: ef3ddc6ac629fc829ed6e08418e1c070332dde63
[09/11] io_uring: waitqueue-less cq waiting
        commit: 65ca9dd8ce5e3de42b100f0e7d2ae360e9b8d14e
[10/11] io_uring: add io_req_local_work_add wake fast path
        commit: 6cd16656e2ddc63ee7aae7c7f27edcab933a0e09
[11/11] io_uring: optimise deferred tw execution
        commit: 607947314b4c9f8c979f79c095da9156b41c82b8

Best regards,