Message ID | c09446bd-faca-13cc-97af-c06fa324e798@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post() | expand |
On 11/21/22 14:52, Jens Axboe wrote: > __io_cq_unlock_post() is identical to io_cq_unlock_post(), and > io_cqring_ev_posted() has a single caller so migth as well just inline > it there. It was there for one purpose, to inline it in the hottest path, i.e. __io_submit_flush_completions(). I'll be reverting it back > Signed-off-by: Jens Axboe <axboe@kernel.dk> > > --- > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 762ecab801f2..2260fb7aa7f2 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -581,23 +581,14 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx) > io_eventfd_flush_signal(ctx); > } > > -static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx) > -{ > - io_commit_cqring_flush(ctx); > - io_cqring_wake(ctx); > -} > - > -static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx) > +void io_cq_unlock_post(struct io_ring_ctx *ctx) > __releases(ctx->completion_lock) > { > io_commit_cqring(ctx); > spin_unlock(&ctx->completion_lock); > - io_cqring_ev_posted(ctx); > -} > > -void io_cq_unlock_post(struct io_ring_ctx *ctx) > -{ > - __io_cq_unlock_post(ctx); > + io_commit_cqring_flush(ctx); > + io_cqring_wake(ctx); > } > > /* Returns true if there are no backlogged entries after the flush */ > @@ -1346,7 +1337,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) > if (!(req->flags & REQ_F_CQE_SKIP)) > __io_fill_cqe_req(ctx, req); > } > - __io_cq_unlock_post(ctx); > + io_cq_unlock_post(ctx); > > io_free_batch_list(ctx, state->compl_reqs.first); > INIT_WQ_LIST(&state->compl_reqs); >
On 11/24/22 9:16?AM, Pavel Begunkov wrote: > On 11/21/22 14:52, Jens Axboe wrote: >> __io_cq_unlock_post() is identical to io_cq_unlock_post(), and >> io_cqring_ev_posted() has a single caller so migth as well just inline >> it there. > > It was there for one purpose, to inline it in the hottest path, > i.e. __io_submit_flush_completions(). I'll be reverting it back The compiler is most certainly already doing that, in fact even __io_submit_flush_completions() is inlined in io_submit_flush_completions() for me here.
On 11/24/22 18:46, Jens Axboe wrote: > On 11/24/22 9:16?AM, Pavel Begunkov wrote: >> On 11/21/22 14:52, Jens Axboe wrote: >>> __io_cq_unlock_post() is identical to io_cq_unlock_post(), and >>> io_cqring_ev_posted() has a single caller so migth as well just inline >>> it there. >> >> It was there for one purpose, to inline it in the hottest path, >> i.e. __io_submit_flush_completions(). I'll be reverting it back > > The compiler is most certainly already doing that, in fact even .L1493: # io_uring/io_uring.c:631: io_cq_unlock_post(ctx); movq %r15, %rdi # ctx, call io_cq_unlock_post # Even more, after IORING_SETUP_CQE32 was added I didn't see once __io_fill_cqe_req actually inlined even though it's marked so. > __io_submit_flush_completions() is inlined in > io_submit_flush_completions() for me here. And io_submit_flush_completions is inlined as well, right? That would be quite odd, __io_submit_flush_completions() is not small by any means and there are 3 call sites.
On 11/24/22 19:17, Pavel Begunkov wrote: > On 11/24/22 18:46, Jens Axboe wrote: >> On 11/24/22 9:16?AM, Pavel Begunkov wrote: >>> On 11/21/22 14:52, Jens Axboe wrote: >>>> __io_cq_unlock_post() is identical to io_cq_unlock_post(), and >>>> io_cqring_ev_posted() has a single caller so migth as well just inline >>>> it there. >>> >>> It was there for one purpose, to inline it in the hottest path, >>> i.e. __io_submit_flush_completions(). I'll be reverting it back >> >> The compiler is most certainly already doing that, in fact even > > .L1493: > # io_uring/io_uring.c:631: io_cq_unlock_post(ctx); > movq %r15, %rdi # ctx, > call io_cq_unlock_post # wrong one, __io_submit_flush_completions: pushq %rbp # ... .L1793: # io_uring/io_uring.c:1394: io_cq_unlock_post(ctx); movq %r12, %rdi # ctx, call io_cq_unlock_post # > Even more, after IORING_SETUP_CQE32 was added I didn't see > once __io_fill_cqe_req actually inlined even though it's marked > so. > >> __io_submit_flush_completions() is inlined in >> io_submit_flush_completions() for me here. > > And io_submit_flush_completions is inlined as well, right? > That would be quite odd, __io_submit_flush_completions() is not > small by any means and there are 3 call sites. >
On 11/24/22 12:17?PM, Pavel Begunkov wrote: > On 11/24/22 18:46, Jens Axboe wrote: >> On 11/24/22 9:16?AM, Pavel Begunkov wrote: >>> On 11/21/22 14:52, Jens Axboe wrote: >>>> __io_cq_unlock_post() is identical to io_cq_unlock_post(), and >>>> io_cqring_ev_posted() has a single caller so migth as well just inline >>>> it there. >>> >>> It was there for one purpose, to inline it in the hottest path, >>> i.e. __io_submit_flush_completions(). I'll be reverting it back >> >> The compiler is most certainly already doing that, in fact even > > .L1493: > # io_uring/io_uring.c:631:???? io_cq_unlock_post(ctx); > ????movq??? %r15, %rdi??? # ctx, > ????call??? io_cq_unlock_post??? # Doubled checked here, and you're actually right: 55bc: 94000000 bl 4760 <io_cq_unlock_post> Huh, that's very odd that it doesn't inline it. It doesn't even it I mark it inline, __always_inline gets it done. > Even more, after IORING_SETUP_CQE32 was added I didn't see > once __io_fill_cqe_req actually inlined even though it's marked > so. Doesn't seem to be inlined here either. Compiler: gcc (Debian 12.2.0-9) 12.2.0 >> __io_submit_flush_completions() is inlined in >> io_submit_flush_completions() for me here. > > And io_submit_flush_completions is inlined as well, right? > That would be quite odd, __io_submit_flush_completions() is not > small by any means and there are 3 call sites. io_submit_flush_completions() doesn't get inlined, __io_submit_flush_completions() gets inlined in io_submit_flush_completions().
On 11/24/22 19:29, Jens Axboe wrote: > On 11/24/22 12:17?PM, Pavel Begunkov wrote: >> On 11/24/22 18:46, Jens Axboe wrote: >>> On 11/24/22 9:16?AM, Pavel Begunkov wrote: >>>> On 11/21/22 14:52, Jens Axboe wrote: >>>>> __io_cq_unlock_post() is identical to io_cq_unlock_post(), and >>>>> io_cqring_ev_posted() has a single caller so migth as well just inline >>>>> it there. >>>> >>>> It was there for one purpose, to inline it in the hottest path, >>>> i.e. __io_submit_flush_completions(). I'll be reverting it back >>> >>> The compiler is most certainly already doing that, in fact even >> >> .L1493: >> # io_uring/io_uring.c:631:???? io_cq_unlock_post(ctx); >> ????movq??? %r15, %rdi??? # ctx, >> ????call??? io_cq_unlock_post??? # > > Doubled checked here, and you're actually right: > > 55bc: 94000000 bl 4760 <io_cq_unlock_post> > > Huh, that's very odd that it doesn't inline it. It doesn't even it I > mark it inline, __always_inline gets it done. That's odd as well for a function of this size >> Even more, after IORING_SETUP_CQE32 was added I didn't see >> once __io_fill_cqe_req actually inlined even though it's marked >> so. > > Doesn't seem to be inlined here either. Compiler: > > gcc (Debian 12.2.0-9) 12.2.0 > >>> __io_submit_flush_completions() is inlined in >>> io_submit_flush_completions() for me here. >> >> And io_submit_flush_completions is inlined as well, right? >> That would be quite odd, __io_submit_flush_completions() is not >> small by any means and there are 3 call sites. > > io_submit_flush_completions() doesn't get inlined, > __io_submit_flush_completions() gets inlined in > io_submit_flush_completions(). Then the compiler is drunk. It doesn't inline the function explicitly marked inline but does it for a non-inline one. Unless it's PGO'ed I can't think of a sane reason for it.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 762ecab801f2..2260fb7aa7f2 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -581,23 +581,14 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx) io_eventfd_flush_signal(ctx); } -static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx) -{ - io_commit_cqring_flush(ctx); - io_cqring_wake(ctx); -} - -static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx) +void io_cq_unlock_post(struct io_ring_ctx *ctx) __releases(ctx->completion_lock) { io_commit_cqring(ctx); spin_unlock(&ctx->completion_lock); - io_cqring_ev_posted(ctx); -} -void io_cq_unlock_post(struct io_ring_ctx *ctx) -{ - __io_cq_unlock_post(ctx); + io_commit_cqring_flush(ctx); + io_cqring_wake(ctx); } /* Returns true if there are no backlogged entries after the flush */ @@ -1346,7 +1337,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) if (!(req->flags & REQ_F_CQE_SKIP)) __io_fill_cqe_req(ctx, req); } - __io_cq_unlock_post(ctx); + io_cq_unlock_post(ctx); io_free_batch_list(ctx, state->compl_reqs.first); INIT_WQ_LIST(&state->compl_reqs);
__io_cq_unlock_post() is identical to io_cq_unlock_post(), and io_cqring_ev_posted() has a single caller so migth as well just inline it there. Signed-off-by: Jens Axboe <axboe@kernel.dk> ---