Message ID | 382791dc97d208d88ee31e5ebb5b661a0453fb79.1722374371.git.olivier@trillion01.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring: minor sqpoll code refactoring | expand |
On 7/30/24 22:10, Olivier Langlois wrote: > there are many small reasons justifying this change. > > 1. busy poll must be performed even on rings that have no iopoll and no > new sqe. It is quite possible that a ring configured for inbound > traffic with multishot be several hours without receiving new request > submissions > 2. NAPI busy poll does not perform any credential validation > 3. If the thread is awaken by task work, processing the task work is > prioritary over NAPI busy loop. This is why a second loop has been > created after the io_sq_tw() call instead of doing the busy loop in > __io_sq_thread() outside its credential acquisition block. That patch should be first as it's a fix we care to backport. It's also Fixes: 8d0c12a80cdeb ("io-uring: add napi busy poll support") Cc: stable@vger.kernel.org And a comment below > > Signed-off-by: Olivier Langlois <olivier@trillion01.com> > --- > io_uring/napi.h | 9 +++++++++ > io_uring/sqpoll.c | 7 ++++--- > 2 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/io_uring/napi.h b/io_uring/napi.h > index 88f1c21d5548..5506c6af1ff5 100644 > --- a/io_uring/napi.h > +++ b/io_uring/napi.h > @@ -101,4 +101,13 @@ static inline int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx) > } > #endif /* CONFIG_NET_RX_BUSY_POLL */ > > +static inline int io_do_sqpoll_napi(struct io_ring_ctx *ctx) > +{ > + int ret = 0; > + > + if (io_napi(ctx)) > + ret = io_napi_sqpoll_busy_poll(ctx); > + return ret; > +} > + > #endif > diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c > index cc4a25136030..ec558daa0331 100644 > --- a/io_uring/sqpoll.c > +++ b/io_uring/sqpoll.c > @@ -195,9 +195,6 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) > ret = io_submit_sqes(ctx, to_submit); > mutex_unlock(&ctx->uring_lock); > > - if (io_napi(ctx)) > - ret += io_napi_sqpoll_busy_poll(ctx); > - > if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait)) > wake_up(&ctx->sqo_sq_wait); > if (creds) > @@ -322,6 +319,10 @@ static int io_sq_thread(void *data) > if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE)) > sqt_spin = true; > > + list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { > + if (io_do_sqpoll_napi(ctx)) > + sqt_spin = true; io_do_sqpoll_napi() returns 1 as long as there are napis in the list, iow even if there is no activity it'll spin almost forever (60s is forever) bypassing sq_thread_idle. Let's not update sqt_spin here, if the user wants it to poll for longer it can pass a larger SQPOLL idle timeout value. > + } > if (sqt_spin || !time_after(jiffies, timeout)) { > if (sqt_spin) { > io_sq_update_worktime(sqd, &start);
On Fri, 2024-08-02 at 12:14 +0100, Pavel Begunkov wrote: > > io_do_sqpoll_napi() returns 1 as long as there are napis in the list, > iow even if there is no activity it'll spin almost forever (60s is > forever) bypassing sq_thread_idle. > > Let's not update sqt_spin here, if the user wants it to poll for > longer it can pass a larger SQPOLL idle timeout value. > > > fair enough... in that case, maybe the man page SQPOLL idle timeout description should include the mention that if NAPI busy loop is used, the idle timeout should be at least as large as gro_flush_timeout to meet NAPI requirement to not generate interrupts as described in Documentation/networking/napi.rst section "Software IRQ coalescing" I have discovered this fact the hard way by having spent days to figure out how to do busy poll the right way. this simple mention could save the trouble to many new users of the feature. I'll rework the patch and send a new version in the few days. Greetings,
On 8/2/24 15:22, Olivier Langlois wrote: > On Fri, 2024-08-02 at 12:14 +0100, Pavel Begunkov wrote: >> >> io_do_sqpoll_napi() returns 1 as long as there are napis in the list, >> iow even if there is no activity it'll spin almost forever (60s is >> forever) bypassing sq_thread_idle. >> >> Let's not update sqt_spin here, if the user wants it to poll for >> longer it can pass a larger SQPOLL idle timeout value. >> >> >> > fair enough... > > in that case, maybe the man page SQPOLL idle timeout description should > include the mention that if NAPI busy loop is used, the idle timeout > should be at least as large as gro_flush_timeout to meet NAPI > requirement to not generate interrupts as described in Would be great to have, I agree. We might also need to start a tips and tricks document, not like many people are looking at documentation. > Documentation/networking/napi.rst > section "Software IRQ coalescing" > > I have discovered this fact the hard way by having spent days to figure > out how to do busy poll the right way. > > this simple mention could save the trouble to many new users of the > feature. > > I'll rework the patch and send a new version in the few days. Awesome, thanks
diff --git a/io_uring/napi.h b/io_uring/napi.h index 88f1c21d5548..5506c6af1ff5 100644 --- a/io_uring/napi.h +++ b/io_uring/napi.h @@ -101,4 +101,13 @@ static inline int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx) } #endif /* CONFIG_NET_RX_BUSY_POLL */ +static inline int io_do_sqpoll_napi(struct io_ring_ctx *ctx) +{ + int ret = 0; + + if (io_napi(ctx)) + ret = io_napi_sqpoll_busy_poll(ctx); + return ret; +} + #endif diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index cc4a25136030..ec558daa0331 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -195,9 +195,6 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) ret = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock); - if (io_napi(ctx)) - ret += io_napi_sqpoll_busy_poll(ctx); - if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait)) wake_up(&ctx->sqo_sq_wait); if (creds) @@ -322,6 +319,10 @@ static int io_sq_thread(void *data) if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE)) sqt_spin = true; + list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { + if (io_do_sqpoll_napi(ctx)) + sqt_spin = true; + } if (sqt_spin || !time_after(jiffies, timeout)) { if (sqt_spin) { io_sq_update_worktime(sqd, &start);
there are many small reasons justifying this change. 1. busy poll must be performed even on rings that have no iopoll and no new sqe. It is quite possible that a ring configured for inbound traffic with multishot be several hours without receiving new request submissions 2. NAPI busy poll does not perform any credential validation 3. If the thread is awaken by task work, processing the task work is prioritary over NAPI busy loop. This is why a second loop has been created after the io_sq_tw() call instead of doing the busy loop in __io_sq_thread() outside its credential acquisition block. Signed-off-by: Olivier Langlois <olivier@trillion01.com> --- io_uring/napi.h | 9 +++++++++ io_uring/sqpoll.c | 7 ++++--- 2 files changed, 13 insertions(+), 3 deletions(-)