Message ID | 20191015135929.30912-1-yangerkun@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V3] io_uring: consider the overflow of sequence for timeout req | expand |
On 10/15/19 7:59 AM, yangerkun wrote: > Now we recalculate the sequence of timeout with 'req->sequence = > ctx->cached_sq_head + count - 1', judge the right place to insert > for timeout_list by compare the number of request we still expected for > completion. But we have not consider about the situation of overflow: > > 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for > the new timeout req can have a small req->sequence. > > 2. cached_sq_head of now may overflow compare with before req. And it > will lead the timeout req with small req->sequence. > > This overflow will lead to the misorder of timeout_list, which can lead > to the wrong order of the completion of timeout_list. Fix it by reuse > req->submit.sequence to store the count, and change the logic of > inserting sort in io_timeout. Thanks, this looks great. Applied for 5.4.
On 2019/10/15 21:59, yangerkun wrote: > Now we recalculate the sequence of timeout with 'req->sequence = > ctx->cached_sq_head + count - 1', judge the right place to insert > for timeout_list by compare the number of request we still expected for > completion. But we have not consider about the situation of overflow: > > 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for > the new timeout req can have a small req->sequence. > > 2. cached_sq_head of now may overflow compare with before req. And it > will lead the timeout req with small req->sequence. > > This overflow will lead to the misorder of timeout_list, which can lead > to the wrong order of the completion of timeout_list. Fix it by reuse > req->submit.sequence to store the count, and change the logic of > inserting sort in io_timeout. > > Signed-off-by: yangerkun <yangerkun@huawei.com> > --- > fs/io_uring.c | 27 +++++++++++++++++++++------ > 1 file changed, 21 insertions(+), 6 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 76fdbe84aff5..c9512da06973 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -1884,7 +1884,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer) > > static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) > { > - unsigned count, req_dist, tail_index; > + unsigned count; > struct io_ring_ctx *ctx = req->ctx; > struct list_head *entry; > struct timespec64 ts; > @@ -1907,21 +1907,36 @@ static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) > count = 1; > > req->sequence = ctx->cached_sq_head + count - 1; > + /* reuse it to store the count */ > + req->submit.sequence = count; > req->flags |= REQ_F_TIMEOUT; > > /* > * Insertion sort, ensuring the first entry in the list is always > * the one we need first. > */ > - tail_index = ctx->cached_cq_tail - ctx->rings->sq_dropped; > - req_dist = req->sequence - tail_index; > spin_lock_irq(&ctx->completion_lock); > list_for_each_prev(entry, &ctx->timeout_list) { > struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); > - unsigned dist; > + unsigned nxt_sq_head; > + long long tmp, tmp_nxt; > > - dist = nxt->sequence - tail_index; > - if (req_dist >= dist) > + /* > + * Since cached_sq_head + count - 1 can overflow, use type long > + * long to store it. > + */ > + tmp = (long long)ctx->cached_sq_head + count - 1; > + nxt_sq_head = nxt->sequence - nxt->submit.sequence + 1; > + tmp_nxt = (long long)nxt_sq_head + nxt->submit.sequence - 1; > + > + /* > + * cached_sq_head may overflow, and it will never overflow twice > + * once there is some timeout req still be valid. > + */ > + if (ctx->cached_sq_head < nxt_sq_head) > + tmp_nxt += UINT_MAX; Maybe there is a mistake, it should be tmp. So sorry about this. > + > + if (tmp >= tmp_nxt) > break; > } > list_add(&req->list, entry); >
On 10/15/19 7:35 PM, yangerkun wrote: > > > On 2019/10/15 21:59, yangerkun wrote: >> Now we recalculate the sequence of timeout with 'req->sequence = >> ctx->cached_sq_head + count - 1', judge the right place to insert >> for timeout_list by compare the number of request we still expected for >> completion. But we have not consider about the situation of overflow: >> >> 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for >> the new timeout req can have a small req->sequence. >> >> 2. cached_sq_head of now may overflow compare with before req. And it >> will lead the timeout req with small req->sequence. >> >> This overflow will lead to the misorder of timeout_list, which can lead >> to the wrong order of the completion of timeout_list. Fix it by reuse >> req->submit.sequence to store the count, and change the logic of >> inserting sort in io_timeout. >> >> Signed-off-by: yangerkun <yangerkun@huawei.com> >> --- >> fs/io_uring.c | 27 +++++++++++++++++++++------ >> 1 file changed, 21 insertions(+), 6 deletions(-) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 76fdbe84aff5..c9512da06973 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -1884,7 +1884,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer) >> >> static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> { >> - unsigned count, req_dist, tail_index; >> + unsigned count; >> struct io_ring_ctx *ctx = req->ctx; >> struct list_head *entry; >> struct timespec64 ts; >> @@ -1907,21 +1907,36 @@ static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) >> count = 1; >> >> req->sequence = ctx->cached_sq_head + count - 1; >> + /* reuse it to store the count */ >> + req->submit.sequence = count; >> req->flags |= REQ_F_TIMEOUT; >> >> /* >> * Insertion sort, ensuring the first entry in the list is always >> * the one we need first. >> */ >> - tail_index = ctx->cached_cq_tail - ctx->rings->sq_dropped; >> - req_dist = req->sequence - tail_index; >> spin_lock_irq(&ctx->completion_lock); >> list_for_each_prev(entry, &ctx->timeout_list) { >> struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); >> - unsigned dist; >> + unsigned nxt_sq_head; >> + long long tmp, tmp_nxt; >> >> - dist = nxt->sequence - tail_index; >> - if (req_dist >= dist) >> + /* >> + * Since cached_sq_head + count - 1 can overflow, use type long >> + * long to store it. >> + */ >> + tmp = (long long)ctx->cached_sq_head + count - 1; >> + nxt_sq_head = nxt->sequence - nxt->submit.sequence + 1; >> + tmp_nxt = (long long)nxt_sq_head + nxt->submit.sequence - 1; >> + >> + /* >> + * cached_sq_head may overflow, and it will never overflow twice >> + * once there is some timeout req still be valid. >> + */ >> + if (ctx->cached_sq_head < nxt_sq_head) >> + tmp_nxt += UINT_MAX; > > Maybe there is a mistake, it should be tmp. So sorry about this. I ran it through the basic testing, but I guess it doesn't catch overflow cases. Maybe we can come up with one? Should be pretty simple to setup a io_uring, post UINT_MAX - 10 nops (or something like that), then do some timeout testing. Just send an incremental patch to fix it.
On 2019/10/16 9:45, Jens Axboe wrote: > On 10/15/19 7:35 PM, yangerkun wrote: >> >> >> On 2019/10/15 21:59, yangerkun wrote: >>> Now we recalculate the sequence of timeout with 'req->sequence = >>> ctx->cached_sq_head + count - 1', judge the right place to insert >>> for timeout_list by compare the number of request we still expected for >>> completion. But we have not consider about the situation of overflow: >>> >>> 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for >>> the new timeout req can have a small req->sequence. >>> >>> 2. cached_sq_head of now may overflow compare with before req. And it >>> will lead the timeout req with small req->sequence. >>> >>> This overflow will lead to the misorder of timeout_list, which can lead >>> to the wrong order of the completion of timeout_list. Fix it by reuse >>> req->submit.sequence to store the count, and change the logic of >>> inserting sort in io_timeout. >>> >>> Signed-off-by: yangerkun <yangerkun@huawei.com> >>> --- >>> fs/io_uring.c | 27 +++++++++++++++++++++------ >>> 1 file changed, 21 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 76fdbe84aff5..c9512da06973 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -1884,7 +1884,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer) >>> >>> static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>> { >>> - unsigned count, req_dist, tail_index; >>> + unsigned count; >>> struct io_ring_ctx *ctx = req->ctx; >>> struct list_head *entry; >>> struct timespec64 ts; >>> @@ -1907,21 +1907,36 @@ static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) >>> count = 1; >>> >>> req->sequence = ctx->cached_sq_head + count - 1; >>> + /* reuse it to store the count */ >>> + req->submit.sequence = count; >>> req->flags |= REQ_F_TIMEOUT; >>> >>> /* >>> * Insertion sort, ensuring the first entry in the list is always >>> * the one we need first. >>> */ >>> - tail_index = ctx->cached_cq_tail - ctx->rings->sq_dropped; >>> - req_dist = req->sequence - tail_index; >>> spin_lock_irq(&ctx->completion_lock); >>> list_for_each_prev(entry, &ctx->timeout_list) { >>> struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); >>> - unsigned dist; >>> + unsigned nxt_sq_head; >>> + long long tmp, tmp_nxt; >>> >>> - dist = nxt->sequence - tail_index; >>> - if (req_dist >= dist) >>> + /* >>> + * Since cached_sq_head + count - 1 can overflow, use type long >>> + * long to store it. >>> + */ >>> + tmp = (long long)ctx->cached_sq_head + count - 1; >>> + nxt_sq_head = nxt->sequence - nxt->submit.sequence + 1; >>> + tmp_nxt = (long long)nxt_sq_head + nxt->submit.sequence - 1; >>> + >>> + /* >>> + * cached_sq_head may overflow, and it will never overflow twice >>> + * once there is some timeout req still be valid. >>> + */ >>> + if (ctx->cached_sq_head < nxt_sq_head) >>> + tmp_nxt += UINT_MAX; >> >> Maybe there is a mistake, it should be tmp. So sorry about this. > > I ran it through the basic testing, but I guess it doesn't catch overflow > cases. Maybe we can come up with one? Should be pretty simple to setup a > io_uring, post UINT_MAX - 10 nops (or something like that), then do some > timeout testing. > Good idea! I will try to add a testcase for this in liburing. > Just send an incremental patch to fix it. OK, will send the fix patch! >
diff --git a/fs/io_uring.c b/fs/io_uring.c index 76fdbe84aff5..c9512da06973 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1884,7 +1884,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer) static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) { - unsigned count, req_dist, tail_index; + unsigned count; struct io_ring_ctx *ctx = req->ctx; struct list_head *entry; struct timespec64 ts; @@ -1907,21 +1907,36 @@ static int io_timeout(struct io_kiocb *req, const struct io_uring_sqe *sqe) count = 1; req->sequence = ctx->cached_sq_head + count - 1; + /* reuse it to store the count */ + req->submit.sequence = count; req->flags |= REQ_F_TIMEOUT; /* * Insertion sort, ensuring the first entry in the list is always * the one we need first. */ - tail_index = ctx->cached_cq_tail - ctx->rings->sq_dropped; - req_dist = req->sequence - tail_index; spin_lock_irq(&ctx->completion_lock); list_for_each_prev(entry, &ctx->timeout_list) { struct io_kiocb *nxt = list_entry(entry, struct io_kiocb, list); - unsigned dist; + unsigned nxt_sq_head; + long long tmp, tmp_nxt; - dist = nxt->sequence - tail_index; - if (req_dist >= dist) + /* + * Since cached_sq_head + count - 1 can overflow, use type long + * long to store it. + */ + tmp = (long long)ctx->cached_sq_head + count - 1; + nxt_sq_head = nxt->sequence - nxt->submit.sequence + 1; + tmp_nxt = (long long)nxt_sq_head + nxt->submit.sequence - 1; + + /* + * cached_sq_head may overflow, and it will never overflow twice + * once there is some timeout req still be valid. + */ + if (ctx->cached_sq_head < nxt_sq_head) + tmp_nxt += UINT_MAX; + + if (tmp >= tmp_nxt) break; } list_add(&req->list, entry);
Now we recalculate the sequence of timeout with 'req->sequence = ctx->cached_sq_head + count - 1', judge the right place to insert for timeout_list by compare the number of request we still expected for completion. But we have not consider about the situation of overflow: 1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for the new timeout req can have a small req->sequence. 2. cached_sq_head of now may overflow compare with before req. And it will lead the timeout req with small req->sequence. This overflow will lead to the misorder of timeout_list, which can lead to the wrong order of the completion of timeout_list. Fix it by reuse req->submit.sequence to store the count, and change the logic of inserting sort in io_timeout. Signed-off-by: yangerkun <yangerkun@huawei.com> --- fs/io_uring.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-)