Message ID | 64f28d0f-b2b9-4ff4-8e2f-efdf1c63d3d4@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring/fdinfo: park SQ thread while retrieving cpu/pid | expand |
On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: > Jens Axboe <axboe@kernel.dk> writes: > >> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >> dereference. Park the SQPOLL thread while getting the task cpu and pid for >> fdinfo, this ensures we have a stable view of it. >> >> Cc: stable@vger.kernel.org >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> >> --- >> >> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >> index c53678875416..cd2a0c6b97c4 100644 >> --- a/io_uring/fdinfo.c >> +++ b/io_uring/fdinfo.c >> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >> { >> struct io_ring_ctx *ctx = f->private_data; >> - struct io_sq_data *sq = NULL; >> struct io_overflow_cqe *ocqe; >> struct io_rings *r = ctx->rings; >> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >> unsigned int cq_shift = 0; >> unsigned int sq_shift = 0; >> unsigned int sq_entries, cq_entries; >> + int sq_pid = -1, sq_cpu = -1; >> bool has_lock; >> unsigned int i; >> >> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >> has_lock = mutex_trylock(&ctx->uring_lock); >> >> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >> - sq = ctx->sq_data; >> - if (!sq->thread) >> - sq = NULL; >> + struct io_sq_data *sq = ctx->sq_data; >> + >> + io_sq_thread_park(sq); >> + if (sq->thread) { >> + sq_pid = task_pid_nr(sq->thread); >> + sq_cpu = task_cpu(sq->thread); >> + } >> + io_sq_thread_unpark(sq); > > Jens, > > io_sq_thread_park will try to wake the sqpoll, which is, at least, > unnecessary. But I'm thinking we don't want to expose the ability to > schedule the sqpoll from procfs, which can be done by any unrelated > process. > > To solve the bug, it should be enough to synchronize directly on > sqd->lock, preventing sq->thread from going away inside the if leg. > Granted, it is might take longer if the sqpoll is busy, but reading > fdinfo is not supposed to be fast. Alternatively, don't call > wake_process in this case? I did think about that but just went with the exported API. But you are right, it's a bit annoying that it'd also wake the thread, in case it was idle. Probably mostly cosmetic, but we may as well just stick with grabbing the sqd mutex. I'll send a v2.
On 10/23/23 16:27, Jens Axboe wrote: > On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: >> Jens Axboe <axboe@kernel.dk> writes: >> >>> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >>> dereference. Park the SQPOLL thread while getting the task cpu and pid for >>> fdinfo, this ensures we have a stable view of it. >>> >>> Cc: stable@vger.kernel.org >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>> >>> --- >>> >>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >>> index c53678875416..cd2a0c6b97c4 100644 >>> --- a/io_uring/fdinfo.c >>> +++ b/io_uring/fdinfo.c >>> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >>> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>> { >>> struct io_ring_ctx *ctx = f->private_data; >>> - struct io_sq_data *sq = NULL; >>> struct io_overflow_cqe *ocqe; >>> struct io_rings *r = ctx->rings; >>> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >>> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>> unsigned int cq_shift = 0; >>> unsigned int sq_shift = 0; >>> unsigned int sq_entries, cq_entries; >>> + int sq_pid = -1, sq_cpu = -1; >>> bool has_lock; >>> unsigned int i; >>> >>> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>> has_lock = mutex_trylock(&ctx->uring_lock); >>> >>> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >>> - sq = ctx->sq_data; >>> - if (!sq->thread) >>> - sq = NULL; >>> + struct io_sq_data *sq = ctx->sq_data; >>> + >>> + io_sq_thread_park(sq); >>> + if (sq->thread) { >>> + sq_pid = task_pid_nr(sq->thread); >>> + sq_cpu = task_cpu(sq->thread); >>> + } >>> + io_sq_thread_unpark(sq); >> >> Jens, >> >> io_sq_thread_park will try to wake the sqpoll, which is, at least, >> unnecessary. But I'm thinking we don't want to expose the ability to >> schedule the sqpoll from procfs, which can be done by any unrelated >> process. >> >> To solve the bug, it should be enough to synchronize directly on >> sqd->lock, preventing sq->thread from going away inside the if leg. >> Granted, it is might take longer if the sqpoll is busy, but reading >> fdinfo is not supposed to be fast. Alternatively, don't call >> wake_process in this case? > > I did think about that but just went with the exported API. But you are > right, it's a bit annoying that it'd also wake the thread, in case it Waking it up is not a problem but without parking sq thread won't drop the lock until it's time to sleep, which might be pretty long leaving the /proc read stuck on the lock uninterruptibly. Aside from parking vs lock, there is a lock inversion now: proc read | SQPOLL | try_lock(ring) // success | | woken up | lock(sqd); // success lock(sqd); // stuck | | try to submit requests | -- lock(ring); // stuck > was idle. Probably mostly cosmetic, but we may as well just stick with > grabbing the sqd mutex. I'll send a v2.
On 10/25/23 6:09 AM, Pavel Begunkov wrote: > On 10/23/23 16:27, Jens Axboe wrote: >> On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: >>> Jens Axboe <axboe@kernel.dk> writes: >>> >>>> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >>>> dereference. Park the SQPOLL thread while getting the task cpu and pid for >>>> fdinfo, this ensures we have a stable view of it. >>>> >>>> Cc: stable@vger.kernel.org >>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> >>>> --- >>>> >>>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >>>> index c53678875416..cd2a0c6b97c4 100644 >>>> --- a/io_uring/fdinfo.c >>>> +++ b/io_uring/fdinfo.c >>>> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >>>> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>> { >>>> struct io_ring_ctx *ctx = f->private_data; >>>> - struct io_sq_data *sq = NULL; >>>> struct io_overflow_cqe *ocqe; >>>> struct io_rings *r = ctx->rings; >>>> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >>>> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>> unsigned int cq_shift = 0; >>>> unsigned int sq_shift = 0; >>>> unsigned int sq_entries, cq_entries; >>>> + int sq_pid = -1, sq_cpu = -1; >>>> bool has_lock; >>>> unsigned int i; >>>> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>> has_lock = mutex_trylock(&ctx->uring_lock); >>>> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >>>> - sq = ctx->sq_data; >>>> - if (!sq->thread) >>>> - sq = NULL; >>>> + struct io_sq_data *sq = ctx->sq_data; >>>> + >>>> + io_sq_thread_park(sq); >>>> + if (sq->thread) { >>>> + sq_pid = task_pid_nr(sq->thread); >>>> + sq_cpu = task_cpu(sq->thread); >>>> + } >>>> + io_sq_thread_unpark(sq); >>> >>> Jens, >>> >>> io_sq_thread_park will try to wake the sqpoll, which is, at least, >>> unnecessary. But I'm thinking we don't want to expose the ability to >>> schedule the sqpoll from procfs, which can be done by any unrelated >>> process. >>> >>> To solve the bug, it should be enough to synchronize directly on >>> sqd->lock, preventing sq->thread from going away inside the if leg. >>> Granted, it is might take longer if the sqpoll is busy, but reading >>> fdinfo is not supposed to be fast. Alternatively, don't call >>> wake_process in this case? >> >> I did think about that but just went with the exported API. But you are >> right, it's a bit annoying that it'd also wake the thread, in case it > > Waking it up is not a problem but without parking sq thread won't drop > the lock until it's time to sleep, which might be pretty long leaving > the /proc read stuck on the lock uninterruptibly. > > Aside from parking vs lock, there is a lock inversion now: > > proc read | SQPOLL > | > try_lock(ring) // success | > | woken up > | lock(sqd); // success > lock(sqd); // stuck | > | try to submit requests > | -- lock(ring); // stuck Yeah good point, forgot we nest these opposite of what you'd expect. Honestly I think the fix here is just to turn it into a trylock. Yes that'll miss some cases where we could've gotten the pid/cpu, but doesn't seem worth caring about. IOW, fold in this incremental. diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index af1bdcc0703e..f04a43044d91 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -145,12 +145,13 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { struct io_sq_data *sq = ctx->sq_data; - mutex_lock(&sq->lock); - if (sq->thread) { - sq_pid = task_pid_nr(sq->thread); - sq_cpu = task_cpu(sq->thread); + if (mutex_trylock(&sq->lock)) { + if (sq->thread) { + sq_pid = task_pid_nr(sq->thread); + sq_cpu = task_cpu(sq->thread); + } + mutex_unlock(&sq->lock); } - mutex_unlock(&sq->lock); } seq_printf(m, "SqThread:\t%d\n", sq_pid);
On 10/25/23 14:44, Jens Axboe wrote: > On 10/25/23 6:09 AM, Pavel Begunkov wrote: >> On 10/23/23 16:27, Jens Axboe wrote: >>> On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: >>>> Jens Axboe <axboe@kernel.dk> writes: >>>> >>>>> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >>>>> dereference. Park the SQPOLL thread while getting the task cpu and pid for >>>>> fdinfo, this ensures we have a stable view of it. >>>>> >>>>> Cc: stable@vger.kernel.org >>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>> >>>>> --- >>>>> >>>>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >>>>> index c53678875416..cd2a0c6b97c4 100644 >>>>> --- a/io_uring/fdinfo.c >>>>> +++ b/io_uring/fdinfo.c >>>>> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >>>>> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>> { >>>>> struct io_ring_ctx *ctx = f->private_data; >>>>> - struct io_sq_data *sq = NULL; >>>>> struct io_overflow_cqe *ocqe; >>>>> struct io_rings *r = ctx->rings; >>>>> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >>>>> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>> unsigned int cq_shift = 0; >>>>> unsigned int sq_shift = 0; >>>>> unsigned int sq_entries, cq_entries; >>>>> + int sq_pid = -1, sq_cpu = -1; >>>>> bool has_lock; >>>>> unsigned int i; >>>>> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>> has_lock = mutex_trylock(&ctx->uring_lock); >>>>> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >>>>> - sq = ctx->sq_data; >>>>> - if (!sq->thread) >>>>> - sq = NULL; >>>>> + struct io_sq_data *sq = ctx->sq_data; >>>>> + >>>>> + io_sq_thread_park(sq); >>>>> + if (sq->thread) { >>>>> + sq_pid = task_pid_nr(sq->thread); >>>>> + sq_cpu = task_cpu(sq->thread); >>>>> + } >>>>> + io_sq_thread_unpark(sq); >>>> >>>> Jens, >>>> >>>> io_sq_thread_park will try to wake the sqpoll, which is, at least, >>>> unnecessary. But I'm thinking we don't want to expose the ability to >>>> schedule the sqpoll from procfs, which can be done by any unrelated >>>> process. >>>> >>>> To solve the bug, it should be enough to synchronize directly on >>>> sqd->lock, preventing sq->thread from going away inside the if leg. >>>> Granted, it is might take longer if the sqpoll is busy, but reading >>>> fdinfo is not supposed to be fast. Alternatively, don't call >>>> wake_process in this case? >>> >>> I did think about that but just went with the exported API. But you are >>> right, it's a bit annoying that it'd also wake the thread, in case it >> >> Waking it up is not a problem but without parking sq thread won't drop >> the lock until it's time to sleep, which might be pretty long leaving >> the /proc read stuck on the lock uninterruptibly. >> >> Aside from parking vs lock, there is a lock inversion now: >> >> proc read | SQPOLL >> | >> try_lock(ring) // success | >> | woken up >> | lock(sqd); // success >> lock(sqd); // stuck | >> | try to submit requests >> | -- lock(ring); // stuck > > Yeah good point, forgot we nest these opposite of what you'd expect. > Honestly I think the fix here is just to turn it into a trylock. Yes > that'll miss some cases where we could've gotten the pid/cpu, but > doesn't seem worth caring about. > > IOW, fold in this incremental. Should work, otherwise you probably can just park first. Long term it'd be nice to make sqpoll to not hold sqd->lock during submission + polling as it currently does. Park callers sleep on the lock, but if we replace it with some struct completion the rest should be easy enough. > diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c > index af1bdcc0703e..f04a43044d91 100644 > --- a/io_uring/fdinfo.c > +++ b/io_uring/fdinfo.c > @@ -145,12 +145,13 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) > if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { > struct io_sq_data *sq = ctx->sq_data; > > - mutex_lock(&sq->lock); > - if (sq->thread) { > - sq_pid = task_pid_nr(sq->thread); > - sq_cpu = task_cpu(sq->thread); > + if (mutex_trylock(&sq->lock)) { > + if (sq->thread) { > + sq_pid = task_pid_nr(sq->thread); > + sq_cpu = task_cpu(sq->thread); > + } > + mutex_unlock(&sq->lock); > } > - mutex_unlock(&sq->lock); > } > > seq_printf(m, "SqThread:\t%d\n", sq_pid); >
On 10/25/23 8:09 AM, Pavel Begunkov wrote: > On 10/25/23 14:44, Jens Axboe wrote: >> On 10/25/23 6:09 AM, Pavel Begunkov wrote: >>> On 10/23/23 16:27, Jens Axboe wrote: >>>> On 10/23/23 9:17 AM, Gabriel Krisman Bertazi wrote: >>>>> Jens Axboe <axboe@kernel.dk> writes: >>>>> >>>>>> We could race with SQ thread exit, and if we do, we'll hit a NULL pointer >>>>>> dereference. Park the SQPOLL thread while getting the task cpu and pid for >>>>>> fdinfo, this ensures we have a stable view of it. >>>>>> >>>>>> Cc: stable@vger.kernel.org >>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 >>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>>>> >>>>>> --- >>>>>> >>>>>> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c >>>>>> index c53678875416..cd2a0c6b97c4 100644 >>>>>> --- a/io_uring/fdinfo.c >>>>>> +++ b/io_uring/fdinfo.c >>>>>> @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, >>>>>> __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>>> { >>>>>> struct io_ring_ctx *ctx = f->private_data; >>>>>> - struct io_sq_data *sq = NULL; >>>>>> struct io_overflow_cqe *ocqe; >>>>>> struct io_rings *r = ctx->rings; >>>>>> unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; >>>>>> @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>>> unsigned int cq_shift = 0; >>>>>> unsigned int sq_shift = 0; >>>>>> unsigned int sq_entries, cq_entries; >>>>>> + int sq_pid = -1, sq_cpu = -1; >>>>>> bool has_lock; >>>>>> unsigned int i; >>>>>> @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) >>>>>> has_lock = mutex_trylock(&ctx->uring_lock); >>>>>> if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { >>>>>> - sq = ctx->sq_data; >>>>>> - if (!sq->thread) >>>>>> - sq = NULL; >>>>>> + struct io_sq_data *sq = ctx->sq_data; >>>>>> + >>>>>> + io_sq_thread_park(sq); >>>>>> + if (sq->thread) { >>>>>> + sq_pid = task_pid_nr(sq->thread); >>>>>> + sq_cpu = task_cpu(sq->thread); >>>>>> + } >>>>>> + io_sq_thread_unpark(sq); >>>>> >>>>> Jens, >>>>> >>>>> io_sq_thread_park will try to wake the sqpoll, which is, at least, >>>>> unnecessary. But I'm thinking we don't want to expose the ability to >>>>> schedule the sqpoll from procfs, which can be done by any unrelated >>>>> process. >>>>> >>>>> To solve the bug, it should be enough to synchronize directly on >>>>> sqd->lock, preventing sq->thread from going away inside the if leg. >>>>> Granted, it is might take longer if the sqpoll is busy, but reading >>>>> fdinfo is not supposed to be fast. Alternatively, don't call >>>>> wake_process in this case? >>>> >>>> I did think about that but just went with the exported API. But you are >>>> right, it's a bit annoying that it'd also wake the thread, in case it >>> >>> Waking it up is not a problem but without parking sq thread won't drop >>> the lock until it's time to sleep, which might be pretty long leaving >>> the /proc read stuck on the lock uninterruptibly. >>> >>> Aside from parking vs lock, there is a lock inversion now: >>> >>> proc read | SQPOLL >>> | >>> try_lock(ring) // success | >>> | woken up >>> | lock(sqd); // success >>> lock(sqd); // stuck | >>> | try to submit requests >>> | -- lock(ring); // stuck >> >> Yeah good point, forgot we nest these opposite of what you'd expect. >> Honestly I think the fix here is just to turn it into a trylock. Yes >> that'll miss some cases where we could've gotten the pid/cpu, but >> doesn't seem worth caring about. >> >> IOW, fold in this incremental. > > Should work, otherwise you probably can just park first. In general I think it's better to have the observational side be less of an impact, which is why I liked not doing the parking. Sometimes apps do stupid things and monitor fdinfo/ etc continually. As it's possible to get the task/cpu anyway via other means, should be better to just skip if we don't get the lock. > Long term it'd be nice to make sqpoll to not hold sqd->lock during > submission + polling as it currently does. Park callers sleep on the > lock, but if we replace it with some struct completion the rest > should be easy enough. Yeah agree.
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index c53678875416..cd2a0c6b97c4 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -53,7 +53,6 @@ static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id, __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) { struct io_ring_ctx *ctx = f->private_data; - struct io_sq_data *sq = NULL; struct io_overflow_cqe *ocqe; struct io_rings *r = ctx->rings; unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1; @@ -64,6 +63,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) unsigned int cq_shift = 0; unsigned int sq_shift = 0; unsigned int sq_entries, cq_entries; + int sq_pid = -1, sq_cpu = -1; bool has_lock; unsigned int i; @@ -143,13 +143,18 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) has_lock = mutex_trylock(&ctx->uring_lock); if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) { - sq = ctx->sq_data; - if (!sq->thread) - sq = NULL; + struct io_sq_data *sq = ctx->sq_data; + + io_sq_thread_park(sq); + if (sq->thread) { + sq_pid = task_pid_nr(sq->thread); + sq_cpu = task_cpu(sq->thread); + } + io_sq_thread_unpark(sq); } - seq_printf(m, "SqThread:\t%d\n", sq ? task_pid_nr(sq->thread) : -1); - seq_printf(m, "SqThreadCpu:\t%d\n", sq ? task_cpu(sq->thread) : -1); + seq_printf(m, "SqThread:\t%d\n", sq_pid); + seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu); seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files); for (i = 0; has_lock && i < ctx->nr_user_files; i++) { struct file *f = io_file_from_index(&ctx->file_table, i);
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer dereference. Park the SQPOLL thread while getting the task cpu and pid for fdinfo, this ensures we have a stable view of it. Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032 Signed-off-by: Jens Axboe <axboe@kernel.dk> ---