Message ID | 20250409134057.198671-2-axboe@kernel.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Cancel and wait for all requests on exit | expand |
On Wed, Apr 09, 2025 at 07:35:19AM -0600, Jens Axboe wrote: > fput currently gates whether or not a task can run task_work on the > PF_KTHREAD flag, which excludes kernel threads as they don't usually run > task_work as they never exit to userspace. This punts the final fput > done from a kthread to a delayed work item instead of using task_work. > > It's perfectly viable to have the final fput done by the kthread itself, > as long as it will actually run the task_work. Add a PF_NO_TASKWORK flag > which is set by default by a kernel thread, and gate the task_work fput > on that instead. This enables a kernel thread to clear this flag > temporarily while putting files, as long as it runs its task_work > manually. > > This enables users like io_uring to ensure that when the final fput of a > file is done as part of ring teardown to run the local task_work and > hence know that all files have been properly put, without needing to > resort to workqueue flushing tricks which can deadlock. > > No functional changes in this patch. > > Cc: Christian Brauner <brauner@kernel.org> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- Seems fine. Although it has some potential for abuse. So maybe a VFS_WARN_ON_ONCE() that PF_NO_TASKWORK is only used with PF_KTHREAD would make sense. Acked-by: Christian Brauner <brauner@kernel.org> > fs/file_table.c | 2 +- > include/linux/sched.h | 2 +- > kernel/fork.c | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/file_table.c b/fs/file_table.c > index c04ed94cdc4b..e3c3dd1b820d 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -521,7 +521,7 @@ static void __fput_deferred(struct file *file) > return; > } > > - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { > + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { > init_task_work(&file->f_task_work, ____fput); > if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) > return; > diff --git a/include/linux/sched.h b/include/linux/sched.h > index f96ac1982893..349c993fc32b 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1736,7 +1736,7 @@ extern struct pid *cad_pid; > * I am cleaning dirty pages from some other bdi. */ > #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ > #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ > -#define PF__HOLE__00800000 0x00800000 > +#define PF_NO_TASKWORK 0x00800000 /* task doesn't run task_work */ > #define PF__HOLE__01000000 0x01000000 > #define PF__HOLE__02000000 0x02000000 > #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ > diff --git a/kernel/fork.c b/kernel/fork.c > index c4b26cd8998b..8dd0b8a5348d 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2261,7 +2261,7 @@ __latent_entropy struct task_struct *copy_process( > goto fork_out; > p->flags &= ~PF_KTHREAD; > if (args->kthread) > - p->flags |= PF_KTHREAD; > + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; > if (args->user_worker) { > /* > * Mark us a user worker, and block any signal that isn't > -- > 2.49.0 >
On 4/11/25 7:48 AM, Christian Brauner wrote: > Seems fine. Although it has some potential for abuse. So maybe a > VFS_WARN_ON_ONCE() that PF_NO_TASKWORK is only used with PF_KTHREAD > would make sense. Can certainly add that. You'd want that before the check for in_interrupt and PF_NO_TASKWORK? Something ala /* PF_NO_TASKWORK should only be used with PF_KTHREAD */ VFS_WARN_ON_ONCE((task->flags & PF_NO_TASKWORK) && !(task->flags & PF_KTHREAD)); ? > Acked-by: Christian Brauner <brauner@kernel.org> Thanks!
On Fri, Apr 11, 2025 at 08:37:51AM -0600, Jens Axboe wrote: > On 4/11/25 7:48 AM, Christian Brauner wrote: > > Seems fine. Although it has some potential for abuse. So maybe a > > VFS_WARN_ON_ONCE() that PF_NO_TASKWORK is only used with PF_KTHREAD > > would make sense. > > Can certainly add that. You'd want that before the check for > in_interrupt and PF_NO_TASKWORK? Something ala > > /* PF_NO_TASKWORK should only be used with PF_KTHREAD */ > VFS_WARN_ON_ONCE((task->flags & PF_NO_TASKWORK) && !(task->flags & PF_KTHREAD)); > > ? Yeah, sounds good!
On 4/14/25 4:10 AM, Christian Brauner wrote: > On Fri, Apr 11, 2025 at 08:37:51AM -0600, Jens Axboe wrote: >> On 4/11/25 7:48 AM, Christian Brauner wrote: >>> Seems fine. Although it has some potential for abuse. So maybe a >>> VFS_WARN_ON_ONCE() that PF_NO_TASKWORK is only used with PF_KTHREAD >>> would make sense. >> >> Can certainly add that. You'd want that before the check for >> in_interrupt and PF_NO_TASKWORK? Something ala >> >> /* PF_NO_TASKWORK should only be used with PF_KTHREAD */ >> VFS_WARN_ON_ONCE((task->flags & PF_NO_TASKWORK) && !(task->flags & PF_KTHREAD)); >> >> ? > > Yeah, sounds good! I used the usual XOR trick for this kind of test, but placed in the same spot: https://git.kernel.dk/cgit/linux/commit/?h=io_uring-exit-cancel.2&id=d5ab108781ccc2f0f013fe009a010a1f29a4785d
On Wed, Apr 09, 2025 at 07:35:19AM -0600, Jens Axboe wrote: > fput currently gates whether or not a task can run task_work on the > PF_KTHREAD flag, which excludes kernel threads as they don't usually run > task_work as they never exit to userspace. This punts the final fput > done from a kthread to a delayed work item instead of using task_work. > > It's perfectly viable to have the final fput done by the kthread itself, > as long as it will actually run the task_work. Add a PF_NO_TASKWORK flag > which is set by default by a kernel thread, and gate the task_work fput > on that instead. This enables a kernel thread to clear this flag > temporarily while putting files, as long as it runs its task_work > manually. > > This enables users like io_uring to ensure that when the final fput of a > file is done as part of ring teardown to run the local task_work and > hence know that all files have been properly put, without needing to > resort to workqueue flushing tricks which can deadlock. > > No functional changes in this patch. > > Cc: Christian Brauner <brauner@kernel.org> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > fs/file_table.c | 2 +- > include/linux/sched.h | 2 +- > kernel/fork.c | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/file_table.c b/fs/file_table.c > index c04ed94cdc4b..e3c3dd1b820d 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -521,7 +521,7 @@ static void __fput_deferred(struct file *file) > return; > } > > - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { > + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { > init_task_work(&file->f_task_work, ____fput); > if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) > return; > diff --git a/include/linux/sched.h b/include/linux/sched.h > index f96ac1982893..349c993fc32b 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1736,7 +1736,7 @@ extern struct pid *cad_pid; > * I am cleaning dirty pages from some other bdi. */ > #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ > #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ > -#define PF__HOLE__00800000 0x00800000 > +#define PF_NO_TASKWORK 0x00800000 /* task doesn't run task_work */ > #define PF__HOLE__01000000 0x01000000 > #define PF__HOLE__02000000 0x02000000 > #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ > diff --git a/kernel/fork.c b/kernel/fork.c > index c4b26cd8998b..8dd0b8a5348d 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2261,7 +2261,7 @@ __latent_entropy struct task_struct *copy_process( > goto fork_out; > p->flags &= ~PF_KTHREAD; > if (args->kthread) > - p->flags |= PF_KTHREAD; > + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; > if (args->user_worker) { > /* > * Mark us a user worker, and block any signal that isn't I don't have comments on the semantics here, I do have comments on some future-proofing. To my reading kthreads on the stock kernel never execute task_work. This suggests it would be nice for task_work_add() to at least WARN_ON when executing with a kthread. After all you don't want a task_work_add consumer adding work which will never execute. But then for your patch to not produce any splats there would have to be a flag blessing select kthreads as legitimate task_work consumers. So my suggestion would be to add the WARN_ON() in task_work_add() prior to anything in this patchset, then this patch would be extended with a flag (PF_KTHREAD_DOES_TASK_WORK?) and relevant io_uring threads would get the flag. Then the machinery which sets/unsets PF_NO_TASKWORK can assert that: 1. it operates on a kthread... 2. ...with the PF_KTHREAD_DOES_TASK_WORK flag This is just a suggestion though.
On 4/14/25 11:11 AM, Mateusz Guzik wrote: > On Wed, Apr 09, 2025 at 07:35:19AM -0600, Jens Axboe wrote: >> fput currently gates whether or not a task can run task_work on the >> PF_KTHREAD flag, which excludes kernel threads as they don't usually run >> task_work as they never exit to userspace. This punts the final fput >> done from a kthread to a delayed work item instead of using task_work. >> >> It's perfectly viable to have the final fput done by the kthread itself, >> as long as it will actually run the task_work. Add a PF_NO_TASKWORK flag >> which is set by default by a kernel thread, and gate the task_work fput >> on that instead. This enables a kernel thread to clear this flag >> temporarily while putting files, as long as it runs its task_work >> manually. >> >> This enables users like io_uring to ensure that when the final fput of a >> file is done as part of ring teardown to run the local task_work and >> hence know that all files have been properly put, without needing to >> resort to workqueue flushing tricks which can deadlock. >> >> No functional changes in this patch. >> >> Cc: Christian Brauner <brauner@kernel.org> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> fs/file_table.c | 2 +- >> include/linux/sched.h | 2 +- >> kernel/fork.c | 2 +- >> 3 files changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/fs/file_table.c b/fs/file_table.c >> index c04ed94cdc4b..e3c3dd1b820d 100644 >> --- a/fs/file_table.c >> +++ b/fs/file_table.c >> @@ -521,7 +521,7 @@ static void __fput_deferred(struct file *file) >> return; >> } >> >> - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { >> + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { >> init_task_work(&file->f_task_work, ____fput); >> if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) >> return; >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index f96ac1982893..349c993fc32b 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -1736,7 +1736,7 @@ extern struct pid *cad_pid; >> * I am cleaning dirty pages from some other bdi. */ >> #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ >> #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ >> -#define PF__HOLE__00800000 0x00800000 >> +#define PF_NO_TASKWORK 0x00800000 /* task doesn't run task_work */ >> #define PF__HOLE__01000000 0x01000000 >> #define PF__HOLE__02000000 0x02000000 >> #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ >> diff --git a/kernel/fork.c b/kernel/fork.c >> index c4b26cd8998b..8dd0b8a5348d 100644 >> --- a/kernel/fork.c >> +++ b/kernel/fork.c >> @@ -2261,7 +2261,7 @@ __latent_entropy struct task_struct *copy_process( >> goto fork_out; >> p->flags &= ~PF_KTHREAD; >> if (args->kthread) >> - p->flags |= PF_KTHREAD; >> + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; >> if (args->user_worker) { >> /* >> * Mark us a user worker, and block any signal that isn't > > I don't have comments on the semantics here, I do have comments on some > future-proofing. > > To my reading kthreads on the stock kernel never execute task_work. Correct > This suggests it would be nice for task_work_add() to at least WARN_ON > when executing with a kthread. After all you don't want a task_work_add > consumer adding work which will never execute. I don't think there's much need for that, as I'm not aware of any kernel usage that had a bug due to that. And if you did, you'd find it pretty quick during testing as that work would just never execute. > But then for your patch to not produce any splats there would have to be > a flag blessing select kthreads as legitimate task_work consumers. This patchset very much adds a specific flag for that, PF_NO_TASKWORK, and kernel threads have it set by default. It just separates the "do I run task_work" flag from PF_KTHREAD. So yes you could add: WARN_ON_ONCE(task->flags & PF_NO_TASKWORK); to task_work_add(), but I'm not really convinced it'd be super useful.
diff --git a/fs/file_table.c b/fs/file_table.c index c04ed94cdc4b..e3c3dd1b820d 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -521,7 +521,7 @@ static void __fput_deferred(struct file *file) return; } - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { init_task_work(&file->f_task_work, ____fput); if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) return; diff --git a/include/linux/sched.h b/include/linux/sched.h index f96ac1982893..349c993fc32b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1736,7 +1736,7 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 +#define PF_NO_TASKWORK 0x00800000 /* task doesn't run task_work */ #define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ diff --git a/kernel/fork.c b/kernel/fork.c index c4b26cd8998b..8dd0b8a5348d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2261,7 +2261,7 @@ __latent_entropy struct task_struct *copy_process( goto fork_out; p->flags &= ~PF_KTHREAD; if (args->kthread) - p->flags |= PF_KTHREAD; + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; if (args->user_worker) { /* * Mark us a user worker, and block any signal that isn't
fput currently gates whether or not a task can run task_work on the PF_KTHREAD flag, which excludes kernel threads as they don't usually run task_work as they never exit to userspace. This punts the final fput done from a kthread to a delayed work item instead of using task_work. It's perfectly viable to have the final fput done by the kthread itself, as long as it will actually run the task_work. Add a PF_NO_TASKWORK flag which is set by default by a kernel thread, and gate the task_work fput on that instead. This enables a kernel thread to clear this flag temporarily while putting files, as long as it runs its task_work manually. This enables users like io_uring to ensure that when the final fput of a file is done as part of ring teardown to run the local task_work and hence know that all files have been properly put, without needing to resort to workqueue flushing tricks which can deadlock. No functional changes in this patch. Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> --- fs/file_table.c | 2 +- include/linux/sched.h | 2 +- kernel/fork.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)