Message ID | 20211108084142.4692-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | kthread: dynamically allocate memory to store kthread's full name | expand |
On Mon, Nov 08, 2021 at 08:41:42AM +0000, Yafang Shao wrote: > When I was implementing a new per-cpu kthread cfs_migration, I found the > comm of it "cfs_migration/%u" is truncated due to the limitation of > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are > all with the same name "cfs_migration/1", which will confuse the user. This > issue is not critical, because we can get the corresponding CPU from the > task's Cpus_allowed. But for kthreads correspoinding to other hardware > devices, it is not easy to get the detailed device info from task comm, > for example, > > jbd2/nvme0n1p2- > xfs-reclaim/sdf > > Currently there are so many truncated kthreads: > > rcu_tasks_kthre > rcu_tasks_rude_ > rcu_tasks_trace > poll_mpt3sas0_s > ext4-rsv-conver > xfs-reclaim/sd{a, b, c, ...} > xfs-blockgc/sd{a, b, c, ...} > xfs-inodegc/sd{a, b, c, ...} > audit_send_repl > ecryptfs-kthrea > vfio-irqfd-clea > jbd2/nvme0n1p2- > ... > > We can shorten these names to work around this problem, but it may be > not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for > example, it is a nice name, and it is not a good idea to shorten it. > > One possible way to fix this issue is extending the task comm size, but > as task->comm is used in lots of places, that may cause some potential > buffer overflows. Another more conservative approach is introducing a new > pointer to store kthread's full name if it is truncated, which won't > introduce too much overhead as it is in the non-critical path. Finally we > make a dicision to use the second approach. See also the discussions in > this thread: > https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/ > > After this change, the full name of these truncated kthreads will be > displayed via /proc/[pid]/comm: > > rcu_tasks_kthread > rcu_tasks_rude_kthread > rcu_tasks_trace_kthread > poll_mpt3sas0_statu > ext4-rsv-conversion > xfs-reclaim/sdf1 > xfs-blockgc/sdf1 > xfs-inodegc/sdf1 > audit_send_reply > ecryptfs-kthread > vfio-irqfd-cleanup > jbd2/nvme0n1p2-8 > > Suggested-by: Petr Mladek <pmladek@suse.com> > Suggested-by: Steven Rostedt <rostedt@goodmis.org> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Kees Cook <keescook@chromium.org> > Cc: Petr Mladek <pmladek@suse.com> > --- > TODO: will cleanup worker comm in the next step. > > --- > fs/proc/array.c | 3 +++ > include/linux/kthread.h | 1 + > kernel/kthread.c | 32 +++++++++++++++++++++++++++++++- > 3 files changed, 35 insertions(+), 1 deletion(-) > > diff --git a/fs/proc/array.c b/fs/proc/array.c > index 49be8c8ef555..860e4deafa65 100644 > --- a/fs/proc/array.c > +++ b/fs/proc/array.c > @@ -92,6 +92,7 @@ > #include <linux/string_helpers.h> > #include <linux/user_namespace.h> > #include <linux/fs_struct.h> > +#include <linux/kthread.h> > > #include <asm/processor.h> > #include "internal.h" > @@ -102,6 +103,8 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) > > if (p->flags & PF_WQ_WORKER) > wq_worker_comm(tcomm, sizeof(tcomm), p); > + else if (p->flags & PF_KTHREAD) > + get_kthread_comm(tcomm, sizeof(tcomm), p); > else > __get_task_comm(tcomm, sizeof(tcomm), p); > > diff --git a/include/linux/kthread.h b/include/linux/kthread.h > index 346b0f269161..2a5c04494663 100644 > --- a/include/linux/kthread.h > +++ b/include/linux/kthread.h > @@ -33,6 +33,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), > unsigned int cpu, > const char *namefmt); > > +void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk); > void set_kthread_struct(struct task_struct *p); > > void kthread_set_per_cpu(struct task_struct *k, int cpu); > diff --git a/kernel/kthread.c b/kernel/kthread.c > index 5b37a8567168..ce8258231eea 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -60,6 +60,8 @@ struct kthread { > #ifdef CONFIG_BLK_CGROUP > struct cgroup_subsys_state *blkcg_css; > #endif > + /* To store the full name if task comm is truncated. */ > + char *full_name; > }; > > enum KTHREAD_BITS { > @@ -93,6 +95,18 @@ static inline struct kthread *__to_kthread(struct task_struct *p) > return kthread; > } > > +void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk) > +{ > + struct kthread *kthread = to_kthread(tsk); > + > + if (!kthread || !kthread->full_name) { > + __get_task_comm(buf, buf_size, tsk); > + return; > + } > + > + strscpy_pad(buf, kthread->full_name, buf_size); > +} > + > void set_kthread_struct(struct task_struct *p) > { > struct kthread *kthread; > @@ -121,6 +135,7 @@ void free_kthread_struct(struct task_struct *k) > #ifdef CONFIG_BLK_CGROUP > WARN_ON_ONCE(kthread && kthread->blkcg_css); > #endif > + kfree(kthread->full_name); > kfree(kthread); > } > > @@ -399,12 +414,27 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), > if (!IS_ERR(task)) { > static const struct sched_param param = { .sched_priority = 0 }; > char name[TASK_COMM_LEN]; > + va_list aq; > + int len; > > /* > * task is already visible to other tasks, so updating > * COMM must be protected. > */ > - vsnprintf(name, sizeof(name), namefmt, args); > + va_copy(aq, args); > + len = vsnprintf(name, sizeof(name), namefmt, aq); > + va_end(aq); > + if (len >= TASK_COMM_LEN) { > + struct kthread *kthread = to_kthread(task); > + char *full_name; > + > + full_name = kvasprintf(GFP_KERNEL, namefmt, args); > + if (!full_name) { > + kfree(create); > + return ERR_PTR(-ENOMEM); I'm not a fan of this out-of-line free/return. Why not just leave it truncated when out of memory? For example just do: struct kthread *kthread = to_kthread(task); kthread->full_name = kvasprintf(GFP_KERNEL, namefmt, args); > + } > + kthread->full_name = full_name; > + } > set_task_comm(task, name); > /* > * root may have changed our (kthreadd's) priority or CPU mask. > -- > 2.17.1 >
On Wed, Nov 10, 2021 at 1:34 AM Kees Cook <keescook@chromium.org> wrote: > > On Mon, Nov 08, 2021 at 08:41:42AM +0000, Yafang Shao wrote: > > When I was implementing a new per-cpu kthread cfs_migration, I found the > > comm of it "cfs_migration/%u" is truncated due to the limitation of > > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are > > all with the same name "cfs_migration/1", which will confuse the user. This > > issue is not critical, because we can get the corresponding CPU from the > > task's Cpus_allowed. But for kthreads correspoinding to other hardware > > devices, it is not easy to get the detailed device info from task comm, > > for example, > > > > jbd2/nvme0n1p2- > > xfs-reclaim/sdf > > > > Currently there are so many truncated kthreads: > > > > rcu_tasks_kthre > > rcu_tasks_rude_ > > rcu_tasks_trace > > poll_mpt3sas0_s > > ext4-rsv-conver > > xfs-reclaim/sd{a, b, c, ...} > > xfs-blockgc/sd{a, b, c, ...} > > xfs-inodegc/sd{a, b, c, ...} > > audit_send_repl > > ecryptfs-kthrea > > vfio-irqfd-clea > > jbd2/nvme0n1p2- > > ... > > > > We can shorten these names to work around this problem, but it may be > > not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for > > example, it is a nice name, and it is not a good idea to shorten it. > > > > One possible way to fix this issue is extending the task comm size, but > > as task->comm is used in lots of places, that may cause some potential > > buffer overflows. Another more conservative approach is introducing a new > > pointer to store kthread's full name if it is truncated, which won't > > introduce too much overhead as it is in the non-critical path. Finally we > > make a dicision to use the second approach. See also the discussions in > > this thread: > > https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/ > > > > After this change, the full name of these truncated kthreads will be > > displayed via /proc/[pid]/comm: > > > > rcu_tasks_kthread > > rcu_tasks_rude_kthread > > rcu_tasks_trace_kthread > > poll_mpt3sas0_statu > > ext4-rsv-conversion > > xfs-reclaim/sdf1 > > xfs-blockgc/sdf1 > > xfs-inodegc/sdf1 > > audit_send_reply > > ecryptfs-kthread > > vfio-irqfd-cleanup > > jbd2/nvme0n1p2-8 > > > > Suggested-by: Petr Mladek <pmladek@suse.com> > > Suggested-by: Steven Rostedt <rostedt@goodmis.org> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> > > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> > > Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Steven Rostedt <rostedt@goodmis.org> > > Cc: Matthew Wilcox <willy@infradead.org> > > Cc: David Hildenbrand <david@redhat.com> > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Petr Mladek <pmladek@suse.com> > > --- > > TODO: will cleanup worker comm in the next step. > > > > --- > > fs/proc/array.c | 3 +++ > > include/linux/kthread.h | 1 + > > kernel/kthread.c | 32 +++++++++++++++++++++++++++++++- > > 3 files changed, 35 insertions(+), 1 deletion(-) > > > > diff --git a/fs/proc/array.c b/fs/proc/array.c > > index 49be8c8ef555..860e4deafa65 100644 > > --- a/fs/proc/array.c > > +++ b/fs/proc/array.c > > @@ -92,6 +92,7 @@ > > #include <linux/string_helpers.h> > > #include <linux/user_namespace.h> > > #include <linux/fs_struct.h> > > +#include <linux/kthread.h> > > > > #include <asm/processor.h> > > #include "internal.h" > > @@ -102,6 +103,8 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) > > > > if (p->flags & PF_WQ_WORKER) > > wq_worker_comm(tcomm, sizeof(tcomm), p); > > + else if (p->flags & PF_KTHREAD) > > + get_kthread_comm(tcomm, sizeof(tcomm), p); > > else > > __get_task_comm(tcomm, sizeof(tcomm), p); > > > > diff --git a/include/linux/kthread.h b/include/linux/kthread.h > > index 346b0f269161..2a5c04494663 100644 > > --- a/include/linux/kthread.h > > +++ b/include/linux/kthread.h > > @@ -33,6 +33,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), > > unsigned int cpu, > > const char *namefmt); > > > > +void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk); > > void set_kthread_struct(struct task_struct *p); > > > > void kthread_set_per_cpu(struct task_struct *k, int cpu); > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > index 5b37a8567168..ce8258231eea 100644 > > --- a/kernel/kthread.c > > +++ b/kernel/kthread.c > > @@ -60,6 +60,8 @@ struct kthread { > > #ifdef CONFIG_BLK_CGROUP > > struct cgroup_subsys_state *blkcg_css; > > #endif > > + /* To store the full name if task comm is truncated. */ > > + char *full_name; > > }; > > > > enum KTHREAD_BITS { > > @@ -93,6 +95,18 @@ static inline struct kthread *__to_kthread(struct task_struct *p) > > return kthread; > > } > > > > +void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk) > > +{ > > + struct kthread *kthread = to_kthread(tsk); > > + > > + if (!kthread || !kthread->full_name) { > > + __get_task_comm(buf, buf_size, tsk); > > + return; > > + } > > + > > + strscpy_pad(buf, kthread->full_name, buf_size); > > +} > > + > > void set_kthread_struct(struct task_struct *p) > > { > > struct kthread *kthread; > > @@ -121,6 +135,7 @@ void free_kthread_struct(struct task_struct *k) > > #ifdef CONFIG_BLK_CGROUP > > WARN_ON_ONCE(kthread && kthread->blkcg_css); > > #endif > > + kfree(kthread->full_name); > > kfree(kthread); > > } > > > > @@ -399,12 +414,27 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), > > if (!IS_ERR(task)) { > > static const struct sched_param param = { .sched_priority = 0 }; > > char name[TASK_COMM_LEN]; > > + va_list aq; > > + int len; > > > > /* > > * task is already visible to other tasks, so updating > > * COMM must be protected. > > */ > > - vsnprintf(name, sizeof(name), namefmt, args); > > + va_copy(aq, args); > > + len = vsnprintf(name, sizeof(name), namefmt, aq); > > + va_end(aq); > > + if (len >= TASK_COMM_LEN) { > > + struct kthread *kthread = to_kthread(task); > > + char *full_name; > > + > > + full_name = kvasprintf(GFP_KERNEL, namefmt, args); > > + if (!full_name) { > > + kfree(create); > > + return ERR_PTR(-ENOMEM); > > I'm not a fan of this out-of-line free/return. Why not just leave it > truncated when out of memory? For example just do: > It is OK for me. I will do it as you suggested and show a warning for this case. > struct kthread *kthread = to_kthread(task); > > kthread->full_name = kvasprintf(GFP_KERNEL, namefmt, args); > > > + } > > + kthread->full_name = full_name; > > + } > > set_task_comm(task, name); > > /* > > * root may have changed our (kthreadd's) priority or CPU mask. > > -- > > 2.17.1 > > > > -- > Kees Cook
On Mon 2021-11-08 08:41:42, Yafang Shao wrote: > When I was implementing a new per-cpu kthread cfs_migration, I found the > comm of it "cfs_migration/%u" is truncated due to the limitation of > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are > all with the same name "cfs_migration/1", which will confuse the user. This > issue is not critical, because we can get the corresponding CPU from the > task's Cpus_allowed. But for kthreads correspoinding to other hardware > devices, it is not easy to get the detailed device info from task comm, > for example, > > After this change, the full name of these truncated kthreads will be > displayed via /proc/[pid]/comm: > > --- a/fs/proc/array.c > +++ b/fs/proc/array.c > @@ -102,6 +103,8 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) > > if (p->flags & PF_WQ_WORKER) > wq_worker_comm(tcomm, sizeof(tcomm), p); Just for record. I though that this patch obsoleted wq_worker_comm() but it did not. wq_worker_comm() returns different values depending on the last proceed work item and has to stay. > + else if (p->flags & PF_KTHREAD) > + get_kthread_comm(tcomm, sizeof(tcomm), p); > else > __get_task_comm(tcomm, sizeof(tcomm), p); > > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -121,6 +135,7 @@ void free_kthread_struct(struct task_struct *k) Hmm, there is the following comment: /* * Can be NULL if this kthread was created by kernel_thread() * or if kmalloc() in kthread() failed. */ kthread = to_kthread(k); And indeed, set_kthread_struct() is called only by kthread() and init_idle(). For example, call_usermodehelper_exec_sync() calls kernel_thread() but given @fn does not call set_kthread_struct(). Also init_idle() continues even when the allocation failed. > #ifdef CONFIG_BLK_CGROUP > WARN_ON_ONCE(kthread && kthread->blkcg_css); > #endif > + kfree(kthread->full_name); Hence, we have to make sure that it is not NULL here. I suggest something like: void free_kthread_struct(struct task_struct *k) { struct kthread *kthread; /* * Can be NULL if this kthread was created by kernel_thread() * or if kmalloc() in kthread() failed. */ kthread = to_kthread(k); if (!kthread) return; #ifdef CONFIG_BLK_CGROUP WARN_ON_ONCE(kthread->blkcg_css); #endif kfree(kthread->full_name); kfree(kthread); } Side note: The possible NULL pointer looks dangerous to me. to_kthread() is dereferenced without any check on several locations. For example, kthread_create_on_cpu() looks safe. It is a kthread crated by kthread(). It will exists only when the allocation succeeded. kthread_stop() is probably safe only because it used only for the classic kthreads created by kthread(). But the API is not safe. kthread_use_mm() is probably used only by classic kthreads as well. But it is less clear to me. All this unsafe APIs looks like a ticking bomb to me. But it is beyond this patchset. > kfree(kthread); > } > Best Regards, Petr
On Wed 2021-11-10 10:12:17, Yafang Shao wrote: > On Wed, Nov 10, 2021 at 1:34 AM Kees Cook <keescook@chromium.org> wrote: > > > > On Mon, Nov 08, 2021 at 08:41:42AM +0000, Yafang Shao wrote: > > > When I was implementing a new per-cpu kthread cfs_migration, I found the > > > comm of it "cfs_migration/%u" is truncated due to the limitation of > > > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are > > > all with the same name "cfs_migration/1", which will confuse the user. This > > > issue is not critical, because we can get the corresponding CPU from the > > > task's Cpus_allowed. But for kthreads correspoinding to other hardware > > > devices, it is not easy to get the detailed device info from task comm, > > > for example, > > > > > > After this change, the full name of these truncated kthreads will be > > > displayed via /proc/[pid]/comm: > > > > > > --- a/kernel/kthread.c > > > +++ b/kernel/kthread.c > > > @@ -399,12 +414,27 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), > > > if (!IS_ERR(task)) { > > > static const struct sched_param param = { .sched_priority = 0 }; > > > char name[TASK_COMM_LEN]; > > > + va_list aq; > > > + int len; > > > > > > /* > > > * task is already visible to other tasks, so updating > > > * COMM must be protected. > > > */ > > > - vsnprintf(name, sizeof(name), namefmt, args); > > > + va_copy(aq, args); > > > + len = vsnprintf(name, sizeof(name), namefmt, aq); > > > + va_end(aq); > > > + if (len >= TASK_COMM_LEN) { > > > + struct kthread *kthread = to_kthread(task); > > > + char *full_name; > > > + > > > + full_name = kvasprintf(GFP_KERNEL, namefmt, args); > > > + if (!full_name) { > > > + kfree(create); > > > + return ERR_PTR(-ENOMEM); > > > > I'm not a fan of this out-of-line free/return. Why not just leave it > > truncated when out of memory? For example just do: > > > > struct kthread *kthread = to_kthread(task); > > > > kthread->full_name = kvasprintf(GFP_KERNEL, namefmt, args); > It is OK for me. I agree. It is perfectly fine to continue here. The truncated name is a reasonable fallback. > I will do it as you suggested and show a warning for this case. Yup. Just please, use only the truncated name in the warning. It is not important enough to add another va_copy() for this. > > > > > + } > > > + kthread->full_name = full_name; > > > + } > > > set_task_comm(task, name); > > > /* > > > * root may have changed our (kthreadd's) priority or CPU mask. Best Regards, Petr
On Fri, Nov 12, 2021 at 11:34 PM Petr Mladek <pmladek@suse.com> wrote: > > On Mon 2021-11-08 08:41:42, Yafang Shao wrote: > > When I was implementing a new per-cpu kthread cfs_migration, I found the > > comm of it "cfs_migration/%u" is truncated due to the limitation of > > TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are > > all with the same name "cfs_migration/1", which will confuse the user. This > > issue is not critical, because we can get the corresponding CPU from the > > task's Cpus_allowed. But for kthreads correspoinding to other hardware > > devices, it is not easy to get the detailed device info from task comm, > > for example, > > > > After this change, the full name of these truncated kthreads will be > > displayed via /proc/[pid]/comm: > > > > --- a/fs/proc/array.c > > +++ b/fs/proc/array.c > > @@ -102,6 +103,8 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) > > > > if (p->flags & PF_WQ_WORKER) > > wq_worker_comm(tcomm, sizeof(tcomm), p); > > Just for record. I though that this patch obsoleted wq_worker_comm() > but it did not. wq_worker_comm() returns different values > depending on the last proceed work item and has to stay. > Right. worker comm is changed dynamically, which is combined by (task_comm+worker_desc) or (task_comm-worker_desc). I planned to remove the whole worker->desc and set it dynamically to the new kthread full_name but I found it may not be a good idea. > > + else if (p->flags & PF_KTHREAD) > > + get_kthread_comm(tcomm, sizeof(tcomm), p); > > else > > __get_task_comm(tcomm, sizeof(tcomm), p); > > > > --- a/kernel/kthread.c > > +++ b/kernel/kthread.c > > @@ -121,6 +135,7 @@ void free_kthread_struct(struct task_struct *k) > > Hmm, there is the following comment: > > /* > * Can be NULL if this kthread was created by kernel_thread() > * or if kmalloc() in kthread() failed. > */ > kthread = to_kthread(k); > > And indeed, set_kthread_struct() is called only by kthread() > and init_idle(). > > For example, call_usermodehelper_exec_sync() calls kernel_thread() > but given @fn does not call set_kthread_struct(). Also init_idle() > continues even when the allocation failed. > Yes, it really can be NULL. > > > #ifdef CONFIG_BLK_CGROUP > > WARN_ON_ONCE(kthread && kthread->blkcg_css); > > #endif > > + kfree(kthread->full_name); > > Hence, we have to make sure that it is not NULL here. I suggest > something like: > Agreed. I will do it. > void free_kthread_struct(struct task_struct *k) > { > struct kthread *kthread; > > /* > * Can be NULL if this kthread was created by kernel_thread() > * or if kmalloc() in kthread() failed. > */ > kthread = to_kthread(k); > if (!kthread) > return; > > #ifdef CONFIG_BLK_CGROUP > WARN_ON_ONCE(kthread->blkcg_css); > #endif > kfree(kthread->full_name); > kfree(kthread); > } > > > Side note: The possible NULL pointer looks dangerous to > me. to_kthread() is dereferenced without any check on > several locations. > > For example, kthread_create_on_cpu() looks safe. It is a kthread > crated by kthread(). It will exists only when the allocation > succeeded. > > kthread_stop() is probably safe only because it used only for > the classic kthreads created by kthread(). But the API > is not safe. > > kthread_use_mm() is probably used only by classic kthreads as > well. But it is less clear to me. > > All this unsafe APIs looks like a ticking bomb to me. But > it is beyond this patchset. > I will analyze it in depth and try to dismantle this ticking bomb.
diff --git a/fs/proc/array.c b/fs/proc/array.c index 49be8c8ef555..860e4deafa65 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -92,6 +92,7 @@ #include <linux/string_helpers.h> #include <linux/user_namespace.h> #include <linux/fs_struct.h> +#include <linux/kthread.h> #include <asm/processor.h> #include "internal.h" @@ -102,6 +103,8 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape) if (p->flags & PF_WQ_WORKER) wq_worker_comm(tcomm, sizeof(tcomm), p); + else if (p->flags & PF_KTHREAD) + get_kthread_comm(tcomm, sizeof(tcomm), p); else __get_task_comm(tcomm, sizeof(tcomm), p); diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 346b0f269161..2a5c04494663 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -33,6 +33,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), unsigned int cpu, const char *namefmt); +void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk); void set_kthread_struct(struct task_struct *p); void kthread_set_per_cpu(struct task_struct *k, int cpu); diff --git a/kernel/kthread.c b/kernel/kthread.c index 5b37a8567168..ce8258231eea 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -60,6 +60,8 @@ struct kthread { #ifdef CONFIG_BLK_CGROUP struct cgroup_subsys_state *blkcg_css; #endif + /* To store the full name if task comm is truncated. */ + char *full_name; }; enum KTHREAD_BITS { @@ -93,6 +95,18 @@ static inline struct kthread *__to_kthread(struct task_struct *p) return kthread; } +void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk) +{ + struct kthread *kthread = to_kthread(tsk); + + if (!kthread || !kthread->full_name) { + __get_task_comm(buf, buf_size, tsk); + return; + } + + strscpy_pad(buf, kthread->full_name, buf_size); +} + void set_kthread_struct(struct task_struct *p) { struct kthread *kthread; @@ -121,6 +135,7 @@ void free_kthread_struct(struct task_struct *k) #ifdef CONFIG_BLK_CGROUP WARN_ON_ONCE(kthread && kthread->blkcg_css); #endif + kfree(kthread->full_name); kfree(kthread); } @@ -399,12 +414,27 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), if (!IS_ERR(task)) { static const struct sched_param param = { .sched_priority = 0 }; char name[TASK_COMM_LEN]; + va_list aq; + int len; /* * task is already visible to other tasks, so updating * COMM must be protected. */ - vsnprintf(name, sizeof(name), namefmt, args); + va_copy(aq, args); + len = vsnprintf(name, sizeof(name), namefmt, aq); + va_end(aq); + if (len >= TASK_COMM_LEN) { + struct kthread *kthread = to_kthread(task); + char *full_name; + + full_name = kvasprintf(GFP_KERNEL, namefmt, args); + if (!full_name) { + kfree(create); + return ERR_PTR(-ENOMEM); + } + kthread->full_name = full_name; + } set_task_comm(task, name); /* * root may have changed our (kthreadd's) priority or CPU mask.
When I was implementing a new per-cpu kthread cfs_migration, I found the comm of it "cfs_migration/%u" is truncated due to the limitation of TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19 are all with the same name "cfs_migration/1", which will confuse the user. This issue is not critical, because we can get the corresponding CPU from the task's Cpus_allowed. But for kthreads correspoinding to other hardware devices, it is not easy to get the detailed device info from task comm, for example, jbd2/nvme0n1p2- xfs-reclaim/sdf Currently there are so many truncated kthreads: rcu_tasks_kthre rcu_tasks_rude_ rcu_tasks_trace poll_mpt3sas0_s ext4-rsv-conver xfs-reclaim/sd{a, b, c, ...} xfs-blockgc/sd{a, b, c, ...} xfs-inodegc/sd{a, b, c, ...} audit_send_repl ecryptfs-kthrea vfio-irqfd-clea jbd2/nvme0n1p2- ... We can shorten these names to work around this problem, but it may be not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-' for example, it is a nice name, and it is not a good idea to shorten it. One possible way to fix this issue is extending the task comm size, but as task->comm is used in lots of places, that may cause some potential buffer overflows. Another more conservative approach is introducing a new pointer to store kthread's full name if it is truncated, which won't introduce too much overhead as it is in the non-critical path. Finally we make a dicision to use the second approach. See also the discussions in this thread: https://lore.kernel.org/lkml/20211101060419.4682-1-laoar.shao@gmail.com/ After this change, the full name of these truncated kthreads will be displayed via /proc/[pid]/comm: rcu_tasks_kthread rcu_tasks_rude_kthread rcu_tasks_trace_kthread poll_mpt3sas0_statu ext4-rsv-conversion xfs-reclaim/sdf1 xfs-blockgc/sdf1 xfs-inodegc/sdf1 audit_send_reply ecryptfs-kthread vfio-irqfd-cleanup jbd2/nvme0n1p2-8 Suggested-by: Petr Mladek <pmladek@suse.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Petr Mladek <pmladek@suse.com> --- TODO: will cleanup worker comm in the next step. --- fs/proc/array.c | 3 +++ include/linux/kthread.h | 1 + kernel/kthread.c | 32 +++++++++++++++++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-)