mbox series

[v5,0/9] Improve the copy of task comm

Message ID 20240804075619.20804-1-laoar.shao@gmail.com (mailing list archive)
Headers show
Series Improve the copy of task comm | expand

Message

Yafang Shao Aug. 4, 2024, 7:56 a.m. UTC
Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the
length of task comm. Changes in the task comm could result in a destination
string that is overflow. Therefore, we should explicitly ensure the destination
string is always NUL-terminated, regardless of the task comm. This approach
will facilitate future extensions to the task comm.

As suggested by Linus [0], we can identify all relevant code with the
following git grep command:

  git grep 'memcpy.*->comm\>'
  git grep 'kstrdup.*->comm\>'
  git grep 'strncpy.*->comm\>'
  git grep 'strcpy.*->comm\>'

PATCH #2~#4:   memcpy
PATCH #5~#6:   kstrdup
PATCH #7:      strncpy
PATCH #8~#9:   strcpy

There is a BUILD_BUG_ON() inside get_task_comm(), so when you use
get_task_comm(), it implies that the BUILD_BUG_ON() is necessary. However,
we don't want to impose this restriction on code where the length can be
changed, so we use __get_task_comm(), rather than the get_task_comm().

One use case of get_task_comm() is in code that has already exposed the
length to userspace. In such cases, we specifically add the BUILD_BUG_ON()
to prevent developers from changing it. For more information, see
commit 95af469c4f60 ("fs/binfmt_elf: replace open-coded string copy with
get_task_comm").

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0]

Changes:
v4->v5:
- Drop changes in the mm/kmemleak.c as it was fixed by
  commit 0b84780134fb ("mm/kmemleak: replace strncpy() with strscpy()")
- Drop changes in kernel/tsacct.c as it was fixed by
  commmit 0fe2356434e ("tsacct: replace strncpy() with strscpy()")

v3->v4: https://lore.kernel.org/linux-mm/20240729023719.1933-1-laoar.shao@gmail.com/
- Rename __kstrndup() to __kmemdup_nul() and define it inside mm/util.c
  (Matthew)
- Remove unused local varaible (Simon)

v2->v3: https://lore.kernel.org/all/20240621022959.9124-1-laoar.shao@gmail.com/
- Deduplicate code around kstrdup (Andrew)
- Add commit log for dropping task_lock (Catalin)

v1->v2: https://lore.kernel.org/bpf/20240613023044.45873-1-laoar.shao@gmail.com/
- Add comment for dropping task_lock() in __get_task_comm() (Alexei)
- Drop changes in trace event (Steven)
- Fix comment on task comm (Matus)

v1: https://lore.kernel.org/all/20240602023754.25443-1-laoar.shao@gmail.com/

Yafang Shao (9):
  fs/exec: Drop task_lock() inside __get_task_comm()
  auditsc: Replace memcpy() with __get_task_comm()
  security: Replace memcpy() with __get_task_comm()
  bpftool: Ensure task comm is always NUL-terminated
  mm/util: Fix possible race condition in kstrdup()
  mm/util: Deduplicate code in {kstrdup,kstrndup,kmemdup_nul}
  tracing: Replace strncpy() with __get_task_comm()
  net: Replace strcpy() with __get_task_comm()
  drm: Replace strcpy() with __get_task_comm()

 drivers/gpu/drm/drm_framebuffer.c     |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c |  2 +-
 fs/exec.c                             | 10 ++++-
 include/linux/sched.h                 |  4 +-
 kernel/auditsc.c                      |  6 +--
 kernel/trace/trace.c                  |  2 +-
 kernel/trace/trace_events_hist.c      |  2 +-
 mm/util.c                             | 61 ++++++++++++---------------
 net/ipv6/ndisc.c                      |  2 +-
 security/lsm_audit.c                  |  4 +-
 security/selinux/selinuxfs.c          |  2 +-
 tools/bpf/bpftool/pids.c              |  2 +
 12 files changed, 49 insertions(+), 50 deletions(-)

Comments

Linus Torvalds Aug. 5, 2024, 9:28 p.m. UTC | #1
On Sun, 4 Aug 2024 at 00:56, Yafang Shao <laoar.shao@gmail.com> wrote:
>
> There is a BUILD_BUG_ON() inside get_task_comm(), so when you use
> get_task_comm(), it implies that the BUILD_BUG_ON() is necessary.

Let's just remove that silly BUILD_BUG_ON(). I don't think it adds any
value, and honestly, it really only makes this patch-series uglier
when reasonable uses suddenly pointlessly need that double-underscore
version..

So let's aim at

 (a) documenting that the last byte in 'tsk->comm{}' is always
guaranteed to be NUL, so that the thing can always just be treated as
a string. Yes, it may change under us, but as long as we know there is
always a stable NUL there *somewhere*, we really really don't care.

 (b) removing __get_task_comm() entirely, and replacing it with a
plain 'str*cpy*()' functions

The whole (a) thing is a requirement anyway, since the *bulk* of
tsk->comm really just seems to be various '%s' things in printk
strings etc.

And once we just admit that we can use the string functions, all the
get_task_comm() stuff is just unnecessary.

And yes, some people may want to use the strscpy_pad() function
because they want to fill the whole destination buffer. But that's
entirely about the *destination* use, not the tsk->comm[] source, so
it has nothing to do with any kind of "get_task_comm()" logic, and it
was always wrong to care about the buffer sizes magically matching.

Hmm?

                Linus
Yafang Shao Aug. 6, 2024, 3 a.m. UTC | #2
On Tue, Aug 6, 2024 at 5:28 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, 4 Aug 2024 at 00:56, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > There is a BUILD_BUG_ON() inside get_task_comm(), so when you use
> > get_task_comm(), it implies that the BUILD_BUG_ON() is necessary.
>
> Let's just remove that silly BUILD_BUG_ON(). I don't think it adds any
> value, and honestly, it really only makes this patch-series uglier
> when reasonable uses suddenly pointlessly need that double-underscore
> version..
>
> So let's aim at
>
>  (a) documenting that the last byte in 'tsk->comm{}' is always
> guaranteed to be NUL, so that the thing can always just be treated as
> a string. Yes, it may change under us, but as long as we know there is
> always a stable NUL there *somewhere*, we really really don't care.
>
>  (b) removing __get_task_comm() entirely, and replacing it with a
> plain 'str*cpy*()' functions
>
> The whole (a) thing is a requirement anyway, since the *bulk* of
> tsk->comm really just seems to be various '%s' things in printk
> strings etc.
>
> And once we just admit that we can use the string functions, all the
> get_task_comm() stuff is just unnecessary.
>
> And yes, some people may want to use the strscpy_pad() function
> because they want to fill the whole destination buffer. But that's
> entirely about the *destination* use, not the tsk->comm[] source, so
> it has nothing to do with any kind of "get_task_comm()" logic, and it
> was always wrong to care about the buffer sizes magically matching.
>
> Hmm?

One concern about removing the BUILD_BUG_ON() is that if we extend
TASK_COMM_LEN to a larger size, such as 24, the caller with a
hardcoded 16-byte buffer may overflow. This could be an issue with
code in include/linux/elfcore.h and include/linux/elfcore-compat.h,
posing a risk. However, I believe it is the caller's responsibility to
explicitly add a null terminator if it uses a fixed buffer that cannot
be changed. Therefore, the following code change is necessary:

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 5ae8045f4df4..e4b0b7cf0c1f 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1579,6 +1579,7 @@ static int fill_psinfo(struct elf_prpsinfo
*psinfo, struct task_struct *p,
        SET_GID(psinfo->pr_gid, from_kgid_munged(cred->user_ns, cred->gid));
        rcu_read_unlock();
        get_task_comm(psinfo->pr_fname, p);
+       psinfo->pr_fname[15] = '\0';

        return 0;
 }

However, it is currently safe to remove the BUILD_BUG_ON() since
TASK_COMM_LEN is still 16.

--
Regards
Yafang
Linus Torvalds Aug. 6, 2024, 3:09 a.m. UTC | #3
On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@gmail.com> wrote:
>
> One concern about removing the BUILD_BUG_ON() is that if we extend
> TASK_COMM_LEN to a larger size, such as 24, the caller with a
> hardcoded 16-byte buffer may overflow.

No, not at all. Because get_task_comm() - and the replacements - would
never use TASK_COMM_LEN.

They'd use the size of the *destination*. That's what the code already does:

  #define get_task_comm(buf, tsk) ({                      \
  ...
        __get_task_comm(buf, sizeof(buf), tsk);         \

note how it uses "sizeof(buf)".

Now, it might be a good idea to also verify that 'buf' is an actual
array, and that this code doesn't do some silly "sizeof(ptr)" thing.

We do have a helper for that, so we could do something like

   #define get_task_comm(buf, tsk) \
        strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)

as a helper macro for this all.

(Although I'm not convinced we generally want the "_pad()" version,
but whatever).

                    Linus
Yafang Shao Aug. 6, 2024, 3:50 a.m. UTC | #4
On Tue, Aug 6, 2024 at 11:10 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 5 Aug 2024 at 20:01, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > One concern about removing the BUILD_BUG_ON() is that if we extend
> > TASK_COMM_LEN to a larger size, such as 24, the caller with a
> > hardcoded 16-byte buffer may overflow.
>
> No, not at all. Because get_task_comm() - and the replacements - would
> never use TASK_COMM_LEN.
>
> They'd use the size of the *destination*. That's what the code already does:
>
>   #define get_task_comm(buf, tsk) ({                      \
>   ...
>         __get_task_comm(buf, sizeof(buf), tsk);         \
>
> note how it uses "sizeof(buf)".
>
> Now, it might be a good idea to also verify that 'buf' is an actual
> array, and that this code doesn't do some silly "sizeof(ptr)" thing.
>
> We do have a helper for that, so we could do something like
>
>    #define get_task_comm(buf, tsk) \
>         strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm)
>
> as a helper macro for this all.
>
> (Although I'm not convinced we generally want the "_pad()" version,
> but whatever).
>

Will do it.
Thanks for your explanation.