[v8,1/8] Get rid of __get_task_comm()

Message ID	20240828030321.20688-2-laoar.shao@gmail.com (mailing list archive)
State	Superseded
Headers	show Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A011C8BFC; Wed, 28 Aug 2024 03:03:45 +0000 (UTC) From: Yafang Shao <laoar.shao@gmail.com> To: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, alx@kernel.org, justinstitt@google.com, ebiederm@xmission.com, alexei.starovoitov@gmail.com, rostedt@goodmis.org, catalin.marinas@arm.com, penguin-kernel@i-love.sakura.ne.jp, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, audit@vger.kernel.org, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, Yafang Shao <laoar.shao@gmail.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>, Kees Cook <keescook@chromium.org>, Matus Jokay <matus.jokay@stuba.sk>, "Serge E. Hallyn" <serge@hallyn.com> Subject: [PATCH v8 1/8] Get rid of __get_task_comm() Date: Wed, 28 Aug 2024 11:03:14 +0800 Message-Id: <20240828030321.20688-2-laoar.shao@gmail.com> In-Reply-To: <20240828030321.20688-1-laoar.shao@gmail.com> References: <20240828030321.20688-1-laoar.shao@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	Improve the copy of task comm \| expand [v8,0/8] Improve the copy of task comm [v8,1/8] Get rid of __get_task_comm() [v8,2/8] auditsc: Replace memcpy() with strscpy() [v8,3/8] security: Replace memcpy() with get_task_comm() [v8,4/8] bpftool: Ensure task comm is always NUL-terminated [v8,5/8] mm/util: Fix possible race condition in kstrdup() [v8,6/8] mm/util: Deduplicate code in {kstrdup,kstrndup,kmemdup_nul} [v8,7/8] net: Replace strcpy() with strscpy() [v8,8/8] drm: Replace strcpy() with strscpy()

Yafang Shao Aug. 28, 2024, 3:03 a.m. UTC

We want to eliminate the use of __get_task_comm() for the following
reasons:

- The task_lock() is unnecessary
  Quoted from Linus [0]:
  : Since user space can randomly change their names anyway, using locking
  : was always wrong for readers (for writers it probably does make sense
  : to have some lock - although practically speaking nobody cares there
  : either, but at least for a writer some kind of race could have
  : long-term mixed results

- The BUILD_BUG_ON() doesn't add any value
  The only requirement is to ensure that the destination buffer is a valid
  array.

- Zeroing is not necessary in current use cases
  To avoid confusion, we should remove it. Moreover, not zeroing could
  potentially make it easier to uncover bugs. If the caller needs a
  zero-padded task name, it should be explicitly handled at the call site.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
Suggested-by: Alejandro Colomar <alx@kernel.org>
Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Matus Jokay <matus.jokay@stuba.sk>
Cc: Alejandro Colomar <alx@kernel.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
---
 fs/exec.c             | 10 ----------
 fs/proc/array.c       |  2 +-
 include/linux/sched.h | 32 ++++++++++++++++++++++++++------
 kernel/kthread.c      |  2 +-
 4 files changed, 28 insertions(+), 18 deletions(-)

Alejandro Colomar Aug. 28, 2024, 10:15 a.m. UTC | #1

Hi Yafang,

On Wed, Aug 28, 2024 at 11:03:14AM GMT, Yafang Shao wrote:
> We want to eliminate the use of __get_task_comm() for the following
> reasons:
> 
> - The task_lock() is unnecessary
>   Quoted from Linus [0]:
>   : Since user space can randomly change their names anyway, using locking
>   : was always wrong for readers (for writers it probably does make sense
>   : to have some lock - although practically speaking nobody cares there
>   : either, but at least for a writer some kind of race could have
>   : long-term mixed results
> 
> - The BUILD_BUG_ON() doesn't add any value
>   The only requirement is to ensure that the destination buffer is a valid
>   array.
> 
> - Zeroing is not necessary in current use cases
>   To avoid confusion, we should remove it. Moreover, not zeroing could
>   potentially make it easier to uncover bugs. If the caller needs a
>   zero-padded task name, it should be explicitly handled at the call site.
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
> Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
> Suggested-by: Alejandro Colomar <alx@kernel.org>
> Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Matus Jokay <matus.jokay@stuba.sk>
> Cc: Alejandro Colomar <alx@kernel.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> ---
>  fs/exec.c             | 10 ----------
>  fs/proc/array.c       |  2 +-
>  include/linux/sched.h | 32 ++++++++++++++++++++++++++------
>  kernel/kthread.c      |  2 +-
>  4 files changed, 28 insertions(+), 18 deletions(-)
> 

[...]

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f8d150343d42..c40b95a79d80 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h

[...]

> @@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
>  	__set_task_comm(tsk, from, false);
>  }
>  
> -extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
> +/*

[...]

> + * - ARRAY_SIZE() can help ensure that @buf is indeed an array.
> + */
>  #define get_task_comm(buf, tsk) ({			\
> -	BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);	\
> -	__get_task_comm(buf, sizeof(buf), tsk);		\
> +	strscpy(buf, (tsk)->comm, ARRAY_SIZE(buf));	\

I see that there's a two-argument macro

	#define strscpy(dst, src)	sized_strscpy(dst, src, sizeof(dst))

which is used in patch 2/8

	diff --git a/kernel/auditsc.c b/kernel/auditsc.c
	index 6f0d6fb6523f..e4ef5e57dde9 100644
	--- a/kernel/auditsc.c
	+++ b/kernel/auditsc.c
	@@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
		context->target_uid = task_uid(t);
		context->target_sessionid = audit_get_sessionid(t);
		security_task_getsecid_obj(t, &context->target_sid);
	-       memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
	+       strscpy(context->target_comm, t->comm);
	 }

	 /**

I propose modifying that macro to use ARRAY_SIZE() instead of sizeof(),
and then calling that macro here too.  That would not only make sure
that this is an array, but make sure that every call to that macro is an
array.  An if there are macros for similar string functions that reduce
the argument with a usual sizeof(), the same thing could be done to
those too.

Have a lovley day!
Alex

> +	buf;						\
>  })
>  
>  #ifdef CONFIG_SMP
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index f7be976ff88a..7d001d033cf9 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -101,7 +101,7 @@ void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk)
>  	struct kthread *kthread = to_kthread(tsk);
>  
>  	if (!kthread || !kthread->full_name) {
> -		__get_task_comm(buf, buf_size, tsk);
> +		strscpy(buf, tsk->comm, buf_size);
>  		return;
>  	}
>  
> -- 
> 2.43.5
>

Yafang Shao Aug. 28, 2024, 12:57 p.m. UTC | #2

On Wed, Aug 28, 2024 at 6:15 PM Alejandro Colomar <alx@kernel.org> wrote:
>
> Hi Yafang,
>
> On Wed, Aug 28, 2024 at 11:03:14AM GMT, Yafang Shao wrote:
> > We want to eliminate the use of __get_task_comm() for the following
> > reasons:
> >
> > - The task_lock() is unnecessary
> >   Quoted from Linus [0]:
> >   : Since user space can randomly change their names anyway, using locking
> >   : was always wrong for readers (for writers it probably does make sense
> >   : to have some lock - although practically speaking nobody cares there
> >   : either, but at least for a writer some kind of race could have
> >   : long-term mixed results
> >
> > - The BUILD_BUG_ON() doesn't add any value
> >   The only requirement is to ensure that the destination buffer is a valid
> >   array.
> >
> > - Zeroing is not necessary in current use cases
> >   To avoid confusion, we should remove it. Moreover, not zeroing could
> >   potentially make it easier to uncover bugs. If the caller needs a
> >   zero-padded task name, it should be explicitly handled at the call site.
> >
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
> > Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
> > Suggested-by: Alejandro Colomar <alx@kernel.org>
> > Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Cc: Matus Jokay <matus.jokay@stuba.sk>
> > Cc: Alejandro Colomar <alx@kernel.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > ---
> >  fs/exec.c             | 10 ----------
> >  fs/proc/array.c       |  2 +-
> >  include/linux/sched.h | 32 ++++++++++++++++++++++++++------
> >  kernel/kthread.c      |  2 +-
> >  4 files changed, 28 insertions(+), 18 deletions(-)
> >
>
> [...]
>
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index f8d150343d42..c40b95a79d80 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
>
> [...]
>
> > @@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
> >       __set_task_comm(tsk, from, false);
> >  }
> >
> > -extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
> > +/*
>
> [...]
>
> > + * - ARRAY_SIZE() can help ensure that @buf is indeed an array.
> > + */
> >  #define get_task_comm(buf, tsk) ({                   \
> > -     BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);     \
> > -     __get_task_comm(buf, sizeof(buf), tsk);         \
> > +     strscpy(buf, (tsk)->comm, ARRAY_SIZE(buf));     \
>
> I see that there's a two-argument macro
>
>         #define strscpy(dst, src)       sized_strscpy(dst, src, sizeof(dst))

This macro is defined in arch/um/include/shared/user.h, which is not
used outside
the arch/um/ directory.
This marco should be addressed.

>
> which is used in patch 2/8

The strscpy() function used in this series is defined in
include/linux/string.h, which already checks whether the input is an
array:

#define __strscpy0(dst, src, ...)       \
        sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
#define __strscpy1(dst, src, size)      sized_strscpy(dst, src, size)

#define __strscpy_pad0(dst, src, ...)   \
        sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
#define __strscpy_pad1(dst, src, size)  sized_strscpy_pad(dst, src, size)


>
>         diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>         index 6f0d6fb6523f..e4ef5e57dde9 100644
>         --- a/kernel/auditsc.c
>         +++ b/kernel/auditsc.c
>         @@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
>                 context->target_uid = task_uid(t);
>                 context->target_sessionid = audit_get_sessionid(t);
>                 security_task_getsecid_obj(t, &context->target_sid);
>         -       memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
>         +       strscpy(context->target_comm, t->comm);
>          }
>
>          /**
>
> I propose modifying that macro to use ARRAY_SIZE() instead of sizeof(),
> and then calling that macro here too.  That would not only make sure
> that this is an array, but make sure that every call to that macro is an
> array.  An if there are macros for similar string functions that reduce
> the argument with a usual sizeof(), the same thing could be done to
> those too.

I have no preference between using ARRAY_SIZE() or sizeof(dst) +
__must_be_array(dst). However, for consistency, it might be better to
use ARRAY_SIZE().


--
Regards

Yafang

Alejandro Colomar Aug. 28, 2024, 12:58 p.m. UTC | #3

On Wed, Aug 28, 2024 at 12:15:40PM GMT, Alejandro Colomar wrote:
> Hi Yafang,
> 
> On Wed, Aug 28, 2024 at 11:03:14AM GMT, Yafang Shao wrote:
> > We want to eliminate the use of __get_task_comm() for the following
> > reasons:
> > 
> > - The task_lock() is unnecessary
> >   Quoted from Linus [0]:
> >   : Since user space can randomly change their names anyway, using locking
> >   : was always wrong for readers (for writers it probably does make sense
> >   : to have some lock - although practically speaking nobody cares there
> >   : either, but at least for a writer some kind of race could have
> >   : long-term mixed results
> > 
> > - The BUILD_BUG_ON() doesn't add any value
> >   The only requirement is to ensure that the destination buffer is a valid
> >   array.
> > 
> > - Zeroing is not necessary in current use cases
> >   To avoid confusion, we should remove it. Moreover, not zeroing could
> >   potentially make it easier to uncover bugs. If the caller needs a
> >   zero-padded task name, it should be explicitly handled at the call site.
> > 
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
> > Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
> > Suggested-by: Alejandro Colomar <alx@kernel.org>
> > Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Cc: Matus Jokay <matus.jokay@stuba.sk>
> > Cc: Alejandro Colomar <alx@kernel.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > ---
> >  fs/exec.c             | 10 ----------
> >  fs/proc/array.c       |  2 +-
> >  include/linux/sched.h | 32 ++++++++++++++++++++++++++------
> >  kernel/kthread.c      |  2 +-
> >  4 files changed, 28 insertions(+), 18 deletions(-)
> > 
> 
> [...]
> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index f8d150343d42..c40b95a79d80 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> 
> [...]
> 
> > @@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
> >  	__set_task_comm(tsk, from, false);
> >  }
> >  
> > -extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
> > +/*
> 
> [...]
> 
> > + * - ARRAY_SIZE() can help ensure that @buf is indeed an array.
> > + */
> >  #define get_task_comm(buf, tsk) ({			\
> > -	BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);	\
> > -	__get_task_comm(buf, sizeof(buf), tsk);		\
> > +	strscpy(buf, (tsk)->comm, ARRAY_SIZE(buf));	\
> 
> I see that there's a two-argument macro
> 
> 	#define strscpy(dst, src)	sized_strscpy(dst, src, sizeof(dst))
> 
> which is used in patch 2/8
> 
> 	diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> 	index 6f0d6fb6523f..e4ef5e57dde9 100644
> 	--- a/kernel/auditsc.c
> 	+++ b/kernel/auditsc.c
> 	@@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
> 		context->target_uid = task_uid(t);
> 		context->target_sessionid = audit_get_sessionid(t);
> 		security_task_getsecid_obj(t, &context->target_sid);
> 	-       memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
> 	+       strscpy(context->target_comm, t->comm);
> 	 }
> 
> 	 /**

Ahh, the actual generic definition is in <include/linux/string.h>.
You could do

	diff --git i/include/linux/string.h w/include/linux/string.h
	index 9edace076ddb..060504719904 100644
	--- i/include/linux/string.h
	+++ w/include/linux/string.h
	@@ -76,11 +76,11 @@ ssize_t sized_strscpy(char *, const char *, size_t);
	  * known size.
	  */
	 #define __strscpy0(dst, src, ...)      \
	-       sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
	+       sized_strscpy(dst, src, ARRAY_SIZE(dst))
	 #define __strscpy1(dst, src, size)     sized_strscpy(dst, src, size)
	 
	 #define __strscpy_pad0(dst, src, ...)  \
	-       sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
	+       sized_strscpy_pad(dst, src, ARRAY_SIZE(dst))
	 #define __strscpy_pad1(dst, src, size) sized_strscpy_pad(dst, src, size)
	 
	 /**

> 
> I propose modifying that macro to use ARRAY_SIZE() instead of sizeof(),
> and then calling that macro here too.  That would not only make sure
> that this is an array, but make sure that every call to that macro is an
> array.  An if there are macros for similar string functions that reduce
> the argument with a usual sizeof(), the same thing could be done to
> those too.
> 
> Have a lovley day!
> Alex
> 
> > +	buf;						\
> >  })
> >  
> >  #ifdef CONFIG_SMP
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index f7be976ff88a..7d001d033cf9 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -101,7 +101,7 @@ void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> >  	struct kthread *kthread = to_kthread(tsk);
> >  
> >  	if (!kthread || !kthread->full_name) {
> > -		__get_task_comm(buf, buf_size, tsk);
> > +		strscpy(buf, tsk->comm, buf_size);
> >  		return;
> >  	}
> >  
> > -- 
> > 2.43.5
> > 
> 
> -- 
> <https://www.alejandro-colomar.es/>

Yafang Shao Aug. 28, 2024, 1:40 p.m. UTC | #4

On Wed, Aug 28, 2024 at 8:58 PM Alejandro Colomar <alx@kernel.org> wrote:
>
> On Wed, Aug 28, 2024 at 12:15:40PM GMT, Alejandro Colomar wrote:
> > Hi Yafang,
> >
> > On Wed, Aug 28, 2024 at 11:03:14AM GMT, Yafang Shao wrote:
> > > We want to eliminate the use of __get_task_comm() for the following
> > > reasons:
> > >
> > > - The task_lock() is unnecessary
> > >   Quoted from Linus [0]:
> > >   : Since user space can randomly change their names anyway, using locking
> > >   : was always wrong for readers (for writers it probably does make sense
> > >   : to have some lock - although practically speaking nobody cares there
> > >   : either, but at least for a writer some kind of race could have
> > >   : long-term mixed results
> > >
> > > - The BUILD_BUG_ON() doesn't add any value
> > >   The only requirement is to ensure that the destination buffer is a valid
> > >   array.
> > >
> > > - Zeroing is not necessary in current use cases
> > >   To avoid confusion, we should remove it. Moreover, not zeroing could
> > >   potentially make it easier to uncover bugs. If the caller needs a
> > >   zero-padded task name, it should be explicitly handled at the call site.
> > >
> > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
> > > Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
> > > Suggested-by: Alejandro Colomar <alx@kernel.org>
> > > Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Eric Biederman <ebiederm@xmission.com>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Cc: Matus Jokay <matus.jokay@stuba.sk>
> > > Cc: Alejandro Colomar <alx@kernel.org>
> > > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > > ---
> > >  fs/exec.c             | 10 ----------
> > >  fs/proc/array.c       |  2 +-
> > >  include/linux/sched.h | 32 ++++++++++++++++++++++++++------
> > >  kernel/kthread.c      |  2 +-
> > >  4 files changed, 28 insertions(+), 18 deletions(-)
> > >
> >
> > [...]
> >
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index f8d150343d42..c40b95a79d80 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> >
> > [...]
> >
> > > @@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
> > >     __set_task_comm(tsk, from, false);
> > >  }
> > >
> > > -extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
> > > +/*
> >
> > [...]
> >
> > > + * - ARRAY_SIZE() can help ensure that @buf is indeed an array.
> > > + */
> > >  #define get_task_comm(buf, tsk) ({                 \
> > > -   BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);     \
> > > -   __get_task_comm(buf, sizeof(buf), tsk);         \
> > > +   strscpy(buf, (tsk)->comm, ARRAY_SIZE(buf));     \
> >
> > I see that there's a two-argument macro
> >
> >       #define strscpy(dst, src)       sized_strscpy(dst, src, sizeof(dst))
> >
> > which is used in patch 2/8
> >
> >       diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> >       index 6f0d6fb6523f..e4ef5e57dde9 100644
> >       --- a/kernel/auditsc.c
> >       +++ b/kernel/auditsc.c
> >       @@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
> >               context->target_uid = task_uid(t);
> >               context->target_sessionid = audit_get_sessionid(t);
> >               security_task_getsecid_obj(t, &context->target_sid);
> >       -       memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
> >       +       strscpy(context->target_comm, t->comm);
> >        }
> >
> >        /**
>
> Ahh, the actual generic definition is in <include/linux/string.h>.
> You could do
>
>         diff --git i/include/linux/string.h w/include/linux/string.h
>         index 9edace076ddb..060504719904 100644
>         --- i/include/linux/string.h
>         +++ w/include/linux/string.h
>         @@ -76,11 +76,11 @@ ssize_t sized_strscpy(char *, const char *, size_t);
>           * known size.
>           */
>          #define __strscpy0(dst, src, ...)      \
>         -       sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
>         +       sized_strscpy(dst, src, ARRAY_SIZE(dst))
>          #define __strscpy1(dst, src, size)     sized_strscpy(dst, src, size)
>
>          #define __strscpy_pad0(dst, src, ...)  \
>         -       sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
>         +       sized_strscpy_pad(dst, src, ARRAY_SIZE(dst))
>          #define __strscpy_pad1(dst, src, size) sized_strscpy_pad(dst, src, size)
>
>          /**

Thank you for your suggestion. How does the following commit log look
to you? Does it meet your expectations?

    string: Use ARRAY_SIZE() in strscpy()

    We can use ARRAY_SIZE() instead to clarify that they are regular characters.

    Co-developed-by: Alejandro Colomar <alx@kernel.org>
    Signed-off-by: Alejandro Colomar <alx@kernel.org>
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
index bbab79c0c074..07216996e3a9 100644
--- a/arch/um/include/shared/user.h
+++ b/arch/um/include/shared/user.h
@@ -14,7 +14,7 @@
  * copying too much infrastructure for my taste, so userspace files
  * get less checking than kernel files.
  */
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))

 /* This is to get size_t and NULL */
 #ifndef __UM_HOST__
@@ -60,7 +60,7 @@ static inline void print_hex_dump(const char *level,
const char *prefix_str,
 extern int in_aton(char *str);
 extern size_t strlcat(char *, const char *, size_t);
 extern size_t sized_strscpy(char *, const char *, size_t);
-#define strscpy(dst, src)      sized_strscpy(dst, src, sizeof(dst))
+#define strscpy(dst, src)      sized_strscpy(dst, src, ARRAY_SIZE(dst))

 /* Copied from linux/compiler-gcc.h since we can't include it directly */
 #define barrier() __asm__ __volatile__("": : :"memory")
diff --git a/include/linux/string.h b/include/linux/string.h
index 9edace076ddb..060504719904 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h

@@ -76,11 +76,11 @@ ssize_t sized_strscpy(char *, const char *, size_t);
  * known size.
  */
 #define __strscpy0(dst, src, ...)      \
-       sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
+       sized_strscpy(dst, src, ARRAY_SIZE(dst))
 #define __strscpy1(dst, src, size)     sized_strscpy(dst, src, size)

 #define __strscpy_pad0(dst, src, ...)  \
-       sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
+       sized_strscpy_pad(dst, src, ARRAY_SIZE(dst))
 #define __strscpy_pad1(dst, src, size) sized_strscpy_pad(dst, src, size)

 /**

--
Regards

Yafang

Kees Cook Aug. 28, 2024, 1:48 p.m. UTC | #5

On August 28, 2024 6:40:35 AM PDT, Yafang Shao <laoar.shao@gmail.com> wrote:
>On Wed, Aug 28, 2024 at 8:58 PM Alejandro Colomar <alx@kernel.org> wrote:
>>
>> On Wed, Aug 28, 2024 at 12:15:40PM GMT, Alejandro Colomar wrote:
>> > Hi Yafang,
>> >
>> > On Wed, Aug 28, 2024 at 11:03:14AM GMT, Yafang Shao wrote:
>> > > We want to eliminate the use of __get_task_comm() for the following
>> > > reasons:
>> > >
>> > > - The task_lock() is unnecessary
>> > >   Quoted from Linus [0]:
>> > >   : Since user space can randomly change their names anyway, using locking
>> > >   : was always wrong for readers (for writers it probably does make sense
>> > >   : to have some lock - although practically speaking nobody cares there
>> > >   : either, but at least for a writer some kind of race could have
>> > >   : long-term mixed results
>> > >
>> > > - The BUILD_BUG_ON() doesn't add any value
>> > >   The only requirement is to ensure that the destination buffer is a valid
>> > >   array.
>> > >
>> > > - Zeroing is not necessary in current use cases
>> > >   To avoid confusion, we should remove it. Moreover, not zeroing could
>> > >   potentially make it easier to uncover bugs. If the caller needs a
>> > >   zero-padded task name, it should be explicitly handled at the call site.
>> > >
>> > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> > > Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
>> > > Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
>> > > Suggested-by: Alejandro Colomar <alx@kernel.org>
>> > > Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
>> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>> > > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> > > Cc: Christian Brauner <brauner@kernel.org>
>> > > Cc: Jan Kara <jack@suse.cz>
>> > > Cc: Eric Biederman <ebiederm@xmission.com>
>> > > Cc: Kees Cook <keescook@chromium.org>
>> > > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> > > Cc: Matus Jokay <matus.jokay@stuba.sk>
>> > > Cc: Alejandro Colomar <alx@kernel.org>
>> > > Cc: "Serge E. Hallyn" <serge@hallyn.com>
>> > > ---
>> > >  fs/exec.c             | 10 ----------
>> > >  fs/proc/array.c       |  2 +-
>> > >  include/linux/sched.h | 32 ++++++++++++++++++++++++++------
>> > >  kernel/kthread.c      |  2 +-
>> > >  4 files changed, 28 insertions(+), 18 deletions(-)
>> > >
>> >
>> > [...]
>> >
>> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
>> > > index f8d150343d42..c40b95a79d80 100644
>> > > --- a/include/linux/sched.h
>> > > +++ b/include/linux/sched.h
>> >
>> > [...]
>> >
>> > > @@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
>> > >     __set_task_comm(tsk, from, false);
>> > >  }
>> > >
>> > > -extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
>> > > +/*
>> >
>> > [...]
>> >
>> > > + * - ARRAY_SIZE() can help ensure that @buf is indeed an array.
>> > > + */
>> > >  #define get_task_comm(buf, tsk) ({                 \
>> > > -   BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);     \
>> > > -   __get_task_comm(buf, sizeof(buf), tsk);         \
>> > > +   strscpy(buf, (tsk)->comm, ARRAY_SIZE(buf));     \
>> >
>> > I see that there's a two-argument macro
>> >
>> >       #define strscpy(dst, src)       sized_strscpy(dst, src, sizeof(dst))
>> >
>> > which is used in patch 2/8
>> >
>> >       diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> >       index 6f0d6fb6523f..e4ef5e57dde9 100644
>> >       --- a/kernel/auditsc.c
>> >       +++ b/kernel/auditsc.c
>> >       @@ -2730,7 +2730,7 @@ void __audit_ptrace(struct task_struct *t)
>> >               context->target_uid = task_uid(t);
>> >               context->target_sessionid = audit_get_sessionid(t);
>> >               security_task_getsecid_obj(t, &context->target_sid);
>> >       -       memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
>> >       +       strscpy(context->target_comm, t->comm);
>> >        }
>> >
>> >        /**
>>
>> Ahh, the actual generic definition is in <include/linux/string.h>.
>> You could do
>>
>>         diff --git i/include/linux/string.h w/include/linux/string.h
>>         index 9edace076ddb..060504719904 100644
>>         --- i/include/linux/string.h
>>         +++ w/include/linux/string.h
>>         @@ -76,11 +76,11 @@ ssize_t sized_strscpy(char *, const char *, size_t);
>>           * known size.
>>           */
>>          #define __strscpy0(dst, src, ...)      \
>>         -       sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
>>         +       sized_strscpy(dst, src, ARRAY_SIZE(dst))
>>          #define __strscpy1(dst, src, size)     sized_strscpy(dst, src, size)
>>
>>          #define __strscpy_pad0(dst, src, ...)  \
>>         -       sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
>>         +       sized_strscpy_pad(dst, src, ARRAY_SIZE(dst))
>>          #define __strscpy_pad1(dst, src, size) sized_strscpy_pad(dst, src, size)
>>
>>          /**
>
>Thank you for your suggestion. How does the following commit log look
>to you? Does it meet your expectations?
>
>    string: Use ARRAY_SIZE() in strscpy()
>
>    We can use ARRAY_SIZE() instead to clarify that they are regular characters.
>
>    Co-developed-by: Alejandro Colomar <alx@kernel.org>
>    Signed-off-by: Alejandro Colomar <alx@kernel.org>
>    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
>diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
>index bbab79c0c074..07216996e3a9 100644
>--- a/arch/um/include/shared/user.h
>+++ b/arch/um/include/shared/user.h
>@@ -14,7 +14,7 @@
>  * copying too much infrastructure for my taste, so userspace files
>  * get less checking than kernel files.
>  */
>-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
>+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
>
> /* This is to get size_t and NULL */
> #ifndef __UM_HOST__
>@@ -60,7 +60,7 @@ static inline void print_hex_dump(const char *level,
>const char *prefix_str,
> extern int in_aton(char *str);
> extern size_t strlcat(char *, const char *, size_t);
> extern size_t sized_strscpy(char *, const char *, size_t);
>-#define strscpy(dst, src)      sized_strscpy(dst, src, sizeof(dst))
>+#define strscpy(dst, src)      sized_strscpy(dst, src, ARRAY_SIZE(dst))

Uh, but why? strscpy() copies bytes, not array elements. Using sizeof() is already correct and using ARRAY_SIZE() could lead to unexpectedly small counts (in admittedly odd situations).

What is the problem you're trying to solve here?

-Kees

Kees Cook Aug. 28, 2024, 2:03 p.m. UTC | #6

On August 27, 2024 8:03:14 PM PDT, Yafang Shao <laoar.shao@gmail.com> wrote:
>We want to eliminate the use of __get_task_comm() for the following
>reasons:
>
>- The task_lock() is unnecessary
>  Quoted from Linus [0]:
>  : Since user space can randomly change their names anyway, using locking
>  : was always wrong for readers (for writers it probably does make sense
>  : to have some lock - although practically speaking nobody cares there
>  : either, but at least for a writer some kind of race could have
>  : long-term mixed results
>
>- The BUILD_BUG_ON() doesn't add any value
>  The only requirement is to ensure that the destination buffer is a valid
>  array.

Sorry, that's not a correct evaluation. See below.

>
>- Zeroing is not necessary in current use cases
>  To avoid confusion, we should remove it. Moreover, not zeroing could
>  potentially make it easier to uncover bugs. If the caller needs a
>  zero-padded task name, it should be explicitly handled at the call site.

This is also not an appropriate rationale. We don't make the kernel "more buggy" not purpose. ;) See below.

>
>Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
>Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
>Suggested-by: Alejandro Colomar <alx@kernel.org>
>Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
>Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>Cc: Christian Brauner <brauner@kernel.org>
>Cc: Jan Kara <jack@suse.cz>
>Cc: Eric Biederman <ebiederm@xmission.com>
>Cc: Kees Cook <keescook@chromium.org>
>Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>Cc: Matus Jokay <matus.jokay@stuba.sk>
>Cc: Alejandro Colomar <alx@kernel.org>
>Cc: "Serge E. Hallyn" <serge@hallyn.com>
>---
> fs/exec.c             | 10 ----------
> fs/proc/array.c       |  2 +-
> include/linux/sched.h | 32 ++++++++++++++++++++++++++------
> kernel/kthread.c      |  2 +-
> 4 files changed, 28 insertions(+), 18 deletions(-)
>
>diff --git a/fs/exec.c b/fs/exec.c
>index 50e76cc633c4..8a23171bc3c3 100644
>--- a/fs/exec.c
>+++ b/fs/exec.c
>@@ -1264,16 +1264,6 @@ static int unshare_sighand(struct task_struct *me)
> 	return 0;
> }
> 
>-char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
>-{
>-	task_lock(tsk);
>-	/* Always NUL terminated and zero-padded */
>-	strscpy_pad(buf, tsk->comm, buf_size);
>-	task_unlock(tsk);
>-	return buf;
>-}
>-EXPORT_SYMBOL_GPL(__get_task_comm);
>-
> /*
>  * These functions flushes out all traces of the currently running executable
>  * so that a new one can be started
>diff --git a/fs/proc/array.c b/fs/proc/array.c
>index 34a47fb0c57f..55ed3510d2bb 100644
>--- a/fs/proc/array.c
>+++ b/fs/proc/array.c
>@@ -109,7 +109,7 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape)
> 	else if (p->flags & PF_KTHREAD)
> 		get_kthread_comm(tcomm, sizeof(tcomm), p);
> 	else
>-		__get_task_comm(tcomm, sizeof(tcomm), p);
>+		get_task_comm(tcomm, p);
> 
> 	if (escape)
> 		seq_escape_str(m, tcomm, ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\");
>diff --git a/include/linux/sched.h b/include/linux/sched.h
>index f8d150343d42..c40b95a79d80 100644
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -1096,9 +1096,12 @@ struct task_struct {
> 	/*
> 	 * executable name, excluding path.
> 	 *
>-	 * - normally initialized setup_new_exec()
>-	 * - access it with [gs]et_task_comm()
>-	 * - lock it with task_lock()
>+	 * - normally initialized begin_new_exec()
>+	 * - set it with set_task_comm()
>+	 *   - strscpy_pad() to ensure it is always NUL-terminated and
>+	 *     zero-padded
>+	 *   - task_lock() to ensure the operation is atomic and the name is
>+	 *     fully updated.
> 	 */
> 	char				comm[TASK_COMM_LEN];
> 
>@@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
> 	__set_task_comm(tsk, from, false);
> }
> 
>-extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
>+/*
>+ * - Why not use task_lock()?
>+ *   User space can randomly change their names anyway, so locking for readers
>+ *   doesn't make sense. For writers, locking is probably necessary, as a race
>+ *   condition could lead to long-term mixed results.
>+ *   The strscpy_pad() in __set_task_comm() can ensure that the task comm is
>+ *   always NUL-terminated and zero-padded. Therefore the race condition between
>+ *   reader and writer is not an issue.
>+ *
>+ * - Why not use strscpy_pad()?
>+ *   While strscpy_pad() prevents writing garbage past the NUL terminator, which
>+ *   is useful when using the task name as a key in a hash map, most use cases
>+ *   don't require this. Zero-padding might confuse users if it’s unnecessary,
>+ *   and not zeroing might even make it easier to expose bugs. If you need a
>+ *   zero-padded task name, please handle that explicitly at the call site.

I really don't like this part of the change. You don't know that existing callers don't depend on the padding. Please invert this logic: get_task_comm() must use strscpy_pad(). Calls NOT wanting padding can call strscpy() themselves.

>+ *
>+ * - ARRAY_SIZE() can help ensure that @buf is indeed an array.

This doesn't need checking here; strscpy() will already do that. 

>+ */
> #define get_task_comm(buf, tsk) ({			\
>-	BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);	\

Also, please leave the TASK_COMM_LEN test so that destination buffers continue to be the correct size: current callers do not perform any return value analysis, so they cannot accidentally start having situations where the destination string might be truncated. Again, anyone wanting to avoid that restriction can use strscpy() directly and check the return value.

>-	__get_task_comm(buf, sizeof(buf), tsk);		\
>+	strscpy(buf, (tsk)->comm, ARRAY_SIZE(buf));	\
>+	buf;						\
> })
> 
> #ifdef CONFIG_SMP
>diff --git a/kernel/kthread.c b/kernel/kthread.c
>index f7be976ff88a..7d001d033cf9 100644
>--- a/kernel/kthread.c
>+++ b/kernel/kthread.c
>@@ -101,7 +101,7 @@ void get_kthread_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> 	struct kthread *kthread = to_kthread(tsk);
> 
> 	if (!kthread || !kthread->full_name) {
>-		__get_task_comm(buf, buf_size, tsk);
>+		strscpy(buf, tsk->comm, buf_size);

Why is this safe to not use strscpy_pad? Also, is buf_size ever not TASK_COMM_LEN?

> 		return;
> 	}
>

Alejandro Colomar Aug. 28, 2024, 2:10 p.m. UTC | #7

Hi Yafang,

On Wed, Aug 28, 2024 at 09:40:35PM GMT, Yafang Shao wrote:
> > Ahh, the actual generic definition is in <include/linux/string.h>.
> > You could do
> >
> >         diff --git i/include/linux/string.h w/include/linux/string.h
> >         index 9edace076ddb..060504719904 100644
> >         --- i/include/linux/string.h
> >         +++ w/include/linux/string.h
> >         @@ -76,11 +76,11 @@ ssize_t sized_strscpy(char *, const char *, size_t);
> >           * known size.
> >           */
> >          #define __strscpy0(dst, src, ...)      \
> >         -       sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
> >         +       sized_strscpy(dst, src, ARRAY_SIZE(dst))
> >          #define __strscpy1(dst, src, size)     sized_strscpy(dst, src, size)
> >
> >          #define __strscpy_pad0(dst, src, ...)  \
> >         -       sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
> >         +       sized_strscpy_pad(dst, src, ARRAY_SIZE(dst))
> >          #define __strscpy_pad1(dst, src, size) sized_strscpy_pad(dst, src, size)
> >
> >          /**
> 
> Thank you for your suggestion. How does the following commit log look
> to you? Does it meet your expectations?
> 
>     string: Use ARRAY_SIZE() in strscpy()
> 
>     We can use ARRAY_SIZE() instead to clarify that they are regular characters.

I would write the following:

For symmetry with wide-character string functions, ARRAY_SIZE() is more
appropriate than sizeof().

For example, one would call wcs*cpy(dst, src, ARRAY_SIZE(dst)).
The call wcs*cpy(dst, src, sizeof(dst)) would be bogus, since the
argument is the number of wide characters, not the number of bytes.

When translating that to normal characters, one wants conceptually the
same operation, but on (normal) characters.  That is, one wants
strscpy(dst, src, ARRAY_SIZE(dst)).  While strscpy() with sizeof() works
just fine because sizeof(char)==1 by definition, it is conceptually
wrong to use it.

By using ARRAY_SIZE(), we get the __must_be_array() check for free.

> 
>     Co-developed-by: Alejandro Colomar <alx@kernel.org>
>     Signed-off-by: Alejandro Colomar <alx@kernel.org>
>     Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> 
> diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
> index bbab79c0c074..07216996e3a9 100644
> --- a/arch/um/include/shared/user.h
> +++ b/arch/um/include/shared/user.h
> @@ -14,7 +14,7 @@
>   * copying too much infrastructure for my taste, so userspace files
>   * get less checking than kernel files.
>   */
> -#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
> 
>  /* This is to get size_t and NULL */
>  #ifndef __UM_HOST__
> @@ -60,7 +60,7 @@ static inline void print_hex_dump(const char *level,
> const char *prefix_str,
>  extern int in_aton(char *str);
>  extern size_t strlcat(char *, const char *, size_t);
>  extern size_t sized_strscpy(char *, const char *, size_t);
> -#define strscpy(dst, src)      sized_strscpy(dst, src, sizeof(dst))
> +#define strscpy(dst, src)      sized_strscpy(dst, src, ARRAY_SIZE(dst))
> 
>  /* Copied from linux/compiler-gcc.h since we can't include it directly */
>  #define barrier() __asm__ __volatile__("": : :"memory")
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 9edace076ddb..060504719904 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> 
> @@ -76,11 +76,11 @@ ssize_t sized_strscpy(char *, const char *, size_t);
>   * known size.
>   */
>  #define __strscpy0(dst, src, ...)      \
> -       sized_strscpy(dst, src, sizeof(dst) + __must_be_array(dst))
> +       sized_strscpy(dst, src, ARRAY_SIZE(dst))
>  #define __strscpy1(dst, src, size)     sized_strscpy(dst, src, size)
> 
>  #define __strscpy_pad0(dst, src, ...)  \
> -       sized_strscpy_pad(dst, src, sizeof(dst) + __must_be_array(dst))
> +       sized_strscpy_pad(dst, src, ARRAY_SIZE(dst))
>  #define __strscpy_pad1(dst, src, size) sized_strscpy_pad(dst, src, size)

The diff looks good to me.  Thanks!

Cheers,
Alex

> 
>  /**
> 
> --
> Regards
> 
> Yafang

Alejandro Colomar Aug. 28, 2024, 3:09 p.m. UTC | #8

Hi Kees,

On Wed, Aug 28, 2024 at 06:48:39AM GMT, Kees Cook wrote:

[...]

> >Thank you for your suggestion. How does the following commit log look
> >to you? Does it meet your expectations?
> >
> >    string: Use ARRAY_SIZE() in strscpy()
> >
> >    We can use ARRAY_SIZE() instead to clarify that they are regular characters.
> >
> >    Co-developed-by: Alejandro Colomar <alx@kernel.org>
> >    Signed-off-by: Alejandro Colomar <alx@kernel.org>
> >    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >
> >diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
> >index bbab79c0c074..07216996e3a9 100644
> >--- a/arch/um/include/shared/user.h
> >+++ b/arch/um/include/shared/user.h
> >@@ -14,7 +14,7 @@
> >  * copying too much infrastructure for my taste, so userspace files
> >  * get less checking than kernel files.
> >  */
> >-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> >+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
> >
> > /* This is to get size_t and NULL */
> > #ifndef __UM_HOST__
> >@@ -60,7 +60,7 @@ static inline void print_hex_dump(const char *level,
> >const char *prefix_str,
> > extern int in_aton(char *str);
> > extern size_t strlcat(char *, const char *, size_t);
> > extern size_t sized_strscpy(char *, const char *, size_t);
> >-#define strscpy(dst, src)      sized_strscpy(dst, src, sizeof(dst))
> >+#define strscpy(dst, src)      sized_strscpy(dst, src, ARRAY_SIZE(dst))
> 
> Uh, but why? strscpy() copies bytes, not array elements. Using sizeof() is already correct and using ARRAY_SIZE() could lead to unexpectedly small counts (in admittedly odd situations).
> 
> What is the problem you're trying to solve here?

I suggested that here:
<https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq/>

There, you'll find the rationale (and also for avoiding the _pad calls
where not necessary --I ignore if it's necessary here--).

Have a lovely day!
Alex

> 
> -Kees
> 
> -- 
> Kees Cook

Kees Cook Aug. 29, 2024, 12:17 a.m. UTC | #9

On Wed, Aug 28, 2024 at 05:09:08PM +0200, Alejandro Colomar wrote:
> Hi Kees,
> 
> On Wed, Aug 28, 2024 at 06:48:39AM GMT, Kees Cook wrote:
> 
> [...]
> 
> > >Thank you for your suggestion. How does the following commit log look
> > >to you? Does it meet your expectations?
> > >
> > >    string: Use ARRAY_SIZE() in strscpy()
> > >
> > >    We can use ARRAY_SIZE() instead to clarify that they are regular characters.
> > >
> > >    Co-developed-by: Alejandro Colomar <alx@kernel.org>
> > >    Signed-off-by: Alejandro Colomar <alx@kernel.org>
> > >    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > >
> > >diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
> > >index bbab79c0c074..07216996e3a9 100644
> > >--- a/arch/um/include/shared/user.h
> > >+++ b/arch/um/include/shared/user.h
> > >@@ -14,7 +14,7 @@
> > >  * copying too much infrastructure for my taste, so userspace files
> > >  * get less checking than kernel files.
> > >  */
> > >-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> > >+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
> > >
> > > /* This is to get size_t and NULL */
> > > #ifndef __UM_HOST__
> > >@@ -60,7 +60,7 @@ static inline void print_hex_dump(const char *level,
> > >const char *prefix_str,
> > > extern int in_aton(char *str);
> > > extern size_t strlcat(char *, const char *, size_t);
> > > extern size_t sized_strscpy(char *, const char *, size_t);
> > >-#define strscpy(dst, src)      sized_strscpy(dst, src, sizeof(dst))
> > >+#define strscpy(dst, src)      sized_strscpy(dst, src, ARRAY_SIZE(dst))
> > 
> > Uh, but why? strscpy() copies bytes, not array elements. Using sizeof() is already correct and using ARRAY_SIZE() could lead to unexpectedly small counts (in admittedly odd situations).
> > 
> > What is the problem you're trying to solve here?
> 
> I suggested that here:
> <https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq/>
> 
> There, you'll find the rationale (and also for avoiding the _pad calls
> where not necessary --I ignore if it's necessary here--).

Right, so we only use byte strings for strscpy(), so sizeof() is
sufficient. There's no technical need to switch to ARRAY_SIZE(), and I'd
like to minimize any changes to such core APIs without a good reason.

And for the _pad change, we are also doing strncpy() replacement via
case-by-case analysis, but with a common function like get_task_comm(),
I don't want to change the behavior without a complete audit of the
padding needs of every caller. Since that's rather a lot for this series,
I'd rather we just leave the existing behavior as-is, and if padding
removal is wanted after that, we can do it on a case-by-case basis then.

-Kees

Alejandro Colomar Aug. 29, 2024, 12:25 a.m. UTC | #10

Hi Kees,

On Wed, Aug 28, 2024 at 05:17:55PM GMT, Kees Cook wrote:
> On Wed, Aug 28, 2024 at 05:09:08PM +0200, Alejandro Colomar wrote:
> > Hi Kees,
> > 
> > On Wed, Aug 28, 2024 at 06:48:39AM GMT, Kees Cook wrote:
> > 
> > [...]
> > 
> > > >Thank you for your suggestion. How does the following commit log look
> > > >to you? Does it meet your expectations?
> > > >
> > > >    string: Use ARRAY_SIZE() in strscpy()
> > > >
> > > >    We can use ARRAY_SIZE() instead to clarify that they are regular characters.
> > > >
> > > >    Co-developed-by: Alejandro Colomar <alx@kernel.org>
> > > >    Signed-off-by: Alejandro Colomar <alx@kernel.org>
> > > >    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > >
> > > >diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
> > > >index bbab79c0c074..07216996e3a9 100644
> > > >--- a/arch/um/include/shared/user.h
> > > >+++ b/arch/um/include/shared/user.h
> > > >@@ -14,7 +14,7 @@
> > > >  * copying too much infrastructure for my taste, so userspace files
> > > >  * get less checking than kernel files.
> > > >  */
> > > >-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> > > >+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
> > > >
> > > > /* This is to get size_t and NULL */
> > > > #ifndef __UM_HOST__
> > > >@@ -60,7 +60,7 @@ static inline void print_hex_dump(const char *level,
> > > >const char *prefix_str,
> > > > extern int in_aton(char *str);
> > > > extern size_t strlcat(char *, const char *, size_t);
> > > > extern size_t sized_strscpy(char *, const char *, size_t);
> > > >-#define strscpy(dst, src)      sized_strscpy(dst, src, sizeof(dst))
> > > >+#define strscpy(dst, src)      sized_strscpy(dst, src, ARRAY_SIZE(dst))
> > > 
> > > Uh, but why? strscpy() copies bytes, not array elements. Using sizeof() is already correct and using ARRAY_SIZE() could lead to unexpectedly small counts (in admittedly odd situations).
> > > 
> > > What is the problem you're trying to solve here?
> > 
> > I suggested that here:
> > <https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq/>
> > 
> > There, you'll find the rationale (and also for avoiding the _pad calls
> > where not necessary --I ignore if it's necessary here--).
> 
> Right, so we only use byte strings for strscpy(), so sizeof() is
> sufficient. There's no technical need to switch to ARRAY_SIZE(), and I'd
> like to minimize any changes to such core APIs without a good reason.

Makes sense.  My original proposal was ignoring that the wrapper was
already using __must_be_array().  Having already sizeof() +
__must_be_array(), I'd leave it like that, since both do effectively the
same.

> And for the _pad change, we are also doing strncpy() replacement via
> case-by-case analysis, but with a common function like get_task_comm(),
> I don't want to change the behavior without a complete audit of the
> padding needs of every caller.

Agree.  I had the same problem with shadow.  Removing padding was the
worst part, because it was hard to justify that nothing was relying on
the padding.

> Since that's rather a lot for this series,
> I'd rather we just leave the existing behavior as-is, and if padding
> removal is wanted after that, we can do it on a case-by-case basis then.
> 
> -Kees

Have a lovely night!
Alex

> 
> -- 
> Kees Cook

Yafang Shao Aug. 29, 2024, 6:30 a.m. UTC | #11

On Wed, Aug 28, 2024 at 10:04 PM Kees Cook <kees@kernel.org> wrote:
>
>
>
> On August 27, 2024 8:03:14 PM PDT, Yafang Shao <laoar.shao@gmail.com> wrote:
> >We want to eliminate the use of __get_task_comm() for the following
> >reasons:
> >
> >- The task_lock() is unnecessary
> >  Quoted from Linus [0]:
> >  : Since user space can randomly change their names anyway, using locking
> >  : was always wrong for readers (for writers it probably does make sense
> >  : to have some lock - although practically speaking nobody cares there
> >  : either, but at least for a writer some kind of race could have
> >  : long-term mixed results
> >
> >- The BUILD_BUG_ON() doesn't add any value
> >  The only requirement is to ensure that the destination buffer is a valid
> >  array.
>
> Sorry, that's not a correct evaluation. See below.
>
> >
> >- Zeroing is not necessary in current use cases
> >  To avoid confusion, we should remove it. Moreover, not zeroing could
> >  potentially make it easier to uncover bugs. If the caller needs a
> >  zero-padded task name, it should be explicitly handled at the call site.
>
> This is also not an appropriate rationale. We don't make the kernel "more buggy" not purpose. ;) See below.
>
> >
> >Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> >Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
> >Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
> >Suggested-by: Alejandro Colomar <alx@kernel.org>
> >Link: https://lore.kernel.org/all/2jxak5v6dfxlpbxhpm3ey7oup4g2lnr3ueurfbosf5wdo65dk4@srb3hsk72zwq
> >Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> >Cc: Christian Brauner <brauner@kernel.org>
> >Cc: Jan Kara <jack@suse.cz>
> >Cc: Eric Biederman <ebiederm@xmission.com>
> >Cc: Kees Cook <keescook@chromium.org>
> >Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> >Cc: Matus Jokay <matus.jokay@stuba.sk>
> >Cc: Alejandro Colomar <alx@kernel.org>
> >Cc: "Serge E. Hallyn" <serge@hallyn.com>
> >---
> > fs/exec.c             | 10 ----------
> > fs/proc/array.c       |  2 +-
> > include/linux/sched.h | 32 ++++++++++++++++++++++++++------
> > kernel/kthread.c      |  2 +-
> > 4 files changed, 28 insertions(+), 18 deletions(-)
> >
> >diff --git a/fs/exec.c b/fs/exec.c
> >index 50e76cc633c4..8a23171bc3c3 100644
> >--- a/fs/exec.c
> >+++ b/fs/exec.c
> >@@ -1264,16 +1264,6 @@ static int unshare_sighand(struct task_struct *me)
> >       return 0;
> > }
> >
> >-char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
> >-{
> >-      task_lock(tsk);
> >-      /* Always NUL terminated and zero-padded */
> >-      strscpy_pad(buf, tsk->comm, buf_size);
> >-      task_unlock(tsk);
> >-      return buf;
> >-}
> >-EXPORT_SYMBOL_GPL(__get_task_comm);
> >-
> > /*
> >  * These functions flushes out all traces of the currently running executable
> >  * so that a new one can be started
> >diff --git a/fs/proc/array.c b/fs/proc/array.c
> >index 34a47fb0c57f..55ed3510d2bb 100644
> >--- a/fs/proc/array.c
> >+++ b/fs/proc/array.c
> >@@ -109,7 +109,7 @@ void proc_task_name(struct seq_file *m, struct task_struct *p, bool escape)
> >       else if (p->flags & PF_KTHREAD)
> >               get_kthread_comm(tcomm, sizeof(tcomm), p);
> >       else
> >-              __get_task_comm(tcomm, sizeof(tcomm), p);
> >+              get_task_comm(tcomm, p);
> >
> >       if (escape)
> >               seq_escape_str(m, tcomm, ESCAPE_SPACE | ESCAPE_SPECIAL, "\n\\");
> >diff --git a/include/linux/sched.h b/include/linux/sched.h
> >index f8d150343d42..c40b95a79d80 100644
> >--- a/include/linux/sched.h
> >+++ b/include/linux/sched.h
> >@@ -1096,9 +1096,12 @@ struct task_struct {
> >       /*
> >        * executable name, excluding path.
> >        *
> >-       * - normally initialized setup_new_exec()
> >-       * - access it with [gs]et_task_comm()
> >-       * - lock it with task_lock()
> >+       * - normally initialized begin_new_exec()
> >+       * - set it with set_task_comm()
> >+       *   - strscpy_pad() to ensure it is always NUL-terminated and
> >+       *     zero-padded
> >+       *   - task_lock() to ensure the operation is atomic and the name is
> >+       *     fully updated.
> >        */
> >       char                            comm[TASK_COMM_LEN];
> >
> >@@ -1914,10 +1917,27 @@ static inline void set_task_comm(struct task_struct *tsk, const char *from)
> >       __set_task_comm(tsk, from, false);
> > }
> >
> >-extern char *__get_task_comm(char *to, size_t len, struct task_struct *tsk);
> >+/*
> >+ * - Why not use task_lock()?
> >+ *   User space can randomly change their names anyway, so locking for readers
> >+ *   doesn't make sense. For writers, locking is probably necessary, as a race
> >+ *   condition could lead to long-term mixed results.
> >+ *   The strscpy_pad() in __set_task_comm() can ensure that the task comm is
> >+ *   always NUL-terminated and zero-padded. Therefore the race condition between
> >+ *   reader and writer is not an issue.
> >+ *
> >+ * - Why not use strscpy_pad()?
> >+ *   While strscpy_pad() prevents writing garbage past the NUL terminator, which
> >+ *   is useful when using the task name as a key in a hash map, most use cases
> >+ *   don't require this. Zero-padding might confuse users if it’s unnecessary,
> >+ *   and not zeroing might even make it easier to expose bugs. If you need a
> >+ *   zero-padded task name, please handle that explicitly at the call site.
>
> I really don't like this part of the change. You don't know that existing callers don't depend on the padding. Please invert this logic: get_task_comm() must use strscpy_pad(). Calls NOT wanting padding can call strscpy() themselves.
>
> >+ *
> >+ * - ARRAY_SIZE() can help ensure that @buf is indeed an array.
>
> This doesn't need checking here; strscpy() will already do that.
>
> >+ */
> > #define get_task_comm(buf, tsk) ({                    \
> >-      BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);     \
>
> Also, please leave the TASK_COMM_LEN test so that destination buffers continue to be the correct size: current callers do not perform any return value analysis, so they cannot accidentally start having situations where the destination string might be truncated. Again, anyone wanting to avoid that restriction can use strscpy() directly and check the return value.

Hello Kees,

Thanks for your input.

Alejandro has addressed all the other changes except for the removal
of BUILD_BUG_ON(). I have a question regarding this: if we're using it
to avoid truncation, why not write it like this?

    BUILD_BUG_ON(sizeof(buf) < TASK_COMM_LEN);

This way, it ensures that the size is at least as large as TASK_COMM_LEN.

--
Regards
Yafang

[v8,1/8] Get rid of __get_task_comm()

Commit Message

Comments

Patch