diff mbox series

[v6] checkpatch: add check for snprintf to scnprintf

Message ID 20240429-snprintf-checkpatch-v6-1-354c62c88290@google.com (mailing list archive)
State New
Headers show
Series [v6] checkpatch: add check for snprintf to scnprintf | expand

Commit Message

Justin Stitt April 29, 2024, 6:39 p.m. UTC
I am going to quote Lee Jones who has been doing some snprintf ->
scnprintf refactorings:

"There is a general misunderstanding amongst engineers that
{v}snprintf() returns the length of the data *actually* encoded into the
destination array.  However, as per the C99 standard {v}snprintf()
really returns the length of the data that *would have been* written if
there were enough space for it.  This misunderstanding has led to
buffer-overruns in the past.  It's generally considered safer to use the
{v}scnprintf() variants in their place (or even sprintf() in simple
cases).  So let's do that."

To help prevent new instances of snprintf() from popping up, let's add a
check to checkpatch.pl.

Suggested-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
---
Changes in v6:
- move capture group to only include symbol name (not spaces or paren)
- Link to v5: https://lore.kernel.org/r/20240422-snprintf-checkpatch-v5-1-f1e90bf7164e@google.com

Changes in v5:
- use capture groups to let the user know which variation they used
- Link to v4: https://lore.kernel.org/r/20240408-snprintf-checkpatch-v4-1-8697c96ac94b@google.com

Changes in v4:
- also check for vsnprintf variant (thanks Bill)
- Link to v3: https://lore.kernel.org/r/20240315-snprintf-checkpatch-v3-1-a451e7664306@google.com

Changes in v3:
- fix indentation
- add reference link (https://github.com/KSPP/linux/issues/105) (thanks Joe)
- Link to v2: https://lore.kernel.org/r/20240221-snprintf-checkpatch-v2-1-9baeb59dae30@google.com

Changes in v2:
- Had a vim moment and deleted a character before sending the patch.
- Replaced the character :)
- Link to v1: https://lore.kernel.org/r/20240221-snprintf-checkpatch-v1-1-3ac5025b5961@google.com
---
From a discussion here [1].

[1]: https://lore.kernel.org/all/0f9c95f9-2c14-eee6-7faf-635880edcea4@linux-m68k.org/
---
 scripts/checkpatch.pl | 6 ++++++
 1 file changed, 6 insertions(+)


---
base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d
change-id: 20240221-snprintf-checkpatch-a864ed67ebd0

Best regards,
--
Justin Stitt <justinstitt@google.com>

Comments

Kees Cook April 29, 2024, 7:49 p.m. UTC | #1
On Mon, Apr 29, 2024 at 06:39:28PM +0000, Justin Stitt wrote:
> I am going to quote Lee Jones who has been doing some snprintf ->
> scnprintf refactorings:
> 
> "There is a general misunderstanding amongst engineers that
> {v}snprintf() returns the length of the data *actually* encoded into the
> destination array.  However, as per the C99 standard {v}snprintf()
> really returns the length of the data that *would have been* written if
> there were enough space for it.  This misunderstanding has led to
> buffer-overruns in the past.  It's generally considered safer to use the
> {v}scnprintf() variants in their place (or even sprintf() in simple
> cases).  So let's do that."
> 
> To help prevent new instances of snprintf() from popping up, let's add a
> check to checkpatch.pl.
> 
> Suggested-by: Finn Thain <fthain@linux-m68k.org>
> Signed-off-by: Justin Stitt <justinstitt@google.com>

Thanks!

Reviewed-by: Kees Cook <keescook@chromium.org>
Joe Perches April 30, 2024, 3:21 a.m. UTC | #2
On Mon, 2024-04-29 at 12:49 -0700, Kees Cook wrote:
> On Mon, Apr 29, 2024 at 06:39:28PM +0000, Justin Stitt wrote:
> > I am going to quote Lee Jones who has been doing some snprintf ->
> > scnprintf refactorings:
> > 
> > "There is a general misunderstanding amongst engineers that
> > {v}snprintf() returns the length of the data *actually* encoded into the
> > destination array.  However, as per the C99 standard {v}snprintf()
> > really returns the length of the data that *would have been* written if
> > there were enough space for it.  This misunderstanding has led to
> > buffer-overruns in the past.  It's generally considered safer to use the
> > {v}scnprintf() variants in their place (or even sprintf() in simple
> > cases).  So let's do that."
> > 
> > To help prevent new instances of snprintf() from popping up, let's add a
> > check to checkpatch.pl.
> > 
> > Suggested-by: Finn Thain <fthain@linux-m68k.org>
> > Signed-off-by: Justin Stitt <justinstitt@google.com>
> 
> Thanks!
> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> 

$ git grep -P '\b((v|)snprintf)\s*\(' | wc -l
7745
$ git grep -P '(?:return\s+|=\s*)\b((v|)snprintf)\s*\(' | wc -l
1626

Given there are ~5000 uses of these that don't care
whether or not it's snprintf or scnprintf, I think this
is not great.

I'd much rather make sure the return value of the call
is used before suggesting an alternative.

$ git grep  -P '\b((v|)snprintf)\s*\(.*PAGE_SIZE' | wc -l
515

And about 1/3 of these snprintf calls are for sysfs style
output that ideally would be converted to sysfs_emit or
sysfs_emit_at instead.
Lee Jones May 2, 2024, 9:29 a.m. UTC | #3
On Mon, 29 Apr 2024, Joe Perches wrote:

> On Mon, 2024-04-29 at 12:49 -0700, Kees Cook wrote:
> > On Mon, Apr 29, 2024 at 06:39:28PM +0000, Justin Stitt wrote:
> > > I am going to quote Lee Jones who has been doing some snprintf ->
> > > scnprintf refactorings:
> > > 
> > > "There is a general misunderstanding amongst engineers that
> > > {v}snprintf() returns the length of the data *actually* encoded into the
> > > destination array.  However, as per the C99 standard {v}snprintf()
> > > really returns the length of the data that *would have been* written if
> > > there were enough space for it.  This misunderstanding has led to
> > > buffer-overruns in the past.  It's generally considered safer to use the
> > > {v}scnprintf() variants in their place (or even sprintf() in simple
> > > cases).  So let's do that."
> > > 
> > > To help prevent new instances of snprintf() from popping up, let's add a
> > > check to checkpatch.pl.
> > > 
> > > Suggested-by: Finn Thain <fthain@linux-m68k.org>
> > > Signed-off-by: Justin Stitt <justinstitt@google.com>
> > 
> > Thanks!
> > 
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > 
> 
> $ git grep -P '\b((v|)snprintf)\s*\(' | wc -l
> 7745
> $ git grep -P '(?:return\s+|=\s*)\b((v|)snprintf)\s*\(' | wc -l
> 1626
> 
> Given there are ~5000 uses of these that don't care
> whether or not it's snprintf or scnprintf, I think this
> is not great.
> 
> I'd much rather make sure the return value of the call
> is used before suggesting an alternative.
> 
> $ git grep  -P '\b((v|)snprintf)\s*\(.*PAGE_SIZE' | wc -l
> 515
> 
> And about 1/3 of these snprintf calls are for sysfs style
> output that ideally would be converted to sysfs_emit or
> sysfs_emit_at instead.

I am working on the migration of these (this patch was spun off from
that project in fact).  Some subsystems are currently prioritising the
status quo (a.k.a. "no churn"), but most have been accepting of the
changes.

Planning to get back to it once the CVE project has calmed a little.

Those numbers should diminish over time.
Lee Jones May 2, 2024, 9:30 a.m. UTC | #4
On Mon, 29 Apr 2024, Justin Stitt wrote:

> I am going to quote Lee Jones who has been doing some snprintf ->
> scnprintf refactorings:
> 
> "There is a general misunderstanding amongst engineers that
> {v}snprintf() returns the length of the data *actually* encoded into the
> destination array.  However, as per the C99 standard {v}snprintf()
> really returns the length of the data that *would have been* written if
> there were enough space for it.  This misunderstanding has led to
> buffer-overruns in the past.  It's generally considered safer to use the
> {v}scnprintf() variants in their place (or even sprintf() in simple
> cases).  So let's do that."
> 
> To help prevent new instances of snprintf() from popping up, let's add a
> check to checkpatch.pl.
> 
> Suggested-by: Finn Thain <fthain@linux-m68k.org>
> Signed-off-by: Justin Stitt <justinstitt@google.com>
> ---
> Changes in v6:
> - move capture group to only include symbol name (not spaces or paren)
> - Link to v5: https://lore.kernel.org/r/20240422-snprintf-checkpatch-v5-1-f1e90bf7164e@google.com
> 
> Changes in v5:
> - use capture groups to let the user know which variation they used
> - Link to v4: https://lore.kernel.org/r/20240408-snprintf-checkpatch-v4-1-8697c96ac94b@google.com
> 
> Changes in v4:
> - also check for vsnprintf variant (thanks Bill)
> - Link to v3: https://lore.kernel.org/r/20240315-snprintf-checkpatch-v3-1-a451e7664306@google.com
> 
> Changes in v3:
> - fix indentation
> - add reference link (https://github.com/KSPP/linux/issues/105) (thanks Joe)
> - Link to v2: https://lore.kernel.org/r/20240221-snprintf-checkpatch-v2-1-9baeb59dae30@google.com
> 
> Changes in v2:
> - Had a vim moment and deleted a character before sending the patch.
> - Replaced the character :)
> - Link to v1: https://lore.kernel.org/r/20240221-snprintf-checkpatch-v1-1-3ac5025b5961@google.com
> ---
> From a discussion here [1].
> 
> [1]: https://lore.kernel.org/all/0f9c95f9-2c14-eee6-7faf-635880edcea4@linux-m68k.org/
> ---
>  scripts/checkpatch.pl | 6 ++++++
>  1 file changed, 6 insertions(+)

Reviewed-by: Lee Jones <lee@kernel.org>
Kees Cook May 2, 2024, 10:57 p.m. UTC | #5
On Mon, Apr 29, 2024 at 08:21:49PM -0700, Joe Perches wrote:
> On Mon, 2024-04-29 at 12:49 -0700, Kees Cook wrote:
> > On Mon, Apr 29, 2024 at 06:39:28PM +0000, Justin Stitt wrote:
> > > I am going to quote Lee Jones who has been doing some snprintf ->
> > > scnprintf refactorings:
> > > 
> > > "There is a general misunderstanding amongst engineers that
> > > {v}snprintf() returns the length of the data *actually* encoded into the
> > > destination array.  However, as per the C99 standard {v}snprintf()
> > > really returns the length of the data that *would have been* written if
> > > there were enough space for it.  This misunderstanding has led to
> > > buffer-overruns in the past.  It's generally considered safer to use the
> > > {v}scnprintf() variants in their place (or even sprintf() in simple
> > > cases).  So let's do that."
> > > 
> > > To help prevent new instances of snprintf() from popping up, let's add a
> > > check to checkpatch.pl.
> > > 
> > > Suggested-by: Finn Thain <fthain@linux-m68k.org>
> > > Signed-off-by: Justin Stitt <justinstitt@google.com>
> > 
> > Thanks!
> > 
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > 
> 
> $ git grep -P '\b((v|)snprintf)\s*\(' | wc -l
> 7745
> $ git grep -P '(?:return\s+|=\s*)\b((v|)snprintf)\s*\(' | wc -l
> 1626
> 
> Given there are ~5000 uses of these that don't care
> whether or not it's snprintf or scnprintf, I think this
> is not great.

But let's not add more of either case. :)

> I'd much rather make sure the return value of the call
> is used before suggesting an alternative.
> 
> $ git grep  -P '\b((v|)snprintf)\s*\(.*PAGE_SIZE' | wc -l
> 515
> 
> And about 1/3 of these snprintf calls are for sysfs style
> output that ideally would be converted to sysfs_emit or
> sysfs_emit_at instead.

Detecting that we're in the right place for sysfs_emit seems out of
scope for here, but maybe it should be more clearly called out by the
contents at the reported URL?
diff mbox series

Patch

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9c4c4a61bc83..bdae8a7ae5ed 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -7012,6 +7012,12 @@  sub process {
 			     "Prefer strscpy, strscpy_pad, or __nonstring over strncpy - see: https://github.com/KSPP/linux/issues/90\n" . $herecurr);
 		}
 
+# {v}snprintf uses that should likely be {v}scnprintf
+		if ($line =~ /\b((v|)snprintf)\s*\(/) {
+			WARN("SNPRINTF",
+			     "Prefer ${2}scnprintf over $1 - see: https://github.com/KSPP/linux/issues/105\n" . $herecurr);
+		}
+
 # ethtool_sprintf uses that should likely be ethtool_puts
 		if ($line =~ /\bethtool_sprintf\s*\(\s*$FuncArg\s*,\s*$FuncArg\s*\)/) {
 			if (WARN("PREFER_ETHTOOL_PUTS",