diff mbox series

[1/1] lib/vsprintf: Implement ssprintf() to catch truncated strings

Message ID 20240125083921.1312709-1-lee@kernel.org (mailing list archive)
State Superseded
Headers show
Series [1/1] lib/vsprintf: Implement ssprintf() to catch truncated strings | expand

Commit Message

Lee Jones Jan. 25, 2024, 8:39 a.m. UTC
There is an ongoing effort to replace the use of {v}snprintf() variants
with safer alternatives - for a more in depth view, see Jon's write-up
on LWN [0] and/or Alex's on the Kernel Self Protection Project [1].

Whist executing the task, it quickly became apparent that the initial
thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for
a number of cases.  Specifically ones where the caller needs to know
whether the given string ends up being truncated.  This is where
ssprintf() [based on similar semantics of strscpy()] comes in, since it
takes the best parts of both of the aforementioned variants.  It has the
testability of truncation of snprintf() and returns the number of Bytes
*actually* written, similar to scnprintf(), making it a very programmer
friendly alternative.

Here's some examples to show the differences:

  Success: No truncation - all 9 Bytes successfully written to the buffer

    ret = snprintf (buf, 10, "%s", "123456789");  // ret = 9
    ret = scnprintf(buf, 10, "%s", "123456789");  // ret = 9
    ret = ssprintf (buf, 10, "%s", "123456789");  // ret = 9

  Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated

    ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10

      Reports: "10 Bytes would have been written if buf was large enough"
      Issue: Programmers need to know/remember to check ret against "10"

    ret = scnprintf(buf, 10, "%s", "123456789-"); // ret = 9

      Reports: "9 Bytes actually written"
      Issue: Returns 9 on success AND failure (see above)

    ret = ssprintf (buf, 10, "%s", "123456789-"); // ret = -E2BIG

      Reports: "Data provided is too large to fit in the buffer"
      Issue: No tangible impact: No way to tell how much data was lost

[0] https://lwn.net/Articles/69419/
[1] https://github.com/KSPP/linux/issues/105
Signed-off-by: Lee Jones <lee@kernel.org>
---
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Crutcher Dunnavant <crutcher+kernel@datastacks.com>
Cc: Juergen Quade <quade@hsnr.de>

 include/linux/sprintf.h |  2 ++
 lib/vsprintf.c          | 58 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

Comments

Rasmus Villemoes Jan. 25, 2024, 9:04 a.m. UTC | #1
On 25/01/2024 09.39, Lee Jones wrote:
> There is an ongoing effort to replace the use of {v}snprintf() variants
> with safer alternatives - for a more in depth view, see Jon's write-up
> on LWN [0] and/or Alex's on the Kernel Self Protection Project [1].
> 
> Whist executing the task, it quickly became apparent that the initial
> thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for
> a number of cases.  Specifically ones where the caller needs to know
> whether the given string ends up being truncated.  This is where
> ssprintf() [based on similar semantics of strscpy()] comes in, since it
> takes the best parts of both of the aforementioned variants.  It has the
> testability of truncation of snprintf() and returns the number of Bytes
> *actually* written, similar to scnprintf(), making it a very programmer
> friendly alternative.
> 
> Here's some examples to show the differences:
> 
>   Success: No truncation - all 9 Bytes successfully written to the buffer
> 
>     ret = snprintf (buf, 10, "%s", "123456789");  // ret = 9
>     ret = scnprintf(buf, 10, "%s", "123456789");  // ret = 9
>     ret = ssprintf (buf, 10, "%s", "123456789");  // ret = 9
> 
>   Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated
> 
>     ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10
> 
>       Reports: "10 Bytes would have been written if buf was large enough"
>       Issue: Programmers need to know/remember to check ret against "10"

Yeah, so I'm not at all sure we need yet-another-wrapper with
yet-another-hard-to-read-prefix when people can just RTFM and learn how
to check for truncation or whatnot. But if you do this:

> +/**
> + * vssprintf - Format a string and place it in a buffer
> + * @buf: The buffer to place the result into
> + * @size: The size of the buffer, including the trailing null space
> + * @fmt: The format string to use
> + * @args: Arguments for the format string
> + *
> + * The return value is the number of characters which have been written into
> + * the @buf not including the trailing '\0' or -E2BIG if the string was
> + * truncated. If @size is == 0 the function returns 0.
> + *
> + * If you're not already dealing with a va_list consider using ssprintf().
> + *
> + * See the vsnprintf() documentation for format string extensions over C99.
> + */
> +int vssprintf(char *buf, size_t size, const char *fmt, va_list args)
> +{
> +	int i;
> +
> +	if (unlikely(!size))
> +		return 0;

No, don't special-case size 0 here. Passing size==0 should just
guarantee -E2BIG because that's essentially a programmer error, and the
calling code is then at least much more likely to not believe that buf
now contains a nul-terminated (empty) string.

And since it's essentially a bug, there's no need to special-case size 0
to avoid calling vsnprintf(), just let it be caught by the i >= size check.

> +	i = vsnprintf(buf, size, fmt, args);
> +
> +	if (unlikely(i >= size))
> +		return -E2BIG;
> +
> +	if (likely(i < size))
> +		return i;

Those two ifs are mutually exclusive, so why the second if() and not
just a direct "return i"? That final "return size-1" is unreachable, and
confusing.

Rasmus
Lee Jones Jan. 25, 2024, 10:36 a.m. UTC | #2
On Thu, 25 Jan 2024, Rasmus Villemoes wrote:

> On 25/01/2024 09.39, Lee Jones wrote:
> > There is an ongoing effort to replace the use of {v}snprintf() variants
> > with safer alternatives - for a more in depth view, see Jon's write-up
> > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1].
> > 
> > Whist executing the task, it quickly became apparent that the initial
> > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for
> > a number of cases.  Specifically ones where the caller needs to know
> > whether the given string ends up being truncated.  This is where
> > ssprintf() [based on similar semantics of strscpy()] comes in, since it
> > takes the best parts of both of the aforementioned variants.  It has the
> > testability of truncation of snprintf() and returns the number of Bytes
> > *actually* written, similar to scnprintf(), making it a very programmer
> > friendly alternative.
> > 
> > Here's some examples to show the differences:
> > 
> >   Success: No truncation - all 9 Bytes successfully written to the buffer
> > 
> >     ret = snprintf (buf, 10, "%s", "123456789");  // ret = 9
> >     ret = scnprintf(buf, 10, "%s", "123456789");  // ret = 9
> >     ret = ssprintf (buf, 10, "%s", "123456789");  // ret = 9
> > 
> >   Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated
> > 
> >     ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10
> > 
> >       Reports: "10 Bytes would have been written if buf was large enough"
> >       Issue: Programmers need to know/remember to check ret against "10"
> 
> Yeah, so I'm not at all sure we need yet-another-wrapper with
> yet-another-hard-to-read-prefix when people can just RTFM and learn how
> to check for truncation or whatnot. But if you do this:

As wonderful as it would be for people to "just RTFM", we're seeing a
large number of cases where this isn't happening.  Providing a more
programmer friendly way is thought, by people way smarter than me, to be
a solid means to solve this issue.  Please also see Kees Cook's related
work to remove strlcpy() use.

> > +/**
> > + * vssprintf - Format a string and place it in a buffer
> > + * @buf: The buffer to place the result into
> > + * @size: The size of the buffer, including the trailing null space
> > + * @fmt: The format string to use
> > + * @args: Arguments for the format string
> > + *
> > + * The return value is the number of characters which have been written into
> > + * the @buf not including the trailing '\0' or -E2BIG if the string was
> > + * truncated. If @size is == 0 the function returns 0.
> > + *
> > + * If you're not already dealing with a va_list consider using ssprintf().
> > + *
> > + * See the vsnprintf() documentation for format string extensions over C99.
> > + */
> > +int vssprintf(char *buf, size_t size, const char *fmt, va_list args)
> > +{
> > +	int i;
> > +
> > +	if (unlikely(!size))
> > +		return 0;
> 
> No, don't special-case size 0 here. Passing size==0 should just
> guarantee -E2BIG because that's essentially a programmer error, and the
> calling code is then at least much more likely to not believe that buf
> now contains a nul-terminated (empty) string.
> 
> And since it's essentially a bug, there's no need to special-case size 0
> to avoid calling vsnprintf(), just let it be caught by the i >= size check.

Agree.  Thanks for the feedback.  I will change this.

> > +	i = vsnprintf(buf, size, fmt, args);
> > +
> > +	if (unlikely(i >= size))
> > +		return -E2BIG;
> > +
> > +	if (likely(i < size))
> > +		return i;
> 
> Those two ifs are mutually exclusive, so why the second if() and not
> just a direct "return i"? That final "return size-1" is unreachable, and
> confusing.

That's true.  The last line of vscnprintf() essentially means that the
data was truncated, which is caught by the new check.  So it should be
reworked to look like this:

```
  if (likely(i < size))
          return i;

  return -E2BIG;
```

Thanks again.  That's very helpful.
David Laight Jan. 27, 2024, 2:32 p.m. UTC | #3
From: Lee Jones
> Sent: 25 January 2024 10:36
> On Thu, 25 Jan 2024, Rasmus Villemoes wrote:
> 
> > On 25/01/2024 09.39, Lee Jones wrote:
> > > There is an ongoing effort to replace the use of {v}snprintf() variants
> > > with safer alternatives - for a more in depth view, see Jon's write-up
> > > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1].
> > >
> > > Whist executing the task, it quickly became apparent that the initial
> > > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for
> > > a number of cases.  Specifically ones where the caller needs to know
> > > whether the given string ends up being truncated.  This is where
> > > ssprintf() [based on similar semantics of strscpy()] comes in, since it
> > > takes the best parts of both of the aforementioned variants.  It has the
> > > testability of truncation of snprintf() and returns the number of Bytes
> > > *actually* written, similar to scnprintf(), making it a very programmer
> > > friendly alternative.
> > >
> > > Here's some examples to show the differences:
> > >
> > >   Success: No truncation - all 9 Bytes successfully written to the buffer
> > >
> > >     ret = snprintf (buf, 10, "%s", "123456789");  // ret = 9
> > >     ret = scnprintf(buf, 10, "%s", "123456789");  // ret = 9
> > >     ret = ssprintf (buf, 10, "%s", "123456789");  // ret = 9
> > >
> > >   Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated
> > >
> > >     ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10
> > >
> > >       Reports: "10 Bytes would have been written if buf was large enough"
> > >       Issue: Programmers need to know/remember to check ret against "10"
> >
> > Yeah, so I'm not at all sure we need yet-another-wrapper with
> > yet-another-hard-to-read-prefix when people can just RTFM and learn how
> > to check for truncation or whatnot. But if you do this:
> 
> As wonderful as it would be for people to "just RTFM", we're seeing a
> large number of cases where this isn't happening.  Providing a more
> programmer friendly way is thought, by people way smarter than me, to be
> a solid means to solve this issue.  Please also see Kees Cook's related
> work to remove strlcpy() use.

My worry is that people will believe the length and forget that
it might be an error code.

So you replace one set of errors (truncated data), with another
worse set (eg write before start of buffer).

I'm sure that the safest return for 'truncated' is the buffer length.
The a series of statements like:
	buf += xxx(buf, buf_end - buf, .....);
can all be called with a single overflow check at the end.

Forget the check, and the length just contains a trailing '\0'
which might cause confusion but isn't going to immediately
break the world.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Lee Jones Jan. 29, 2024, 9:24 a.m. UTC | #4
NB: I was _just_ about to send out v2 with Rasmus's suggestions before I
saw your reply.  I'm going to submit it anyway and Cc both you and
Rasmus.  If you still disagree with my suggested approach, we can either
continue discussion here or on the new version.

More below:

> From: Lee Jones
> > Sent: 25 January 2024 10:36
> > On Thu, 25 Jan 2024, Rasmus Villemoes wrote:
> > 
> > > On 25/01/2024 09.39, Lee Jones wrote:
> > > > There is an ongoing effort to replace the use of {v}snprintf() variants
> > > > with safer alternatives - for a more in depth view, see Jon's write-up
> > > > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1].
> > > >
> > > > Whist executing the task, it quickly became apparent that the initial
> > > > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for
> > > > a number of cases.  Specifically ones where the caller needs to know
> > > > whether the given string ends up being truncated.  This is where
> > > > ssprintf() [based on similar semantics of strscpy()] comes in, since it
> > > > takes the best parts of both of the aforementioned variants.  It has the
> > > > testability of truncation of snprintf() and returns the number of Bytes
> > > > *actually* written, similar to scnprintf(), making it a very programmer
> > > > friendly alternative.
> > > >
> > > > Here's some examples to show the differences:
> > > >
> > > >   Success: No truncation - all 9 Bytes successfully written to the buffer
> > > >
> > > >     ret = snprintf (buf, 10, "%s", "123456789");  // ret = 9
> > > >     ret = scnprintf(buf, 10, "%s", "123456789");  // ret = 9
> > > >     ret = ssprintf (buf, 10, "%s", "123456789");  // ret = 9
> > > >
> > > >   Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated
> > > >
> > > >     ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10
> > > >
> > > >       Reports: "10 Bytes would have been written if buf was large enough"
> > > >       Issue: Programmers need to know/remember to check ret against "10"
> > >
> > > Yeah, so I'm not at all sure we need yet-another-wrapper with
> > > yet-another-hard-to-read-prefix when people can just RTFM and learn how
> > > to check for truncation or whatnot. But if you do this:
> > 
> > As wonderful as it would be for people to "just RTFM", we're seeing a
> > large number of cases where this isn't happening.  Providing a more
> > programmer friendly way is thought, by people way smarter than me, to be
> > a solid means to solve this issue.  Please also see Kees Cook's related
> > work to remove strlcpy() use.
> 
> My worry is that people will believe the length and forget that
> it might be an error code.

My plan is to go around and convert these myself.  All of the examples
in the kernel will check the return value for error.  We can go one
further and author a Coccinelle rule to enforce the semantics.

> So you replace one set of errors (truncated data), with another
> worse set (eg write before start of buffer).

Under-running the buffer is no worse over-running.  However, as I say,
we're going to make a concerted effort to prevent that via various
proactive and passive measures.

> I'm sure that the safest return for 'truncated' is the buffer length.
> The a series of statements like:
> 	buf += xxx(buf, buf_end - buf, .....);
> can all be called with a single overflow check at the end.
>
> Forget the check, and the length just contains a trailing '\0'
> which might cause confusion but isn't going to immediately
> break the world.

snprintf() does this and has been proven to cause buffer-overflows.
There have been multiple articles authored describing why using
snprintf() is not generally a good idea for the masses including the 2
linked in the commit message:

LWN: snprintf() confusion
  https://lwn.net/Articles/69419/

KSPP: Replace uses of snprintf() and vsnprintf()
  https://github.com/KSPP/linux/issues/105

Yes, you should check ssprintf() for error.  This is no different to the
many hundreds of APIs where this is also a stipulation.  Not checking
(m)any of the memory allocation APIs for error will also lead to similar
results which is why we enforce the check.
David Laight Jan. 29, 2024, 9:39 a.m. UTC | #5
...
> > I'm sure that the safest return for 'truncated' is the buffer length.
> > The a series of statements like:
> > 	buf += xxx(buf, buf_end - buf, .....);
> > can all be called with a single overflow check at the end.
> >
> > Forget the check, and the length just contains a trailing '\0'
> > which might cause confusion but isn't going to immediately
> > break the world.
> 
> snprintf() does this and has been proven to cause buffer-overflows.
> There have been multiple articles authored describing why using
> snprintf() is not generally a good idea for the masses including the 2
> linked in the commit message:

snprintf() returns the number of bytes that would have been output [1].
I'm not suggesting that, or not terminating the buffer.
Just returning the length including the '\0' (unless length was zero).
This still lets the code check for overflow but isn't going to
generate a pointer outside the buffer if used to update a pointer.

[1] I'm pretty certain this is because the original libc version
of sprintf() allocated a FILE structure on stack (fully buffered)
and called fprintf().
snprintf() would have been done the same way but with something
to stop the buffer being flushed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Lee Jones Jan. 29, 2024, 9:52 a.m. UTC | #6
On Mon, 29 Jan 2024, David Laight wrote:

> ...
> > > I'm sure that the safest return for 'truncated' is the buffer length.
> > > The a series of statements like:
> > > 	buf += xxx(buf, buf_end - buf, .....);
> > > can all be called with a single overflow check at the end.
> > >
> > > Forget the check, and the length just contains a trailing '\0'
> > > which might cause confusion but isn't going to immediately
> > > break the world.
> > 
> > snprintf() does this and has been proven to cause buffer-overflows.
> > There have been multiple articles authored describing why using
> > snprintf() is not generally a good idea for the masses including the 2
> > linked in the commit message:
> 
> snprintf() returns the number of bytes that would have been output [1].
> I'm not suggesting that, or not terminating the buffer.
> Just returning the length including the '\0' (unless length was zero).
> This still lets the code check for overflow but isn't going to
> generate a pointer outside the buffer if used to update a pointer.

I see.  Well I'm not married to my solution.  However, I am convinced
that the 2 solutions currently offered can be improved upon.  If you or
anyone else has a better solution, I'd be more than happy to implement
and switch to it.

Let me have a think about the solution you suggest and get back to you.

> [1] I'm pretty certain this is because the original libc version
> of sprintf() allocated a FILE structure on stack (fully buffered)
> and called fprintf().
> snprintf() would have been done the same way but with something
> to stop the buffer being flushed.

Interesting.  Thanks for the background.
Lee Jones Jan. 30, 2024, 3:07 p.m. UTC | #7
On Mon, 29 Jan 2024, Lee Jones wrote:

> On Mon, 29 Jan 2024, David Laight wrote:
> 
> > ...
> > > > I'm sure that the safest return for 'truncated' is the buffer length.
> > > > The a series of statements like:
> > > > 	buf += xxx(buf, buf_end - buf, .....);
> > > > can all be called with a single overflow check at the end.
> > > >
> > > > Forget the check, and the length just contains a trailing '\0'
> > > > which might cause confusion but isn't going to immediately
> > > > break the world.
> > > 
> > > snprintf() does this and has been proven to cause buffer-overflows.
> > > There have been multiple articles authored describing why using
> > > snprintf() is not generally a good idea for the masses including the 2
> > > linked in the commit message:
> > 
> > snprintf() returns the number of bytes that would have been output [1].
> > I'm not suggesting that, or not terminating the buffer.
> > Just returning the length including the '\0' (unless length was zero).
> > This still lets the code check for overflow but isn't going to
> > generate a pointer outside the buffer if used to update a pointer.
> 
> I see.  Well I'm not married to my solution.  However, I am convinced
> that the 2 solutions currently offered can be improved upon.  If you or
> anyone else has a better solution, I'd be more than happy to implement
> and switch to it.
> 
> Let me have a think about the solution you suggest and get back to you.

Okay, I've written a bunch of simple test cases and results are
positive.  It seems to achieve my aim whilst minimising any potential
pitfalls.

 - Success returns Bytes actually written - no functional change
 - Overflow returns the size of the buffer - which makes the result
   a) testable for overflow
   b) non-catastrophic if accidentally used to manipulate later sizes

    int size = 10;
    char buf[size];
    char *b = buf;

    ret = spprintf(b, size, "1234");
    size -= ret;
    b += ret;
    // ret = 4  size = 6  buf = "1234\0"

    ret = spprintf(b, size, "5678");
    size -= ret;
    b += ret;
    // ret = 4  size = 2  buf = "12345678\0"

    ret = spprintf(b, size, "9***");
    size -= ret;
    b += ret;
    // ret = 2  size = 0  buf = "123456789\0"

Since size is now 0, further calls result in no changes of state.

    ret = spprintf(b, size, "----");
    size -= ret;
    b += ret;
    // ret = 0  size = 0  buf = "123456789\0"

I'll knock this up and submit a patch.
Rasmus Villemoes Jan. 30, 2024, 3:18 p.m. UTC | #8
On 30/01/2024 16.07, Lee Jones wrote:
> On Mon, 29 Jan 2024, Lee Jones wrote:
> 
>> On Mon, 29 Jan 2024, David Laight wrote:
>>
>>> ...
>>>>> I'm sure that the safest return for 'truncated' is the buffer length.
>>>>> The a series of statements like:
>>>>> 	buf += xxx(buf, buf_end - buf, .....);
>>>>> can all be called with a single overflow check at the end.
>>>>>
>>>>> Forget the check, and the length just contains a trailing '\0'
>>>>> which might cause confusion but isn't going to immediately
>>>>> break the world.
>>>>
>>>> snprintf() does this and has been proven to cause buffer-overflows.
>>>> There have been multiple articles authored describing why using
>>>> snprintf() is not generally a good idea for the masses including the 2
>>>> linked in the commit message:
>>>
>>> snprintf() returns the number of bytes that would have been output [1].
>>> I'm not suggesting that, or not terminating the buffer.
>>> Just returning the length including the '\0' (unless length was zero).
>>> This still lets the code check for overflow but isn't going to
>>> generate a pointer outside the buffer if used to update a pointer.
>>
>> I see.  Well I'm not married to my solution.  However, I am convinced
>> that the 2 solutions currently offered can be improved upon.  If you or
>> anyone else has a better solution, I'd be more than happy to implement
>> and switch to it.
>>
>> Let me have a think about the solution you suggest and get back to you.
> 
> Okay, I've written a bunch of simple test cases and results are
> positive.  It seems to achieve my aim whilst minimising any potential
> pitfalls.
> 
>  - Success returns Bytes actually written - no functional change
>  - Overflow returns the size of the buffer - which makes the result
>    a) testable for overflow
>    b) non-catastrophic if accidentally used to manipulate later sizes

You are describing scnprintf(), which almost does exactly that. The last
thing we need is another interface with almost identical semantics.

>     int size = 10;
>     char buf[size];
>     char *b = buf;
> 
>     ret = spprintf(b, size, "1234");
>     size -= ret;
>     b += ret;
>     // ret = 4  size = 6  buf = "1234\0"
> 
>     ret = spprintf(b, size, "5678");
>     size -= ret;
>     b += ret;
>     // ret = 4  size = 2  buf = "12345678\0"
> 
>     ret = spprintf(b, size, "9***");
>     size -= ret;
>     b += ret;
>     // ret = 2  size = 0  buf = "123456789\0"

So here scnprint() would have returned 1, leaving size at 1. scnprintf()
has the invariant that, for non-zero size, the return value is strictly
less than that size, so when passed a size of 1, all subsequent calls
return 0 (corresponding to the fact that all it could do was to write
the '\0' terminator).

This pattern already exists, and is really the reason scnprint exists.
Yes, scnprintf() cannot distinguish overflow from
it-just-exactly-fitted. Maybe it would have been better to make it work
like this, but I don't think there's a real use - and we do have
seq_buf() if one really wants an interface that can build a string
piece-meal while keeping track of whether it ever caused overflow.

Rasmus
Lee Jones Jan. 30, 2024, 3:53 p.m. UTC | #9
On Tue, 30 Jan 2024, Rasmus Villemoes wrote:

> On 30/01/2024 16.07, Lee Jones wrote:
> > On Mon, 29 Jan 2024, Lee Jones wrote:
> > 
> >> On Mon, 29 Jan 2024, David Laight wrote:
> >>
> >>> ...
> >>>>> I'm sure that the safest return for 'truncated' is the buffer length.
> >>>>> The a series of statements like:
> >>>>> 	buf += xxx(buf, buf_end - buf, .....);
> >>>>> can all be called with a single overflow check at the end.
> >>>>>
> >>>>> Forget the check, and the length just contains a trailing '\0'
> >>>>> which might cause confusion but isn't going to immediately
> >>>>> break the world.
> >>>>
> >>>> snprintf() does this and has been proven to cause buffer-overflows.
> >>>> There have been multiple articles authored describing why using
> >>>> snprintf() is not generally a good idea for the masses including the 2
> >>>> linked in the commit message:
> >>>
> >>> snprintf() returns the number of bytes that would have been output [1].
> >>> I'm not suggesting that, or not terminating the buffer.
> >>> Just returning the length including the '\0' (unless length was zero).
> >>> This still lets the code check for overflow but isn't going to
> >>> generate a pointer outside the buffer if used to update a pointer.
> >>
> >> I see.  Well I'm not married to my solution.  However, I am convinced
> >> that the 2 solutions currently offered can be improved upon.  If you or
> >> anyone else has a better solution, I'd be more than happy to implement
> >> and switch to it.
> >>
> >> Let me have a think about the solution you suggest and get back to you.
> > 
> > Okay, I've written a bunch of simple test cases and results are
> > positive.  It seems to achieve my aim whilst minimising any potential
> > pitfalls.
> > 
> >  - Success returns Bytes actually written - no functional change
> >  - Overflow returns the size of the buffer - which makes the result
> >    a) testable for overflow
> >    b) non-catastrophic if accidentally used to manipulate later sizes
> 
> You are describing scnprintf(), which almost does exactly that. The last
> thing we need is another interface with almost identical semantics.

It does, which is why when I first centred my efforts on this task the
plan was to simply switch to it.  However, as I described in the commit
message:

  "Whist executing the task, it quickly became apparent that the initial
  thought of simply s/snprintf/scnprintf/ wasn't going to be adequate
  for a number of cases.  Specifically ones where the caller needs to
  know whether the given string ends up being truncated."

A great deal of callers want to know if the string they attempted to
form was successful.  A malformed string would lead to oddities in the
best cases and various device/probing/matching failures in the worst.

> >     int size = 10;
> >     char buf[size];
> >     char *b = buf;
> > 
> >     ret = spprintf(b, size, "1234");
> >     size -= ret;
> >     b += ret;
> >     // ret = 4  size = 6  buf = "1234\0"
> > 
> >     ret = spprintf(b, size, "5678");
> >     size -= ret;
> >     b += ret;
> >     // ret = 4  size = 2  buf = "12345678\0"
> > 
> >     ret = spprintf(b, size, "9***");
> >     size -= ret;
> >     b += ret;
> >     // ret = 2  size = 0  buf = "123456789\0"
> 
> So here scnprint() would have returned 1, leaving size at 1. scnprintf()
> has the invariant that, for non-zero size, the return value is strictly
> less than that size, so when passed a size of 1, all subsequent calls
> return 0 (corresponding to the fact that all it could do was to write
> the '\0' terminator).
> 
> This pattern already exists, and is really the reason scnprint exists.
> Yes, scnprintf() cannot distinguish overflow from
> it-just-exactly-fitted. Maybe it would have been better to make it work
> like this, but I don't think there's a real use

There are real use-cases.  They are what brought me here.

> and we do have
> seq_buf() if one really wants an interface that can build a string
> piece-meal while keeping track of whether it ever caused overflow.

seq_buf_*() looks okay, but it's petty heavy requiring what looks like
the buffers to be initialised with an API call before use.  We're
looking for something more light weight.

scnprint() had clear safety centric improvements over snprintf() and
spprintf() adds an additional layer of return value checking on top of
that.

I'm not sure I understand the resistance to something which is needed
and has clear benefits over what presently exists just for the sake of a
few lines of code.  I'd be on board if it were change for the sake of
change, but the added flexibility and ease of use is evident.
Kees Cook Jan. 30, 2024, 9:55 p.m. UTC | #10
On Tue, Jan 30, 2024 at 04:18:42PM +0100, Rasmus Villemoes wrote:
> So here scnprint() would have returned 1, leaving size at 1. scnprintf()
> has the invariant that, for non-zero size, the return value is strictly
> less than that size, so when passed a size of 1, all subsequent calls
> return 0 (corresponding to the fact that all it could do was to write
> the '\0' terminator).
> 
> This pattern already exists, and is really the reason scnprint exists.
> Yes, scnprintf() cannot distinguish overflow from
> it-just-exactly-fitted. Maybe it would have been better to make it work
> like this, but I don't think there's a real use - and we do have
> seq_buf() if one really wants an interface that can build a string
> piece-meal while keeping track of whether it ever caused overflow.

Yeah, I think we can take the handful of places that really need to know
about the overflow and can't reliably use scnprintf() and migrate them
to the seq_buf API. It should be much easier to use now[1] too.

That way we won't add a new string API, and we can continue to remove
snprintf.

-Kees

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=dcc4e5728eeaeda84878ca0018758cff1abfca21
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=7a8e9cdf9405819105ae7405cd91e482bf574b01
Lee Jones Jan. 31, 2024, 8:36 a.m. UTC | #11
On Tue, 30 Jan 2024, Kees Cook wrote:

> On Tue, Jan 30, 2024 at 04:18:42PM +0100, Rasmus Villemoes wrote:
> > So here scnprint() would have returned 1, leaving size at 1. scnprintf()
> > has the invariant that, for non-zero size, the return value is strictly
> > less than that size, so when passed a size of 1, all subsequent calls
> > return 0 (corresponding to the fact that all it could do was to write
> > the '\0' terminator).
> > 
> > This pattern already exists, and is really the reason scnprint exists.
> > Yes, scnprintf() cannot distinguish overflow from
> > it-just-exactly-fitted. Maybe it would have been better to make it work
> > like this, but I don't think there's a real use - and we do have
> > seq_buf() if one really wants an interface that can build a string
> > piece-meal while keeping track of whether it ever caused overflow.
> 
> Yeah, I think we can take the handful of places that really need to know
> about the overflow and can't reliably use scnprintf() and migrate them
> to the seq_buf API. It should be much easier to use now[1] too.
> 
> That way we won't add a new string API, and we can continue to remove
> snprintf.

This looks promising.  I'll have a look and get back to you.

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=dcc4e5728eeaeda84878ca0018758cff1abfca21
>     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=7a8e9cdf9405819105ae7405cd91e482bf574b01
Petr Mladek Feb. 8, 2024, 4:24 p.m. UTC | #12
On Tue 2024-01-30 15:53:36, Lee Jones wrote:
> On Tue, 30 Jan 2024, Rasmus Villemoes wrote:
> > On 30/01/2024 16.07, Lee Jones wrote:
> > > On Mon, 29 Jan 2024, Lee Jones wrote:
> > >> On Mon, 29 Jan 2024, David Laight wrote:
> > >>>> snprintf() does this and has been proven to cause buffer-overflows.
> > >>>> There have been multiple articles authored describing why using
> > >>>> snprintf() is not generally a good idea for the masses including the 2
> > >>>> linked in the commit message:
> > >>>
> > >>> snprintf() returns the number of bytes that would have been output [1].
>
> > > Okay, I've written a bunch of simple test cases and results are
> > > positive.  It seems to achieve my aim whilst minimising any potential
> > > pitfalls.
> > > 
> > >  - Success returns Bytes actually written - no functional change
> > >  - Overflow returns the size of the buffer - which makes the result
> > >    a) testable for overflow
> > >    b) non-catastrophic if accidentally used to manipulate later sizes
> > 
> > You are describing scnprintf(), which almost does exactly that. The last
> > thing we need is another interface with almost identical semantics.
> 
> It does, which is why when I first centred my efforts on this task the
> plan was to simply switch to it.  However, as I described in the commit
> message:
> 
>   "Whist executing the task, it quickly became apparent that the initial
>   thought of simply s/snprintf/scnprintf/ wasn't going to be adequate
>   for a number of cases.  Specifically ones where the caller needs to
>   know whether the given string ends up being truncated."
> 
> A great deal of callers want to know if the string they attempted to
> form was successful.  A malformed string would lead to oddities in the
> best cases and various device/probing/matching failures in the worst.
> 
> > >     int size = 10;
> > >     char buf[size];
> > >     char *b = buf;
> > > 
> > >     ret = spprintf(b, size, "1234");
> > >     size -= ret;
> > >     b += ret;
> > >     // ret = 4  size = 6  buf = "1234\0"
> > > 
> > >     ret = spprintf(b, size, "5678");
> > >     size -= ret;
> > >     b += ret;
> > >     // ret = 4  size = 2  buf = "12345678\0"
> > > 
> > >     ret = spprintf(b, size, "9***");
> > >     size -= ret;
> > >     b += ret;
> > >     // ret = 2  size = 0  buf = "123456789\0"
> > 
> > So here scnprint() would have returned 1, leaving size at 1. scnprintf()
> > has the invariant that, for non-zero size, the return value is strictly
> > less than that size, so when passed a size of 1, all subsequent calls
> > return 0 (corresponding to the fact that all it could do was to write
> > the '\0' terminator).
> > 
> > This pattern already exists, and is really the reason scnprint exists.
> > Yes, scnprintf() cannot distinguish overflow from
> > it-just-exactly-fitted. Maybe it would have been better to make it work
> > like this, but I don't think there's a real use
> 
> There are real use-cases.  They are what brought me here.
>
> > and we do have
> > seq_buf() if one really wants an interface that can build a string
> > piece-meal while keeping track of whether it ever caused overflow.
> 
> seq_buf_*() looks okay, but it's petty heavy requiring what looks like
> the buffers to be initialised with an API call before use.  We're
> looking for something more light weight.
> 
> scnprint() had clear safety centric improvements over snprintf() and
> spprintf() adds an additional layer of return value checking on top of
> that.
> 
> I'm not sure I understand the resistance to something which is needed
> and has clear benefits over what presently exists just for the sake of a
> few lines of code.  I'd be on board if it were change for the sake of
> change, but the added flexibility and ease of use is evident.

My view is the following:

First, I agree that snprintf() is dangerous and should be replaced.

I think that the resistance is because:

  + ssprintf() has its danger as well. You wrote [1] that
    "Under-running the buffer is no worse over-running". But it is
    no better either.

  + More APIs might create more confusion and increase the risk of
    a misuse.

  + spprintf() does not allow to detect truncated string. It means
    that it does not provide any advantage over the existing scnprintf().

So, if we could solve it another way then it might be better.


That said, people tend to look how an API is used instead of RTFM.
They copy good or bad patterns. There is even a term for this
but I can't remember it new.

So, if we introduce a new API and provide some good examples
then there is a good chance that people will continue using
it correctly. And it even might be checked more easily.

There is a similarity with alloc() APIs. Good (most) programmers
know that they need to check kalloc() return value. They might
also learn that they need to check also return value from
ssnprintf() of how we call it. Especially, when we provide
good examples from the very beginning. Also it might
be checked by Coccinelle.


Finally, if you eventually send a new version of a string API
then please add Linus into Cc. He has huge experience and very
good feeling for what is good and what is wrong. And he
is interested into exactly these types of APIs.


[1] https://lore.kernel.org/r/20240129092440.GA1708181@google.com

Best Regards,
Petr
Lee Jones Feb. 8, 2024, 5:05 p.m. UTC | #13
On Thu, 08 Feb 2024, Petr Mladek wrote:

> On Tue 2024-01-30 15:53:36, Lee Jones wrote:
> > On Tue, 30 Jan 2024, Rasmus Villemoes wrote:
> > > On 30/01/2024 16.07, Lee Jones wrote:
> > > > On Mon, 29 Jan 2024, Lee Jones wrote:
> > > >> On Mon, 29 Jan 2024, David Laight wrote:
> > > >>>> snprintf() does this and has been proven to cause buffer-overflows.
> > > >>>> There have been multiple articles authored describing why using
> > > >>>> snprintf() is not generally a good idea for the masses including the 2
> > > >>>> linked in the commit message:
> > > >>>
> > > >>> snprintf() returns the number of bytes that would have been output [1].
> >
> > > > Okay, I've written a bunch of simple test cases and results are
> > > > positive.  It seems to achieve my aim whilst minimising any potential
> > > > pitfalls.
> > > > 
> > > >  - Success returns Bytes actually written - no functional change
> > > >  - Overflow returns the size of the buffer - which makes the result
> > > >    a) testable for overflow
> > > >    b) non-catastrophic if accidentally used to manipulate later sizes
> > > 
> > > You are describing scnprintf(), which almost does exactly that. The last
> > > thing we need is another interface with almost identical semantics.
> > 
> > It does, which is why when I first centred my efforts on this task the
> > plan was to simply switch to it.  However, as I described in the commit
> > message:
> > 
> >   "Whist executing the task, it quickly became apparent that the initial
> >   thought of simply s/snprintf/scnprintf/ wasn't going to be adequate
> >   for a number of cases.  Specifically ones where the caller needs to
> >   know whether the given string ends up being truncated."
> > 
> > A great deal of callers want to know if the string they attempted to
> > form was successful.  A malformed string would lead to oddities in the
> > best cases and various device/probing/matching failures in the worst.
> > 
> > > >     int size = 10;
> > > >     char buf[size];
> > > >     char *b = buf;
> > > > 
> > > >     ret = spprintf(b, size, "1234");
> > > >     size -= ret;
> > > >     b += ret;
> > > >     // ret = 4  size = 6  buf = "1234\0"
> > > > 
> > > >     ret = spprintf(b, size, "5678");
> > > >     size -= ret;
> > > >     b += ret;
> > > >     // ret = 4  size = 2  buf = "12345678\0"
> > > > 
> > > >     ret = spprintf(b, size, "9***");
> > > >     size -= ret;
> > > >     b += ret;
> > > >     // ret = 2  size = 0  buf = "123456789\0"
> > > 
> > > So here scnprint() would have returned 1, leaving size at 1. scnprintf()
> > > has the invariant that, for non-zero size, the return value is strictly
> > > less than that size, so when passed a size of 1, all subsequent calls
> > > return 0 (corresponding to the fact that all it could do was to write
> > > the '\0' terminator).
> > > 
> > > This pattern already exists, and is really the reason scnprint exists.
> > > Yes, scnprintf() cannot distinguish overflow from
> > > it-just-exactly-fitted. Maybe it would have been better to make it work
> > > like this, but I don't think there's a real use
> > 
> > There are real use-cases.  They are what brought me here.
> >
> > > and we do have
> > > seq_buf() if one really wants an interface that can build a string
> > > piece-meal while keeping track of whether it ever caused overflow.
> > 
> > seq_buf_*() looks okay, but it's petty heavy requiring what looks like
> > the buffers to be initialised with an API call before use.  We're
> > looking for something more light weight.
> > 
> > scnprint() had clear safety centric improvements over snprintf() and
> > spprintf() adds an additional layer of return value checking on top of
> > that.
> > 
> > I'm not sure I understand the resistance to something which is needed
> > and has clear benefits over what presently exists just for the sake of a
> > few lines of code.  I'd be on board if it were change for the sake of
> > change, but the added flexibility and ease of use is evident.
> 
> My view is the following:
> 
> First, I agree that snprintf() is dangerous and should be replaced.

Thanks for your input Petr.

> I think that the resistance is because:
> 
>   + ssprintf() has its danger as well. You wrote [1] that
>     "Under-running the buffer is no worse over-running". But it is
>     no better either.
> 
>   + More APIs might create more confusion and increase the risk of
>     a misuse.
> 
>   + spprintf() does not allow to detect truncated string. It means
>     that it does not provide any advantage over the existing scnprintf().

spprintf() can detect a truncated string:

  ret = spprintf(buf, 10, "1234567890");
  if (ret == 10) {
    /* The buffer truncated */
  }

And if the return value is used directly on a subsequent call:

  spprintf(buf + ret, 10 - ret, "1234567890"); /* No underflow/overflow */

Caveat: I just knocked this example up off the top of my head.  It's
        completely untested and probably contains lots of errors.

> So, if we could solve it another way then it might be better.

I'm still quite fond of ssprintf() since it seems to solve the present
issues and it's very light (compared with the alternative), but I think
the preferred approach is to use seq_buf in cases where the caller needs
to know if truncation occurred.

> That said, people tend to look how an API is used instead of RTFM.
> They copy good or bad patterns. There is even a term for this
> but I can't remember it new.
> 
> So, if we introduce a new API and provide some good examples
> then there is a good chance that people will continue using
> it correctly. And it even might be checked more easily.
> 
> There is a similarity with alloc() APIs. Good (most) programmers
> know that they need to check kalloc() return value. They might
> also learn that they need to check also return value from
> ssnprintf() of how we call it. Especially, when we provide
> good examples from the very beginning. Also it might
> be checked by Coccinelle.
> 
> 
> Finally, if you eventually send a new version of a string API
> then please add Linus into Cc. He has huge experience and very
> good feeling for what is good and what is wrong. And he
> is interested into exactly these types of APIs.
> 
> 
> [1] https://lore.kernel.org/r/20240129092440.GA1708181@google.com
> 
> Best Regards,
> Petr
diff mbox series

Patch

diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h
index 33dcbec719254..2a3db6285492a 100644
--- a/include/linux/sprintf.h
+++ b/include/linux/sprintf.h
@@ -13,6 +13,8 @@  __printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...);
 __printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
 __printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...);
 __printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
+__printf(3, 4) int ssprintf(char *buf, size_t size, const char *fmt, ...);
+__printf(3, 0) int vssprintf(char *buf, size_t size, const char *fmt, va_list args);
 __printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...);
 __printf(2, 0) __malloc char *kvasprintf(gfp_t gfp, const char *fmt, va_list args);
 __printf(2, 0) const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args);
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 552738f14275a..01a1060ca0b0d 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2936,6 +2936,40 @@  int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
 }
 EXPORT_SYMBOL(vscnprintf);
 
+/**
+ * vssprintf - Format a string and place it in a buffer
+ * @buf: The buffer to place the result into
+ * @size: The size of the buffer, including the trailing null space
+ * @fmt: The format string to use
+ * @args: Arguments for the format string
+ *
+ * The return value is the number of characters which have been written into
+ * the @buf not including the trailing '\0' or -E2BIG if the string was
+ * truncated. If @size is == 0 the function returns 0.
+ *
+ * If you're not already dealing with a va_list consider using ssprintf().
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ */
+int vssprintf(char *buf, size_t size, const char *fmt, va_list args)
+{
+	int i;
+
+	if (unlikely(!size))
+		return 0;
+
+	i = vsnprintf(buf, size, fmt, args);
+
+	if (unlikely(i >= size))
+		return -E2BIG;
+
+	if (likely(i < size))
+		return i;
+
+	return size - 1;
+}
+EXPORT_SYMBOL(vssprintf);
+
 /**
  * snprintf - Format a string and place it in a buffer
  * @buf: The buffer to place the result into
@@ -2987,6 +3021,30 @@  int scnprintf(char *buf, size_t size, const char *fmt, ...)
 }
 EXPORT_SYMBOL(scnprintf);
 
+/**
+ * ssprintf - Format a string and place it in a buffer
+ * @buf: The buffer to place the result into
+ * @size: The size of the buffer, including the trailing null space
+ * @fmt: The format string to use
+ * @...: Arguments for the format string
+ *
+ * The return value is the number of characters written into @buf not including
+ * the trailing '\0' or -E2BIG if the string was truncated. If @size is == 0
+ * the function returns 0.
+ */
+int ssprintf(char *buf, size_t size, const char *fmt, ...)
+{
+	va_list args;
+	int i;
+
+	va_start(args, fmt);
+	i = vssprintf(buf, size, fmt, args);
+	va_end(args);
+
+	return i;
+}
+EXPORT_SYMBOL(ssprintf);
+
 /**
  * vsprintf - Format a string and place it in a buffer
  * @buf: The buffer to place the result into