Message ID | 20240125083921.1312709-1-lee@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/1] lib/vsprintf: Implement ssprintf() to catch truncated strings | expand |
On 25/01/2024 09.39, Lee Jones wrote: > There is an ongoing effort to replace the use of {v}snprintf() variants > with safer alternatives - for a more in depth view, see Jon's write-up > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1]. > > Whist executing the task, it quickly became apparent that the initial > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for > a number of cases. Specifically ones where the caller needs to know > whether the given string ends up being truncated. This is where > ssprintf() [based on similar semantics of strscpy()] comes in, since it > takes the best parts of both of the aforementioned variants. It has the > testability of truncation of snprintf() and returns the number of Bytes > *actually* written, similar to scnprintf(), making it a very programmer > friendly alternative. > > Here's some examples to show the differences: > > Success: No truncation - all 9 Bytes successfully written to the buffer > > ret = snprintf (buf, 10, "%s", "123456789"); // ret = 9 > ret = scnprintf(buf, 10, "%s", "123456789"); // ret = 9 > ret = ssprintf (buf, 10, "%s", "123456789"); // ret = 9 > > Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated > > ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10 > > Reports: "10 Bytes would have been written if buf was large enough" > Issue: Programmers need to know/remember to check ret against "10" Yeah, so I'm not at all sure we need yet-another-wrapper with yet-another-hard-to-read-prefix when people can just RTFM and learn how to check for truncation or whatnot. But if you do this: > +/** > + * vssprintf - Format a string and place it in a buffer > + * @buf: The buffer to place the result into > + * @size: The size of the buffer, including the trailing null space > + * @fmt: The format string to use > + * @args: Arguments for the format string > + * > + * The return value is the number of characters which have been written into > + * the @buf not including the trailing '\0' or -E2BIG if the string was > + * truncated. If @size is == 0 the function returns 0. > + * > + * If you're not already dealing with a va_list consider using ssprintf(). > + * > + * See the vsnprintf() documentation for format string extensions over C99. > + */ > +int vssprintf(char *buf, size_t size, const char *fmt, va_list args) > +{ > + int i; > + > + if (unlikely(!size)) > + return 0; No, don't special-case size 0 here. Passing size==0 should just guarantee -E2BIG because that's essentially a programmer error, and the calling code is then at least much more likely to not believe that buf now contains a nul-terminated (empty) string. And since it's essentially a bug, there's no need to special-case size 0 to avoid calling vsnprintf(), just let it be caught by the i >= size check. > + i = vsnprintf(buf, size, fmt, args); > + > + if (unlikely(i >= size)) > + return -E2BIG; > + > + if (likely(i < size)) > + return i; Those two ifs are mutually exclusive, so why the second if() and not just a direct "return i"? That final "return size-1" is unreachable, and confusing. Rasmus
On Thu, 25 Jan 2024, Rasmus Villemoes wrote: > On 25/01/2024 09.39, Lee Jones wrote: > > There is an ongoing effort to replace the use of {v}snprintf() variants > > with safer alternatives - for a more in depth view, see Jon's write-up > > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1]. > > > > Whist executing the task, it quickly became apparent that the initial > > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for > > a number of cases. Specifically ones where the caller needs to know > > whether the given string ends up being truncated. This is where > > ssprintf() [based on similar semantics of strscpy()] comes in, since it > > takes the best parts of both of the aforementioned variants. It has the > > testability of truncation of snprintf() and returns the number of Bytes > > *actually* written, similar to scnprintf(), making it a very programmer > > friendly alternative. > > > > Here's some examples to show the differences: > > > > Success: No truncation - all 9 Bytes successfully written to the buffer > > > > ret = snprintf (buf, 10, "%s", "123456789"); // ret = 9 > > ret = scnprintf(buf, 10, "%s", "123456789"); // ret = 9 > > ret = ssprintf (buf, 10, "%s", "123456789"); // ret = 9 > > > > Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated > > > > ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10 > > > > Reports: "10 Bytes would have been written if buf was large enough" > > Issue: Programmers need to know/remember to check ret against "10" > > Yeah, so I'm not at all sure we need yet-another-wrapper with > yet-another-hard-to-read-prefix when people can just RTFM and learn how > to check for truncation or whatnot. But if you do this: As wonderful as it would be for people to "just RTFM", we're seeing a large number of cases where this isn't happening. Providing a more programmer friendly way is thought, by people way smarter than me, to be a solid means to solve this issue. Please also see Kees Cook's related work to remove strlcpy() use. > > +/** > > + * vssprintf - Format a string and place it in a buffer > > + * @buf: The buffer to place the result into > > + * @size: The size of the buffer, including the trailing null space > > + * @fmt: The format string to use > > + * @args: Arguments for the format string > > + * > > + * The return value is the number of characters which have been written into > > + * the @buf not including the trailing '\0' or -E2BIG if the string was > > + * truncated. If @size is == 0 the function returns 0. > > + * > > + * If you're not already dealing with a va_list consider using ssprintf(). > > + * > > + * See the vsnprintf() documentation for format string extensions over C99. > > + */ > > +int vssprintf(char *buf, size_t size, const char *fmt, va_list args) > > +{ > > + int i; > > + > > + if (unlikely(!size)) > > + return 0; > > No, don't special-case size 0 here. Passing size==0 should just > guarantee -E2BIG because that's essentially a programmer error, and the > calling code is then at least much more likely to not believe that buf > now contains a nul-terminated (empty) string. > > And since it's essentially a bug, there's no need to special-case size 0 > to avoid calling vsnprintf(), just let it be caught by the i >= size check. Agree. Thanks for the feedback. I will change this. > > + i = vsnprintf(buf, size, fmt, args); > > + > > + if (unlikely(i >= size)) > > + return -E2BIG; > > + > > + if (likely(i < size)) > > + return i; > > Those two ifs are mutually exclusive, so why the second if() and not > just a direct "return i"? That final "return size-1" is unreachable, and > confusing. That's true. The last line of vscnprintf() essentially means that the data was truncated, which is caught by the new check. So it should be reworked to look like this: ``` if (likely(i < size)) return i; return -E2BIG; ``` Thanks again. That's very helpful.
From: Lee Jones > Sent: 25 January 2024 10:36 > On Thu, 25 Jan 2024, Rasmus Villemoes wrote: > > > On 25/01/2024 09.39, Lee Jones wrote: > > > There is an ongoing effort to replace the use of {v}snprintf() variants > > > with safer alternatives - for a more in depth view, see Jon's write-up > > > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1]. > > > > > > Whist executing the task, it quickly became apparent that the initial > > > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for > > > a number of cases. Specifically ones where the caller needs to know > > > whether the given string ends up being truncated. This is where > > > ssprintf() [based on similar semantics of strscpy()] comes in, since it > > > takes the best parts of both of the aforementioned variants. It has the > > > testability of truncation of snprintf() and returns the number of Bytes > > > *actually* written, similar to scnprintf(), making it a very programmer > > > friendly alternative. > > > > > > Here's some examples to show the differences: > > > > > > Success: No truncation - all 9 Bytes successfully written to the buffer > > > > > > ret = snprintf (buf, 10, "%s", "123456789"); // ret = 9 > > > ret = scnprintf(buf, 10, "%s", "123456789"); // ret = 9 > > > ret = ssprintf (buf, 10, "%s", "123456789"); // ret = 9 > > > > > > Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated > > > > > > ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10 > > > > > > Reports: "10 Bytes would have been written if buf was large enough" > > > Issue: Programmers need to know/remember to check ret against "10" > > > > Yeah, so I'm not at all sure we need yet-another-wrapper with > > yet-another-hard-to-read-prefix when people can just RTFM and learn how > > to check for truncation or whatnot. But if you do this: > > As wonderful as it would be for people to "just RTFM", we're seeing a > large number of cases where this isn't happening. Providing a more > programmer friendly way is thought, by people way smarter than me, to be > a solid means to solve this issue. Please also see Kees Cook's related > work to remove strlcpy() use. My worry is that people will believe the length and forget that it might be an error code. So you replace one set of errors (truncated data), with another worse set (eg write before start of buffer). I'm sure that the safest return for 'truncated' is the buffer length. The a series of statements like: buf += xxx(buf, buf_end - buf, .....); can all be called with a single overflow check at the end. Forget the check, and the length just contains a trailing '\0' which might cause confusion but isn't going to immediately break the world. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
NB: I was _just_ about to send out v2 with Rasmus's suggestions before I saw your reply. I'm going to submit it anyway and Cc both you and Rasmus. If you still disagree with my suggested approach, we can either continue discussion here or on the new version. More below: > From: Lee Jones > > Sent: 25 January 2024 10:36 > > On Thu, 25 Jan 2024, Rasmus Villemoes wrote: > > > > > On 25/01/2024 09.39, Lee Jones wrote: > > > > There is an ongoing effort to replace the use of {v}snprintf() variants > > > > with safer alternatives - for a more in depth view, see Jon's write-up > > > > on LWN [0] and/or Alex's on the Kernel Self Protection Project [1]. > > > > > > > > Whist executing the task, it quickly became apparent that the initial > > > > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for > > > > a number of cases. Specifically ones where the caller needs to know > > > > whether the given string ends up being truncated. This is where > > > > ssprintf() [based on similar semantics of strscpy()] comes in, since it > > > > takes the best parts of both of the aforementioned variants. It has the > > > > testability of truncation of snprintf() and returns the number of Bytes > > > > *actually* written, similar to scnprintf(), making it a very programmer > > > > friendly alternative. > > > > > > > > Here's some examples to show the differences: > > > > > > > > Success: No truncation - all 9 Bytes successfully written to the buffer > > > > > > > > ret = snprintf (buf, 10, "%s", "123456789"); // ret = 9 > > > > ret = scnprintf(buf, 10, "%s", "123456789"); // ret = 9 > > > > ret = ssprintf (buf, 10, "%s", "123456789"); // ret = 9 > > > > > > > > Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated > > > > > > > > ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10 > > > > > > > > Reports: "10 Bytes would have been written if buf was large enough" > > > > Issue: Programmers need to know/remember to check ret against "10" > > > > > > Yeah, so I'm not at all sure we need yet-another-wrapper with > > > yet-another-hard-to-read-prefix when people can just RTFM and learn how > > > to check for truncation or whatnot. But if you do this: > > > > As wonderful as it would be for people to "just RTFM", we're seeing a > > large number of cases where this isn't happening. Providing a more > > programmer friendly way is thought, by people way smarter than me, to be > > a solid means to solve this issue. Please also see Kees Cook's related > > work to remove strlcpy() use. > > My worry is that people will believe the length and forget that > it might be an error code. My plan is to go around and convert these myself. All of the examples in the kernel will check the return value for error. We can go one further and author a Coccinelle rule to enforce the semantics. > So you replace one set of errors (truncated data), with another > worse set (eg write before start of buffer). Under-running the buffer is no worse over-running. However, as I say, we're going to make a concerted effort to prevent that via various proactive and passive measures. > I'm sure that the safest return for 'truncated' is the buffer length. > The a series of statements like: > buf += xxx(buf, buf_end - buf, .....); > can all be called with a single overflow check at the end. > > Forget the check, and the length just contains a trailing '\0' > which might cause confusion but isn't going to immediately > break the world. snprintf() does this and has been proven to cause buffer-overflows. There have been multiple articles authored describing why using snprintf() is not generally a good idea for the masses including the 2 linked in the commit message: LWN: snprintf() confusion https://lwn.net/Articles/69419/ KSPP: Replace uses of snprintf() and vsnprintf() https://github.com/KSPP/linux/issues/105 Yes, you should check ssprintf() for error. This is no different to the many hundreds of APIs where this is also a stipulation. Not checking (m)any of the memory allocation APIs for error will also lead to similar results which is why we enforce the check.
... > > I'm sure that the safest return for 'truncated' is the buffer length. > > The a series of statements like: > > buf += xxx(buf, buf_end - buf, .....); > > can all be called with a single overflow check at the end. > > > > Forget the check, and the length just contains a trailing '\0' > > which might cause confusion but isn't going to immediately > > break the world. > > snprintf() does this and has been proven to cause buffer-overflows. > There have been multiple articles authored describing why using > snprintf() is not generally a good idea for the masses including the 2 > linked in the commit message: snprintf() returns the number of bytes that would have been output [1]. I'm not suggesting that, or not terminating the buffer. Just returning the length including the '\0' (unless length was zero). This still lets the code check for overflow but isn't going to generate a pointer outside the buffer if used to update a pointer. [1] I'm pretty certain this is because the original libc version of sprintf() allocated a FILE structure on stack (fully buffered) and called fprintf(). snprintf() would have been done the same way but with something to stop the buffer being flushed. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, 29 Jan 2024, David Laight wrote: > ... > > > I'm sure that the safest return for 'truncated' is the buffer length. > > > The a series of statements like: > > > buf += xxx(buf, buf_end - buf, .....); > > > can all be called with a single overflow check at the end. > > > > > > Forget the check, and the length just contains a trailing '\0' > > > which might cause confusion but isn't going to immediately > > > break the world. > > > > snprintf() does this and has been proven to cause buffer-overflows. > > There have been multiple articles authored describing why using > > snprintf() is not generally a good idea for the masses including the 2 > > linked in the commit message: > > snprintf() returns the number of bytes that would have been output [1]. > I'm not suggesting that, or not terminating the buffer. > Just returning the length including the '\0' (unless length was zero). > This still lets the code check for overflow but isn't going to > generate a pointer outside the buffer if used to update a pointer. I see. Well I'm not married to my solution. However, I am convinced that the 2 solutions currently offered can be improved upon. If you or anyone else has a better solution, I'd be more than happy to implement and switch to it. Let me have a think about the solution you suggest and get back to you. > [1] I'm pretty certain this is because the original libc version > of sprintf() allocated a FILE structure on stack (fully buffered) > and called fprintf(). > snprintf() would have been done the same way but with something > to stop the buffer being flushed. Interesting. Thanks for the background.
On Mon, 29 Jan 2024, Lee Jones wrote: > On Mon, 29 Jan 2024, David Laight wrote: > > > ... > > > > I'm sure that the safest return for 'truncated' is the buffer length. > > > > The a series of statements like: > > > > buf += xxx(buf, buf_end - buf, .....); > > > > can all be called with a single overflow check at the end. > > > > > > > > Forget the check, and the length just contains a trailing '\0' > > > > which might cause confusion but isn't going to immediately > > > > break the world. > > > > > > snprintf() does this and has been proven to cause buffer-overflows. > > > There have been multiple articles authored describing why using > > > snprintf() is not generally a good idea for the masses including the 2 > > > linked in the commit message: > > > > snprintf() returns the number of bytes that would have been output [1]. > > I'm not suggesting that, or not terminating the buffer. > > Just returning the length including the '\0' (unless length was zero). > > This still lets the code check for overflow but isn't going to > > generate a pointer outside the buffer if used to update a pointer. > > I see. Well I'm not married to my solution. However, I am convinced > that the 2 solutions currently offered can be improved upon. If you or > anyone else has a better solution, I'd be more than happy to implement > and switch to it. > > Let me have a think about the solution you suggest and get back to you. Okay, I've written a bunch of simple test cases and results are positive. It seems to achieve my aim whilst minimising any potential pitfalls. - Success returns Bytes actually written - no functional change - Overflow returns the size of the buffer - which makes the result a) testable for overflow b) non-catastrophic if accidentally used to manipulate later sizes int size = 10; char buf[size]; char *b = buf; ret = spprintf(b, size, "1234"); size -= ret; b += ret; // ret = 4 size = 6 buf = "1234\0" ret = spprintf(b, size, "5678"); size -= ret; b += ret; // ret = 4 size = 2 buf = "12345678\0" ret = spprintf(b, size, "9***"); size -= ret; b += ret; // ret = 2 size = 0 buf = "123456789\0" Since size is now 0, further calls result in no changes of state. ret = spprintf(b, size, "----"); size -= ret; b += ret; // ret = 0 size = 0 buf = "123456789\0" I'll knock this up and submit a patch.
On 30/01/2024 16.07, Lee Jones wrote: > On Mon, 29 Jan 2024, Lee Jones wrote: > >> On Mon, 29 Jan 2024, David Laight wrote: >> >>> ... >>>>> I'm sure that the safest return for 'truncated' is the buffer length. >>>>> The a series of statements like: >>>>> buf += xxx(buf, buf_end - buf, .....); >>>>> can all be called with a single overflow check at the end. >>>>> >>>>> Forget the check, and the length just contains a trailing '\0' >>>>> which might cause confusion but isn't going to immediately >>>>> break the world. >>>> >>>> snprintf() does this and has been proven to cause buffer-overflows. >>>> There have been multiple articles authored describing why using >>>> snprintf() is not generally a good idea for the masses including the 2 >>>> linked in the commit message: >>> >>> snprintf() returns the number of bytes that would have been output [1]. >>> I'm not suggesting that, or not terminating the buffer. >>> Just returning the length including the '\0' (unless length was zero). >>> This still lets the code check for overflow but isn't going to >>> generate a pointer outside the buffer if used to update a pointer. >> >> I see. Well I'm not married to my solution. However, I am convinced >> that the 2 solutions currently offered can be improved upon. If you or >> anyone else has a better solution, I'd be more than happy to implement >> and switch to it. >> >> Let me have a think about the solution you suggest and get back to you. > > Okay, I've written a bunch of simple test cases and results are > positive. It seems to achieve my aim whilst minimising any potential > pitfalls. > > - Success returns Bytes actually written - no functional change > - Overflow returns the size of the buffer - which makes the result > a) testable for overflow > b) non-catastrophic if accidentally used to manipulate later sizes You are describing scnprintf(), which almost does exactly that. The last thing we need is another interface with almost identical semantics. > int size = 10; > char buf[size]; > char *b = buf; > > ret = spprintf(b, size, "1234"); > size -= ret; > b += ret; > // ret = 4 size = 6 buf = "1234\0" > > ret = spprintf(b, size, "5678"); > size -= ret; > b += ret; > // ret = 4 size = 2 buf = "12345678\0" > > ret = spprintf(b, size, "9***"); > size -= ret; > b += ret; > // ret = 2 size = 0 buf = "123456789\0" So here scnprint() would have returned 1, leaving size at 1. scnprintf() has the invariant that, for non-zero size, the return value is strictly less than that size, so when passed a size of 1, all subsequent calls return 0 (corresponding to the fact that all it could do was to write the '\0' terminator). This pattern already exists, and is really the reason scnprint exists. Yes, scnprintf() cannot distinguish overflow from it-just-exactly-fitted. Maybe it would have been better to make it work like this, but I don't think there's a real use - and we do have seq_buf() if one really wants an interface that can build a string piece-meal while keeping track of whether it ever caused overflow. Rasmus
On Tue, 30 Jan 2024, Rasmus Villemoes wrote: > On 30/01/2024 16.07, Lee Jones wrote: > > On Mon, 29 Jan 2024, Lee Jones wrote: > > > >> On Mon, 29 Jan 2024, David Laight wrote: > >> > >>> ... > >>>>> I'm sure that the safest return for 'truncated' is the buffer length. > >>>>> The a series of statements like: > >>>>> buf += xxx(buf, buf_end - buf, .....); > >>>>> can all be called with a single overflow check at the end. > >>>>> > >>>>> Forget the check, and the length just contains a trailing '\0' > >>>>> which might cause confusion but isn't going to immediately > >>>>> break the world. > >>>> > >>>> snprintf() does this and has been proven to cause buffer-overflows. > >>>> There have been multiple articles authored describing why using > >>>> snprintf() is not generally a good idea for the masses including the 2 > >>>> linked in the commit message: > >>> > >>> snprintf() returns the number of bytes that would have been output [1]. > >>> I'm not suggesting that, or not terminating the buffer. > >>> Just returning the length including the '\0' (unless length was zero). > >>> This still lets the code check for overflow but isn't going to > >>> generate a pointer outside the buffer if used to update a pointer. > >> > >> I see. Well I'm not married to my solution. However, I am convinced > >> that the 2 solutions currently offered can be improved upon. If you or > >> anyone else has a better solution, I'd be more than happy to implement > >> and switch to it. > >> > >> Let me have a think about the solution you suggest and get back to you. > > > > Okay, I've written a bunch of simple test cases and results are > > positive. It seems to achieve my aim whilst minimising any potential > > pitfalls. > > > > - Success returns Bytes actually written - no functional change > > - Overflow returns the size of the buffer - which makes the result > > a) testable for overflow > > b) non-catastrophic if accidentally used to manipulate later sizes > > You are describing scnprintf(), which almost does exactly that. The last > thing we need is another interface with almost identical semantics. It does, which is why when I first centred my efforts on this task the plan was to simply switch to it. However, as I described in the commit message: "Whist executing the task, it quickly became apparent that the initial thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for a number of cases. Specifically ones where the caller needs to know whether the given string ends up being truncated." A great deal of callers want to know if the string they attempted to form was successful. A malformed string would lead to oddities in the best cases and various device/probing/matching failures in the worst. > > int size = 10; > > char buf[size]; > > char *b = buf; > > > > ret = spprintf(b, size, "1234"); > > size -= ret; > > b += ret; > > // ret = 4 size = 6 buf = "1234\0" > > > > ret = spprintf(b, size, "5678"); > > size -= ret; > > b += ret; > > // ret = 4 size = 2 buf = "12345678\0" > > > > ret = spprintf(b, size, "9***"); > > size -= ret; > > b += ret; > > // ret = 2 size = 0 buf = "123456789\0" > > So here scnprint() would have returned 1, leaving size at 1. scnprintf() > has the invariant that, for non-zero size, the return value is strictly > less than that size, so when passed a size of 1, all subsequent calls > return 0 (corresponding to the fact that all it could do was to write > the '\0' terminator). > > This pattern already exists, and is really the reason scnprint exists. > Yes, scnprintf() cannot distinguish overflow from > it-just-exactly-fitted. Maybe it would have been better to make it work > like this, but I don't think there's a real use There are real use-cases. They are what brought me here. > and we do have > seq_buf() if one really wants an interface that can build a string > piece-meal while keeping track of whether it ever caused overflow. seq_buf_*() looks okay, but it's petty heavy requiring what looks like the buffers to be initialised with an API call before use. We're looking for something more light weight. scnprint() had clear safety centric improvements over snprintf() and spprintf() adds an additional layer of return value checking on top of that. I'm not sure I understand the resistance to something which is needed and has clear benefits over what presently exists just for the sake of a few lines of code. I'd be on board if it were change for the sake of change, but the added flexibility and ease of use is evident.
On Tue, Jan 30, 2024 at 04:18:42PM +0100, Rasmus Villemoes wrote: > So here scnprint() would have returned 1, leaving size at 1. scnprintf() > has the invariant that, for non-zero size, the return value is strictly > less than that size, so when passed a size of 1, all subsequent calls > return 0 (corresponding to the fact that all it could do was to write > the '\0' terminator). > > This pattern already exists, and is really the reason scnprint exists. > Yes, scnprintf() cannot distinguish overflow from > it-just-exactly-fitted. Maybe it would have been better to make it work > like this, but I don't think there's a real use - and we do have > seq_buf() if one really wants an interface that can build a string > piece-meal while keeping track of whether it ever caused overflow. Yeah, I think we can take the handful of places that really need to know about the overflow and can't reliably use scnprintf() and migrate them to the seq_buf API. It should be much easier to use now[1] too. That way we won't add a new string API, and we can continue to remove snprintf. -Kees [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=dcc4e5728eeaeda84878ca0018758cff1abfca21 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=7a8e9cdf9405819105ae7405cd91e482bf574b01
On Tue, 30 Jan 2024, Kees Cook wrote: > On Tue, Jan 30, 2024 at 04:18:42PM +0100, Rasmus Villemoes wrote: > > So here scnprint() would have returned 1, leaving size at 1. scnprintf() > > has the invariant that, for non-zero size, the return value is strictly > > less than that size, so when passed a size of 1, all subsequent calls > > return 0 (corresponding to the fact that all it could do was to write > > the '\0' terminator). > > > > This pattern already exists, and is really the reason scnprint exists. > > Yes, scnprintf() cannot distinguish overflow from > > it-just-exactly-fitted. Maybe it would have been better to make it work > > like this, but I don't think there's a real use - and we do have > > seq_buf() if one really wants an interface that can build a string > > piece-meal while keeping track of whether it ever caused overflow. > > Yeah, I think we can take the handful of places that really need to know > about the overflow and can't reliably use scnprintf() and migrate them > to the seq_buf API. It should be much easier to use now[1] too. > > That way we won't add a new string API, and we can continue to remove > snprintf. This looks promising. I'll have a look and get back to you. > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=dcc4e5728eeaeda84878ca0018758cff1abfca21 > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/linux/seq_buf.h?id=7a8e9cdf9405819105ae7405cd91e482bf574b01
On Tue 2024-01-30 15:53:36, Lee Jones wrote: > On Tue, 30 Jan 2024, Rasmus Villemoes wrote: > > On 30/01/2024 16.07, Lee Jones wrote: > > > On Mon, 29 Jan 2024, Lee Jones wrote: > > >> On Mon, 29 Jan 2024, David Laight wrote: > > >>>> snprintf() does this and has been proven to cause buffer-overflows. > > >>>> There have been multiple articles authored describing why using > > >>>> snprintf() is not generally a good idea for the masses including the 2 > > >>>> linked in the commit message: > > >>> > > >>> snprintf() returns the number of bytes that would have been output [1]. > > > > Okay, I've written a bunch of simple test cases and results are > > > positive. It seems to achieve my aim whilst minimising any potential > > > pitfalls. > > > > > > - Success returns Bytes actually written - no functional change > > > - Overflow returns the size of the buffer - which makes the result > > > a) testable for overflow > > > b) non-catastrophic if accidentally used to manipulate later sizes > > > > You are describing scnprintf(), which almost does exactly that. The last > > thing we need is another interface with almost identical semantics. > > It does, which is why when I first centred my efforts on this task the > plan was to simply switch to it. However, as I described in the commit > message: > > "Whist executing the task, it quickly became apparent that the initial > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate > for a number of cases. Specifically ones where the caller needs to > know whether the given string ends up being truncated." > > A great deal of callers want to know if the string they attempted to > form was successful. A malformed string would lead to oddities in the > best cases and various device/probing/matching failures in the worst. > > > > int size = 10; > > > char buf[size]; > > > char *b = buf; > > > > > > ret = spprintf(b, size, "1234"); > > > size -= ret; > > > b += ret; > > > // ret = 4 size = 6 buf = "1234\0" > > > > > > ret = spprintf(b, size, "5678"); > > > size -= ret; > > > b += ret; > > > // ret = 4 size = 2 buf = "12345678\0" > > > > > > ret = spprintf(b, size, "9***"); > > > size -= ret; > > > b += ret; > > > // ret = 2 size = 0 buf = "123456789\0" > > > > So here scnprint() would have returned 1, leaving size at 1. scnprintf() > > has the invariant that, for non-zero size, the return value is strictly > > less than that size, so when passed a size of 1, all subsequent calls > > return 0 (corresponding to the fact that all it could do was to write > > the '\0' terminator). > > > > This pattern already exists, and is really the reason scnprint exists. > > Yes, scnprintf() cannot distinguish overflow from > > it-just-exactly-fitted. Maybe it would have been better to make it work > > like this, but I don't think there's a real use > > There are real use-cases. They are what brought me here. > > > and we do have > > seq_buf() if one really wants an interface that can build a string > > piece-meal while keeping track of whether it ever caused overflow. > > seq_buf_*() looks okay, but it's petty heavy requiring what looks like > the buffers to be initialised with an API call before use. We're > looking for something more light weight. > > scnprint() had clear safety centric improvements over snprintf() and > spprintf() adds an additional layer of return value checking on top of > that. > > I'm not sure I understand the resistance to something which is needed > and has clear benefits over what presently exists just for the sake of a > few lines of code. I'd be on board if it were change for the sake of > change, but the added flexibility and ease of use is evident. My view is the following: First, I agree that snprintf() is dangerous and should be replaced. I think that the resistance is because: + ssprintf() has its danger as well. You wrote [1] that "Under-running the buffer is no worse over-running". But it is no better either. + More APIs might create more confusion and increase the risk of a misuse. + spprintf() does not allow to detect truncated string. It means that it does not provide any advantage over the existing scnprintf(). So, if we could solve it another way then it might be better. That said, people tend to look how an API is used instead of RTFM. They copy good or bad patterns. There is even a term for this but I can't remember it new. So, if we introduce a new API and provide some good examples then there is a good chance that people will continue using it correctly. And it even might be checked more easily. There is a similarity with alloc() APIs. Good (most) programmers know that they need to check kalloc() return value. They might also learn that they need to check also return value from ssnprintf() of how we call it. Especially, when we provide good examples from the very beginning. Also it might be checked by Coccinelle. Finally, if you eventually send a new version of a string API then please add Linus into Cc. He has huge experience and very good feeling for what is good and what is wrong. And he is interested into exactly these types of APIs. [1] https://lore.kernel.org/r/20240129092440.GA1708181@google.com Best Regards, Petr
On Thu, 08 Feb 2024, Petr Mladek wrote: > On Tue 2024-01-30 15:53:36, Lee Jones wrote: > > On Tue, 30 Jan 2024, Rasmus Villemoes wrote: > > > On 30/01/2024 16.07, Lee Jones wrote: > > > > On Mon, 29 Jan 2024, Lee Jones wrote: > > > >> On Mon, 29 Jan 2024, David Laight wrote: > > > >>>> snprintf() does this and has been proven to cause buffer-overflows. > > > >>>> There have been multiple articles authored describing why using > > > >>>> snprintf() is not generally a good idea for the masses including the 2 > > > >>>> linked in the commit message: > > > >>> > > > >>> snprintf() returns the number of bytes that would have been output [1]. > > > > > > Okay, I've written a bunch of simple test cases and results are > > > > positive. It seems to achieve my aim whilst minimising any potential > > > > pitfalls. > > > > > > > > - Success returns Bytes actually written - no functional change > > > > - Overflow returns the size of the buffer - which makes the result > > > > a) testable for overflow > > > > b) non-catastrophic if accidentally used to manipulate later sizes > > > > > > You are describing scnprintf(), which almost does exactly that. The last > > > thing we need is another interface with almost identical semantics. > > > > It does, which is why when I first centred my efforts on this task the > > plan was to simply switch to it. However, as I described in the commit > > message: > > > > "Whist executing the task, it quickly became apparent that the initial > > thought of simply s/snprintf/scnprintf/ wasn't going to be adequate > > for a number of cases. Specifically ones where the caller needs to > > know whether the given string ends up being truncated." > > > > A great deal of callers want to know if the string they attempted to > > form was successful. A malformed string would lead to oddities in the > > best cases and various device/probing/matching failures in the worst. > > > > > > int size = 10; > > > > char buf[size]; > > > > char *b = buf; > > > > > > > > ret = spprintf(b, size, "1234"); > > > > size -= ret; > > > > b += ret; > > > > // ret = 4 size = 6 buf = "1234\0" > > > > > > > > ret = spprintf(b, size, "5678"); > > > > size -= ret; > > > > b += ret; > > > > // ret = 4 size = 2 buf = "12345678\0" > > > > > > > > ret = spprintf(b, size, "9***"); > > > > size -= ret; > > > > b += ret; > > > > // ret = 2 size = 0 buf = "123456789\0" > > > > > > So here scnprint() would have returned 1, leaving size at 1. scnprintf() > > > has the invariant that, for non-zero size, the return value is strictly > > > less than that size, so when passed a size of 1, all subsequent calls > > > return 0 (corresponding to the fact that all it could do was to write > > > the '\0' terminator). > > > > > > This pattern already exists, and is really the reason scnprint exists. > > > Yes, scnprintf() cannot distinguish overflow from > > > it-just-exactly-fitted. Maybe it would have been better to make it work > > > like this, but I don't think there's a real use > > > > There are real use-cases. They are what brought me here. > > > > > and we do have > > > seq_buf() if one really wants an interface that can build a string > > > piece-meal while keeping track of whether it ever caused overflow. > > > > seq_buf_*() looks okay, but it's petty heavy requiring what looks like > > the buffers to be initialised with an API call before use. We're > > looking for something more light weight. > > > > scnprint() had clear safety centric improvements over snprintf() and > > spprintf() adds an additional layer of return value checking on top of > > that. > > > > I'm not sure I understand the resistance to something which is needed > > and has clear benefits over what presently exists just for the sake of a > > few lines of code. I'd be on board if it were change for the sake of > > change, but the added flexibility and ease of use is evident. > > My view is the following: > > First, I agree that snprintf() is dangerous and should be replaced. Thanks for your input Petr. > I think that the resistance is because: > > + ssprintf() has its danger as well. You wrote [1] that > "Under-running the buffer is no worse over-running". But it is > no better either. > > + More APIs might create more confusion and increase the risk of > a misuse. > > + spprintf() does not allow to detect truncated string. It means > that it does not provide any advantage over the existing scnprintf(). spprintf() can detect a truncated string: ret = spprintf(buf, 10, "1234567890"); if (ret == 10) { /* The buffer truncated */ } And if the return value is used directly on a subsequent call: spprintf(buf + ret, 10 - ret, "1234567890"); /* No underflow/overflow */ Caveat: I just knocked this example up off the top of my head. It's completely untested and probably contains lots of errors. > So, if we could solve it another way then it might be better. I'm still quite fond of ssprintf() since it seems to solve the present issues and it's very light (compared with the alternative), but I think the preferred approach is to use seq_buf in cases where the caller needs to know if truncation occurred. > That said, people tend to look how an API is used instead of RTFM. > They copy good or bad patterns. There is even a term for this > but I can't remember it new. > > So, if we introduce a new API and provide some good examples > then there is a good chance that people will continue using > it correctly. And it even might be checked more easily. > > There is a similarity with alloc() APIs. Good (most) programmers > know that they need to check kalloc() return value. They might > also learn that they need to check also return value from > ssnprintf() of how we call it. Especially, when we provide > good examples from the very beginning. Also it might > be checked by Coccinelle. > > > Finally, if you eventually send a new version of a string API > then please add Linus into Cc. He has huge experience and very > good feeling for what is good and what is wrong. And he > is interested into exactly these types of APIs. > > > [1] https://lore.kernel.org/r/20240129092440.GA1708181@google.com > > Best Regards, > Petr
diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h index 33dcbec719254..2a3db6285492a 100644 --- a/include/linux/sprintf.h +++ b/include/linux/sprintf.h @@ -13,6 +13,8 @@ __printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...); __printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args); __printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...); __printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args); +__printf(3, 4) int ssprintf(char *buf, size_t size, const char *fmt, ...); +__printf(3, 0) int vssprintf(char *buf, size_t size, const char *fmt, va_list args); __printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...); __printf(2, 0) __malloc char *kvasprintf(gfp_t gfp, const char *fmt, va_list args); __printf(2, 0) const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args); diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 552738f14275a..01a1060ca0b0d 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -2936,6 +2936,40 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args) } EXPORT_SYMBOL(vscnprintf); +/** + * vssprintf - Format a string and place it in a buffer + * @buf: The buffer to place the result into + * @size: The size of the buffer, including the trailing null space + * @fmt: The format string to use + * @args: Arguments for the format string + * + * The return value is the number of characters which have been written into + * the @buf not including the trailing '\0' or -E2BIG if the string was + * truncated. If @size is == 0 the function returns 0. + * + * If you're not already dealing with a va_list consider using ssprintf(). + * + * See the vsnprintf() documentation for format string extensions over C99. + */ +int vssprintf(char *buf, size_t size, const char *fmt, va_list args) +{ + int i; + + if (unlikely(!size)) + return 0; + + i = vsnprintf(buf, size, fmt, args); + + if (unlikely(i >= size)) + return -E2BIG; + + if (likely(i < size)) + return i; + + return size - 1; +} +EXPORT_SYMBOL(vssprintf); + /** * snprintf - Format a string and place it in a buffer * @buf: The buffer to place the result into @@ -2987,6 +3021,30 @@ int scnprintf(char *buf, size_t size, const char *fmt, ...) } EXPORT_SYMBOL(scnprintf); +/** + * ssprintf - Format a string and place it in a buffer + * @buf: The buffer to place the result into + * @size: The size of the buffer, including the trailing null space + * @fmt: The format string to use + * @...: Arguments for the format string + * + * The return value is the number of characters written into @buf not including + * the trailing '\0' or -E2BIG if the string was truncated. If @size is == 0 + * the function returns 0. + */ +int ssprintf(char *buf, size_t size, const char *fmt, ...) +{ + va_list args; + int i; + + va_start(args, fmt); + i = vssprintf(buf, size, fmt, args); + va_end(args); + + return i; +} +EXPORT_SYMBOL(ssprintf); + /** * vsprintf - Format a string and place it in a buffer * @buf: The buffer to place the result into
There is an ongoing effort to replace the use of {v}snprintf() variants with safer alternatives - for a more in depth view, see Jon's write-up on LWN [0] and/or Alex's on the Kernel Self Protection Project [1]. Whist executing the task, it quickly became apparent that the initial thought of simply s/snprintf/scnprintf/ wasn't going to be adequate for a number of cases. Specifically ones where the caller needs to know whether the given string ends up being truncated. This is where ssprintf() [based on similar semantics of strscpy()] comes in, since it takes the best parts of both of the aforementioned variants. It has the testability of truncation of snprintf() and returns the number of Bytes *actually* written, similar to scnprintf(), making it a very programmer friendly alternative. Here's some examples to show the differences: Success: No truncation - all 9 Bytes successfully written to the buffer ret = snprintf (buf, 10, "%s", "123456789"); // ret = 9 ret = scnprintf(buf, 10, "%s", "123456789"); // ret = 9 ret = ssprintf (buf, 10, "%s", "123456789"); // ret = 9 Failure: Truncation - only 9 of 10 Bytes written; '-' is truncated ret = snprintf (buf, 10, "%s", "123456789-"); // ret = 10 Reports: "10 Bytes would have been written if buf was large enough" Issue: Programmers need to know/remember to check ret against "10" ret = scnprintf(buf, 10, "%s", "123456789-"); // ret = 9 Reports: "9 Bytes actually written" Issue: Returns 9 on success AND failure (see above) ret = ssprintf (buf, 10, "%s", "123456789-"); // ret = -E2BIG Reports: "Data provided is too large to fit in the buffer" Issue: No tangible impact: No way to tell how much data was lost [0] https://lwn.net/Articles/69419/ [1] https://github.com/KSPP/linux/issues/105 Signed-off-by: Lee Jones <lee@kernel.org> --- Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Crutcher Dunnavant <crutcher+kernel@datastacks.com> Cc: Juergen Quade <quade@hsnr.de> include/linux/sprintf.h | 2 ++ lib/vsprintf.c | 58 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+)