diff mbox series

[5/6] lib: Fix function documentation for strncpy_from_user

Message ID 20190218232308.11241-6-tobin@kernel.org (mailing list archive)
State New, archived
Headers show
Series lib: Add safe string funtions | expand

Commit Message

Tobin C. Harding Feb. 18, 2019, 11:23 p.m. UTC
Current function documentation for strncpy_from_user() is incorrect.  If
@count (size of destination buffer) is less than the length of the user
string the function does _not_ return @count but rather returns -EFAULT.

Document correctly the function return value, also add note that string
may be left non-null terminated.

Signed-off-by: Tobin C. Harding <tobin@kernel.org>
---
 lib/strncpy_from_user.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

Comments

Jann Horn Feb. 19, 2019, 12:51 a.m. UTC | #1
+cc Andy because he's also preparing a patch for this function

On Tue, Feb 19, 2019 at 12:25 AM Tobin C. Harding <tobin@kernel.org> wrote:
> Current function documentation for strncpy_from_user() is incorrect.  If
> @count (size of destination buffer) is less than the length of the user
> string the function does _not_ return @count but rather returns -EFAULT.
>
> Document correctly the function return value, also add note that string
> may be left non-null terminated.
>
> Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> ---
>  lib/strncpy_from_user.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
> index 58eacd41526c..11fe9a4a00fd 100644
> --- a/lib/strncpy_from_user.c
> +++ b/lib/strncpy_from_user.c
> @@ -82,22 +82,19 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long
>  }
>
>  /**
> - * strncpy_from_user: - Copy a NUL terminated string from userspace.
> + * strncpy_from_user() - Copy a NUL terminated string from userspace.
>   * @dst:   Destination address, in kernel space.  This buffer must be at
>   *         least @count bytes long.
>   * @src:   Source address, in user space.
> - * @count: Maximum number of bytes to copy, including the trailing NUL.
> + * @count: Maximum number of bytes to copy, including the trailing %NUL.
>   *
>   * Copies a NUL-terminated string from userspace to kernel space.
>   *
> - * On success, returns the length of the string (not including the trailing
> - * NUL).
> - *
> - * If access to userspace fails, returns -EFAULT (some data may have been
> - * copied).
> - *
> - * If @count is smaller than the length of the string, copies @count bytes
> - * and returns @count.
> + * Return: If access to userspace fails, returns -EFAULT.  Otherwise,
> + *         return the number of characters copied excluding the trailing
> + *         %NUL, if the length of the user string exceeds @count return
> + *         -EFAULT (in which case @dst will be left without a %NUL
> + *         terminator).
>   */

AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
if `res >= count` (in other words, if we've copied as many bytes as
requested, haven't encountered a null byte so far, and haven't reached
the end of the address space), we return `res`, which is the same as
`count`. Are you sure?
Tobin Harding Feb. 19, 2019, 9:52 p.m. UTC | #2
On Tue, Feb 19, 2019 at 01:51:45AM +0100, Jann Horn wrote:
> +cc Andy because he's also preparing a patch for this function
> 
> On Tue, Feb 19, 2019 at 12:25 AM Tobin C. Harding <tobin@kernel.org> wrote:
> > Current function documentation for strncpy_from_user() is incorrect.  If
> > @count (size of destination buffer) is less than the length of the user
> > string the function does _not_ return @count but rather returns -EFAULT.
> >
> > Document correctly the function return value, also add note that string
> > may be left non-null terminated.
> >
> > Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> > ---
> >  lib/strncpy_from_user.c | 17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
> > index 58eacd41526c..11fe9a4a00fd 100644
> > --- a/lib/strncpy_from_user.c
> > +++ b/lib/strncpy_from_user.c
> > @@ -82,22 +82,19 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long
> >  }
> >
> >  /**
> > - * strncpy_from_user: - Copy a NUL terminated string from userspace.
> > + * strncpy_from_user() - Copy a NUL terminated string from userspace.
> >   * @dst:   Destination address, in kernel space.  This buffer must be at
> >   *         least @count bytes long.
> >   * @src:   Source address, in user space.
> > - * @count: Maximum number of bytes to copy, including the trailing NUL.
> > + * @count: Maximum number of bytes to copy, including the trailing %NUL.
> >   *
> >   * Copies a NUL-terminated string from userspace to kernel space.
> >   *
> > - * On success, returns the length of the string (not including the trailing
> > - * NUL).
> > - *
> > - * If access to userspace fails, returns -EFAULT (some data may have been
> > - * copied).
> > - *
> > - * If @count is smaller than the length of the string, copies @count bytes
> > - * and returns @count.
> > + * Return: If access to userspace fails, returns -EFAULT.  Otherwise,
> > + *         return the number of characters copied excluding the trailing
> > + *         %NUL, if the length of the user string exceeds @count return
> > + *         -EFAULT (in which case @dst will be left without a %NUL
> > + *         terminator).
> >   */
> 
> AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
> if `res >= count` (in other words, if we've copied as many bytes as
> requested, haven't encountered a null byte so far, and haven't reached
> the end of the address space), we return `res`, which is the same as
> `count`. Are you sure?

Thanks for the review Jaan.  Seems I was in a world of confused, I
misread count to be the size of the destination buffer not the size of
the source string.

Also, re-reading strscpy() and why its better I think I may not be the
correct person to implement strscpy_from_user() since I'm not across all
the subtleties.

If v2 seems wanted for the first patches in this set I'll drop the
*_from_user() stuff.

thanks,
Tobin.
Kees Cook Feb. 21, 2019, 1:05 a.m. UTC | #3
On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote:
> AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
> if `res >= count` (in other words, if we've copied as many bytes as
> requested, haven't encountered a null byte so far, and haven't reached
> the end of the address space), we return `res`, which is the same as
> `count`. Are you sure?

Oh, whew, there is only 1 arch-specific implementation of this. I
thought you meant there was multiple implementations.

So, generally speaking, I'd love to split all strncpy* uses into
strscpy_zero() (when expecting to do str->str copies), and some new
function, named like mempadstr() or str2mem() that copies a str to a
__nonstring char array, with trailing padding, if there is space. Then
there is no more mixing the two cases and botching things.
Tobin Harding Feb. 21, 2019, 5:24 a.m. UTC | #4
On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote:
> > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
> > if `res >= count` (in other words, if we've copied as many bytes as
> > requested, haven't encountered a null byte so far, and haven't reached
> > the end of the address space), we return `res`, which is the same as
> > `count`. Are you sure?
> 
> Oh, whew, there is only 1 arch-specific implementation of this. I
> thought you meant there was multiple implementations.
> 
> So, generally speaking, I'd love to split all strncpy* uses into
> strscpy_zero() (when expecting to do str->str copies), and some new
> function, named like mempadstr() or str2mem() that copies a str to a
> __nonstring char array, with trailing padding, if there is space. Then
> there is no more mixing the two cases and botching things.

Oh cool, treewide changes, I'm down with that.  So to v2 I'll add
str2mem() and then attack the tree as suggested.  What could possibly go
wrong :)?

	Tobin
Kees Cook Feb. 21, 2019, 6:02 a.m. UTC | #5
On Wed, Feb 20, 2019 at 9:25 PM Tobin C. Harding <me@tobin.cc> wrote:
>
> On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote:
> > So, generally speaking, I'd love to split all strncpy* uses into
> > strscpy_zero() (when expecting to do str->str copies), and some new
> > function, named like mempadstr() or str2mem() that copies a str to a
> > __nonstring char array, with trailing padding, if there is space. Then
> > there is no more mixing the two cases and botching things.

I should use "converts" instead of "copies" above, just to drive the
point home. :)

>
> Oh cool, treewide changes, I'm down with that.  So to v2 I'll add
> str2mem() and then attack the tree as suggested.  What could possibly go
> wrong :)?

Some clear documentation needs to be written for str2mem() to really
help people understand what a "non string" character array is
(especially given that it LOOKS like it has NUL termination -- when in
fact it's just "padding").

The tree-wide changes will likely take a while (and don't need to be
part of this series unless you want to find a couple good examples)
since we have to do them case-by-case: it's not always obvious when
it's actually a non-string, so getting help from maintainers here will
be needed. (And maybe some kind of flow chart added to
Documentation/process/deprecated.rst for how to stop using strncpy()
and strlcpy().)

What I can't quite figure out yet is how to find a way for sfr to flag
newly added users of strcpy, strncpy, and strlcpy. We might need to
bring back __deprecated, but hide it behind a W=linux-next flag or
something crazy. Stephen, in your builds you're already injecting
-Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I
think we need some W= setting for your linux-next builds that generate
the maintainer-nag warnings...

-Kees

P.S. Here's C string API Rant (I just had to get this out, please feel
free to ignore):

strcpy returns dest ... which is already known, so it's a meaningless
return value.

strncpy returns dest (still meaningless)

strlcpy returns strlen(src) ... the length we WOULD have copied. Why
would we care about that? I'm operating on dest. Were there callers
that needed to both copy part of src and learn how long it was at the
same time?

strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about
what actually happened from the operation!

... snprintf returns what it WOULD have written, much like strlcpy
above. At least snprintf has an excuse: it can be used to calculate an
allocation size (called with a NULL dest and 0 size) ... but shouldn't
"how big is this format string going to be?" be a separate function? I
wonder if we can kill all kernel uses of snprintf too (after
introducing a "how big would it be?" function and switching all other
callers over to scnprintf)...

So scnprintf() does the right thing (count of non-NUL bytes copied out).

So now our safe(r?) string API versions use different letters to show
they're safe: "s" in strcpy and "cn" in sprintf. /me cries forever.
Jann Horn Feb. 21, 2019, 2:28 p.m. UTC | #6
On Thu, Feb 21, 2019 at 2:05 AM Kees Cook <keescook@chromium.org> wrote:
> On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote:
> > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
> > if `res >= count` (in other words, if we've copied as many bytes as
> > requested, haven't encountered a null byte so far, and haven't reached
> > the end of the address space), we return `res`, which is the same as
> > `count`. Are you sure?
>
> Oh, whew, there is only 1 arch-specific implementation of this. I
> thought you meant there was multiple implementations.

Not sure what you're talking about. Are you talking about
implementations of strncpy_from_user()?


csky: custom strncpy_from_user() calls custom __do_strncpy_from_user,
which is a macro containing a big asm() block.


h8300:
;;; long strncpy_from_user(void *to, void *from, size_t n)
strncpy_from_user:
mov.l er2,er2
bne 1f
sub.l er0,er0
rts
[...]


mips:
static inline long
strncpy_from_user(char *__to, const char __user *__from, long __len)
{
long res;

if (eva_kernel_access()) {
__asm__ __volatile__(
"move\t$4, %1\n\t"
"move\t$5, %2\n\t"
"move\t$6, %3\n\t"
[...]


unicore32:
ENTRY(__strncpy_from_user)
mov ip, r1
1: sub.a r2, r2, #1
ldrusr r3, r1, 1, ns
bfs 2f
stb.w r3, [r0]+, #1
cxor.a r3, #0
[...]


and so on.

> So, generally speaking, I'd love to split all strncpy* uses into
> strscpy_zero() (when expecting to do str->str copies), and some new
> function, named like mempadstr() or str2mem() that copies a str to a
> __nonstring char array, with trailing padding, if there is space. Then
> there is no more mixing the two cases and botching things.
Rasmus Villemoes Feb. 21, 2019, 2:58 p.m. UTC | #7
On 21/02/2019 07.02, Kees Cook wrote:

> P.S. Here's C string API Rant (I just had to get this out, please feel
> free to ignore):

I'll bite. First, it's "linux kernel string API", only some of string.h
interfaces are in std C. Sure, none of those satisfy all use cases, but
adding Yet Another One also has its costs.

> strcpy returns dest ... which is already known, so it's a meaningless
> return value.
> 
> strncpy returns dest (still meaningless)

Yeah, same for memcpy, memset. Those are classic C interfaces. It does
allow some micro-optimizations in some cases - since one is likely to
continue doing other stuff with dest, the compiler might use the fact
that it has dest in the return register instead of spilling and
reloading it. But I do wonder what the real rationale was.

> strlcpy returns strlen(src) ... the length we WOULD have copied. Why
> would we care about that? I'm operating on dest. Were there callers
> that needed to both copy part of src and learn how long it was at the
> same time?

The rationale is exactly the same as for snprintf - to allow you to
optimistically format/copy to a buffer, and know exactly how much to
realloc() in case it turned out to be too small. Of course, nobody ever
uses strlcpy like that, since just doing strdup() is much much simpler
and less error-prone. So yes, strlcpy() is often a silly interface to use.

> strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about
> what actually happened from the operation!

Well, strictly speaking, strlcpy()'s return value also tells you exactly
what happened, just not in the kernel "negative errno for error
condition" style.

> ... snprintf returns what it WOULD have written, much like strlcpy
> above. At least snprintf has an excuse: it can be used to calculate an
> allocation size (called with a NULL dest and 0 size) ... but shouldn't
> "how big is this format string going to be?" be a separate function?

Why? Sure, you can make a wrapper vhowmuch(const char *fmt, va_args ap),
but its implementation would have to be "return vsnprintf(NULL, 0, fmt,
ap);", since we really must reuse the exact same logic for computing the
length as for doing the actual printf'ing (otherwise they'd get out of
sync).

> I wonder if we can kill all kernel uses of snprintf too (after
> introducing a "how big would it be?" function and switching all other
> callers over to scnprintf)...

Please no. snprintf() is standardized, has sane semantics (though one
sometimes _want_ scnprintf semantics - in which case one can and should
use that...), and, importantly, gcc understands those semantics. So we
get optimizations and diagnostics that we'd miss if we kill off explicit
snprintf and replace with non-standard howmuch+scnprintf.

> So scnprintf() does the right thing (count of non-NUL bytes copied out).

The right thing, when that's the thing you want to know. Which it is in
the "build a string in a buffer by repeated printf calls, perhaps
intermixed with a little flow control logic". But not so in a "format
this stuff to this fixed-size char buffer inside that struct, and let me
know [i.e., 'give me a way of knowing'] if it all fit".


Hello, I'm you super-fantastic-one-size-fits-all string copying API.
Please carefully fill out the following form, and I'll get back to you ASAP.

A1: destination
A2: max bytes to write
B1: source
B2: max bytes to read
C1: do you want me to nul-terminate the destination?
C2: if C1, should that be done by truncating the result if necessary?
C3: how do you want me to handle leftover space in dest, if any?
C4: should the destination be left entirely untouched in case the result
does not fit?
C5: is it an error if I do not encounter a nul byte somewhere in [B1,
B1+B2-1] ?
C6: how do you want me to report on the results of my efforts and my
discoveries?
C7: are there any special conditions I should take into account?
(overlapping buffers, allergies, ...)

Put -1 under A2 to mean "assume there's room enough". Put -1 under B2 to
mean "assume there's a nul byte before walking into lala-land".

Answering yes to C1 and C2 and writing 0 under A2 causes your CapsLock
to stay permanently on. If there's any ambiguity in your answer to C6,
the machine reboots.


Rasmus
Jann Horn Feb. 21, 2019, 4:06 p.m. UTC | #8
On Thu, Feb 21, 2019 at 7:02 AM Kees Cook <keescook@chromium.org> wrote:
> On Wed, Feb 20, 2019 at 9:25 PM Tobin C. Harding <me@tobin.cc> wrote:
> >
> > On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote:
> > > So, generally speaking, I'd love to split all strncpy* uses into
> > > strscpy_zero() (when expecting to do str->str copies), and some new
> > > function, named like mempadstr() or str2mem() that copies a str to a
> > > __nonstring char array, with trailing padding, if there is space. Then
> > > there is no more mixing the two cases and botching things.
>
> I should use "converts" instead of "copies" above, just to drive the
> point home. :)
>
> >
> > Oh cool, treewide changes, I'm down with that.  So to v2 I'll add
> > str2mem() and then attack the tree as suggested.  What could possibly go
> > wrong :)?
>
> Some clear documentation needs to be written for str2mem() to really
> help people understand what a "non string" character array is
> (especially given that it LOOKS like it has NUL termination -- when in
> fact it's just "padding").
>
> The tree-wide changes will likely take a while (and don't need to be
> part of this series unless you want to find a couple good examples)
> since we have to do them case-by-case: it's not always obvious when
> it's actually a non-string, so getting help from maintainers here will
> be needed. (And maybe some kind of flow chart added to
> Documentation/process/deprecated.rst for how to stop using strncpy()
> and strlcpy().)

Wild brainstorming ahead...

Such non-string character arrays are usually fixed-size, right? We
could use struct types around such character arrays to hide the actual
characters (so that people don't accidentally use them with C string
APIs), and enforce that the length is passed around as part of the
type... something like:

#define char_array(len) struct { char __ca_data[len]; }
#define __CA_UNWRAP(captr) (captr)->__ca_data, sizeof((captr)->__ca_data)

void __str2ca(char *dst, size_t dst_len, char *src) {
  size_t src_len = strlen(src);
  if (WARN_ON(src_len > dst_len)) {
    // or whatever else we think is the safest way to deal with this
    // without crashing, if BUG() is not an option.
    memset(dst, 0, dst_len);
    return;
  }
  memcpy(dst, src, src_len);
  memset(dst + src_len, 0, dst_len - src_len);
}
#define str2ca(dst, src) __str2ca(__CA_UNWRAP(dst), (src))

static inline void __ca2ca(char *dst, size_t dst_len, char *src,
size_t src_len) {
  BUILD_BUG_ON(dst_len < src_len);
  memcpy(dst, src, src_len);
  if (src_len != dst_len) {
    memset(dst + src_len, 0, dst_len - src_len);
  }
}
#define ca2ca(dst, src) __ca2ca(__CA_UNWRAP(dst), __CA_UNWRAP(src))


And if you want to do direct assignments instead of using helpers, you
make a typedef like this (since just assigning between two equivalent
structs doesn't work):

typedef char_array(20) blah_name;
blah_name global_name;
int main(int argc, char **argv) {
  blah_name local_name;
  str2ca(&local_name, argv[1]);
  global_name = local_name;
}

You'd also have to use a typedef like this if you want to pass
references to other functions.


Something like this would also be neat for classic C strings in some
situations - if you can put bounds in the types, or (if the string is
dynamically-sized) in the struct, you don't need to messily pass
around bounds for things like snprint() and so on.

> What I can't quite figure out yet is how to find a way for sfr to flag
> newly added users of strcpy, strncpy, and strlcpy. We might need to
> bring back __deprecated, but hide it behind a W=linux-next flag or
> something crazy. Stephen, in your builds you're already injecting
> -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I
> think we need some W= setting for your linux-next builds that generate
> the maintainer-nag warnings...
>
> -Kees
>
> P.S. Here's C string API Rant (I just had to get this out, please feel
> free to ignore):
>
> strcpy returns dest ... which is already known, so it's a meaningless
> return value.
>
> strncpy returns dest (still meaningless)

Weeeell... it kinda makes sense if you're trying to micro-optimize the
amount of stack spills. If you write code like this:

char *a = malloc(10);
a = strcpy(a, other_string);
return a;

... then the compiler doesn't have to shove `a` in one of the
callee-saved registers or spill it to the stack around the strcpy().
Plus, it might allow tail-call optimizations in some cases.

> strlcpy returns strlen(src) ... the length we WOULD have copied. Why
> would we care about that? I'm operating on dest. Were there callers
> that needed to both copy part of src and learn how long it was at the
> same time?

You could use it to optimistically attempt a copy in a fastpath, and
then fall back to a reallocation if that failed.

> strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about
> what actually happened from the operation!
>
> ... snprintf returns what it WOULD have written, much like strlcpy
> above. At least snprintf has an excuse: it can be used to calculate an
> allocation size (called with a NULL dest and 0 size) ... but shouldn't
> "how big is this format string going to be?" be a separate function? I
> wonder if we can kill all kernel uses of snprintf too (after
> introducing a "how big would it be?" function and switching all other
> callers over to scnprintf)...
>
> So scnprintf() does the right thing (count of non-NUL bytes copied out).

That seems kinda suboptimal. Wouldn't you ideally want to bail out
with an error, or at least do a WARN(), if you're trying to format a
string that doesn't fit into its destination? Like, for example, if
you're assembling a path, and the path doesn't fit into a buffer, and
so you just use half of it, you might end up in a very different place
from where you intended to go.

> So now our safe(r?) string API versions use different letters to show
> they're safe: "s" in strcpy and "cn" in sprintf. /me cries forever.
Stephen Rothwell Feb. 21, 2019, 8:26 p.m. UTC | #9
Hi Kees,

On Wed, 20 Feb 2019 22:02:32 -0800 Kees Cook <keescook@chromium.org> wrote:
>
> What I can't quite figure out yet is how to find a way for sfr to flag
> newly added users of strcpy, strncpy, and strlcpy. We might need to
> bring back __deprecated, but hide it behind a W=linux-next flag or
> something crazy. Stephen, in your builds you're already injecting
> -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I
> think we need some W= setting for your linux-next builds that generate
> the maintainer-nag warnings...

I just have a set of compiler flags that my build scripts explicitly
enable by setting KCFLAGS.
Kees Cook Feb. 21, 2019, 10:52 p.m. UTC | #10
On Thu, Feb 21, 2019 at 6:28 AM Jann Horn <jannh@google.com> wrote:
> On Thu, Feb 21, 2019 at 2:05 AM Kees Cook <keescook@chromium.org> wrote:
> > On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote:
> > > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
> > > if `res >= count` (in other words, if we've copied as many bytes as
> > > requested, haven't encountered a null byte so far, and haven't reached
> > > the end of the address space), we return `res`, which is the same as
> > > `count`. Are you sure?
> >
> > Oh, whew, there is only 1 arch-specific implementation of this. I
> > thought you meant there was multiple implementations.
>
> Not sure what you're talking about. Are you talking about
> implementations of strncpy_from_user()?

Ah, I used a bad match. But it seems it's only about half (and not
x86, arm, powerpc).
Kees Cook Feb. 21, 2019, 11:03 p.m. UTC | #11
On Thu, Feb 21, 2019 at 6:58 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> On 21/02/2019 07.02, Kees Cook wrote:
>
> > P.S. Here's C string API Rant (I just had to get this out, please feel
> > free to ignore):
>
> I'll bite. First, it's "linux kernel string API", only some of string.h
> interfaces are in std C. Sure, none of those satisfy all use cases, but
> adding Yet Another One also has its costs.

Well, yes, strscpy and scnprintf appear to be only in the kernel (did
the originate externally somewhere I'm not aware of?)

> > strcpy returns dest ... which is already known, so it's a meaningless
> > return value.
> >
> > strncpy returns dest (still meaningless)
>
> Yeah, same for memcpy, memset. Those are classic C interfaces. It does
> allow some micro-optimizations in some cases - since one is likely to

Right, yes, but this level of optimization is best left to the
compiler. There's much more interesting results a caller should care
about. :)

> > strlcpy returns strlen(src) ... the length we WOULD have copied. Why
> > would we care about that? I'm operating on dest. Were there callers
> > that needed to both copy part of src and learn how long it was at the
> > same time?
>
> The rationale is exactly the same as for snprintf - to allow you to

Okay, but there's as separate function for that: strlen().

> > strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about
> > what actually happened from the operation!
>
> Well, strictly speaking, strlcpy()'s return value also tells you exactly
> what happened, just not in the kernel "negative errno for error
> condition" style.

True, yes, but it's unfriendly: it requires careful math, just like
snprintf, which depends on rechecking the args, making sure you're not
doing an off-by-one from the NUL, etc etc.

> > ... snprintf returns what it WOULD have written, much like strlcpy
> > above. At least snprintf has an excuse: it can be used to calculate an
> > allocation size (called with a NULL dest and 0 size) ... but shouldn't
> > "how big is this format string going to be?" be a separate function?
>
> Why? Sure, you can make a wrapper vhowmuch(const char *fmt, va_args ap),
> but its implementation would have to be "return vsnprintf(NULL, 0, fmt,
> ap);", since we really must reuse the exact same logic for computing the
> length as for doing the actual printf'ing (otherwise they'd get out of
> sync).

Well, I'd likely go the opposite directly: make snprintf() a wrapper
and call vhowmuch(), etc, convert until snprintf could be removed. But
really the best might be removal of snprintf first, then refactor it
with vhowmuch() etc. Lots of ways to solve it, I suppose. But dropping
the unfriendly API would be nice.

> Please no. snprintf() is standardized, has sane semantics (though one

No, it doesn't -- it has well-defined semantics, but using it
correctly is too hard. It's a regular source of bugs. (Not *nearly* as
bad as strncpy, so let's stick to dropping one bad API at a time...)

> sometimes _want_ scnprintf semantics - in which case one can and should
> use that...), and, importantly, gcc understands those semantics. So we
> get optimizations and diagnostics that we'd miss if we kill off explicit
> snprintf and replace with non-standard howmuch+scnprintf.

Those diagnostics can be had with the __printf() markings already...

> > So scnprintf() does the right thing (count of non-NUL bytes copied out).
>
> The right thing, when that's the thing you want to know. Which it is in
> the "build a string in a buffer by repeated printf calls, perhaps
> intermixed with a little flow control logic". But not so in a "format
> this stuff to this fixed-size char buffer inside that struct, and let me
> know [i.e., 'give me a way of knowing'] if it all fit".

Do we need ssprintf() to get the -E2BIG result like strscpy()?

> Hello, I'm you super-fantastic-one-size-fits-all string copying API.
> Please carefully fill out the following form, and I'll get back to you ASAP.

I mean, this is kinda what we have already. We just keep exposing too
many knobs. :)
Kees Cook Feb. 21, 2019, 11:14 p.m. UTC | #12
On Thu, Feb 21, 2019 at 8:07 AM Jann Horn <jannh@google.com> wrote:
>
> On Thu, Feb 21, 2019 at 7:02 AM Kees Cook <keescook@chromium.org> wrote:
> > The tree-wide changes will likely take a while (and don't need to be
> > part of this series unless you want to find a couple good examples)
> > since we have to do them case-by-case: it's not always obvious when
> > it's actually a non-string, so getting help from maintainers here will
> > be needed. (And maybe some kind of flow chart added to
> > Documentation/process/deprecated.rst for how to stop using strncpy()
> > and strlcpy().)
>
> Wild brainstorming ahead...
>
> Such non-string character arrays are usually fixed-size, right? We
> could use struct types around such character arrays to hide the actual
> characters (so that people don't accidentally use them with C string
> APIs), and enforce that the length is passed around as part of the
> type... something like:
>
> #define char_array(len) struct { char __ca_data[len]; }

Does this need __packed or will the compiler understand it's
byte-aligned? And it needs __nonstring :)

> #define __CA_UNWRAP(captr) (captr)->__ca_data, sizeof((captr)->__ca_data)
>
> void __str2ca(char *dst, size_t dst_len, char *src) {
>   size_t src_len = strlen(src);
>   if (WARN_ON(src_len > dst_len)) {
>     // or whatever else we think is the safest way to deal with this
>     // without crashing, if BUG() is not an option.
>     memset(dst, 0, dst_len);
>     return;
>   }
>   memcpy(dst, src, src_len);
>   memset(dst + src_len, 0, dst_len - src_len);
> }
> #define str2ca(dst, src) __str2ca(__CA_UNWRAP(dst), (src))
>
> static inline void __ca2ca(char *dst, size_t dst_len, char *src,
> size_t src_len) {
>   BUILD_BUG_ON(dst_len < src_len);
>   memcpy(dst, src, src_len);
>   if (src_len != dst_len) {
>     memset(dst + src_len, 0, dst_len - src_len);
>   }
> }
> #define ca2ca(dst, src) __ca2ca(__CA_UNWRAP(dst), __CA_UNWRAP(src))

Yeah, I was trying to think of ways to do this but I was thinking
mostly about noticing the __nonstring mark. #define-ing this away
might work, but it might also just annoy people more. At least GCC
will yell about __nonstring use in some cases.

> And if you want to do direct assignments instead of using helpers, you
> make a typedef like this (since just assigning between two equivalent
> structs doesn't work):
>
> typedef char_array(20) blah_name;
> blah_name global_name;
> int main(int argc, char **argv) {
>   blah_name local_name;
>   str2ca(&local_name, argv[1]);
>   global_name = local_name;
> }
>
> You'd also have to use a typedef like this if you want to pass
> references to other functions.

Yeah, it might work. I need to go look through ext4 -- that's one
place I know there were "legit" strncpy() uses...

Converting existing non-string cases to this and see if it's worse
would be educational.

> Something like this would also be neat for classic C strings in some
> situations - if you can put bounds in the types, or (if the string is
> dynamically-sized) in the struct, you don't need to messily pass
> around bounds for things like snprint() and so on.

Yeah, I remain annoyed about string pointers having lost their
allocation size meta data. The vast majority of str*() functions could
just be "str, sizeof(str)" like you have for the CA_UNWRAP.

> > So scnprintf() does the right thing (count of non-NUL bytes copied out).
>
> That seems kinda suboptimal. Wouldn't you ideally want to bail out
> with an error, or at least do a WARN(), if you're trying to format a
> string that doesn't fit into its destination? Like, for example, if
> you're assembling a path, and the path doesn't fit into a buffer, and
> so you just use half of it, you might end up in a very different place
> from where you intended to go.

ssprintf() with -E2BIG? Most correct users of snprintf()'s return
value do some version of trying to detect if there wasn't enough space
and then error out:

static inline int usb_make_path(struct usb_device *dev, char *buf, size_t size)
{
        int actual;
        actual = snprintf(buf, size, "usb-%s-%s", dev->bus->bus_name,
                          dev->devpath);
        return (actual >= (int)size) ? -1 : actual;
}

a) this could just be "return ssprintf(..."
b) as far as I see, there are literally 2 callers of usb_make_path()
that check the return value

Anyway, snprintf() is "next". I'd like to get through
strcpy/strncpy/strlcpy removal first... :)
Kees Cook Feb. 21, 2019, 11:16 p.m. UTC | #13
On Thu, Feb 21, 2019 at 12:27 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi Kees,
>
> On Wed, 20 Feb 2019 22:02:32 -0800 Kees Cook <keescook@chromium.org> wrote:
> >
> > What I can't quite figure out yet is how to find a way for sfr to flag
> > newly added users of strcpy, strncpy, and strlcpy. We might need to
> > bring back __deprecated, but hide it behind a W=linux-next flag or
> > something crazy. Stephen, in your builds you're already injecting
> > -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I
> > think we need some W= setting for your linux-next builds that generate
> > the maintainer-nag warnings...
>
> I just have a set of compiler flags that my build scripts explicitly
> enable by setting KCFLAGS.

Okay, so you could include some -D option to enable it. I'll see if I
can cook something up.
Rasmus Villemoes Feb. 25, 2019, 3:41 p.m. UTC | #14
On 22/02/2019 00.03, Kees Cook wrote:
> On Thu, Feb 21, 2019 at 6:58 AM Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>>
>> On 21/02/2019 07.02, Kees Cook wrote:
>>
> 
>>> ... snprintf returns what it WOULD have written, much like strlcpy
>>> above. At least snprintf has an excuse: it can be used to calculate an
>>> allocation size (called with a NULL dest and 0 size) ... but shouldn't
>>> "how big is this format string going to be?" be a separate function?
>>
>> Why? Sure, you can make a wrapper vhowmuch(const char *fmt, va_args ap),
>> but its implementation would have to be "return vsnprintf(NULL, 0, fmt,
>> ap);", since we really must reuse the exact same logic for computing the
>> length as for doing the actual printf'ing (otherwise they'd get out of
>> sync).
> 
> Well, I'd likely go the opposite directly: make snprintf() a wrapper
> and call vhowmuch(), etc, convert until snprintf could be removed.But
> really the best might be removal of snprintf first, then refactor it
> with vhowmuch() etc. Lots of ways to solve it, I suppose.

I'd really like to see how vsprintf.c would look in a world where the
workhorse (which is vsnprintf() and its tree of helpers) would not
format and measure at the same time.

But dropping
> the unfriendly API would be nice.
> 
>> Please no. snprintf() is standardized, has sane semantics (though one
> 
> No, it doesn't -- it has well-defined semantics, 

We'll just have to disagree on what sane means.

> but using it
> correctly is too hard. It's a regular source of bugs. (Not *nearly* as
> bad as strncpy, so let's stick to dropping one bad API at a time...)
> 
>> sometimes _want_ scnprintf semantics - in which case one can and should
>> use that...), and, importantly, gcc understands those semantics. So we
>> get optimizations and diagnostics that we'd miss if we kill off explicit
>> snprintf and replace with non-standard howmuch+scnprintf.
> 
> Those diagnostics can be had with the __printf() markings already...

No. You're thinking of the type-checking things. Those are important, of
course, but have nothing at all to do with the s or n parts of snprintf
- what I'm thinking of is the fact that gcc knows that buf is a
char-buffer of length len into which snprintf() may/will write, so we
have the Wformat-truncation and Warray-bounds kind of warnings. And the
optimization part is where gcc can replace a snprintf() with a simpler
string op (e.g. a memcpy), or know that an overflow check can be elided
because "%d" does fit in the buf[16], or...

One thing I've had on my wishlist for a long time is a
buffer_size(ptrarg, expr, r/w/rw) attribute that would say that the
function treats the pointer argument ptrarg as a buffer of size given by
the expr (in bytes), and accesses it according to the third parameter.
Then the compiler could automatically check with its builtin_objsize
machinery for mismatched buffer size computations (e.g., using sizeof()
when the interface expects ARRAY_SIZE()) violations or uninitialized
reads, or any number of optional run-time instrumentations in caller or
callee... So

memcpy(void *dst, const void *src, size_t len) __buffer_size(dst, len,
"w") __buffer_size(src, len, "r")

or

int poll(struct pollfd *fds, nfds_t nfds, int timeout)
__buffer_size(fds, nfds*sizeof(struct pollfd), "rw")

the latter being an example of where an arbitrary expression in terms of
the formal parameters is needed - but I don't think gcc's attribute
machinery supports this syntax at all.

Bonus points if one could attach buffer_size to a struct definition as
well, saying that the ->foo member points to ->bar*4  of storage.


Rasmus
diff mbox series

Patch

diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index 58eacd41526c..11fe9a4a00fd 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -82,22 +82,19 @@  static inline long do_strncpy_from_user(char *dst, const char __user *src, long
 }
 
 /**
- * strncpy_from_user: - Copy a NUL terminated string from userspace.
+ * strncpy_from_user() - Copy a NUL terminated string from userspace.
  * @dst:   Destination address, in kernel space.  This buffer must be at
  *         least @count bytes long.
  * @src:   Source address, in user space.
- * @count: Maximum number of bytes to copy, including the trailing NUL.
+ * @count: Maximum number of bytes to copy, including the trailing %NUL.
  *
  * Copies a NUL-terminated string from userspace to kernel space.
  *
- * On success, returns the length of the string (not including the trailing
- * NUL).
- *
- * If access to userspace fails, returns -EFAULT (some data may have been
- * copied).
- *
- * If @count is smaller than the length of the string, copies @count bytes
- * and returns @count.
+ * Return: If access to userspace fails, returns -EFAULT.  Otherwise,
+ *         return the number of characters copied excluding the trailing
+ *         %NUL, if the length of the user string exceeds @count return
+ *         -EFAULT (in which case @dst will be left without a %NUL
+ *         terminator).
  */
 long strncpy_from_user(char *dst, const char __user *src, long count)
 {