Message ID | 20190218232308.11241-6-tobin@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lib: Add safe string funtions | expand |
+cc Andy because he's also preparing a patch for this function On Tue, Feb 19, 2019 at 12:25 AM Tobin C. Harding <tobin@kernel.org> wrote: > Current function documentation for strncpy_from_user() is incorrect. If > @count (size of destination buffer) is less than the length of the user > string the function does _not_ return @count but rather returns -EFAULT. > > Document correctly the function return value, also add note that string > may be left non-null terminated. > > Signed-off-by: Tobin C. Harding <tobin@kernel.org> > --- > lib/strncpy_from_user.c | 17 +++++++---------- > 1 file changed, 7 insertions(+), 10 deletions(-) > > diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c > index 58eacd41526c..11fe9a4a00fd 100644 > --- a/lib/strncpy_from_user.c > +++ b/lib/strncpy_from_user.c > @@ -82,22 +82,19 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long > } > > /** > - * strncpy_from_user: - Copy a NUL terminated string from userspace. > + * strncpy_from_user() - Copy a NUL terminated string from userspace. > * @dst: Destination address, in kernel space. This buffer must be at > * least @count bytes long. > * @src: Source address, in user space. > - * @count: Maximum number of bytes to copy, including the trailing NUL. > + * @count: Maximum number of bytes to copy, including the trailing %NUL. > * > * Copies a NUL-terminated string from userspace to kernel space. > * > - * On success, returns the length of the string (not including the trailing > - * NUL). > - * > - * If access to userspace fails, returns -EFAULT (some data may have been > - * copied). > - * > - * If @count is smaller than the length of the string, copies @count bytes > - * and returns @count. > + * Return: If access to userspace fails, returns -EFAULT. Otherwise, > + * return the number of characters copied excluding the trailing > + * %NUL, if the length of the user string exceeds @count return > + * -EFAULT (in which case @dst will be left without a %NUL > + * terminator). > */ AFAICS the byte_at_a_time loop exits when max==0 is reached, and then if `res >= count` (in other words, if we've copied as many bytes as requested, haven't encountered a null byte so far, and haven't reached the end of the address space), we return `res`, which is the same as `count`. Are you sure?
On Tue, Feb 19, 2019 at 01:51:45AM +0100, Jann Horn wrote: > +cc Andy because he's also preparing a patch for this function > > On Tue, Feb 19, 2019 at 12:25 AM Tobin C. Harding <tobin@kernel.org> wrote: > > Current function documentation for strncpy_from_user() is incorrect. If > > @count (size of destination buffer) is less than the length of the user > > string the function does _not_ return @count but rather returns -EFAULT. > > > > Document correctly the function return value, also add note that string > > may be left non-null terminated. > > > > Signed-off-by: Tobin C. Harding <tobin@kernel.org> > > --- > > lib/strncpy_from_user.c | 17 +++++++---------- > > 1 file changed, 7 insertions(+), 10 deletions(-) > > > > diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c > > index 58eacd41526c..11fe9a4a00fd 100644 > > --- a/lib/strncpy_from_user.c > > +++ b/lib/strncpy_from_user.c > > @@ -82,22 +82,19 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long > > } > > > > /** > > - * strncpy_from_user: - Copy a NUL terminated string from userspace. > > + * strncpy_from_user() - Copy a NUL terminated string from userspace. > > * @dst: Destination address, in kernel space. This buffer must be at > > * least @count bytes long. > > * @src: Source address, in user space. > > - * @count: Maximum number of bytes to copy, including the trailing NUL. > > + * @count: Maximum number of bytes to copy, including the trailing %NUL. > > * > > * Copies a NUL-terminated string from userspace to kernel space. > > * > > - * On success, returns the length of the string (not including the trailing > > - * NUL). > > - * > > - * If access to userspace fails, returns -EFAULT (some data may have been > > - * copied). > > - * > > - * If @count is smaller than the length of the string, copies @count bytes > > - * and returns @count. > > + * Return: If access to userspace fails, returns -EFAULT. Otherwise, > > + * return the number of characters copied excluding the trailing > > + * %NUL, if the length of the user string exceeds @count return > > + * -EFAULT (in which case @dst will be left without a %NUL > > + * terminator). > > */ > > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then > if `res >= count` (in other words, if we've copied as many bytes as > requested, haven't encountered a null byte so far, and haven't reached > the end of the address space), we return `res`, which is the same as > `count`. Are you sure? Thanks for the review Jaan. Seems I was in a world of confused, I misread count to be the size of the destination buffer not the size of the source string. Also, re-reading strscpy() and why its better I think I may not be the correct person to implement strscpy_from_user() since I'm not across all the subtleties. If v2 seems wanted for the first patches in this set I'll drop the *_from_user() stuff. thanks, Tobin.
On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote: > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then > if `res >= count` (in other words, if we've copied as many bytes as > requested, haven't encountered a null byte so far, and haven't reached > the end of the address space), we return `res`, which is the same as > `count`. Are you sure? Oh, whew, there is only 1 arch-specific implementation of this. I thought you meant there was multiple implementations. So, generally speaking, I'd love to split all strncpy* uses into strscpy_zero() (when expecting to do str->str copies), and some new function, named like mempadstr() or str2mem() that copies a str to a __nonstring char array, with trailing padding, if there is space. Then there is no more mixing the two cases and botching things.
On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote: > On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote: > > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then > > if `res >= count` (in other words, if we've copied as many bytes as > > requested, haven't encountered a null byte so far, and haven't reached > > the end of the address space), we return `res`, which is the same as > > `count`. Are you sure? > > Oh, whew, there is only 1 arch-specific implementation of this. I > thought you meant there was multiple implementations. > > So, generally speaking, I'd love to split all strncpy* uses into > strscpy_zero() (when expecting to do str->str copies), and some new > function, named like mempadstr() or str2mem() that copies a str to a > __nonstring char array, with trailing padding, if there is space. Then > there is no more mixing the two cases and botching things. Oh cool, treewide changes, I'm down with that. So to v2 I'll add str2mem() and then attack the tree as suggested. What could possibly go wrong :)? Tobin
On Wed, Feb 20, 2019 at 9:25 PM Tobin C. Harding <me@tobin.cc> wrote: > > On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote: > > So, generally speaking, I'd love to split all strncpy* uses into > > strscpy_zero() (when expecting to do str->str copies), and some new > > function, named like mempadstr() or str2mem() that copies a str to a > > __nonstring char array, with trailing padding, if there is space. Then > > there is no more mixing the two cases and botching things. I should use "converts" instead of "copies" above, just to drive the point home. :) > > Oh cool, treewide changes, I'm down with that. So to v2 I'll add > str2mem() and then attack the tree as suggested. What could possibly go > wrong :)? Some clear documentation needs to be written for str2mem() to really help people understand what a "non string" character array is (especially given that it LOOKS like it has NUL termination -- when in fact it's just "padding"). The tree-wide changes will likely take a while (and don't need to be part of this series unless you want to find a couple good examples) since we have to do them case-by-case: it's not always obvious when it's actually a non-string, so getting help from maintainers here will be needed. (And maybe some kind of flow chart added to Documentation/process/deprecated.rst for how to stop using strncpy() and strlcpy().) What I can't quite figure out yet is how to find a way for sfr to flag newly added users of strcpy, strncpy, and strlcpy. We might need to bring back __deprecated, but hide it behind a W=linux-next flag or something crazy. Stephen, in your builds you're already injecting -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I think we need some W= setting for your linux-next builds that generate the maintainer-nag warnings... -Kees P.S. Here's C string API Rant (I just had to get this out, please feel free to ignore): strcpy returns dest ... which is already known, so it's a meaningless return value. strncpy returns dest (still meaningless) strlcpy returns strlen(src) ... the length we WOULD have copied. Why would we care about that? I'm operating on dest. Were there callers that needed to both copy part of src and learn how long it was at the same time? strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about what actually happened from the operation! ... snprintf returns what it WOULD have written, much like strlcpy above. At least snprintf has an excuse: it can be used to calculate an allocation size (called with a NULL dest and 0 size) ... but shouldn't "how big is this format string going to be?" be a separate function? I wonder if we can kill all kernel uses of snprintf too (after introducing a "how big would it be?" function and switching all other callers over to scnprintf)... So scnprintf() does the right thing (count of non-NUL bytes copied out). So now our safe(r?) string API versions use different letters to show they're safe: "s" in strcpy and "cn" in sprintf. /me cries forever.
On Thu, Feb 21, 2019 at 2:05 AM Kees Cook <keescook@chromium.org> wrote: > On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote: > > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then > > if `res >= count` (in other words, if we've copied as many bytes as > > requested, haven't encountered a null byte so far, and haven't reached > > the end of the address space), we return `res`, which is the same as > > `count`. Are you sure? > > Oh, whew, there is only 1 arch-specific implementation of this. I > thought you meant there was multiple implementations. Not sure what you're talking about. Are you talking about implementations of strncpy_from_user()? csky: custom strncpy_from_user() calls custom __do_strncpy_from_user, which is a macro containing a big asm() block. h8300: ;;; long strncpy_from_user(void *to, void *from, size_t n) strncpy_from_user: mov.l er2,er2 bne 1f sub.l er0,er0 rts [...] mips: static inline long strncpy_from_user(char *__to, const char __user *__from, long __len) { long res; if (eva_kernel_access()) { __asm__ __volatile__( "move\t$4, %1\n\t" "move\t$5, %2\n\t" "move\t$6, %3\n\t" [...] unicore32: ENTRY(__strncpy_from_user) mov ip, r1 1: sub.a r2, r2, #1 ldrusr r3, r1, 1, ns bfs 2f stb.w r3, [r0]+, #1 cxor.a r3, #0 [...] and so on. > So, generally speaking, I'd love to split all strncpy* uses into > strscpy_zero() (when expecting to do str->str copies), and some new > function, named like mempadstr() or str2mem() that copies a str to a > __nonstring char array, with trailing padding, if there is space. Then > there is no more mixing the two cases and botching things.
On 21/02/2019 07.02, Kees Cook wrote: > P.S. Here's C string API Rant (I just had to get this out, please feel > free to ignore): I'll bite. First, it's "linux kernel string API", only some of string.h interfaces are in std C. Sure, none of those satisfy all use cases, but adding Yet Another One also has its costs. > strcpy returns dest ... which is already known, so it's a meaningless > return value. > > strncpy returns dest (still meaningless) Yeah, same for memcpy, memset. Those are classic C interfaces. It does allow some micro-optimizations in some cases - since one is likely to continue doing other stuff with dest, the compiler might use the fact that it has dest in the return register instead of spilling and reloading it. But I do wonder what the real rationale was. > strlcpy returns strlen(src) ... the length we WOULD have copied. Why > would we care about that? I'm operating on dest. Were there callers > that needed to both copy part of src and learn how long it was at the > same time? The rationale is exactly the same as for snprintf - to allow you to optimistically format/copy to a buffer, and know exactly how much to realloc() in case it turned out to be too small. Of course, nobody ever uses strlcpy like that, since just doing strdup() is much much simpler and less error-prone. So yes, strlcpy() is often a silly interface to use. > strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about > what actually happened from the operation! Well, strictly speaking, strlcpy()'s return value also tells you exactly what happened, just not in the kernel "negative errno for error condition" style. > ... snprintf returns what it WOULD have written, much like strlcpy > above. At least snprintf has an excuse: it can be used to calculate an > allocation size (called with a NULL dest and 0 size) ... but shouldn't > "how big is this format string going to be?" be a separate function? Why? Sure, you can make a wrapper vhowmuch(const char *fmt, va_args ap), but its implementation would have to be "return vsnprintf(NULL, 0, fmt, ap);", since we really must reuse the exact same logic for computing the length as for doing the actual printf'ing (otherwise they'd get out of sync). > I wonder if we can kill all kernel uses of snprintf too (after > introducing a "how big would it be?" function and switching all other > callers over to scnprintf)... Please no. snprintf() is standardized, has sane semantics (though one sometimes _want_ scnprintf semantics - in which case one can and should use that...), and, importantly, gcc understands those semantics. So we get optimizations and diagnostics that we'd miss if we kill off explicit snprintf and replace with non-standard howmuch+scnprintf. > So scnprintf() does the right thing (count of non-NUL bytes copied out). The right thing, when that's the thing you want to know. Which it is in the "build a string in a buffer by repeated printf calls, perhaps intermixed with a little flow control logic". But not so in a "format this stuff to this fixed-size char buffer inside that struct, and let me know [i.e., 'give me a way of knowing'] if it all fit". Hello, I'm you super-fantastic-one-size-fits-all string copying API. Please carefully fill out the following form, and I'll get back to you ASAP. A1: destination A2: max bytes to write B1: source B2: max bytes to read C1: do you want me to nul-terminate the destination? C2: if C1, should that be done by truncating the result if necessary? C3: how do you want me to handle leftover space in dest, if any? C4: should the destination be left entirely untouched in case the result does not fit? C5: is it an error if I do not encounter a nul byte somewhere in [B1, B1+B2-1] ? C6: how do you want me to report on the results of my efforts and my discoveries? C7: are there any special conditions I should take into account? (overlapping buffers, allergies, ...) Put -1 under A2 to mean "assume there's room enough". Put -1 under B2 to mean "assume there's a nul byte before walking into lala-land". Answering yes to C1 and C2 and writing 0 under A2 causes your CapsLock to stay permanently on. If there's any ambiguity in your answer to C6, the machine reboots. Rasmus
On Thu, Feb 21, 2019 at 7:02 AM Kees Cook <keescook@chromium.org> wrote: > On Wed, Feb 20, 2019 at 9:25 PM Tobin C. Harding <me@tobin.cc> wrote: > > > > On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote: > > > So, generally speaking, I'd love to split all strncpy* uses into > > > strscpy_zero() (when expecting to do str->str copies), and some new > > > function, named like mempadstr() or str2mem() that copies a str to a > > > __nonstring char array, with trailing padding, if there is space. Then > > > there is no more mixing the two cases and botching things. > > I should use "converts" instead of "copies" above, just to drive the > point home. :) > > > > > Oh cool, treewide changes, I'm down with that. So to v2 I'll add > > str2mem() and then attack the tree as suggested. What could possibly go > > wrong :)? > > Some clear documentation needs to be written for str2mem() to really > help people understand what a "non string" character array is > (especially given that it LOOKS like it has NUL termination -- when in > fact it's just "padding"). > > The tree-wide changes will likely take a while (and don't need to be > part of this series unless you want to find a couple good examples) > since we have to do them case-by-case: it's not always obvious when > it's actually a non-string, so getting help from maintainers here will > be needed. (And maybe some kind of flow chart added to > Documentation/process/deprecated.rst for how to stop using strncpy() > and strlcpy().) Wild brainstorming ahead... Such non-string character arrays are usually fixed-size, right? We could use struct types around such character arrays to hide the actual characters (so that people don't accidentally use them with C string APIs), and enforce that the length is passed around as part of the type... something like: #define char_array(len) struct { char __ca_data[len]; } #define __CA_UNWRAP(captr) (captr)->__ca_data, sizeof((captr)->__ca_data) void __str2ca(char *dst, size_t dst_len, char *src) { size_t src_len = strlen(src); if (WARN_ON(src_len > dst_len)) { // or whatever else we think is the safest way to deal with this // without crashing, if BUG() is not an option. memset(dst, 0, dst_len); return; } memcpy(dst, src, src_len); memset(dst + src_len, 0, dst_len - src_len); } #define str2ca(dst, src) __str2ca(__CA_UNWRAP(dst), (src)) static inline void __ca2ca(char *dst, size_t dst_len, char *src, size_t src_len) { BUILD_BUG_ON(dst_len < src_len); memcpy(dst, src, src_len); if (src_len != dst_len) { memset(dst + src_len, 0, dst_len - src_len); } } #define ca2ca(dst, src) __ca2ca(__CA_UNWRAP(dst), __CA_UNWRAP(src)) And if you want to do direct assignments instead of using helpers, you make a typedef like this (since just assigning between two equivalent structs doesn't work): typedef char_array(20) blah_name; blah_name global_name; int main(int argc, char **argv) { blah_name local_name; str2ca(&local_name, argv[1]); global_name = local_name; } You'd also have to use a typedef like this if you want to pass references to other functions. Something like this would also be neat for classic C strings in some situations - if you can put bounds in the types, or (if the string is dynamically-sized) in the struct, you don't need to messily pass around bounds for things like snprint() and so on. > What I can't quite figure out yet is how to find a way for sfr to flag > newly added users of strcpy, strncpy, and strlcpy. We might need to > bring back __deprecated, but hide it behind a W=linux-next flag or > something crazy. Stephen, in your builds you're already injecting > -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I > think we need some W= setting for your linux-next builds that generate > the maintainer-nag warnings... > > -Kees > > P.S. Here's C string API Rant (I just had to get this out, please feel > free to ignore): > > strcpy returns dest ... which is already known, so it's a meaningless > return value. > > strncpy returns dest (still meaningless) Weeeell... it kinda makes sense if you're trying to micro-optimize the amount of stack spills. If you write code like this: char *a = malloc(10); a = strcpy(a, other_string); return a; ... then the compiler doesn't have to shove `a` in one of the callee-saved registers or spill it to the stack around the strcpy(). Plus, it might allow tail-call optimizations in some cases. > strlcpy returns strlen(src) ... the length we WOULD have copied. Why > would we care about that? I'm operating on dest. Were there callers > that needed to both copy part of src and learn how long it was at the > same time? You could use it to optimistically attempt a copy in a fastpath, and then fall back to a reallocation if that failed. > strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about > what actually happened from the operation! > > ... snprintf returns what it WOULD have written, much like strlcpy > above. At least snprintf has an excuse: it can be used to calculate an > allocation size (called with a NULL dest and 0 size) ... but shouldn't > "how big is this format string going to be?" be a separate function? I > wonder if we can kill all kernel uses of snprintf too (after > introducing a "how big would it be?" function and switching all other > callers over to scnprintf)... > > So scnprintf() does the right thing (count of non-NUL bytes copied out). That seems kinda suboptimal. Wouldn't you ideally want to bail out with an error, or at least do a WARN(), if you're trying to format a string that doesn't fit into its destination? Like, for example, if you're assembling a path, and the path doesn't fit into a buffer, and so you just use half of it, you might end up in a very different place from where you intended to go. > So now our safe(r?) string API versions use different letters to show > they're safe: "s" in strcpy and "cn" in sprintf. /me cries forever.
Hi Kees, On Wed, 20 Feb 2019 22:02:32 -0800 Kees Cook <keescook@chromium.org> wrote: > > What I can't quite figure out yet is how to find a way for sfr to flag > newly added users of strcpy, strncpy, and strlcpy. We might need to > bring back __deprecated, but hide it behind a W=linux-next flag or > something crazy. Stephen, in your builds you're already injecting > -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I > think we need some W= setting for your linux-next builds that generate > the maintainer-nag warnings... I just have a set of compiler flags that my build scripts explicitly enable by setting KCFLAGS.
On Thu, Feb 21, 2019 at 6:28 AM Jann Horn <jannh@google.com> wrote: > On Thu, Feb 21, 2019 at 2:05 AM Kees Cook <keescook@chromium.org> wrote: > > On Mon, Feb 18, 2019 at 4:52 PM Jann Horn <jannh@google.com> wrote: > > > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then > > > if `res >= count` (in other words, if we've copied as many bytes as > > > requested, haven't encountered a null byte so far, and haven't reached > > > the end of the address space), we return `res`, which is the same as > > > `count`. Are you sure? > > > > Oh, whew, there is only 1 arch-specific implementation of this. I > > thought you meant there was multiple implementations. > > Not sure what you're talking about. Are you talking about > implementations of strncpy_from_user()? Ah, I used a bad match. But it seems it's only about half (and not x86, arm, powerpc).
On Thu, Feb 21, 2019 at 6:58 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 21/02/2019 07.02, Kees Cook wrote: > > > P.S. Here's C string API Rant (I just had to get this out, please feel > > free to ignore): > > I'll bite. First, it's "linux kernel string API", only some of string.h > interfaces are in std C. Sure, none of those satisfy all use cases, but > adding Yet Another One also has its costs. Well, yes, strscpy and scnprintf appear to be only in the kernel (did the originate externally somewhere I'm not aware of?) > > strcpy returns dest ... which is already known, so it's a meaningless > > return value. > > > > strncpy returns dest (still meaningless) > > Yeah, same for memcpy, memset. Those are classic C interfaces. It does > allow some micro-optimizations in some cases - since one is likely to Right, yes, but this level of optimization is best left to the compiler. There's much more interesting results a caller should care about. :) > > strlcpy returns strlen(src) ... the length we WOULD have copied. Why > > would we care about that? I'm operating on dest. Were there callers > > that needed to both copy part of src and learn how long it was at the > > same time? > > The rationale is exactly the same as for snprintf - to allow you to Okay, but there's as separate function for that: strlen(). > > strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about > > what actually happened from the operation! > > Well, strictly speaking, strlcpy()'s return value also tells you exactly > what happened, just not in the kernel "negative errno for error > condition" style. True, yes, but it's unfriendly: it requires careful math, just like snprintf, which depends on rechecking the args, making sure you're not doing an off-by-one from the NUL, etc etc. > > ... snprintf returns what it WOULD have written, much like strlcpy > > above. At least snprintf has an excuse: it can be used to calculate an > > allocation size (called with a NULL dest and 0 size) ... but shouldn't > > "how big is this format string going to be?" be a separate function? > > Why? Sure, you can make a wrapper vhowmuch(const char *fmt, va_args ap), > but its implementation would have to be "return vsnprintf(NULL, 0, fmt, > ap);", since we really must reuse the exact same logic for computing the > length as for doing the actual printf'ing (otherwise they'd get out of > sync). Well, I'd likely go the opposite directly: make snprintf() a wrapper and call vhowmuch(), etc, convert until snprintf could be removed. But really the best might be removal of snprintf first, then refactor it with vhowmuch() etc. Lots of ways to solve it, I suppose. But dropping the unfriendly API would be nice. > Please no. snprintf() is standardized, has sane semantics (though one No, it doesn't -- it has well-defined semantics, but using it correctly is too hard. It's a regular source of bugs. (Not *nearly* as bad as strncpy, so let's stick to dropping one bad API at a time...) > sometimes _want_ scnprintf semantics - in which case one can and should > use that...), and, importantly, gcc understands those semantics. So we > get optimizations and diagnostics that we'd miss if we kill off explicit > snprintf and replace with non-standard howmuch+scnprintf. Those diagnostics can be had with the __printf() markings already... > > So scnprintf() does the right thing (count of non-NUL bytes copied out). > > The right thing, when that's the thing you want to know. Which it is in > the "build a string in a buffer by repeated printf calls, perhaps > intermixed with a little flow control logic". But not so in a "format > this stuff to this fixed-size char buffer inside that struct, and let me > know [i.e., 'give me a way of knowing'] if it all fit". Do we need ssprintf() to get the -E2BIG result like strscpy()? > Hello, I'm you super-fantastic-one-size-fits-all string copying API. > Please carefully fill out the following form, and I'll get back to you ASAP. I mean, this is kinda what we have already. We just keep exposing too many knobs. :)
On Thu, Feb 21, 2019 at 8:07 AM Jann Horn <jannh@google.com> wrote: > > On Thu, Feb 21, 2019 at 7:02 AM Kees Cook <keescook@chromium.org> wrote: > > The tree-wide changes will likely take a while (and don't need to be > > part of this series unless you want to find a couple good examples) > > since we have to do them case-by-case: it's not always obvious when > > it's actually a non-string, so getting help from maintainers here will > > be needed. (And maybe some kind of flow chart added to > > Documentation/process/deprecated.rst for how to stop using strncpy() > > and strlcpy().) > > Wild brainstorming ahead... > > Such non-string character arrays are usually fixed-size, right? We > could use struct types around such character arrays to hide the actual > characters (so that people don't accidentally use them with C string > APIs), and enforce that the length is passed around as part of the > type... something like: > > #define char_array(len) struct { char __ca_data[len]; } Does this need __packed or will the compiler understand it's byte-aligned? And it needs __nonstring :) > #define __CA_UNWRAP(captr) (captr)->__ca_data, sizeof((captr)->__ca_data) > > void __str2ca(char *dst, size_t dst_len, char *src) { > size_t src_len = strlen(src); > if (WARN_ON(src_len > dst_len)) { > // or whatever else we think is the safest way to deal with this > // without crashing, if BUG() is not an option. > memset(dst, 0, dst_len); > return; > } > memcpy(dst, src, src_len); > memset(dst + src_len, 0, dst_len - src_len); > } > #define str2ca(dst, src) __str2ca(__CA_UNWRAP(dst), (src)) > > static inline void __ca2ca(char *dst, size_t dst_len, char *src, > size_t src_len) { > BUILD_BUG_ON(dst_len < src_len); > memcpy(dst, src, src_len); > if (src_len != dst_len) { > memset(dst + src_len, 0, dst_len - src_len); > } > } > #define ca2ca(dst, src) __ca2ca(__CA_UNWRAP(dst), __CA_UNWRAP(src)) Yeah, I was trying to think of ways to do this but I was thinking mostly about noticing the __nonstring mark. #define-ing this away might work, but it might also just annoy people more. At least GCC will yell about __nonstring use in some cases. > And if you want to do direct assignments instead of using helpers, you > make a typedef like this (since just assigning between two equivalent > structs doesn't work): > > typedef char_array(20) blah_name; > blah_name global_name; > int main(int argc, char **argv) { > blah_name local_name; > str2ca(&local_name, argv[1]); > global_name = local_name; > } > > You'd also have to use a typedef like this if you want to pass > references to other functions. Yeah, it might work. I need to go look through ext4 -- that's one place I know there were "legit" strncpy() uses... Converting existing non-string cases to this and see if it's worse would be educational. > Something like this would also be neat for classic C strings in some > situations - if you can put bounds in the types, or (if the string is > dynamically-sized) in the struct, you don't need to messily pass > around bounds for things like snprint() and so on. Yeah, I remain annoyed about string pointers having lost their allocation size meta data. The vast majority of str*() functions could just be "str, sizeof(str)" like you have for the CA_UNWRAP. > > So scnprintf() does the right thing (count of non-NUL bytes copied out). > > That seems kinda suboptimal. Wouldn't you ideally want to bail out > with an error, or at least do a WARN(), if you're trying to format a > string that doesn't fit into its destination? Like, for example, if > you're assembling a path, and the path doesn't fit into a buffer, and > so you just use half of it, you might end up in a very different place > from where you intended to go. ssprintf() with -E2BIG? Most correct users of snprintf()'s return value do some version of trying to detect if there wasn't enough space and then error out: static inline int usb_make_path(struct usb_device *dev, char *buf, size_t size) { int actual; actual = snprintf(buf, size, "usb-%s-%s", dev->bus->bus_name, dev->devpath); return (actual >= (int)size) ? -1 : actual; } a) this could just be "return ssprintf(..." b) as far as I see, there are literally 2 callers of usb_make_path() that check the return value Anyway, snprintf() is "next". I'd like to get through strcpy/strncpy/strlcpy removal first... :)
On Thu, Feb 21, 2019 at 12:27 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote: > > Hi Kees, > > On Wed, 20 Feb 2019 22:02:32 -0800 Kees Cook <keescook@chromium.org> wrote: > > > > What I can't quite figure out yet is how to find a way for sfr to flag > > newly added users of strcpy, strncpy, and strlcpy. We might need to > > bring back __deprecated, but hide it behind a W=linux-next flag or > > something crazy. Stephen, in your builds you're already injecting > > -Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I > > think we need some W= setting for your linux-next builds that generate > > the maintainer-nag warnings... > > I just have a set of compiler flags that my build scripts explicitly > enable by setting KCFLAGS. Okay, so you could include some -D option to enable it. I'll see if I can cook something up.
On 22/02/2019 00.03, Kees Cook wrote: > On Thu, Feb 21, 2019 at 6:58 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> On 21/02/2019 07.02, Kees Cook wrote: >> > >>> ... snprintf returns what it WOULD have written, much like strlcpy >>> above. At least snprintf has an excuse: it can be used to calculate an >>> allocation size (called with a NULL dest and 0 size) ... but shouldn't >>> "how big is this format string going to be?" be a separate function? >> >> Why? Sure, you can make a wrapper vhowmuch(const char *fmt, va_args ap), >> but its implementation would have to be "return vsnprintf(NULL, 0, fmt, >> ap);", since we really must reuse the exact same logic for computing the >> length as for doing the actual printf'ing (otherwise they'd get out of >> sync). > > Well, I'd likely go the opposite directly: make snprintf() a wrapper > and call vhowmuch(), etc, convert until snprintf could be removed.But > really the best might be removal of snprintf first, then refactor it > with vhowmuch() etc. Lots of ways to solve it, I suppose. I'd really like to see how vsprintf.c would look in a world where the workhorse (which is vsnprintf() and its tree of helpers) would not format and measure at the same time. But dropping > the unfriendly API would be nice. > >> Please no. snprintf() is standardized, has sane semantics (though one > > No, it doesn't -- it has well-defined semantics, We'll just have to disagree on what sane means. > but using it > correctly is too hard. It's a regular source of bugs. (Not *nearly* as > bad as strncpy, so let's stick to dropping one bad API at a time...) > >> sometimes _want_ scnprintf semantics - in which case one can and should >> use that...), and, importantly, gcc understands those semantics. So we >> get optimizations and diagnostics that we'd miss if we kill off explicit >> snprintf and replace with non-standard howmuch+scnprintf. > > Those diagnostics can be had with the __printf() markings already... No. You're thinking of the type-checking things. Those are important, of course, but have nothing at all to do with the s or n parts of snprintf - what I'm thinking of is the fact that gcc knows that buf is a char-buffer of length len into which snprintf() may/will write, so we have the Wformat-truncation and Warray-bounds kind of warnings. And the optimization part is where gcc can replace a snprintf() with a simpler string op (e.g. a memcpy), or know that an overflow check can be elided because "%d" does fit in the buf[16], or... One thing I've had on my wishlist for a long time is a buffer_size(ptrarg, expr, r/w/rw) attribute that would say that the function treats the pointer argument ptrarg as a buffer of size given by the expr (in bytes), and accesses it according to the third parameter. Then the compiler could automatically check with its builtin_objsize machinery for mismatched buffer size computations (e.g., using sizeof() when the interface expects ARRAY_SIZE()) violations or uninitialized reads, or any number of optional run-time instrumentations in caller or callee... So memcpy(void *dst, const void *src, size_t len) __buffer_size(dst, len, "w") __buffer_size(src, len, "r") or int poll(struct pollfd *fds, nfds_t nfds, int timeout) __buffer_size(fds, nfds*sizeof(struct pollfd), "rw") the latter being an example of where an arbitrary expression in terms of the formal parameters is needed - but I don't think gcc's attribute machinery supports this syntax at all. Bonus points if one could attach buffer_size to a struct definition as well, saying that the ->foo member points to ->bar*4 of storage. Rasmus
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c index 58eacd41526c..11fe9a4a00fd 100644 --- a/lib/strncpy_from_user.c +++ b/lib/strncpy_from_user.c @@ -82,22 +82,19 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long } /** - * strncpy_from_user: - Copy a NUL terminated string from userspace. + * strncpy_from_user() - Copy a NUL terminated string from userspace. * @dst: Destination address, in kernel space. This buffer must be at * least @count bytes long. * @src: Source address, in user space. - * @count: Maximum number of bytes to copy, including the trailing NUL. + * @count: Maximum number of bytes to copy, including the trailing %NUL. * * Copies a NUL-terminated string from userspace to kernel space. * - * On success, returns the length of the string (not including the trailing - * NUL). - * - * If access to userspace fails, returns -EFAULT (some data may have been - * copied). - * - * If @count is smaller than the length of the string, copies @count bytes - * and returns @count. + * Return: If access to userspace fails, returns -EFAULT. Otherwise, + * return the number of characters copied excluding the trailing + * %NUL, if the length of the user string exceeds @count return + * -EFAULT (in which case @dst will be left without a %NUL + * terminator). */ long strncpy_from_user(char *dst, const char __user *src, long count) {
Current function documentation for strncpy_from_user() is incorrect. If @count (size of destination buffer) is less than the length of the user string the function does _not_ return @count but rather returns -EFAULT. Document correctly the function return value, also add note that string may be left non-null terminated. Signed-off-by: Tobin C. Harding <tobin@kernel.org> --- lib/strncpy_from_user.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)