mbox series

[RFC,0/7] runtime format string checking

Message ID 20181026232409.16100-1-linux@rasmusvillemoes.dk (mailing list archive)
Headers show
Series runtime format string checking | expand

Message

Rasmus Villemoes Oct. 26, 2018, 11:24 p.m. UTC
This is a resurrection of something I sent out about a year ago. In
the relatively few places where we use a non-literal as format string,
we can annotate the source with a fmtcheck() call that will (a) at
build time, allow the compiler to check the variadic arguments against
the template, and (b) at runtime, check that the format specifiers
present in the actual format string match those in the template (and
if not, WARN and use the template to ensure runtime type safety).

Finding places to annotate is just

  make -j8 KCFLAGS='-Wformat-nonliteral'

So far, in about half the places I looked, one might as well get
completely rid of the non-literal format string. Patches 5,6,7 are
some examples of where one might add fmtcheck() calls. I don't think
we can get to a state where we can unconditionally add
-Wformat-nonliteral to the build, but I think there's a lot of
low-hanging fruit.

This is on top of Miguel's compiler attributes series [1], which I
hope will land in mainline soon.

[1] https://github.com/ojeda/linux.git tags/compiler-attributes-for-linus-4.20-rc1

Rasmus Villemoes (7):
  compiler_attributes.h: add __attribute__((format_arg)) shorthand
  lib/vsprintf.c: add fmtcheck utility
  kernel.h: implement fmtmatch() wrapper around fmtcheck()
  lib/test_printf.c: add a few fmtcheck() test cases
  kernel/kthread.c: do runtime check of format string in
    kthread_create_on_cpu()
  nfs: use fmtcheck() in root_nfs_data
  drivers: hwmon: add runtime format string checking

 drivers/hwmon/hwmon.c               |  3 +-
 fs/nfs/nfsroot.c                    |  2 +-
 include/linux/compiler_attributes.h | 13 ++++++
 include/linux/kernel.h              | 25 +++++++++++
 kernel/kthread.c                    |  4 +-
 lib/Kconfig.debug                   |  9 ++++
 lib/test_printf.c                   | 43 +++++++++++++++++++
 lib/vsprintf.c                      | 65 +++++++++++++++++++++++++++++
 8 files changed, 160 insertions(+), 4 deletions(-)

Comments

Kees Cook Oct. 30, 2018, 8:58 p.m. UTC | #1
On Sat, Oct 27, 2018 at 12:24 AM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
> This is a resurrection of something I sent out about a year ago. In
> the relatively few places where we use a non-literal as format string,
> we can annotate the source with a fmtcheck() call that will (a) at
> build time, allow the compiler to check the variadic arguments against
> the template, and (b) at runtime, check that the format specifiers
> present in the actual format string match those in the template (and
> if not, WARN and use the template to ensure runtime type safety).
>
> Finding places to annotate is just
>
>   make -j8 KCFLAGS='-Wformat-nonliteral'
>
> So far, in about half the places I looked, one might as well get
> completely rid of the non-literal format string.

Agreed. When I was still updating format string sites regularly, I had
a couple "just make the build quiet" patches to do this:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security&id=ce9b938574042d09920650cb3c63ec29658edc87
The above seemed to "noisy" to send, but perhaps we should just land
it anyway. They really _should_ be const.

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security&id=b7dcfc8f48caaafcc423e5793f7ef61b9bb5c458
This one covers cases where the pointer is pointing to a const string,
so really there's no sense in injecting the "%s", but I was collecting
them to make real ones stand out.

> Patches 5,6,7 are
> some examples of where one might add fmtcheck() calls. I don't think
> we can get to a state where we can unconditionally add
> -Wformat-nonliteral to the build, but I think there's a lot of
> low-hanging fruit.

How much work do you think it'd take to get to a
format-nonliteral-clean build? I think it's worth doing the work if
it's not totally intractable.

-Kees

>
> This is on top of Miguel's compiler attributes series [1], which I
> hope will land in mainline soon.
>
> [1] https://github.com/ojeda/linux.git tags/compiler-attributes-for-linus-4.20-rc1
>
> Rasmus Villemoes (7):
>   compiler_attributes.h: add __attribute__((format_arg)) shorthand
>   lib/vsprintf.c: add fmtcheck utility
>   kernel.h: implement fmtmatch() wrapper around fmtcheck()
>   lib/test_printf.c: add a few fmtcheck() test cases
>   kernel/kthread.c: do runtime check of format string in
>     kthread_create_on_cpu()
>   nfs: use fmtcheck() in root_nfs_data
>   drivers: hwmon: add runtime format string checking
>
>  drivers/hwmon/hwmon.c               |  3 +-
>  fs/nfs/nfsroot.c                    |  2 +-
>  include/linux/compiler_attributes.h | 13 ++++++
>  include/linux/kernel.h              | 25 +++++++++++
>  kernel/kthread.c                    |  4 +-
>  lib/Kconfig.debug                   |  9 ++++
>  lib/test_printf.c                   | 43 +++++++++++++++++++
>  lib/vsprintf.c                      | 65 +++++++++++++++++++++++++++++
>  8 files changed, 160 insertions(+), 4 deletions(-)
>
> --
> 2.19.1.6.gbde171bbf5
>
Rasmus Villemoes Nov. 1, 2018, 10:06 p.m. UTC | #2
On 2018-10-30 21:58, Kees Cook wrote:
> On Sat, Oct 27, 2018 at 12:24 AM, Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security&id=ce9b938574042d09920650cb3c63ec29658edc87
> The above seemed to "noisy" to send, but perhaps we should just land
> it anyway. They really _should_ be const.
>

Isn't that 063246641d4a9e9de84a2466fbad50112faf88dc in mainline ;) ?
BTW, I don't agree with all the changes in there: For auto variables, this

-       const char *cur_drv, *drv = "acpi-cpufreq";
+       const char drv[] = "acpi-cpufreq";
+       const char *cur_drv;

makes gcc actually generate that string on the stack instead of just
referring to an anonymous object in .rodata; one gets code gen like

+:      31 c0                   xor    %eax,%eax
+:      48 b8 61 63 70 69 2d    movabs $0x7570632d69706361,%rax # "acpi-cpu"
+:      63 70 75
+:      c7 44 24 0b 66 72 65    movl   $0x71657266,0xb(%rsp) # "freq"
+:      71
+:      c6 44 24 0f 00          movb   $0x0,0xf(%rsp) "\0"
+:      48 89 44 24 03          mov    %rax,0x3(%rsp)

It's not the-end-of-the-world-horrible, but it's better avoided,
especially for patches that are not supposed to change anything. And
longer strings would of course produce even more gunk like the above.
A better fix which also silences -Wformat-security is to declare the
variable itself const, i.e.

const char *const drv = "acpi-cpufreq".

Yes, gcc should be able to infer the constness of drv from the fact that
it's never assigned to elsewhere in the function... I think I saw that
on some gcc todo list at some point.


> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security&id=b7dcfc8f48caaafcc423e5793f7ef61b9bb5c458
> This one covers cases where the pointer is pointing to a const string,
> so really there's no sense in injecting the "%s", but I was collecting
> them to make real ones stand out.

I don't agree. Yes, a human can verify that _currently_, only "pencrypt"
and "pdecrypt" can ever reach pcrypt_sysfs_add(). But without the
compiler being smart enough to do that, one will never know if some new
caller shows up, or one of those literals grows a % for some reason.
Adding "%s" doesn't cost much, especially not in cases (like this one)
where the fmt+args end up at kobject_set_name_vargs - for a "%s" +
literal that does a (succesful) kstrdup_const(), so we never even hit
the vsnprintf() engine.

>> Patches 5,6,7 are
>> some examples of where one might add fmtcheck() calls. I don't think
>> we can get to a state where we can unconditionally add
>> -Wformat-nonliteral to the build, but I think there's a lot of
>> low-hanging fruit.
> 
> How much work do you think it'd take to get to a
> format-nonliteral-clean build? I think it's worth doing the work if
> it's not totally intractable.

Probably less than the VLA removal. But it kind of depends on which
tools one allows. I can't see how to do it without something like
fmtcheck() to annotate certain places (e.g. the nfs example). Maybe a
no_fmtcheck() to annotate places which have been manually verified
[modulo the above "but that may change..."] would also be needed
(no_fmtcheck would be the same as fmtcheck for at !CONFIG_FMTCHECK
kernel), similar to how we have no_printk.

I kind of agree with Guenther that the hwmon example is a bad one. It
would be better to have the compiler check all those string literals
against a pattern at build time. Probably the format template plugin can
be extended to apply to any "const char*" declaration, not just those
sitting inside structs. But I'd rather get fmtcheck() in first before
returning to work on that plugin.

Rasmus
Kees Cook Nov. 1, 2018, 10:57 p.m. UTC | #3
On Thu, Nov 1, 2018 at 3:06 PM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
> On 2018-10-30 21:58, Kees Cook wrote:
>> On Sat, Oct 27, 2018 at 12:24 AM, Rasmus Villemoes
>> <linux@rasmusvillemoes.dk> wrote:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security&id=ce9b938574042d09920650cb3c63ec29658edc87
>> The above seemed to "noisy" to send, but perhaps we should just land
>> it anyway. They really _should_ be const.
>>
>
> Isn't that 063246641d4a9e9de84a2466fbad50112faf88dc in mainline ;) ?

Oh, hah, so it is. So long ago I forgot. :P

> BTW, I don't agree with all the changes in there: For auto variables, this
>
> -       const char *cur_drv, *drv = "acpi-cpufreq";
> +       const char drv[] = "acpi-cpufreq";
> +       const char *cur_drv;
>
> makes gcc actually generate that string on the stack instead of just
> referring to an anonymous object in .rodata; one gets code gen like
>
> +:      31 c0                   xor    %eax,%eax
> +:      48 b8 61 63 70 69 2d    movabs $0x7570632d69706361,%rax # "acpi-cpu"
> +:      63 70 75
> +:      c7 44 24 0b 66 72 65    movl   $0x71657266,0xb(%rsp) # "freq"
> +:      71
> +:      c6 44 24 0f 00          movb   $0x0,0xf(%rsp) "\0"
> +:      48 89 44 24 03          mov    %rax,0x3(%rsp)

Oh that is nasty. Ugh. I hate the "const but not really ha ha" optimizations. :(

> It's not the-end-of-the-world-horrible, but it's better avoided,
> especially for patches that are not supposed to change anything. And
> longer strings would of course produce even more gunk like the above.
> A better fix which also silences -Wformat-security is to declare the
> variable itself const, i.e.
>
> const char *const drv = "acpi-cpufreq".

Yes, that would be much better. Seems like we could do a really easy
Coccinelle script to fix all of those?

@@
identifier VAR;
expression STRING;
@@

- const char VAR[]
+ const char * const VAR
  = STRING;

yields:
 517 files changed, 890 insertions(+), 891 deletions(-)

Worth doing at the end of -rc2?

> Yes, gcc should be able to infer the constness of drv from the fact that
> it's never assigned to elsewhere in the function... I think I saw that
> on some gcc todo list at some point.

If you find that bug, I'll add it to my gcc bug tracking list. :P

>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security&id=b7dcfc8f48caaafcc423e5793f7ef61b9bb5c458
>> This one covers cases where the pointer is pointing to a const string,
>> so really there's no sense in injecting the "%s", but I was collecting
>> them to make real ones stand out.
>
> I don't agree. Yes, a human can verify that _currently_, only "pencrypt"
> and "pdecrypt" can ever reach pcrypt_sysfs_add(). But without the
> compiler being smart enough to do that, one will never know if some new
> caller shows up, or one of those literals grows a % for some reason.
> Adding "%s" doesn't cost much, especially not in cases (like this one)
> where the fmt+args end up at kobject_set_name_vargs - for a "%s" +
> literal that does a (succesful) kstrdup_const(), so we never even hit
> the vsnprintf() engine.

Okay, then I'll forward this to akpm maybe?

>>> Patches 5,6,7 are
>>> some examples of where one might add fmtcheck() calls. I don't think
>>> we can get to a state where we can unconditionally add
>>> -Wformat-nonliteral to the build, but I think there's a lot of
>>> low-hanging fruit.
>>
>> How much work do you think it'd take to get to a
>> format-nonliteral-clean build? I think it's worth doing the work if
>> it's not totally intractable.
>
> Probably less than the VLA removal. But it kind of depends on which
> tools one allows. I can't see how to do it without something like
> fmtcheck() to annotate certain places (e.g. the nfs example). Maybe a
> no_fmtcheck() to annotate places which have been manually verified
> [modulo the above "but that may change..."] would also be needed
> (no_fmtcheck would be the same as fmtcheck for at !CONFIG_FMTCHECK
> kernel), similar to how we have no_printk.
>
> I kind of agree with Guenther that the hwmon example is a bad one. It
> would be better to have the compiler check all those string literals
> against a pattern at build time. Probably the format template plugin can
> be extended to apply to any "const char*" declaration, not just those
> sitting inside structs. But I'd rather get fmtcheck() in first before
> returning to work on that plugin.

Yeah, fair.

-Kees