[RFC/RFT] gcc-plugins: force initialize auto variables whose addresses are taken
diff mbox

Message ID 20170706101317.10382-1-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel July 6, 2017, 10:13 a.m. UTC
To prevent leaking stack contents in cases where it is not possible
for the compiler to figure out whether an automatic variable has been
initialized or not, add a plugin that forcibly initializes all automatic
variables of struct/union types if their address is taken at any point.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/Kconfig                               |   9 ++
 scripts/Makefile.gcc-plugins               |   3 +
 scripts/gcc-plugins/initautobyref_plugin.c | 159 ++++++++++++++++++++
 3 files changed, 171 insertions(+)

Comments

Arnd Bergmann July 6, 2017, 11:09 a.m. UTC | #1
On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> To prevent leaking stack contents in cases where it is not possible
> for the compiler to figure out whether an automatic variable has been
> initialized or not, add a plugin that forcibly initializes all automatic
> variables of struct/union types if their address is taken at any point.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

If we only do this for variables that have their address taken, we miss
a lot of cases that the compiler (both clang and gcc) decide not to warn
about but that can still cause undefined behavior, e.g.:

extern int g(void);
int f(void)
{
        int i;

        switch (g()) {
        case 1:
                i = 0;
        }

        return i;
}

which gets compiled without warning under the assumption that
g() always returns '1':

0000000000000000 <f>:
   0: 48 83 ec 08           sub    $0x8,%rsp
   4: e8 00 00 00 00       callq  9 <f+0x9>
5: R_X86_64_PC32 g-0x4
   9: b8 00 00 00 00       mov    $0x0,%eax
   e: 48 83 c4 08           add    $0x8,%rsp
  12: c3                   retq

Detecting those cases from the plugin may be a lot harder.

> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -443,6 +443,15 @@ config GCC_PLUGIN_STRUCTLEAK_VERBOSE
>           initialized. Since not all existing initializers are detected
>           by the plugin, this can produce false positive warnings.
>
> +config GCC_PLUGIN_INITAUTOBYREF
> +       bool "Force initialization of auto variables that have their address taken"
> +       depends on GCC_PLUGINS
> +       help
> +
> +config GCC_PLUGIN_INITAUTOBYREF_VERBOSE
> +       bool "Report uninitialized auto variables that have their address taken"
> +       depends on GCC_PLUGIN_INITAUTOBYREF

I think this should be

      depends on GCC_PLUGIN_INITAUTOBYREF || COMPILE_TEST

to avoid producing output in an allmodconfig build.

      Arnd
Arnd Bergmann July 6, 2017, 11:25 a.m. UTC | #2
On Thu, Jul 6, 2017 at 1:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> To prevent leaking stack contents in cases where it is not possible
>> for the compiler to figure out whether an automatic variable has been
>> initialized or not, add a plugin that forcibly initializes all automatic
>> variables of struct/union types if their address is taken at any point.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> If we only do this for variables that have their address taken, we miss
> a lot of cases that the compiler (both clang and gcc) decide not to warn
> about but that can still cause undefined behavior, e.g.:
>
> extern int g(void);
> int f(void)
> {
>         int i;
>
>         switch (g()) {
>         case 1:
>                 i = 0;
>         }
>
>         return i;
> }
>
> which gets compiled without warning under the assumption that
> g() always returns '1':
>
> 0000000000000000 <f>:
>    0: 48 83 ec 08           sub    $0x8,%rsp
>    4: e8 00 00 00 00       callq  9 <f+0x9>
> 5: R_X86_64_PC32 g-0x4
>    9: b8 00 00 00 00       mov    $0x0,%eax
>    e: 48 83 c4 08           add    $0x8,%rsp
>   12: c3                   retq
>
> Detecting those cases from the plugin may be a lot harder.

Sorry, bad example, that one is a bit less undefined than
I thought, as it will produce the same result every time,
regardless of the stack contents. I'll try to come up
with another test program instead.

       Arnd
Kees Cook July 6, 2017, 9:44 p.m. UTC | #3
On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> To prevent leaking stack contents in cases where it is not possible
>> for the compiler to figure out whether an automatic variable has been
>> initialized or not, add a plugin that forcibly initializes all automatic
>> variables of struct/union types if their address is taken at any point.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> If we only do this for variables that have their address taken, we miss
> a lot of cases that the compiler (both clang and gcc) decide not to warn
> about but that can still cause undefined behavior, e.g.:
>
> extern int g(void);
> int f(void)
> {
>         int i;
>
>         switch (g()) {
>         case 1:
>                 i = 0;
>         }
>
>         return i;
> }
>
> which gets compiled without warning under the assumption that
> g() always returns '1':
>
> 0000000000000000 <f>:
>    0: 48 83 ec 08           sub    $0x8,%rsp
>    4: e8 00 00 00 00       callq  9 <f+0x9>
> 5: R_X86_64_PC32 g-0x4
>    9: b8 00 00 00 00       mov    $0x0,%eax
>    e: 48 83 c4 08           add    $0x8,%rsp
>   12: c3                   retq
>
> Detecting those cases from the plugin may be a lot harder.
>
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -443,6 +443,15 @@ config GCC_PLUGIN_STRUCTLEAK_VERBOSE
>>           initialized. Since not all existing initializers are detected
>>           by the plugin, this can produce false positive warnings.
>>
>> +config GCC_PLUGIN_INITAUTOBYREF
>> +       bool "Force initialization of auto variables that have their address taken"
>> +       depends on GCC_PLUGINS
>> +       help
>> +
>> +config GCC_PLUGIN_INITAUTOBYREF_VERBOSE
>> +       bool "Report uninitialized auto variables that have their address taken"
>> +       depends on GCC_PLUGIN_INITAUTOBYREF
>
> I think this should be
>
>       depends on GCC_PLUGIN_INITAUTOBYREF || COMPILE_TEST
>
> to avoid producing output in an allmodconfig build.

Err, surely you mean:

depends on GCC_PLUGIN_INITAUTOBYREF && !COMPILE_TEST

?

I've done this with the other _VERBOSE settings, and yeah, this one
should do it too.

Ard, I'd be curious what you see for "size" difference between builds
and if it's visible with hackbench or other things?

-Kees
Arnd Bergmann July 6, 2017, 10 p.m. UTC | #4
On Thu, Jul 6, 2017 at 11:44 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel

>> I think this should be
>>
>>       depends on GCC_PLUGIN_INITAUTOBYREF || COMPILE_TEST
>>
>> to avoid producing output in an allmodconfig build.
>
> Err, surely you mean:
>
> depends on GCC_PLUGIN_INITAUTOBYREF && !COMPILE_TEST
>
> ?

Yes, sorry for the confusion.

        Arnd
Arnd Bergmann July 6, 2017, 10:08 p.m. UTC | #5
On Thu, Jul 6, 2017 at 1:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Jul 6, 2017 at 1:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>
> Sorry, bad example, that one is a bit less undefined than
> I thought, as it will produce the same result every time,
> regardless of the stack contents. I'll try to come up
> with another test program instead.

I've tried a few more things, but couldn't actually come up with an example
that ends up using uninitialized stack values without also warning about it,
so your plugin may actually cover the most important cases.

The remaining cases I found are either uninitialized uses that we get
a compile-time warning for, or other kinds of undefined behavior
(as in my earlier example).

         Arnd
Kees Cook July 6, 2017, 11:16 p.m. UTC | #6
On Thu, Jul 6, 2017 at 2:44 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> To prevent leaking stack contents in cases where it is not possible
>>> for the compiler to figure out whether an automatic variable has been
>>> initialized or not, add a plugin that forcibly initializes all automatic
>>> variables of struct/union types if their address is taken at any point.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Ard, I'd be curious what you see for "size" difference between builds
> and if it's visible with hackbench or other things?

Hm, not all that bad on the size front:

   text                 data            bss                 dec
            hex filename
10950705        5592525 13955072        30498302        1d15dfe vmlinux
11048035        5592365 13955072        30595472        1d2d990
vmlinux.initautobyref

And yes, as expected, wow there are a lot of notices in verbose mode. ;)

My pet favorite, from the NAKed patch I sent forever ago, is covered
(as expected):

net/socket.c: In function ‘SYSC_getsockname’:
net/socket.c:1605:26: note: auto variable will be forcibly initialized
  struct sockaddr_storage address;
                          ^~~~~~~

-Kees
Ard Biesheuvel July 7, 2017, 8:16 a.m. UTC | #7
On 6 July 2017 at 23:08, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Jul 6, 2017 at 1:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Thu, Jul 6, 2017 at 1:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>>
>> Sorry, bad example, that one is a bit less undefined than
>> I thought, as it will produce the same result every time,
>> regardless of the stack contents. I'll try to come up
>> with another test program instead.
>
> I've tried a few more things, but couldn't actually come up with an example
> that ends up using uninitialized stack values without also warning about it,
> so your plugin may actually cover the most important cases.
>
> The remaining cases I found are either uninitialized uses that we get
> a compile-time warning for, or other kinds of undefined behavior
> (as in my earlier example).
>

Yes, so your original example is actually the opposite side of the
same coin. The compiler notices that 'i' may be returned
uninitialized, but since 0 is equally suitable as any other value in
this case, it just makes the assignment unconditional.

So imagine a function call h(&i) preceding the switch. This forces the
compiler to return whatever value i happens to have, because it cannot
know whether calling h() amounts to an initialization.

This all comes back to the way we pass structs by reference only,
which is why I retained the struct/union type test in the plugin, even
though stack buffers (i.e, u8 buf[xxx]) would probably deserve the
same treatment.

I think we should be able to deal with this more elegantly by
annotating function parameters that are guaranteed to be set by the
function, e.g,

extern h(int * __out i)

where we would on the one hand enforce that every code path in the
definition of h() contains an assignment of *i, and on the other hand,
omit the forced initialization of variables that appear as __out
parameters.

Alternatively (or to complement the effort), we could at least start
making the effort to convert byref initializers to struct assignment,
in cases where the function is a void type to begin with.
Daniel Micay July 9, 2017, 5 a.m. UTC | #8
On Fri, 2017-07-07 at 00:08 +0200, Arnd Bergmann wrote:
> On Thu, Jul 6, 2017 at 1:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thu, Jul 6, 2017 at 1:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
> > 
> > Sorry, bad example, that one is a bit less undefined than
> > I thought, as it will produce the same result every time,
> > regardless of the stack contents. I'll try to come up
> > with another test program instead.
> 
> I've tried a few more things, but couldn't actually come up with an
> example
> that ends up using uninitialized stack values without also warning
> about it,
> so your plugin may actually cover the most important cases.
> 
> The remaining cases I found are either uninitialized uses that we get
> a compile-time warning for, or other kinds of undefined behavior
> (as in my earlier example).
> 
>          Arnd

The compiler will optimize out zeroing that's clearly redundant, so zero
initialization of all uninitialized variables is not really all of them
but rather the set that the compiler thinks could be used before they
get initialized. It makes sense to have that as an option. It's an
aggressive non-heuristic-based approach and yet it isn't as heavy as it
seems due to optimization.

It also provides another baseline to compare a heuristic against. No
automatic zeroing vs. all uninitialized variables zeroed vs. proposed
heuristic. Definitely worth including even if the main purpose is to
figure what's *not* being covered by chosen heuristics, especially after
optimization where they'll be more similar. You could find the cases
you're talking about by comparing the generated code with the zeroing
guided by the reference taken heuristic.
Kees Cook Aug. 3, 2017, 4:35 a.m. UTC | #9
On Thu, Jul 6, 2017 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 2:44 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>>>> To prevent leaking stack contents in cases where it is not possible
>>>> for the compiler to figure out whether an automatic variable has been
>>>> initialized or not, add a plugin that forcibly initializes all automatic
>>>> variables of struct/union types if their address is taken at any point.
>>>>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>
>> Ard, I'd be curious what you see for "size" difference between builds
>> and if it's visible with hackbench or other things?
>
> Hm, not all that bad on the size front:
>
>    text                 data            bss                 dec
>             hex filename
> 10950705        5592525 13955072        30498302        1d15dfe vmlinux
> 11048035        5592365 13955072        30595472        1d2d990
> vmlinux.initautobyref
>
> And yes, as expected, wow there are a lot of notices in verbose mode. ;)
>
> My pet favorite, from the NAKed patch I sent forever ago, is covered
> (as expected):
>
> net/socket.c: In function ‘SYSC_getsockname’:
> net/socket.c:1605:26: note: auto variable will be forcibly initialized
>   struct sockaddr_storage address;
>                           ^~~~~~~

While this was an RFC, it seems to work well and, as Daniel mentioned,
provides another benchmark for future optimizations of this kind of
protection. Besides the COMPILE_TEST change already discussed, any
other changes or objections before I carry this in -next?

-Kees
Ard Biesheuvel Aug. 3, 2017, 6:27 p.m. UTC | #10
On 3 August 2017 at 05:35, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Jul 6, 2017 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 6, 2017 at 2:44 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>> To prevent leaking stack contents in cases where it is not possible
>>>>> for the compiler to figure out whether an automatic variable has been
>>>>> initialized or not, add a plugin that forcibly initializes all automatic
>>>>> variables of struct/union types if their address is taken at any point.
>>>>>
>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>
>>> Ard, I'd be curious what you see for "size" difference between builds
>>> and if it's visible with hackbench or other things?
>>
>> Hm, not all that bad on the size front:
>>
>>    text                 data            bss                 dec
>>             hex filename
>> 10950705        5592525 13955072        30498302        1d15dfe vmlinux
>> 11048035        5592365 13955072        30595472        1d2d990
>> vmlinux.initautobyref
>>
>> And yes, as expected, wow there are a lot of notices in verbose mode. ;)
>>
>> My pet favorite, from the NAKed patch I sent forever ago, is covered
>> (as expected):
>>
>> net/socket.c: In function ‘SYSC_getsockname’:
>> net/socket.c:1605:26: note: auto variable will be forcibly initialized
>>   struct sockaddr_storage address;
>>                           ^~~~~~~
>
> While this was an RFC, it seems to work well and, as Daniel mentioned,
> provides another benchmark for future optimizations of this kind of
> protection. Besides the COMPILE_TEST change already discussed, any
> other changes or objections before I carry this in -next?
>

Sounds reasonable to me.
Kees Cook Aug. 3, 2017, 10:14 p.m. UTC | #11
On Thu, Aug 3, 2017 at 11:27 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 3 August 2017 at 05:35, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Jul 6, 2017 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Jul 6, 2017 at 2:44 PM, Kees Cook <keescook@chromium.org> wrote:
>>>> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>> To prevent leaking stack contents in cases where it is not possible
>>>>>> for the compiler to figure out whether an automatic variable has been
>>>>>> initialized or not, add a plugin that forcibly initializes all automatic
>>>>>> variables of struct/union types if their address is taken at any point.
>>>>>>
>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>
>>>> Ard, I'd be curious what you see for "size" difference between builds
>>>> and if it's visible with hackbench or other things?
>>>
>>> Hm, not all that bad on the size front:
>>>
>>>    text                 data            bss                 dec
>>>             hex filename
>>> 10950705        5592525 13955072        30498302        1d15dfe vmlinux
>>> 11048035        5592365 13955072        30595472        1d2d990
>>> vmlinux.initautobyref
>>>
>>> And yes, as expected, wow there are a lot of notices in verbose mode. ;)
>>>
>>> My pet favorite, from the NAKed patch I sent forever ago, is covered
>>> (as expected):
>>>
>>> net/socket.c: In function ‘SYSC_getsockname’:
>>> net/socket.c:1605:26: note: auto variable will be forcibly initialized
>>>   struct sockaddr_storage address;
>>>                           ^~~~~~~
>>
>> While this was an RFC, it seems to work well and, as Daniel mentioned,
>> provides another benchmark for future optimizations of this kind of
>> protection. Besides the COMPILE_TEST change already discussed, any
>> other changes or objections before I carry this in -next?
>>
>
> Sounds reasonable to me.

Actually, I just looked at the diff between structleak and
initautobyref, and it's essentially 1 test (and the removal of all the
__user-detection code):

-               /* if the type is of interest, examine the variable */
-               if (TYPE_USERSPACE(type))
+               /* initialize the variable if its address is taken */
+               if (TREE_ADDRESSABLE (var))

Perhaps instead of a whole new plugin, could we just add the
functionality to the existing structleak plugin as a Kconfig option?
Like maybe CONFIG_GCC_PLUGIN_STRUCTLEAK_TAKEN?

-Kees
Ard Biesheuvel Aug. 3, 2017, 10:42 p.m. UTC | #12
On 3 August 2017 at 23:14, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Aug 3, 2017 at 11:27 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 3 August 2017 at 05:35, Kees Cook <keescook@chromium.org> wrote:
>>> On Thu, Jul 6, 2017 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
>>>> On Thu, Jul 6, 2017 at 2:44 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>>> To prevent leaking stack contents in cases where it is not possible
>>>>>>> for the compiler to figure out whether an automatic variable has been
>>>>>>> initialized or not, add a plugin that forcibly initializes all automatic
>>>>>>> variables of struct/union types if their address is taken at any point.
>>>>>>>
>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>
>>>>> Ard, I'd be curious what you see for "size" difference between builds
>>>>> and if it's visible with hackbench or other things?
>>>>
>>>> Hm, not all that bad on the size front:
>>>>
>>>>    text                 data            bss                 dec
>>>>             hex filename
>>>> 10950705        5592525 13955072        30498302        1d15dfe vmlinux
>>>> 11048035        5592365 13955072        30595472        1d2d990
>>>> vmlinux.initautobyref
>>>>
>>>> And yes, as expected, wow there are a lot of notices in verbose mode. ;)
>>>>
>>>> My pet favorite, from the NAKed patch I sent forever ago, is covered
>>>> (as expected):
>>>>
>>>> net/socket.c: In function ‘SYSC_getsockname’:
>>>> net/socket.c:1605:26: note: auto variable will be forcibly initialized
>>>>   struct sockaddr_storage address;
>>>>                           ^~~~~~~
>>>
>>> While this was an RFC, it seems to work well and, as Daniel mentioned,
>>> provides another benchmark for future optimizations of this kind of
>>> protection. Besides the COMPILE_TEST change already discussed, any
>>> other changes or objections before I carry this in -next?
>>>
>>
>> Sounds reasonable to me.
>
> Actually, I just looked at the diff between structleak and
> initautobyref, and it's essentially 1 test (and the removal of all the
> __user-detection code):
>
> -               /* if the type is of interest, examine the variable */
> -               if (TYPE_USERSPACE(type))
> +               /* initialize the variable if its address is taken */
> +               if (TREE_ADDRESSABLE (var))
>
> Perhaps instead of a whole new plugin, could we just add the
> functionality to the existing structleak plugin as a Kconfig option?
> Like maybe CONFIG_GCC_PLUGIN_STRUCTLEAK_TAKEN?
>

Yeah, it is mostly the same code. As you know, it was mainly intended
as a PoC but given the interest to merge this functionality for real,
I will do another pass and try to incorporate it more cleanly.
Kees Cook Aug. 3, 2017, 10:43 p.m. UTC | #13
On Thu, Aug 3, 2017 at 3:42 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 3 August 2017 at 23:14, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Aug 3, 2017 at 11:27 AM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 3 August 2017 at 05:35, Kees Cook <keescook@chromium.org> wrote:
>>>> On Thu, Jul 6, 2017 at 4:16 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>> On Thu, Jul 6, 2017 at 2:44 PM, Kees Cook <keescook@chromium.org> wrote:
>>>>>> On Thu, Jul 6, 2017 at 4:09 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>>> On Thu, Jul 6, 2017 at 12:13 PM, Ard Biesheuvel
>>>>>>> <ard.biesheuvel@linaro.org> wrote:
>>>>>>>> To prevent leaking stack contents in cases where it is not possible
>>>>>>>> for the compiler to figure out whether an automatic variable has been
>>>>>>>> initialized or not, add a plugin that forcibly initializes all automatic
>>>>>>>> variables of struct/union types if their address is taken at any point.
>>>>>>>>
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>
>>>>>> Ard, I'd be curious what you see for "size" difference between builds
>>>>>> and if it's visible with hackbench or other things?
>>>>>
>>>>> Hm, not all that bad on the size front:
>>>>>
>>>>>    text                 data            bss                 dec
>>>>>             hex filename
>>>>> 10950705        5592525 13955072        30498302        1d15dfe vmlinux
>>>>> 11048035        5592365 13955072        30595472        1d2d990
>>>>> vmlinux.initautobyref
>>>>>
>>>>> And yes, as expected, wow there are a lot of notices in verbose mode. ;)
>>>>>
>>>>> My pet favorite, from the NAKed patch I sent forever ago, is covered
>>>>> (as expected):
>>>>>
>>>>> net/socket.c: In function ‘SYSC_getsockname’:
>>>>> net/socket.c:1605:26: note: auto variable will be forcibly initialized
>>>>>   struct sockaddr_storage address;
>>>>>                           ^~~~~~~
>>>>
>>>> While this was an RFC, it seems to work well and, as Daniel mentioned,
>>>> provides another benchmark for future optimizations of this kind of
>>>> protection. Besides the COMPILE_TEST change already discussed, any
>>>> other changes or objections before I carry this in -next?
>>>>
>>>
>>> Sounds reasonable to me.
>>
>> Actually, I just looked at the diff between structleak and
>> initautobyref, and it's essentially 1 test (and the removal of all the
>> __user-detection code):
>>
>> -               /* if the type is of interest, examine the variable */
>> -               if (TYPE_USERSPACE(type))
>> +               /* initialize the variable if its address is taken */
>> +               if (TREE_ADDRESSABLE (var))
>>
>> Perhaps instead of a whole new plugin, could we just add the
>> functionality to the existing structleak plugin as a Kconfig option?
>> Like maybe CONFIG_GCC_PLUGIN_STRUCTLEAK_TAKEN?
>>
>
> Yeah, it is mostly the same code. As you know, it was mainly intended
> as a PoC but given the interest to merge this functionality for real,
> I will do another pass and try to incorporate it more cleanly.

Yup, I just hadn't actually looked to see how much was reused. :) Thanks!

-Kees

Patch
diff mbox

diff --git a/arch/Kconfig b/arch/Kconfig
index 6c00e5b00f8b..273b66749ad8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -443,6 +443,15 @@  config GCC_PLUGIN_STRUCTLEAK_VERBOSE
 	  initialized. Since not all existing initializers are detected
 	  by the plugin, this can produce false positive warnings.
 
+config GCC_PLUGIN_INITAUTOBYREF
+	bool "Force initialization of auto variables that have their address taken"
+	depends on GCC_PLUGINS
+	help
+
+config GCC_PLUGIN_INITAUTOBYREF_VERBOSE
+	bool "Report uninitialized auto variables that have their address taken"
+	depends on GCC_PLUGIN_INITAUTOBYREF
+
 config HAVE_CC_STACKPROTECTOR
 	bool
 	help
diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
index 82335533620e..90f30622f86d 100644
--- a/scripts/Makefile.gcc-plugins
+++ b/scripts/Makefile.gcc-plugins
@@ -29,6 +29,9 @@  ifdef CONFIG_GCC_PLUGINS
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE)	+= -fplugin-arg-structleak_plugin-verbose
   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STRUCTLEAK)	+= -DSTRUCTLEAK_PLUGIN
 
+  gcc-plugin-$(CONFIG_GCC_PLUGIN_INITAUTOBYREF) += initautobyref_plugin.so
+  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_INITAUTOBYREF_VERBOSE)	+= -fplugin-arg-initautobyref_plugin-verbose
+
   GCC_PLUGINS_CFLAGS := $(strip $(addprefix -fplugin=$(objtree)/scripts/gcc-plugins/, $(gcc-plugin-y)) $(gcc-plugin-cflags-y))
 
   export PLUGINCC GCC_PLUGINS_CFLAGS GCC_PLUGIN GCC_PLUGIN_SUBDIR
diff --git a/scripts/gcc-plugins/initautobyref_plugin.c b/scripts/gcc-plugins/initautobyref_plugin.c
new file mode 100644
index 000000000000..afc8b8f22056
--- /dev/null
+++ b/scripts/gcc-plugins/initautobyref_plugin.c
@@ -0,0 +1,159 @@ 
+/*
+ * Copyright 2013-2017 by PaX Team <pageexec@freemail.hu>
+ * Copyright 2017 by Ard Biesheuvel <ard.biesheuvel@linaro.org>
+ * Licensed under the GPL v2
+ *
+ * Note: the choice of the license means that the compilation process is
+ *       NOT 'eligible' as defined by gcc's library exception to the GPL v3,
+ *       but for the kernel it doesn't matter since it doesn't link against
+ *       any of the gcc libraries
+ *
+ * gcc plugin to forcibly initialize local variables that have their address
+ * taken, to prevent leaking data if the variable is never initialized (which
+ * may be difficult to decide for the compiler if the address is passed outside
+ * of the compilation unit)
+ *
+ * Options:
+ * -fplugin-arg-initautobyref_plugin-disable
+ * -fplugin-arg-initautobyref_plugin-verbose
+ */
+
+#include "gcc-common.h"
+
+__visible int plugin_is_GPL_compatible;
+
+static struct plugin_info initautobyref_plugin_info = {
+	.version	= "0.1",
+	.help		= "disable\tdo not activate plugin\n"
+			   "verbose\tprint all initialized variables\n",
+};
+
+static bool verbose;
+
+static void initialize(tree var)
+{
+	basic_block bb;
+	gimple_stmt_iterator gsi;
+	tree initializer;
+	gimple init_stmt;
+
+	/* this is the original entry bb before the forced split */
+	bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun));
+
+	/* first check if variable is already initialized, warn otherwise */
+	for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
+		gimple stmt = gsi_stmt(gsi);
+		tree rhs1;
+
+		/* we're looking for an assignment of a single rhs... */
+		if (!gimple_assign_single_p(stmt))
+			continue;
+		rhs1 = gimple_assign_rhs1(stmt);
+#if BUILDING_GCC_VERSION >= 4007
+		/* ... of a non-clobbering expression... */
+		if (TREE_CLOBBER_P(rhs1))
+			continue;
+#endif
+		/* ... to our variable... */
+		if (gimple_get_lhs(stmt) != var)
+			continue;
+		/* if it's an initializer then we're good */
+		if (TREE_CODE(rhs1) == CONSTRUCTOR)
+			return;
+	}
+
+	/* these aren't the 0days you're looking for */
+	if (verbose)
+		inform(DECL_SOURCE_LOCATION(var),
+			"auto variable will be forcibly initialized");
+
+	/* build the initializer expression */
+	initializer = build_constructor(TREE_TYPE(var), NULL);
+
+	/* build the initializer stmt */
+	init_stmt = gimple_build_assign(var, initializer);
+	gsi = gsi_after_labels(single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+	gsi_insert_before(&gsi, init_stmt, GSI_NEW_STMT);
+	update_stmt(init_stmt);
+}
+
+static unsigned int initautobyref_execute(void)
+{
+	basic_block bb;
+	unsigned int ret = 0;
+	tree var;
+	unsigned int i;
+
+	/* split the first bb where we can put the forced initializers */
+	gcc_assert(single_succ_p(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+	bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun));
+	if (!single_pred_p(bb)) {
+		split_edge(single_succ_edge(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+		gcc_assert(single_succ_p(ENTRY_BLOCK_PTR_FOR_FN(cfun)));
+	}
+
+	/* enumerate all local variables and forcibly initialize our targets */
+	FOR_EACH_LOCAL_DECL(cfun, i, var) {
+		tree type = TREE_TYPE(var);
+
+		gcc_assert(DECL_P(var));
+		if (!auto_var_in_fn_p(var, current_function_decl))
+			continue;
+
+		/* only care about structure types */
+		if (TREE_CODE(type) != RECORD_TYPE && TREE_CODE(type) != UNION_TYPE)
+			continue;
+
+		/* initialize the variable if its address is taken */
+		if (TREE_ADDRESSABLE (var))
+			initialize(var);
+	}
+
+	return ret;
+}
+
+#define PASS_NAME initautobyref
+#define NO_GATE
+#define PROPERTIES_REQUIRED PROP_cfg
+#define TODO_FLAGS_FINISH TODO_verify_il | TODO_verify_ssa | TODO_verify_stmts | TODO_dump_func | TODO_remove_unused_locals | TODO_update_ssa | TODO_ggc_collect | TODO_verify_flow
+#include "gcc-generate-gimple-pass.h"
+
+__visible int plugin_init(struct plugin_name_args *plugin_info, struct plugin_gcc_version *version)
+{
+	int i;
+	const char * const plugin_name = plugin_info->base_name;
+	const int argc = plugin_info->argc;
+	const struct plugin_argument * const argv = plugin_info->argv;
+	bool enable = true;
+
+	PASS_INFO(initautobyref, "early_optimizations", 1, PASS_POS_INSERT_BEFORE);
+
+	if (!plugin_default_version_check(version, &gcc_version)) {
+		error(G_("incompatible gcc/plugin versions"));
+		return 1;
+	}
+
+	if (strncmp(lang_hooks.name, "GNU C", 5) && !strncmp(lang_hooks.name, "GNU C+", 6)) {
+		inform(UNKNOWN_LOCATION, G_("%s supports C only, not %s"), plugin_name, lang_hooks.name);
+		enable = false;
+	}
+
+	for (i = 0; i < argc; ++i) {
+		if (!strcmp(argv[i].key, "disable")) {
+			enable = false;
+			continue;
+		}
+		if (!strcmp(argv[i].key, "verbose")) {
+			verbose = true;
+			continue;
+		}
+		error(G_("unknown option '-fplugin-arg-%s-%s'"), plugin_name, argv[i].key);
+	}
+
+	register_callback(plugin_name, PLUGIN_INFO, NULL, &initautobyref_plugin_info);
+	if (enable) {
+		register_callback(plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL, &initautobyref_pass_info);
+	}
+
+	return 0;
+}