diff mbox

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

Message ID CAGXu5jLc_A1w4FM8FvQHGSPjLfOu80L_vbt_FkcE6+Bh_XMjxw@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kees Cook Nov. 18, 2017, 5:14 a.m. UTC
On Fri, Nov 17, 2017 at 5:54 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
> On 2017-11-17 04:55 PM, Linus Torvalds wrote:
>> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>
>>> I am still getting the crash at d9e12200852d, I figured I would
>>> double-check the "good" and "bad" kernels before starting a full bisect.
>>
>> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?
>
> Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

That's strange. With d9e12200852d the shuffle_seed variables won't
ever actually get used. (i.e. I wouldn't expect the seed to change any
behavior.)

Can you confirm with something like this:


        for (i = 0; i < 4; i++) {
                seed[i] = shuffle_seed[i];


You should see no reports of "Shuffling struct ..."

And if it reports nothing, and you're on d9e12200852d, can you confirm
that switching to a "good" seed fixes it? (If it _does_, then I
suspect a build artifact being left behind or something odd like
that.)

>> Kees removed even the baseline "randomize pure function pointer
>> structures", so at that commit, nothing should be randomized.
>>
>> But maybe the plugin code itself ends up confusing gcc somehow?
>>
>> Even when it doesn't actually do that "relayout_struct()" on the
>> structure, it always does those TYPE_ATTRIBUTES() games.

FWIW, myself doing a build at d9e12200852d with and without
GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
where I did spot-checks.

Also, do you have any other plugins enabled? (Can you send your .config?)

-Kees

Comments

Linus Torvalds Nov. 18, 2017, 5:29 a.m. UTC | #1
On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>
> FWIW, myself doing a build at d9e12200852d with and without
> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
> where I did spot-checks.

That would probably be a good thing to check anyway - check the
difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.

Just do

   objdump --disassemble vmlinux > file

and compare the two files for where the differences start occurring.

                 Linus
Kees Cook Nov. 18, 2017, 8:20 a.m. UTC | #2
On Fri, Nov 17, 2017 at 9:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Nov 17, 2017 at 9:14 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> FWIW, myself doing a build at d9e12200852d with and without
>> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
>> where I did spot-checks.
>
> That would probably be a good thing to check anyway - check the
> difference between GCC_PLUGIN_RANDSTRUCT on and off at that commit.
>
> Just do
>
>    objdump --disassemble vmlinux > file
>
> and compare the two files for where the differences start occurring.

Yeah, I was just doing that now. Looks like there _is_ something
getting changed just from having the plugin enabled, but it appears
localized. For me, the first non-offset change happens in
lookup_user_key and persists for a while.

-ffffffff813893a7:      0f 85 55 03 00 00       jne
ffffffff81389702 <lookup_user_key+0x3f2>
-ffffffff813893ad:      f0 41 ff 06             lock incl (%r14)
-ffffffff813893b1:      83 fb 07                cmp    $0x7,%ebx
-ffffffff813893b4:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)
...
+ffffffff813893a7:      0f 85 35 03 00 00       jne
ffffffff813896e2 <lookup_user_key+0x3d2>
+ffffffff813893ad:      4d 89 f0                mov    %r14,%r8
+ffffffff813893b0:      f0 41 ff 06             lock incl (%r14)
+ffffffff813893b4:      83 fb 07                cmp    $0x7,%ebx
+ffffffff813893b7:      4c 89 b5 70 ff ff ff    mov    %r14,-0x90(%rbp)

And removing the TYPE_ATTRIBUTES() poking makes the register storage
differences go away, but there's still a 0x40 byte offset delta.

I'll continue looking at this tomorrow.

-Kees
Maciej S. Szmigiero Feb. 21, 2018, 10:19 p.m. UTC | #3
On 18.11.2017 06:14, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 5:54 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>> On 2017-11-17 04:55 PM, Linus Torvalds wrote:
>>> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutzpah@gentoo.org> wrote:
>>>>
>>>> I am still getting the crash at d9e12200852d, I figured I would
>>>> double-check the "good" and "bad" kernels before starting a full bisect.
>>>
>>> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?
>>
>> Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.
> 
> That's strange. With d9e12200852d the shuffle_seed variables won't
> ever actually get used. (i.e. I wouldn't expect the seed to change any
> behavior.)
> 
> Can you confirm with something like this:
> 
> 
>         for (i = 0; i < 4; i++) {
>                 seed[i] = shuffle_seed[i];
> 
> 
> You should see no reports of "Shuffling struct ..."
> 
> And if it reports nothing, and you're on d9e12200852d, can you confirm
> that switching to a "good" seed fixes it? (If it _does_, then I
> suspect a build artifact being left behind or something odd like
> that.)
> 
>>> Kees removed even the baseline "randomize pure function pointer
>>> structures", so at that commit, nothing should be randomized.
>>>
>>> But maybe the plugin code itself ends up confusing gcc somehow?
>>>
>>> Even when it doesn't actually do that "relayout_struct()" on the
>>> structure, it always does those TYPE_ATTRIBUTES() games.
> 
> FWIW, myself doing a build at d9e12200852d with and without
> GCC_PLUGIN_RANDSTRUCT _appears_ to produce identical objdump output
> where I did spot-checks.
> 

I have also hit a GPF in nfsd4_encode_fattr() with RANDSTRUCT plugin
enabled.
This function is located in a fs/nfsd/nfs4xdr.c file.

The fault happened at "xdr_encode_hyper(p, exp->ex_path.mnt->mnt_sb->s_maxbytes)"
line, namely when accessing s_maxbytes.

exp->ex_path is of type struct path that has been annotated with
__randomize_layout.
It seems to me that this annotation isn't really taken into consideration
when compiling nfs4xdr.c.
This most likely results in dereferencing a value of exp->ex_path.dentry
instead of exp->ex_path.mnt. Then some member of struct dentry is
dereferenced as struct super_block to access its s_maxbytes member which
results in an oops if it happens to be an invalid pointer (which it was
in my case).

How to reproduce the problem statically (tested on current Linus's tree
and on 4.15.4, with gcc 7.3.0):
1) Enable RANDSTRUCT plugin,

2) Use a RANDSTRUCT seed that results in shuffling of struct path,
Example: "55e5fea7ff662b333a190209ab31f35b6f0f2470f7d0e3c64430936169571106".

3) make fs/nfsd/nfs4xdr.s and save the result,

4) Insert "#include <linux/compiler_types.h>" at the top of
fs/nfsd/nfs4xdr.c as the very first include directive.

5) make fs/nfsd/nfs4xdr.s and compare the result with the one from step 3.

One can see that offsets used to access various members of struct path are
different, and also that the original file from step 3 contains an object
named "__randomize_layout".

This is caused by a fact that the current version of nfs4xdr.c includes
linux/fs_struct.h as the very first included header which then includes
linux/path.h as the very first included header, which then defines
struct path, but without including any files on its own.

This results in __randomize_layout tag at the end of struct path
definition being treated as a variable name (since linux/compiler-gcc.h
that defines it as a type attribute has not been included yet).

It looks like to me that every header file that defines a randomized
struct also has to include linux/compiler_types.h or some other file
that ultimately results in that file inclusion in order to make
the RANDSTRUCT plugin work correctly.

Maciej
Kees Cook Feb. 21, 2018, 10:52 p.m. UTC | #4
On Wed, Feb 21, 2018 at 2:19 PM, Maciej S. Szmigiero
<mail@maciej.szmigiero.name> wrote:
> One can see that offsets used to access various members of struct path are
> different, and also that the original file from step 3 contains an object
> named "__randomize_layout".
>
> This is caused by a fact that the current version of nfs4xdr.c includes
> linux/fs_struct.h as the very first included header which then includes
> linux/path.h as the very first included header, which then defines
> struct path, but without including any files on its own.
>
> This results in __randomize_layout tag at the end of struct path
> definition being treated as a variable name (since linux/compiler-gcc.h
> that defines it as a type attribute has not been included yet).

Oh, well done! That would explain the code offset I was seeing when
the plugin on, but no-op, since the variable would still exist.

I'll play with Linus's suggestion and see what we get.

Thanks!

-Kees
Linus Torvalds Feb. 21, 2018, 11:24 p.m. UTC | #5
On Wed, Feb 21, 2018 at 2:52 PM, Kees Cook <keescook@chromium.org> wrote:
>
> I'll play with Linus's suggestion and see what we get.

It may be just as well to just include <linux/compiler_types.h> from
<linux/kconfig.h> and be done with it.

If you look at that hacky script I documented in commit 23c35f48f5fb
("pinctrl: remove include file from <linux/device.h>") and run it in a
fully built kernel tree, you'll see that that header is included from
pretty much every single file anyway. At least for me, for an
allmodconfig build, the top headers are

  23322 arch/x86/include/uapi/asm/types.h
  23322 include/asm-generic/int-ll64.h
  23322 include/linux/types.h
  23322 include/uapi/asm-generic/int-ll64.h
  23322 include/uapi/asm-generic/types.h
  23322 include/uapi/linux/types.h
  23323 arch/x86/include/uapi/asm/bitsperlong.h
  23323 include/asm-generic/bitsperlong.h
  23323 include/uapi/asm-generic/bitsperlong.h
  23326 include/linux/stringify.h
  23390 include/linux/compiler_types.h

and considering that I have 25949 object files in that tree, it really
means that just about every compile ended up including that
<linux/compiler_types.h> file anyway (yeah, the "orc_types.h" header
ends up being mentioned twice for most files, so it looks even more
hot, but that's not real data).

I do hate including unnecessary stuff because it makes builds slower,
but kernel header files probably don't get much more core than
<linux/compiler_types.h>.

              Linus
Kees Cook Feb. 22, 2018, 12:12 a.m. UTC | #6
On Wed, Feb 21, 2018 at 3:24 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Feb 21, 2018 at 2:52 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> I'll play with Linus's suggestion and see what we get.
>
> It may be just as well to just include <linux/compiler_types.h> from
> <linux/kconfig.h> and be done with it.

Hah, yeah, that would certainly solve it too. :)

> I do hate including unnecessary stuff because it makes builds slower,
> but kernel header files probably don't get much more core than
> <linux/compiler_types.h>.

It also has the benefit of not letting it "go wrong" in the first
place. (And the separate fix for nfs isn't needed...)

Do you want me to send the patch for this, or do you already have it
prepared? The body-fields I had prepared for the nfs were:

Reported-by: Patrick McLean <chutzpah@gentoo.org>
Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization")

-Kees
Linus Torvalds Feb. 22, 2018, 12:22 a.m. UTC | #7
On Wed, Feb 21, 2018 at 4:12 PM, Kees Cook <keescook@chromium.org> wrote:
>
> Do you want me to send the patch for this, or do you already have it
> prepared?

I'd rather get something explicitly tested. I tried my earlier patch
with "make allmodconfig" (and a fix to nfsd to make it compile), but
now I'm back to testing hjl's gas updates so it would be better to get
a tested commit with a good commit message.

> The body-fields I had prepared for the nfs were:
>
> Reported-by: Patrick McLean <chutzpah@gentoo.org>
> Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>

Oh, I think Maciej needs to get more than a "Reported-by:". This was a
really subtle thing that we didn't figure out in the original thread,
so give him a gold star in the form of "Root-caused-by:" or something.

*Fixing* this ends up being a one-liner or so. Finding the cause was
the painful part.

               Linus
Kees Cook Feb. 22, 2018, 12:23 a.m. UTC | #8
On Wed, Feb 21, 2018 at 4:22 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Feb 21, 2018 at 4:12 PM, Kees Cook <keescook@chromium.org> wrote:
>>
>> Do you want me to send the patch for this, or do you already have it
>> prepared?
>
> I'd rather get something explicitly tested. I tried my earlier patch
> with "make allmodconfig" (and a fix to nfsd to make it compile), but
> now I'm back to testing hjl's gas updates so it would be better to get
> a tested commit with a good commit message.
>
>> The body-fields I had prepared for the nfs were:
>>
>> Reported-by: Patrick McLean <chutzpah@gentoo.org>
>> Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>
> Oh, I think Maciej needs to get more than a "Reported-by:". This was a
> really subtle thing that we didn't figure out in the original thread,
> so give him a gold star in the form of "Root-caused-by:" or something.

Oops, I just sent this out. I will adjust a re-send. I couldn't find a
documented field name for this...

> *Fixing* this ends up being a one-liner or so. Finding the cause was
> the painful part.

Yes indeed!

-Kees
Kees Cook Feb. 22, 2018, 12:27 a.m. UTC | #9
On Wed, Feb 21, 2018 at 4:23 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Feb 21, 2018 at 4:22 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Wed, Feb 21, 2018 at 4:12 PM, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> Do you want me to send the patch for this, or do you already have it
>>> prepared?
>>
>> I'd rather get something explicitly tested. I tried my earlier patch
>> with "make allmodconfig" (and a fix to nfsd to make it compile), but
>> now I'm back to testing hjl's gas updates so it would be better to get
>> a tested commit with a good commit message.
>>
>>> The body-fields I had prepared for the nfs were:
>>>
>>> Reported-by: Patrick McLean <chutzpah@gentoo.org>
>>> Reported-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>>
>> Oh, I think Maciej needs to get more than a "Reported-by:". This was a
>> really subtle thing that we didn't figure out in the original thread,
>> so give him a gold star in the form of "Root-caused-by:" or something.
>
> Oops, I just sent this out. I will adjust a re-send. I couldn't find a
> documented field name for this...

With the "root-cause" hint, I see we have used:

2    Root-cause-analysis-by:
2    Root-caused-by:
1    Root-cause-found-by:

I'll go with your "Root-caused-by" to tip the scale. :)

-Kees
diff mbox

Patch

diff --git a/scripts/gcc-plugins/randomize_layout_plugin.c
b/scripts/gcc-plugins/randomize_layout_plugin.c
index cdaac8c66734..aac570a57d7d 100644
--- a/scripts/gcc-plugins/randomize_layout_plugin.c
+++ b/scripts/gcc-plugins/randomize_layout_plugin.c
@@ -267,12 +267,10 @@  static void shuffle(const_tree type, tree
*newtree, unsigned long length)

        structname = ORIG_TYPE_NAME(type);

-#ifdef __DEBUG_PLUGIN
        fprintf(stderr, "Shuffling struct %s %p\n", (const char
*)structname, type);
 #ifdef __DEBUG_VERBOSE
        debug_tree((tree)type);
 #endif
-#endif