x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
diff mbox

Message ID 20180517233510.24996-1-dima@arista.com
State New
Headers show

Commit Message

Dmitry Safonov May 17, 2018, 11:35 p.m. UTC
The x86 mmap() code selects the mmap base for an allocation depending on
the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
for 32bit mm->mmap_compat_base.

exec() calls mmap() which in turn uses in_compat_syscall() to check whether
the mapping is for a 32bit or a 64bit task. The decision is made on the
following criteria:

  ia32    child->thread.status & TS_COMPAT
   x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
  ia64    !ia32 && !x32

__set_personality_x32() was dropping TS_COMPAT flag, but
set_personality_64bit() has kept compat syscall flag making
in_compat_syscall() return true during the first exec() syscall.

Which in result has user-visible effects, mentioned by Alexey:
1) It breaks ASAN
$ gcc -fsanitize=address wrap.c -o wrap-asan
$ ./wrap32 ./wrap-asan true
==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==1217==Process memory map follows:
        0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
        0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
        0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
        0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
        0x0000f7fe4000-0x0000f7fe5000
        0x7fed9abff000-0x7fed9af54000
        0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
[snip]

2) It doesn't seem to be great for security if an attacker always knows
that ld.so is going to be mapped into the first 4GB in this case
(the same thing happens for PIEs as well).

The testcase:
$ cat wrap.c

int main(int argc, char *argv[]) {
  execvp(argv[1], &argv[1]);
  return 127;
}

$ gcc wrap.c -o wrap
$ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
AT_BASE:         0x7f63b8309000
AT_BASE:         0x7faec143c000
AT_BASE:         0x7fbdb25fa000

$ gcc -m32 wrap.c -o wrap32
$ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
AT_BASE:         0xf7eff000
AT_BASE:         0xf7cee000
AT_BASE:         0x7f8b9774e000

Fixes:
commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
commit ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec")

Cc: Borislav Petkov <bp@suse.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <linux-mm@kvack.org>
Cc: <x86@kernel.org>
Cc: <stable@vger.kernel.org> # v4.12+
Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
Bisected-by: Alexander Monakov <amonakov@ispras.ru>
Investigated-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/kernel/process_64.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Dmitry Safonov May 17, 2018, 11:40 p.m. UTC | #1
On Fri, 2018-05-18 at 00:35 +0100, Dmitry Safonov wrote:
> The x86 mmap() code selects the mmap base for an allocation depending
> on
> the bitness of the syscall. For 64bit sycalls it select mm->mmap_base 
> and
> for 32bit mm->mmap_compat_base.
> 
> exec() calls mmap() which in turn uses in_compat_syscall() to check
> whether
> the mapping is for a 32bit or a 64bit task. The decision is made on
> the
> following criteria:
> 
>   ia32    child->thread.status & TS_COMPAT
>    x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
>   ia64    !ia32 && !x32
> 
> __set_personality_x32() was dropping TS_COMPAT flag, but
> set_personality_64bit() has kept compat syscall flag making
> in_compat_syscall() return true during the first exec() syscall.
> 
> Which in result has user-visible effects, mentioned by Alexey:
> 1) It breaks ASAN
> $ gcc -fsanitize=address wrap.c -o wrap-asan
> $ ./wrap32 ./wrap-asan true
> ==1217==Shadow memory range interleaves with an existing memory
> mapping. ASan cannot proceed correctly. ABORTING.
> ==1217==ASan shadow was supposed to be located in the
> [0x00007fff7000-0x10007fff7fff] range.
> ==1217==Process memory map follows:
>         0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-
> exec-from-32bit/wrap-asan
>         0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-
> exec-from-32bit/wrap-asan
>         0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-
> exec-from-32bit/wrap-asan
>         0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
>         0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
>         0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
>         0x0000f7fe4000-0x0000f7fe5000
>         0x7fed9abff000-0x7fed9af54000
>         0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
> [snip]
> 
> 2) It doesn't seem to be great for security if an attacker always
> knows
> that ld.so is going to be mapped into the first 4GB in this case
> (the same thing happens for PIEs as well).
> 
> The testcase:
> $ cat wrap.c
> 
> int main(int argc, char *argv[]) {
>   execvp(argv[1], &argv[1]);
>   return 127;
> }
> 
> $ gcc wrap.c -o wrap
> $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
> AT_BASE:         0x7f63b8309000
> AT_BASE:         0x7faec143c000
> AT_BASE:         0x7fbdb25fa000
> 
> $ gcc -m32 wrap.c -o wrap32
> $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
> AT_BASE:         0xf7eff000
> AT_BASE:         0xf7cee000
> AT_BASE:         0x7f8b9774e000
> 
> Fixes:
> commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit 
> mmap()")
> commit ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during
> exec")
> 
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Dmitry Safonov <0x7f454c46@gmail.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: <linux-mm@kvack.org>
> Cc: <x86@kernel.org>
> Cc: <stable@vger.kernel.org> # v4.12+
> Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
> Bisected-by: Alexander Monakov <amonakov@ispras.ru>
> Investigated-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dmitry Safonov <dima@arista.com>

I've tested it on master with:
- the reproducer
- x86 selftests
- criu

Some selftests are failing, but the same way as before the patch
(ITOW, it's not regression):
[root@localhost self]# grep FAIL out 
[FAIL]	Reg 1 mismatch: requested 0x0; got 0x3
[FAIL]	Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
0xffffff425aadc0de
[FAIL]	Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
0xffffff425aadc0de
[FAIL]	Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
0xffffff425aadc0de
[FAIL]	f[u]comi[p] errors: 1
[FAIL]	fisttp errors: 1
[FAIL]	R8 has changed:0000000000000000
[FAIL]	R9 has changed:0000000000000000
[FAIL]	R10 has changed:0000000000000000
[FAIL]	R11 has changed:0000000000000000
[FAIL]	R8 has changed:0000000000000000
[FAIL]	R9 has changed:0000000000000000
[FAIL]	R10 has changed:0000000000000000
[FAIL]	R11 has changed:0000000000000000

I think, R8-R11 are not preserved yet in master?
Not quite sure about register mismatches :-/
Also ia32-criu has a fail, which I need to look into (but not a
regression).
Cyrill Gorcunov May 18, 2018, 7:20 a.m. UTC | #2
On Fri, May 18, 2018 at 12:35:10AM +0100, Dmitry Safonov wrote:
> The x86 mmap() code selects the mmap base for an allocation depending on
> the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and
> for 32bit mm->mmap_compat_base.
> 
> exec() calls mmap() which in turn uses in_compat_syscall() to check whether
> the mapping is for a 32bit or a 64bit task. The decision is made on the
> following criteria:
> 
>   ia32    child->thread.status & TS_COMPAT
>    x32    child->pt_regs.orig_ax & __X32_SYSCALL_BIT
>   ia64    !ia32 && !x32
> 
> __set_personality_x32() was dropping TS_COMPAT flag, but
> set_personality_64bit() has kept compat syscall flag making
> in_compat_syscall() return true during the first exec() syscall.
> 
> Which in result has user-visible effects, mentioned by Alexey:
> 1) It breaks ASAN
> $ gcc -fsanitize=address wrap.c -o wrap-asan
> $ ./wrap32 ./wrap-asan true
> ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
> ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
> ==1217==Process memory map follows:
>         0x000000400000-0x000000401000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
>         0x000000600000-0x000000601000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
>         0x000000601000-0x000000602000   /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan
>         0x0000f7dbd000-0x0000f7de2000   /lib64/ld-2.27.so
>         0x0000f7fe2000-0x0000f7fe3000   /lib64/ld-2.27.so
>         0x0000f7fe3000-0x0000f7fe4000   /lib64/ld-2.27.so
>         0x0000f7fe4000-0x0000f7fe5000
>         0x7fed9abff000-0x7fed9af54000
>         0x7fed9af54000-0x7fed9af6b000   /lib64/libgcc_s.so.1
> [snip]
> 
> 2) It doesn't seem to be great for security if an attacker always knows
> that ld.so is going to be mapped into the first 4GB in this case
> (the same thing happens for PIEs as well).
> 
> The testcase:
> $ cat wrap.c
> 
> int main(int argc, char *argv[]) {
>   execvp(argv[1], &argv[1]);
>   return 127;
> }
> 
> $ gcc wrap.c -o wrap
> $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE
> AT_BASE:         0x7f63b8309000
> AT_BASE:         0x7faec143c000
> AT_BASE:         0x7fbdb25fa000
> 
> $ gcc -m32 wrap.c -o wrap32
> $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE
> AT_BASE:         0xf7eff000
> AT_BASE:         0xf7cee000
> AT_BASE:         0x7f8b9774e000
> 
> Fixes:
> commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()")
> commit ada26481dfe6 ("x86/mm: Make in_compat_syscall() work during exec")
> 
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Dmitry Safonov <0x7f454c46@gmail.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: <linux-mm@kvack.org>
> Cc: <x86@kernel.org>
> Cc: <stable@vger.kernel.org> # v4.12+
> Reported-by: Alexey Izbyshev <izbyshev@ispras.ru>
> Bisected-by: Alexander Monakov <amonakov@ispras.ru>
> Investigated-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>

Thanks a lot! (At first I had to scratch my head for a second
to realize that the key moment is executing 64 bit application
from inside of a compat process :-)
Andy Lutomirski May 18, 2018, 10:03 p.m. UTC | #3
On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov <dima@arista.com> wrote:
> Some selftests are failing, but the same way as before the patch
> (ITOW, it's not regression):
> [root@localhost self]# grep FAIL out
> [FAIL]  Reg 1 mismatch: requested 0x0; got 0x3
> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
> 0xffffff425aadc0de
> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
> 0xffffff425aadc0de
> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
> 0xffffff425aadc0de

Are you on AMD?  Can you try this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=c88aa6d53840e48970c54f9ef70c79415033b32d

and give me a Tested-by if it fixes it for you?

> [FAIL]  f[u]comi[p] errors: 1
> [FAIL]  fisttp errors: 1'

I don't know about these.

> [FAIL]  R8 has changed:0000000000000000
> [FAIL]  R9 has changed:0000000000000000
> [FAIL]  R10 has changed:0000000000000000
> [FAIL]  R11 has changed:0000000000000000
> [FAIL]  R8 has changed:0000000000000000
> [FAIL]  R9 has changed:0000000000000000
> [FAIL]  R10 has changed:0000000000000000
> [FAIL]  R11 has changed:0000000000000000

The patch that added these test lines was the same patch that should have
made them pass.  Are you sure your tests match your running kernel?  You
need commit 8bb2610bc4967f19672444a7b0407367f1540028.

If you still have failures, can you send me the complete output from the
test_syscall_vdso test?

--Andy
Dmitry Safonov May 18, 2018, 11:10 p.m. UTC | #4
Hi Andy,

2018-05-18 23:03 GMT+01:00 Andy Lutomirski <luto@kernel.org>:
> On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov <dima@arista.com> wrote:
>> Some selftests are failing, but the same way as before the patch
>> (ITOW, it's not regression):
>> [root@localhost self]# grep FAIL out
>> [FAIL]  Reg 1 mismatch: requested 0x0; got 0x3
>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>> 0xffffff425aadc0de
>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>> 0xffffff425aadc0de
>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>> 0xffffff425aadc0de
>
> Are you on AMD?  Can you try this patch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=c88aa6d53840e48970c54f9ef70c79415033b32d
>
> and give me a Tested-by if it fixes it for you?

Sure.
I'm on Intel actually:
cpu family    : 6
model        : 142
model name    : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

But I usually test kernels in VM. So, I use virt-manager as it's
easier to manage
multiple VMs. The thing is that I've chosen "Copy host CPU configuration"
and for some reason, I don't quite follow virt-manager makes model "Opteron_G4".
I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26).
So, cpuinfo in VM says:
cpu family    : 21
model        : 1
model name    : AMD Opteron 62xx class CPU

What's worse than registers changes is that some selftests actually lead to
Oops's. The same reason for criu-ia32 fails.
I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2),
so it looks to be not a recent regression.

Full Oopses:
[  189.100174] BUG: unable to handle kernel paging request at 00000000417bafe8
[  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE 6991f067
[  189.100174] Oops: 0001 [#3] SMP NOPTI
[  189.100174] Modules linked in:
[  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
D           4.17.0-rc5+ #11
[  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1.fc26 04/01/2014
[  189.103187] RIP: 0033:0x40085a
[  189.103187] RSP: 002b:00000000417bafe8 EFLAGS: 00000206
[  189.103187] RAX: 0000000000000000 RBX: 00000000000003e8 RCX: 0000000000000000
[  189.103187] RDX: 0000000000000000 RSI: 0000000000400830 RDI: 00000000417baff8
[  189.103187] RBP: 00000000417baff8 R08: 0000000000000000 R09: 0000000000000077
[  189.103187] R10: 0000000000000006 R11: 0000000000000000 R12: 00000000417ba000
[  189.103187] R13: 00007ffc05207840 R14: 0000000000000000 R15: 0000000000000000
[  189.103187] FS:  00007f98566ecb40(0000) GS:ffff9740ffc00000(0000)
knlGS:0000000000000000
[  189.103187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  189.103187] CR2: 00000000417bafe8 CR3: 0000000069dc4000 CR4: 00000000007406f0
[  189.103187] PKRU: 55555554
[  189.103187] RIP: 0x40085a RSP: 00000000417bafe8
[  189.103187] CR2: 00000000417bafe8
[  189.103187] ---[ end trace 8878c9a088d5f296 ]---
Killed
[  219.366814] BUG: unable to handle kernel paging request at 00000000ffd2874c
[  219.367040] PGD 69fbf067 P4D 69fbf067 PUD 69fa5067 PMD 69fa4067 PTE 6cb04067
[  219.367040] Oops: 0001 [#4] SMP NOPTI
[  219.367040] Modules linked in:
[  219.367040] CPU: 1 PID: 2497 Comm: test_syscall_vd Tainted: G
D           4.17.0-rc5+ #11
[  219.367040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1.fc26 04/01/2014
[  219.367040] RIP: 0033:0x8048e9d
[  219.367040] RSP: 002b:00000000ffd2874c EFLAGS: 00000202
[  219.367040] RAX: 0000000008048778 RBX: 0000000000000000 RCX: 000000000000003f
[  219.367040] RDX: 0000000000000001 RSI: 00000000f7ff7b80 RDI: 0000000000000000
[  219.367040] RBP: 00000000ffd287c8 R08: 7f7f7f7f7f7f7f7f R09: 7f7f7f7f7f7f7f80
[  219.367040] R10: 7f7f7f7f7f7f7f81 R11: 7f7f7f7f7f7f7f82 R12: 7f7f7f7f7f7f7f83
[  219.367040] R13: 7f7f7f7f7f7f7f84 R14: 7f7f7f7f7f7f7f85 R15: 7f7f7f7f7f7f7f86
[  219.367040] FS:  0000000000000000(0000) GS:ffff9740ffd00000(0063)
knlGS:00000000f7fc6700
[  219.367040] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  219.367040] CR2: 00000000ffd2874c CR3: 000000006c4ca000 CR4: 00000000007406e0
[  219.367040] PKRU: 55555554
[  219.367040] RIP: 0x8048e9d RSP: 00000000ffd2874c
[  219.367040] CR2: 00000000ffd2874c
[  219.367040] ---[ end trace 8878c9a088d5f297 ]---
Killed

When I choose kvm64 (or qemu64) as CPU model, Oops's are gone, but
tests still fail with registers mismatch the same way.
Possibly, Oops's are qemu faults?

>
>> [FAIL]  f[u]comi[p] errors: 1
>> [FAIL]  fisttp errors: 1'
>
> I don't know about these.
>
>> [FAIL]  R8 has changed:0000000000000000
>> [FAIL]  R9 has changed:0000000000000000
>> [FAIL]  R10 has changed:0000000000000000
>> [FAIL]  R11 has changed:0000000000000000
>> [FAIL]  R8 has changed:0000000000000000
>> [FAIL]  R9 has changed:0000000000000000
>> [FAIL]  R10 has changed:0000000000000000
>> [FAIL]  R11 has changed:0000000000000000
>
> The patch that added these test lines was the same patch that should have
> made them pass.  Are you sure your tests match your running kernel?  You
> need commit 8bb2610bc4967f19672444a7b0407367f1540028.

Yeah, it is already in the last master.

> If you still have failures, can you send me the complete output from the
> test_syscall_vdso test?

So, with such possibly loosy qemu (mis-)configuration that I have,
with your patch
applied on the top of the last master, it fixes "Reg 15 mismatch".
Still see the following faults:

======./sigreturn_32========
[OK]    set_thread_area refused 16-bit data
[OK]    set_thread_area refused 16-bit data
[RUN]    Valid sigreturn: 64-bit CS (33), 32-bit SS (2b, GDT)
[FAIL]    Reg 1 mismatch: requested 0x0; got 0x3
    SP: 5aadc0de -> 5aadc0de
[RUN]    Valid sigreturn: 32-bit CS (23), 32-bit SS (2b, GDT)
    SP: 5aadc0de -> 5aadc0de
[OK]    all registers okay
[RUN]    Valid sigreturn: 16-bit CS (37), 32-bit SS (2b, GDT)
    SP: 5aadc0de -> 5aadc0de
[OK]    all registers okay
[RUN]    Valid sigreturn: 64-bit CS (33), 16-bit SS (3f)
    SP: 5aadc0de -> 5aadc0de
[OK]    all registers okay
--
[RUN]    Testing fcmovCC instructions
[OK]    fcmovCC
======./test_syscall_vdso_32========
[RUN]    Executing 6-argument 32-bit syscall via VDSO
[OK]    Arguments are preserved across syscall
[NOTE]    R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK]    R8..R15 did not leak kernel data
[RUN]    Executing 6-argument 32-bit syscall via INT 80
[OK]    Arguments are preserved across syscall
[FAIL]    R8 has changed:0000000000000000
[FAIL]    R9 has changed:0000000000000000
[FAIL]    R10 has changed:0000000000000000
[FAIL]    R11 has changed:0000000000000000
[RUN]    Executing 6-argument 32-bit syscall via VDSO
[OK]    Arguments are preserved across syscall
[NOTE]    R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK]    R8..R15 did not leak kernel data
[RUN]    Executing 6-argument 32-bit syscall via INT 80
[OK]    Arguments are preserved across syscall
[FAIL]    R8 has changed:0000000000000000
[FAIL]    R9 has changed:0000000000000000
[FAIL]    R10 has changed:0000000000000000
[FAIL]    R11 has changed:0000000000000000
[RUN]    Running tests under ptrace

Thanks,
             Dmitry
Dmitry Safonov May 18, 2018, 11:16 p.m. UTC | #5
2018-05-19 0:10 GMT+01:00 Dmitry Safonov <0x7f454c46@gmail.com>:
> Sure.
> I'm on Intel actually:
> cpu family    : 6
> model        : 142
> model name    : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>
> But I usually test kernels in VM. So, I use virt-manager as it's
> easier to manage
> multiple VMs. The thing is that I've chosen "Copy host CPU configuration"
> and for some reason, I don't quite follow virt-manager makes model "Opteron_G4".
> I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26).

Hmm, the reason it chooses AMD emulation looks like a bug in virt-manager:
When I try IvyBridge CPU, it gives the following error:
> Error starting domain: the CPU is incompatible with host CPU: Host CPU does not
> provide required features: vme, x2apic, tsc-deadline, avx, f16c, rdrand

Which to my naive mind is by the reason that "tsc-deadline" is not written with
a dash in cpuinfo:
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq
pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma
cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept
vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves
ibpb ibrs stibp dtherm ida arat pln pts hwp hwp_notify hwp_act_window
hwp_epp

But that just my naive suppose.

Thanks,
             Dmitry
Dmitry Safonov May 18, 2018, 11:25 p.m. UTC | #6
2018-05-19 0:16 GMT+01:00 Dmitry Safonov <0x7f454c46@gmail.com>:
> 2018-05-19 0:10 GMT+01:00 Dmitry Safonov <0x7f454c46@gmail.com>:
>> Sure.
>> I'm on Intel actually:
>> cpu family    : 6
>> model        : 142
>> model name    : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>>
>> But I usually test kernels in VM. So, I use virt-manager as it's
>> easier to manage
>> multiple VMs. The thing is that I've chosen "Copy host CPU configuration"
>> and for some reason, I don't quite follow virt-manager makes model "Opteron_G4".
>> I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26).
>
> Hmm, the reason it chooses AMD emulation looks like a bug in virt-manager:
> When I try IvyBridge CPU, it gives the following error:
>> Error starting domain: the CPU is incompatible with host CPU: Host CPU does not
>> provide required features: vme, x2apic, tsc-deadline, avx, f16c, rdrand
>
> Which to my naive mind is by the reason that "tsc-deadline" is not written with
> a dash in cpuinfo:
> flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq
> pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma
> cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
> cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept
> vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
> rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves
> ibpb ibrs stibp dtherm ida arat pln pts hwp hwp_notify hwp_act_window
> hwp_epp
>
> But that just my naive suppose.

Yeah, so they use cpuid there and I guess this one wasn't fixed for me:
https://bugzilla.redhat.com/show_bug.cgi?id=1467599

Thanks,
             Dmitry
Andy Lutomirski May 19, 2018, 2:05 a.m. UTC | #7
> On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454c46@gmail.com> wrote:

> Hi Andy,

> 2018-05-18 23:03 GMT+01:00 Andy Lutomirski <luto@kernel.org>:
>>> On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov <dima@arista.com> wrote:
>>> Some selftests are failing, but the same way as before the patch
>>> (ITOW, it's not regression):
>>> [root@localhost self]# grep FAIL out
>>> [FAIL]  Reg 1 mismatch: requested 0x0; got 0x3
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xffffff425aadc0de
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xffffff425aadc0de
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xffffff425aadc0de

>> Are you on AMD?  Can you try this patch:


https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=c88aa6d53840e48970c54f9ef70c79415033b32d

>> and give me a Tested-by if it fixes it for you?

> Sure.
> I'm on Intel actually:
> cpu family    : 6
> model        : 142
> model name    : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

> But I usually test kernels in VM. So, I use virt-manager as it's
> easier to manage
> multiple VMs. The thing is that I've chosen "Copy host CPU configuration"
> and for some reason, I don't quite follow virt-manager makes model
"Opteron_G4".
> I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26).
> So, cpuinfo in VM says:
> cpu family    : 21
> model        : 1
> model name    : AMD Opteron 62xx class CPU

What does guest cpuinfo say for vendor_id?

There are multiple potential screwups here.

1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
that’s different from Intel’s, and the test case was definitely wrong. But
KVM has no way to influence it.  Are you sure you’re using KVM and not QEMU
TCG? Anyway, the IRET thing is minor compared to your other problems, so
let’s try to fix them first.

2. Compat fast syscalls are wildly different on AMD and Intel. Because of
this issue, QEMU with KVM is supposed to always report the real vendor_id
no matter -cpu asks for.  If we get the wrong vendor_id, then we’re at the
mercy of KVM’s emulation and performance will suck.  On older kernels, this
would cause hideous kernel crashes.  On new kernels, I would expect it to
merely crash 32-bit user programs or be slow.

> What's worse than registers changes is that some selftests actually lead
to
> Oops's. The same reason for criu-ia32 fails.
> I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2),
> so it looks to be not a recent regression.

> Full Oopses:
> [  189.100174] BUG: unable to handle kernel paging request at
00000000417bafe8
> [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE
6991f067
> [  189.100174] Oops: 0001 [#3] SMP NOPTI

Whoa there!  0001 means a failed *kernel* access.

> [  189.100174] Modules linked in:
> [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G

Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?

> D           4.17.0-rc5+ #11
> [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1.fc26 04/01/2014
> [  189.103187] RIP: 0033:0x40085a

The oops was caused from CPL 3 at what looks like a totally sensible user
address.  Can you disassemble the offending binary and tell me what the
code at 0x40085a is?

> [  189.103187] RSP: 002b:00000000417bafe8 EFLAGS: 00000206
> [  189.103187] RAX: 0000000000000000 RBX: 00000000000003e8 RCX:
0000000000000000
> [  189.103187] RDX: 0000000000000000 RSI: 0000000000400830 RDI:
00000000417baff8
> [  189.103187] RBP: 00000000417baff8 R08: 0000000000000000 R09:
0000000000000077
> [  189.103187] R10: 0000000000000006 R11: 0000000000000000 R12:
00000000417ba000
> [  189.103187] R13: 00007ffc05207840 R14: 0000000000000000 R15:
0000000000000000
> [  189.103187] FS:  00007f98566ecb40(0000) GS:ffff9740ffc00000(0000)
> knlGS:0000000000000000
> [  189.103187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CS here is the value of CS that the *kernel* has, so 0x10 is normal.

> [  189.103187] CR2: 00000000417bafe8 CR3: 0000000069dc4000 CR4:
00000000007406f0

CR2 is in user space.

So the big question is: what happened here?  Why did the CPU (or emulated
CPU) attempt a privileged access to a user address while running user code?
Dmitry Safonov May 19, 2018, 2:22 a.m. UTC | #8
On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454c46@gmail.com>
> > cpu family    : 6
> > model        : 142
> > model name    : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
> > But I usually test kernels in VM. So, I use virt-manager as it's
> > easier to manage
> > multiple VMs. The thing is that I've chosen "Copy host CPU
> > configuration"
> > and for some reason, I don't quite follow virt-manager makes model
> 
> "Opteron_G4".
> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-
> > 2.fc26).
> > So, cpuinfo in VM says:
> > cpu family    : 21
> > model        : 1
> > model name    : AMD Opteron 62xx class CPU
> 
> What does guest cpuinfo say for vendor_id?
> 
> There are multiple potential screwups here.
> 
> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
> that’s different from Intel’s, and the test case was definitely
> wrong. But
> KVM has no way to influence it.  Are you sure you’re using KVM and
> not QEMU
> TCG? Anyway, the IRET thing is minor compared to your other problems,
> so
> let’s try to fix them first.
> 
> 2. Compat fast syscalls are wildly different on AMD and Intel.
> Because of
> this issue, QEMU with KVM is supposed to always report the real
> vendor_id
> no matter -cpu asks for.  If we get the wrong vendor_id, then we’re
> at the
> mercy of KVM’s emulation and performance will suck.  On older
> kernels, this
> would cause hideous kernel crashes.  On new kernels, I would expect
> it to
> merely crash 32-bit user programs or be slow.

Heh, I didn't know those details, so it looks like it's (2),
vendor_id	: AuthenticAMD
in guest.

> 
> > What's worse than registers changes is that some selftests actually
> > lead
> 
> to
> > Oops's. The same reason for criu-ia32 fails.
> > I've tested so far v4.15 and v4.16 releases besides master
> > (2c71d338bef2),
> > so it looks to be not a recent regression.
> > Full Oopses:
> > [  189.100174] BUG: unable to handle kernel paging request at
> 
> 00000000417bafe8
> > [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067
> > PTE
> 
> 6991f067
> > [  189.100174] Oops: 0001 [#3] SMP NOPTI
> 
> Whoa there!  0001 means a failed *kernel* access.
> 
> > [  189.100174] Modules linked in:
> > [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
> 
> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?

sysret_ss_attrs_32 survives

> 
> > D           4.17.0-rc5+ #11
> > [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996),
> > BIOS 1.10.2-1.fc26 04/01/2014
> > [  189.103187] RIP: 0033:0x40085a
> 
> The oops was caused from CPL 3 at what looks like a totally sensible
> user
> address.  Can you disassemble the offending binary and tell me what
> the
> code at 0x40085a is?

Here is the function:
0000000000400842 <call32_from_64>:
  400842:       53                      push   %rbx
  400843:       55                      push   %rbp
  400844:       41 54                   push   %r12
  400846:       41 55                   push   %r13
  400848:       41 56                   push   %r14
  40084a:       41 57                   push   %r15
  40084c:       9c                      pushfq 
  40084d:       48 89 27                mov    %rsp,(%rdi)
  400850:       48 89 fc                mov    %rdi,%rsp
  400853:       6a 23                   pushq  $0x23
  400855:       68 5c 08 40 00          pushq  $0x40085c
  40085a:       48 cb                   lretq  
  40085c:       ff d6                   callq  *%rsi
  40085e:       ea                      (bad)  
  40085f:       65 08 40 00             or     %al,%gs:0x0(%rax)
  400863:       33 00                   xor    (%rax),%eax
  400865:       48 8b 24 24             mov    (%rsp),%rsp
  400869:       9d                      popfq  
  40086a:       41 5f                   pop    %r15
  40086c:       41 5e                   pop    %r14
  40086e:       41 5d                   pop    %r13
  400870:       41 5c                   pop    %r12
  400872:       5d                      pop    %rbp
  400873:       5b                      pop    %rbx
  400874:       c3                      retq   
  400875:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40087c:       00 00 00 
  40087f:       90                      nop

Looks like mov between registers caused it? The hell.

> 
> > [  189.103187] RSP: 002b:00000000417bafe8 EFLAGS: 00000206
> > [  189.103187] RAX: 0000000000000000 RBX: 00000000000003e8 RCX:
> 
> 0000000000000000
> > [  189.103187] RDX: 0000000000000000 RSI: 0000000000400830 RDI:
> 
> 00000000417baff8
> > [  189.103187] RBP: 00000000417baff8 R08: 0000000000000000 R09:
> 
> 0000000000000077
> > [  189.103187] R10: 0000000000000006 R11: 0000000000000000 R12:
> 
> 00000000417ba000
> > [  189.103187] R13: 00007ffc05207840 R14: 0000000000000000 R15:
> 
> 0000000000000000
> > [  189.103187] FS:  00007f98566ecb40(0000)
> > GS:ffff9740ffc00000(0000)
> > knlGS:0000000000000000
> > [  189.103187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 
> CS here is the value of CS that the *kernel* has, so 0x10 is normal.
> 
> > [  189.103187] CR2: 00000000417bafe8 CR3: 0000000069dc4000 CR4:
> 
> 00000000007406f0
> 
> CR2 is in user space.
> 
> So the big question is: what happened here?  Why did the CPU (or
> emulated
> CPU) attempt a privileged access to a user address while running user
> code?

No idea, but looks like it's not a kernel fault.
Dmitry Safonov May 19, 2018, 2:25 a.m. UTC | #9
2018-05-19 3:22 GMT+01:00 Dmitry Safonov <dima@arista.com>:
> On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
>> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454c46@gmail.com>
>> > cpu family    : 6
>> > model        : 142
>> > model name    : Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>> > But I usually test kernels in VM. So, I use virt-manager as it's
>> > easier to manage
>> > multiple VMs. The thing is that I've chosen "Copy host CPU
>> > configuration"
>> > and for some reason, I don't quite follow virt-manager makes model
>>
>> "Opteron_G4".
>> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-
>> > 2.fc26).
>> > So, cpuinfo in VM says:
>> > cpu family    : 21
>> > model        : 1
>> > model name    : AMD Opteron 62xx class CPU
>>
>> What does guest cpuinfo say for vendor_id?
>>
>> There are multiple potential screwups here.
>>
>> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
>> that’s different from Intel’s, and the test case was definitely
>> wrong. But
>> KVM has no way to influence it.  Are you sure you’re using KVM and
>> not QEMU
>> TCG? Anyway, the IRET thing is minor compared to your other problems,
>> so
>> let’s try to fix them first.
>>
>> 2. Compat fast syscalls are wildly different on AMD and Intel.
>> Because of
>> this issue, QEMU with KVM is supposed to always report the real
>> vendor_id
>> no matter -cpu asks for.  If we get the wrong vendor_id, then we’re
>> at the
>> mercy of KVM’s emulation and performance will suck.  On older
>> kernels, this
>> would cause hideous kernel crashes.  On new kernels, I would expect
>> it to
>> merely crash 32-bit user programs or be slow.
>
> Heh, I didn't know those details, so it looks like it's (2),
> vendor_id       : AuthenticAMD
> in guest.
>
>>
>> > What's worse than registers changes is that some selftests actually
>> > lead
>>
>> to
>> > Oops's. The same reason for criu-ia32 fails.
>> > I've tested so far v4.15 and v4.16 releases besides master
>> > (2c71d338bef2),
>> > so it looks to be not a recent regression.
>> > Full Oopses:
>> > [  189.100174] BUG: unable to handle kernel paging request at
>>
>> 00000000417bafe8
>> > [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067
>> > PTE
>>
>> 6991f067
>> > [  189.100174] Oops: 0001 [#3] SMP NOPTI
>>
>> Whoa there!  0001 means a failed *kernel* access.
>>
>> > [  189.100174] Modules linked in:
>> > [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
>>
>> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?
>
> sysret_ss_attrs_32 survives
>
>>
>> > D           4.17.0-rc5+ #11
>> > [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> > 1996),
>> > BIOS 1.10.2-1.fc26 04/01/2014
>> > [  189.103187] RIP: 0033:0x40085a
>>
>> The oops was caused from CPL 3 at what looks like a totally sensible
>> user
>> address.  Can you disassemble the offending binary and tell me what
>> the
>> code at 0x40085a is?
>
> Here is the function:
> 0000000000400842 <call32_from_64>:
>   400842:       53                      push   %rbx
>   400843:       55                      push   %rbp
>   400844:       41 54                   push   %r12
>   400846:       41 55                   push   %r13
>   400848:       41 56                   push   %r14
>   40084a:       41 57                   push   %r15
>   40084c:       9c                      pushfq
>   40084d:       48 89 27                mov    %rsp,(%rdi)
>   400850:       48 89 fc                mov    %rdi,%rsp
>   400853:       6a 23                   pushq  $0x23
>   400855:       68 5c 08 40 00          pushq  $0x40085c
>   40085a:       48 cb                   lretq
>   40085c:       ff d6                   callq  *%rsi
>   40085e:       ea                      (bad)
>   40085f:       65 08 40 00             or     %al,%gs:0x0(%rax)
>   400863:       33 00                   xor    (%rax),%eax
>   400865:       48 8b 24 24             mov    (%rsp),%rsp
>   400869:       9d                      popfq
>   40086a:       41 5f                   pop    %r15
>   40086c:       41 5e                   pop    %r14
>   40086e:       41 5d                   pop    %r13
>   400870:       41 5c                   pop    %r12
>   400872:       5d                      pop    %rbp
>   400873:       5b                      pop    %rbx
>   400874:       c3                      retq
>   400875:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>   40087c:       00 00 00
>   40087f:       90                      nop
>
> Looks like mov between registers caused it? The hell.

Oh, it's not 400850, I missloked, but 40085a so lretq might case it.

>
>>
>> > [  189.103187] RSP: 002b:00000000417bafe8 EFLAGS: 00000206
>> > [  189.103187] RAX: 0000000000000000 RBX: 00000000000003e8 RCX:
>>
>> 0000000000000000
>> > [  189.103187] RDX: 0000000000000000 RSI: 0000000000400830 RDI:
>>
>> 00000000417baff8
>> > [  189.103187] RBP: 00000000417baff8 R08: 0000000000000000 R09:
>>
>> 0000000000000077
>> > [  189.103187] R10: 0000000000000006 R11: 0000000000000000 R12:
>>
>> 00000000417ba000
>> > [  189.103187] R13: 00007ffc05207840 R14: 0000000000000000 R15:
>>
>> 0000000000000000
>> > [  189.103187] FS:  00007f98566ecb40(0000)
>> > GS:ffff9740ffc00000(0000)
>> > knlGS:0000000000000000
>> > [  189.103187] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>
>> CS here is the value of CS that the *kernel* has, so 0x10 is normal.
>>
>> > [  189.103187] CR2: 00000000417bafe8 CR3: 0000000069dc4000 CR4:
>>
>> 00000000007406f0
>>
>> CR2 is in user space.
>>
>> So the big question is: what happened here?  Why did the CPU (or
>> emulated
>> CPU) attempt a privileged access to a user address while running user
>> code?
>
> No idea, but looks like it's not a kernel fault.
>
> --
> Thanks,
>              Dmitry
Dmitry Safonov May 19, 2018, 2:33 a.m. UTC | #10
2018-05-19 3:25 GMT+01:00 Dmitry Safonov <0x7f454c46@gmail.com>:
>> Here is the function:
>> 0000000000400842 <call32_from_64>:
>>   400842:       53                      push   %rbx
>>   400843:       55                      push   %rbp
>>   400844:       41 54                   push   %r12
>>   400846:       41 55                   push   %r13
>>   400848:       41 56                   push   %r14
>>   40084a:       41 57                   push   %r15
>>   40084c:       9c                      pushfq
>>   40084d:       48 89 27                mov    %rsp,(%rdi)
>>   400850:       48 89 fc                mov    %rdi,%rsp
>>   400853:       6a 23                   pushq  $0x23
>>   400855:       68 5c 08 40 00          pushq  $0x40085c
>>   40085a:       48 cb                   lretq
>>   40085c:       ff d6                   callq  *%rsi
>>   40085e:       ea                      (bad)
>>   40085f:       65 08 40 00             or     %al,%gs:0x0(%rax)
>>   400863:       33 00                   xor    (%rax),%eax
>>   400865:       48 8b 24 24             mov    (%rsp),%rsp
>>   400869:       9d                      popfq
>>   40086a:       41 5f                   pop    %r15
>>   40086c:       41 5e                   pop    %r14
>>   40086e:       41 5d                   pop    %r13
>>   400870:       41 5c                   pop    %r12
>>   400872:       5d                      pop    %rbp
>>   400873:       5b                      pop    %rbx
>>   400874:       c3                      retq
>>   400875:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
>>   40087c:       00 00 00
>>   40087f:       90                      nop
>>
>> Looks like mov between registers caused it? The hell.
>
> Oh, it's not 400850, I missloked, but 40085a so lretq might case it.

But it's
002b:00000000417bafe8
USER_DS and sensible address, still no idea.

Patch
diff mbox

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4b100fe0f508..12bb445fb98d 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -542,6 +542,7 @@  void set_personality_64bit(void)
 	clear_thread_flag(TIF_X32);
 	/* Pretend that this comes from a 64bit execve */
 	task_pt_regs(current)->orig_ax = __NR_execve;
+	current_thread_info()->status &= ~TS_COMPAT;
 
 	/* Ensure the corresponding mm is not marked. */
 	if (current->mm)