Message ID | CA+55aFyONQeyHit6LcQz_JM5+C9pr61fcWFSpdn=7K_2mZN+oA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So the biggest impact of this is the extra register saves Actually, the other noticeable part is the reloading of the argument registers from ptregs. Together with just the extra level of 'call/ret' and the stack setup, I'm guessing we're talking maybe 20 cycles or so. So there's the extra register saves, and simply the fact that the fastpath had a flatter calling structure. It still feels worth it. And if we do decide that we want to do the register clearing on kernel entry for some paranoid mode, we'd pretty much have to do this anyway. Linus
In Thu, Jan 25, 2018 at 2:16 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> So the biggest impact of this is the extra register saves > > Actually, the other noticeable part is the reloading of the argument > registers from ptregs. Together with just the extra level of > 'call/ret' and the stack setup, I'm guessing we're talking maybe 20 > cycles or so. > > So there's the extra register saves, and simply the fact that the > fastpath had a flatter calling structure. > > It still feels worth it. And if we do decide that we want to do the > register clearing on kernel entry for some paranoid mode, we'd pretty > much have to do this anyway. > > Linus Another extra step the slow path does is checking to see if ptregs is safe for SYSRET. I think that can be mitigated by moving the check to the places that do modify ptregs (ptrace, sigreturn, and exec) which would set a flag to force return with IRET if the modified regs do not satisfy the criteria for SYSRET. -- Brian Gerst
On Thu, Jan 25, 2018 at 12:04 PM, Brian Gerst <brgerst@gmail.com> wrote: > > Another extra step the slow path does is checking to see if ptregs is > safe for SYSRET. I think that can be mitigated by moving the check to > the places that do modify ptregs (ptrace, sigreturn, and exec) which > would set a flag to force return with IRET if the modified regs do not > satisfy the criteria for SYSRET. I tried to do some profiling, and none of that shows up for me. That said, what _also_ doesn't show up is the actual page table switch on entry. And that seems to be because the per-pcu trampoline code isn't captures by perf (or at least not shown). Oh well. What _does_ show up a bit is this in prepare_exit_to_usermode(): #ifdef CONFIG_COMPAT /* * Compat syscalls set TS_COMPAT. Make sure we clear it before * returning to user mode. We need to clear it *after* signal * handling, because syscall restart has a fixup for compat * syscalls. The fixup is exercised by the ptrace_syscall_32 * selftest. * * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer * special case only applies after poking regs and before the * very next return to user mode. */ current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED); #endif and I think the problem there is that it is unnecessarily dirtying that cacheline. Afaik, those bits are already clear 99.999% of the time. So things would be better if that 'status' would be in the thread-info (to keep cachelines close to the other stuff we already touch) and the code should have something like if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED))) or whatever. There might be other similar small tuning issues going on. So there is room for improvement there in the slow path. Linus
On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Jan 22, 2018 at 10:55 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> Honestly, I'd rather get rid of the fast-path entirely. Compared to >> all the PTI mess, it's not even noticeable. > > So I looked at how that would be. > > Patch attached. Not really "tested", but I'm running the kernel with > this patch now, and 'strace' etc works, and honestly, it seems very > obvious. > > Also, code generation for 'do_syscall_64()' does not look horrible. In > fact, it doesn't look all that much worse than the fast-path ever did. > > So the biggest impact of this is the extra register saves > (SAVE_EXTRA_REGS) from setting up the full ptregs. And honestly, I > hate how that stupid macro still uses "movq reg,off(%rsp)" instead of > "pushq %reg". > > Considering the diffstat: > > 2 files changed, 2 insertions(+), 121 deletions(-) > > and how those 100+ lines are nasty assembly code, I do think we should > just do it. Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch. Or I can grab it and send it to -tip. Re: the trampoline not showing up: if I find some time, I'll try to wire it up correctly in kallsyms. --Andy
On Thu, 25 Jan 2018, Andy Lutomirski wrote: > On Thu, Jan 25, 2018 at 10:48 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Mon, Jan 22, 2018 at 10:55 AM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > >> > >> Honestly, I'd rather get rid of the fast-path entirely. Compared to > >> all the PTI mess, it's not even noticeable. > > > > So I looked at how that would be. > > > > Patch attached. Not really "tested", but I'm running the kernel with > > this patch now, and 'strace' etc works, and honestly, it seems very > > obvious. > > > > Also, code generation for 'do_syscall_64()' does not look horrible. In > > fact, it doesn't look all that much worse than the fast-path ever did. > > > > So the biggest impact of this is the extra register saves > > (SAVE_EXTRA_REGS) from setting up the full ptregs. And honestly, I > > hate how that stupid macro still uses "movq reg,off(%rsp)" instead of > > "pushq %reg". > > > > Considering the diffstat: > > > > 2 files changed, 2 insertions(+), 121 deletions(-) > > > > and how those 100+ lines are nasty assembly code, I do think we should > > just do it. > > Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch. > > Or I can grab it and send it to -tip. That would be nice, so we can route it through x86/pti which provides it for backporting cleanly. Thanks, tglx
On Thu, Jan 25, 2018 at 1:02 PM, Andy Lutomirski <luto@kernel.org> wrote: > > Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch. > > Or I can grab it and send it to -tip. I'm not going to apply it for 4.15, I just wanted to see how it looked, and do some minimal profiling. From the profiles, as mentioned, moving 'status' from thread_struct to thread_info is probably worth doing. But I didn't look at the impact of that at all. So it should go through all the normal channels in -tip for 4.16. I'll happily sign off on the patch, but it was really pretty mindless, so I'm not sure I need the authorship either. > Re: the trampoline not showing up: if I find some time, I'll try to > wire it up correctly in kallsyms. That would be lovely. Right now the system call exit shows up pretty clearly in profiles, and most of it is (obviously) the cr3 write. So the missing entry trampoline is not insignificant. Linus
On Thu, Jan 25, 2018 at 1:06 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Jan 25, 2018 at 1:02 PM, Andy Lutomirski <luto@kernel.org> wrote: >> >> Feel free to Acked-by: Andy Lutomirski <luto@kernel.org> that patch. >> >> Or I can grab it and send it to -tip. > > I'm not going to apply it for 4.15, I just wanted to see how it > looked, and do some minimal profiling. > > From the profiles, as mentioned, moving 'status' from thread_struct to > thread_info is probably worth doing. But I didn't look at the impact > of that at all. > > So it should go through all the normal channels in -tip for 4.16. > > I'll happily sign off on the patch, but it was really pretty mindless, > so I'm not sure I need the authorship either. > >> Re: the trampoline not showing up: if I find some time, I'll try to >> wire it up correctly in kallsyms. > > That would be lovely. Right now the system call exit shows up pretty > clearly in profiles, and most of it is (obviously) the cr3 write. So > the missing entry trampoline is not insignificant. > With retpoline, the retpoline in the trampoline sucks. I don't need perf for that -- I've benchmarked it both ways. It sucks. I'll fix it, but it'll be kind of complicated. --Andy
On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote: > > With retpoline, the retpoline in the trampoline sucks. I don't need > perf for that -- I've benchmarked it both ways. It sucks. I'll fix > it, but it'll be kind of complicated. Ahh, I'd forgotten about that (and obviously didn't see it in the profiles). But yeah, that is fixable even if it does require a page per CPU. Or did you have some clever scheme in mind? But that's independent of the slow/fast path. Linus
On Thu, Jan 25, 2018 at 1:20 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote: >> >> With retpoline, the retpoline in the trampoline sucks. I don't need >> perf for that -- I've benchmarked it both ways. It sucks. I'll fix >> it, but it'll be kind of complicated. > > Ahh, I'd forgotten about that (and obviously didn't see it in the profiles). > > But yeah, that is fixable even if it does require a page per CPU. Or > did you have some clever scheme in mind? Nothing clever. I was going to see if I could get actual binutils-generated relocations to work in the trampoline. We already have code to parse ELF relocations and turn them into a simple table, and it shouldn't be *that* hard to run a separate pass on the entry trampoline. Another potentially useful if rather minor optimization would be to rejigger the SYSCALL_DEFINE macros a bit. Currently we treat all syscalls like this: long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5); I wonder if we'd be better off doing: long func(const struct pt_regs *regs); and autogenerating: static long SyS_read(const struct pt_regs *regs) { return sys_reg(regs->di, ...); }
On Thu, Jan 25, 2018 at 1:31 PM, Andy Lutomirski <luto@kernel.org> wrote: > On Thu, Jan 25, 2018 at 1:20 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote: >>> >>> With retpoline, the retpoline in the trampoline sucks. I don't need >>> perf for that -- I've benchmarked it both ways. It sucks. I'll fix >>> it, but it'll be kind of complicated. >> >> Ahh, I'd forgotten about that (and obviously didn't see it in the profiles). >> >> But yeah, that is fixable even if it does require a page per CPU. Or >> did you have some clever scheme in mind? > > Nothing clever. I was going to see if I could get actual > binutils-generated relocations to work in the trampoline. We already > have code to parse ELF relocations and turn them into a simple table, > and it shouldn't be *that* hard to run a separate pass on the entry > trampoline. > > Another potentially useful if rather minor optimization would be to > rejigger the SYSCALL_DEFINE macros a bit. Currently we treat all > syscalls like this: > > long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5); > > I wonder if we'd be better off doing: > > long func(const struct pt_regs *regs); > > and autogenerating: > > static long SyS_read(const struct pt_regs *regs) > { > return sys_reg(regs->di, ...); > } If you're rejiggering, can we also put in a mechanism for detecting which registers to clear so that userspace can't inject useful values into speculation paths? https://patchwork.kernel.org/patch/10153753/
On Thu, Jan 25, 2018 at 1:39 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Jan 25, 2018 at 1:31 PM, Andy Lutomirski <luto@kernel.org> wrote: >> On Thu, Jan 25, 2018 at 1:20 PM, Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> On Thu, Jan 25, 2018 at 1:08 PM, Andy Lutomirski <luto@kernel.org> wrote: >>>> >>>> With retpoline, the retpoline in the trampoline sucks. I don't need >>>> perf for that -- I've benchmarked it both ways. It sucks. I'll fix >>>> it, but it'll be kind of complicated. >>> >>> Ahh, I'd forgotten about that (and obviously didn't see it in the profiles). >>> >>> But yeah, that is fixable even if it does require a page per CPU. Or >>> did you have some clever scheme in mind? >> >> Nothing clever. I was going to see if I could get actual >> binutils-generated relocations to work in the trampoline. We already >> have code to parse ELF relocations and turn them into a simple table, >> and it shouldn't be *that* hard to run a separate pass on the entry >> trampoline. >> >> Another potentially useful if rather minor optimization would be to >> rejigger the SYSCALL_DEFINE macros a bit. Currently we treat all >> syscalls like this: >> >> long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5); >> >> I wonder if we'd be better off doing: >> >> long func(const struct pt_regs *regs); >> >> and autogenerating: >> >> static long SyS_read(const struct pt_regs *regs) >> { >> return sys_reg(regs->di, ...); >> } > > If you're rejiggering, can we also put in a mechanism for detecting > which registers to clear so that userspace can't inject useful values > into speculation paths? > > https://patchwork.kernel.org/patch/10153753/ My SYSCALL_DEFINE rejigger suggestion up-thread does this for free as a side effect. That being said, I think this would be more accurately characterized as "so that userspace has a somewhat harder time injecting useful values into speculation paths".
On Thu, Jan 25, 2018 at 1:39 PM, Dan Williams <dan.j.williams@intel.com> wrote: > > If you're rejiggering, can we also put in a mechanism for detecting > which registers to clear so that userspace can't inject useful values > into speculation paths? That actually becomes trivial with just the "no fastpath" patch I sent out. You can just clear all of them. Sure, then do_syscall_64() will reload the six first ones, but since those are the argument registers anyway, and since they are caller-clobbered, they have very short lifetimes. So it would effectively not really be an issue. But yes, SYSCALL_DEFINEx() rejiggery would close even that tiny hole. Linus
From: Andy Lutomirski > Sent: 25 January 2018 21:31 ... > Another potentially useful if rather minor optimization would be to > rejigger the SYSCALL_DEFINE macros a bit. Currently we treat all > syscalls like this: > > long func(long arg0, long arg1, long arg2, long arg3, long arg4, long arg5); > > I wonder if we'd be better off doing: > > long func(const struct pt_regs *regs); > > and autogenerating: > > static long SyS_read(const struct pt_regs *regs) > { > return sys_reg(regs->di, ...); > } Hmmm.... NetBSD (and the other BSD?) defines a structure for the arguments to each syscall. On systems where all function arguments are put on stack the user stack that contains the arguments is copied into a kernel buffer. For amd64 I changed the register save area layout so that the arguments were in the right order [1]. Then added an extra area for the extra arguments that had to be read from the user stack. Just passing a pointer into the save area has to be better than reading all the values again. [1] There was some horrid fallout from that :-( David
> NetBSD (and the other BSD?) defines a structure for the arguments to > each syscall. Goes back to v7 or so but they put the syscall arguments into the uarea so that no pointers were needed (uarea being a per process mapping at a fixed address) in order to also reduce pointer dereferencing costs (not that those matter much on modern processors) Alan.
On Fri, Jan 26, 2018 at 6:24 AM, Alan Cox <alan@linux.intel.com> wrote: >> NetBSD (and the other BSD?) defines a structure for the arguments to >> each syscall. > > Goes back to v7 or so but they put the syscall arguments into the uarea > so that no pointers were needed (uarea being a per process mapping at a > fixed address) in order to also reduce pointer dereferencing costs (not > that those matter much on modern processors) > I gave the rearrangement like this a try yesterday and it's a bit of a mess. Part of the problem is that there are a bunch of pieces of code that expect sys_xyz() to be actual callable functions. The best way to deal with that is probably to switch to calling normal functions.
On Fri, Jan 26, 2018 at 7:57 AM, Andy Lutomirski <luto@kernel.org> wrote: > > I gave the rearrangement like this a try yesterday and it's a bit of a > mess. Part of the problem is that there are a bunch of pieces of code > that expect sys_xyz() to be actual callable functions. That's not supposed to be a mess. That's part of why we do that whole indirection through SYSC##xyz to sys##_xyz: the asm-callable ones will do the full casting of troublesome arguments (some architectures have C calling sequence rules that have security issues, so we need to make sure that the arguments actually follow the right rules and 'int' arguments are properly sign-extended etc). So that whole indirection could be made to *also* create another version of the syscall that instead took the arguments from ptregs. We already do exactly that for the tracing events: look how FTRACE_SYSCALLS ends up creating that extra metadata. The ptreg version should be done the same way: don't make 'sys_xyz()' take a struct ptregs, instead make those SYSCALL_DEFINE*() macros create a _new_ function called 'ptregs_xyz()' and then that function does the argument unpacking. Then the x86 system call table can just be switched over to call those ptreg versions instead. Linus
On Fri, Jan 26, 2018 at 09:40:23AM -0800, Linus Torvalds wrote: > On Fri, Jan 26, 2018 at 7:57 AM, Andy Lutomirski <luto@kernel.org> wrote: > > > > I gave the rearrangement like this a try yesterday and it's a bit of a > > mess. Part of the problem is that there are a bunch of pieces of code > > that expect sys_xyz() to be actual callable functions. > > That's not supposed to be a mess. > > That's part of why we do that whole indirection through SYSC##xyz to > sys##_xyz: the asm-callable ones will do the full casting of > troublesome arguments (some architectures have C calling sequence > rules that have security issues, so we need to make sure that the > arguments actually follow the right rules and 'int' arguments are > properly sign-extended etc). > > So that whole indirection could be made to *also* create another > version of the syscall that instead took the arguments from ptregs. > > We already do exactly that for the tracing events: look how > FTRACE_SYSCALLS ends up creating that extra metadata. > > The ptreg version should be done the same way: don't make 'sys_xyz()' > take a struct ptregs, instead make those SYSCALL_DEFINE*() macros > create a _new_ function called 'ptregs_xyz()' and then that function > does the argument unpacking. > > Then the x86 system call table can just be switched over to call those > ptreg versions instead. Umm... What about other architectures? Or do you want SYSCALL_DEFINE... to be per-arch? I wonder how much would that "go through pt_regs" hurt on something like sparc...
On Fri, Jan 26, 2018 at 10:07 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Umm... What about other architectures? Or do you want SYSCALL_DEFINE... > to be per-arch? I wonder how much would that "go through pt_regs" hurt > on something like sparc... No, but I just talked to Will Deacon about register clearing on entry, and so I suspect that arm64 might want something similar too. So I think some opt-in for letting architectures add their own function would be good. Because it wouldn't be all architectures, but it probably _would_ be more than just x86. You need to add architecture-specific "load argX from ptregs" macros anyway. Linus
On Fri, Jan 26, 2018 at 10:13 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Jan 26, 2018 at 10:07 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> Umm... What about other architectures? Or do you want SYSCALL_DEFINE... >> to be per-arch? I wonder how much would that "go through pt_regs" hurt >> on something like sparc... > > No, but I just talked to Will Deacon about register clearing on entry, > and so I suspect that arm64 might want something similar too. > > So I think some opt-in for letting architectures add their own > function would be good. Because it wouldn't be all architectures, but > it probably _would_ be more than just x86. > > You need to add architecture-specific "load argX from ptregs" macros anyway. I mocked that up, and it's straightforward. I ended up with something like: #define __ARCH_SYSCALL_ARGS(n, ...) (regs->di, ...) (obviously modified so it actually compiles.) The issue is that doing it this way gives us, effectively: long sys_foo(int a, int b) { body here; } long SyS_foo(const struct pt_regs *regs) { return sys_foo(regs->di, regs->si); } whereas what we want is *static* long sys_foo(...). So I could split the macros into: DEFINE_SYSCALL2(foo, ....) and DEFINE_EXTERN_SYSCALL2(foo, ...) or I could just fix up all the code that expects calling sys_foo() across files to work. My mockup patch doesn't actually work because of compat crap, but that's manageable. --Andy
On Fri, Jan 26, 2018 at 10:23 AM, Andy Lutomirski <luto@kernel.org> wrote: > > The issue is that doing it this way gives us, effectively: > > long sys_foo(int a, int b) > { > body here; > } > > long SyS_foo(const struct pt_regs *regs) > { > return sys_foo(regs->di, regs->si); > } > > whereas what we want is *static* long sys_foo(...). How about just marking 'sys_foo()' as being always_inline (but still not static)? Because the case that _matters_ is that SyS_foo(), thing when this is enabled. Sure, you'll get two copies of the code (one in SyS_foo(), the other being the callable-from C 'sys_foo()' that is exported and almost never used). But that seems a fairly small price to pay. We could leave it for later to try to get rid of the unused copies entirely. Linus
On Fri, Jan 26, 2018 at 10:54 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Jan 26, 2018 at 10:23 AM, Andy Lutomirski <luto@kernel.org> wrote: >> >> The issue is that doing it this way gives us, effectively: >> >> long sys_foo(int a, int b) >> { >> body here; >> } >> >> long SyS_foo(const struct pt_regs *regs) >> { >> return sys_foo(regs->di, regs->si); >> } >> >> whereas what we want is *static* long sys_foo(...). > > How about just marking 'sys_foo()' as being always_inline (but still > not static)? Because the case that _matters_ is that SyS_foo(), thing > when this is enabled. > > Sure, you'll get two copies of the code (one in SyS_foo(), the other > being the callable-from C 'sys_foo()' that is exported and almost > never used). But that seems a fairly small price to pay. We could > leave it for later to try to get rid of the unused copies entirely. > I could do that, but Josh Triplett will yell at me. Anyway, I'll fiddle with it. This isn't exactly high priority.
Hi Andy, On Fri, Jan 26, 2018 at 10:23:23AM -0800, Andy Lutomirski wrote: > On Fri, Jan 26, 2018 at 10:13 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Fri, Jan 26, 2018 at 10:07 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > >> > >> Umm... What about other architectures? Or do you want SYSCALL_DEFINE... > >> to be per-arch? I wonder how much would that "go through pt_regs" hurt > >> on something like sparc... > > > > No, but I just talked to Will Deacon about register clearing on entry, > > and so I suspect that arm64 might want something similar too. > > > > So I think some opt-in for letting architectures add their own > > function would be good. Because it wouldn't be all architectures, but > > it probably _would_ be more than just x86. > > > > You need to add architecture-specific "load argX from ptregs" macros anyway. > > I mocked that up, and it's straightforward. I ended up with something like: > > #define __ARCH_SYSCALL_ARGS(n, ...) (regs->di, ...) > > (obviously modified so it actually compiles.) > > The issue is that doing it this way gives us, effectively: > > long sys_foo(int a, int b) > { > body here; > } > > long SyS_foo(const struct pt_regs *regs) > { > return sys_foo(regs->di, regs->si); > } > > whereas what we want is *static* long sys_foo(...). So I could split > the macros into: > > DEFINE_SYSCALL2(foo, ....) > > and > > DEFINE_EXTERN_SYSCALL2(foo, ...) > > or I could just fix up all the code that expects calling sys_foo() > across files to work. Another issue with this style of macro definition exists on architectures where the calling convention needs you to carry state around depending on how you packed the previous parameters. For example, on 32-bit ARM, 64-bit values are passed in adjacent pairs of registers but the low numbered register needs to be even. This is what stopped me from trying to use existing helpers such as syscall_get_arguments to unpack the pt_regs and it generally means that anything that says "get me argument n" is going to require constructing arguments 0..n-1 first. To do this properly I think we'll either need to pass back the size and current register offset to the arch code, or just allow the thing to be overridden per syscall (the case above isn't especially frequent). Will
From: Will Deacon > Sent: 29 January 2018 13:20 ... > Another issue with this style of macro definition exists on architectures > where the calling convention needs you to carry state around depending on > how you packed the previous parameters. For example, on 32-bit ARM, 64-bit > values are passed in adjacent pairs of registers but the low numbered > register needs to be even. This is what stopped me from trying to use > existing helpers such as syscall_get_arguments to unpack the pt_regs > and it generally means that anything that says "get me argument n" is going > to require constructing arguments 0..n-1 first. > > To do this properly I think we'll either need to pass back the size and > current register offset to the arch code, or just allow the thing to be > overridden per syscall (the case above isn't especially frequent). If you generate a structure from the argument list that might work 'by magic'. Certainly you can add explicit pads to any structure. David
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index ff6f8022612c..65f4d29be69b 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -241,80 +241,6 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) TRACE_IRQS_OFF - /* - * If we need to do entry work or if we guess we'll need to do - * exit work, go straight to the slow path. - */ - movq PER_CPU_VAR(current_task), %r11 - testl $_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11) - jnz entry_SYSCALL64_slow_path - -entry_SYSCALL_64_fastpath: - /* - * Easy case: enable interrupts and issue the syscall. If the syscall - * needs pt_regs, we'll call a stub that disables interrupts again - * and jumps to the slow path. - */ - TRACE_IRQS_ON - ENABLE_INTERRUPTS(CLBR_NONE) -#if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max, %rax -#else - andl $__SYSCALL_MASK, %eax - cmpl $__NR_syscall_max, %eax -#endif - ja 1f /* return -ENOSYS (already in pt_regs->ax) */ - movq %r10, %rcx - - /* - * This call instruction is handled specially in stub_ptregs_64. - * It might end up jumping to the slow path. If it jumps, RAX - * and all argument registers are clobbered. - */ -#ifdef CONFIG_RETPOLINE - movq sys_call_table(, %rax, 8), %rax - call __x86_indirect_thunk_rax -#else - call *sys_call_table(, %rax, 8) -#endif -.Lentry_SYSCALL_64_after_fastpath_call: - - movq %rax, RAX(%rsp) -1: - - /* - * If we get here, then we know that pt_regs is clean for SYSRET64. - * If we see that no exit work is required (which we are required - * to check with IRQs off), then we can go straight to SYSRET64. - */ - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF - movq PER_CPU_VAR(current_task), %r11 - testl $_TIF_ALLWORK_MASK, TASK_TI_flags(%r11) - jnz 1f - - LOCKDEP_SYS_EXIT - TRACE_IRQS_ON /* user mode is traced as IRQs on */ - movq RIP(%rsp), %rcx - movq EFLAGS(%rsp), %r11 - addq $6*8, %rsp /* skip extra regs -- they were preserved */ - UNWIND_HINT_EMPTY - jmp .Lpop_c_regs_except_rcx_r11_and_sysret - -1: - /* - * The fast path looked good when we started, but something changed - * along the way and we need to switch to the slow path. Calling - * raise(3) will trigger this, for example. IRQs are off. - */ - TRACE_IRQS_ON - ENABLE_INTERRUPTS(CLBR_ANY) - SAVE_EXTRA_REGS - movq %rsp, %rdi - call syscall_return_slowpath /* returns with IRQs disabled */ - jmp return_from_SYSCALL_64 - -entry_SYSCALL64_slow_path: /* IRQs are off. */ SAVE_EXTRA_REGS movq %rsp, %rdi @@ -393,7 +319,6 @@ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ UNWIND_HINT_EMPTY POP_EXTRA_REGS -.Lpop_c_regs_except_rcx_r11_and_sysret: popq %rsi /* skip r11 */ popq %r10 popq %r9 @@ -424,47 +349,6 @@ syscall_return_via_sysret: USERGS_SYSRET64 END(entry_SYSCALL_64) -ENTRY(stub_ptregs_64) - /* - * Syscalls marked as needing ptregs land here. - * If we are on the fast path, we need to save the extra regs, - * which we achieve by trying again on the slow path. If we are on - * the slow path, the extra regs are already saved. - * - * RAX stores a pointer to the C function implementing the syscall. - * IRQs are on. - */ - cmpq $.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) - jne 1f - - /* - * Called from fast path -- disable IRQs again, pop return address - * and jump to slow path - */ - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF - popq %rax - UNWIND_HINT_REGS extra=0 - jmp entry_SYSCALL64_slow_path - -1: - JMP_NOSPEC %rax /* Called from C */ -END(stub_ptregs_64) - -.macro ptregs_stub func -ENTRY(ptregs_\func) - UNWIND_HINT_FUNC - leaq \func(%rip), %rax - jmp stub_ptregs_64 -END(ptregs_\func) -.endm - -/* Instantiate ptregs_stub for each ptregs-using syscall */ -#define __SYSCALL_64_QUAL_(sym) -#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym -#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym) -#include <asm/syscalls_64.h> - /* * %rdi: prev task * %rsi: next task diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index 9c09775e589d..c176d2fab1da 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -7,14 +7,11 @@ #include <asm/asm-offsets.h> #include <asm/syscall.h> -#define __SYSCALL_64_QUAL_(sym) sym -#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym - -#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); +#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); #include <asm/syscalls_64.h> #undef __SYSCALL_64 -#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym), +#define __SYSCALL_64(nr, sym, qual) [nr] = sym, extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);