From patchwork Thu Jul 16 18:22:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668373 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 068A213A4 for ; Thu, 16 Jul 2020 19:51:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9F28207BC for ; Thu, 16 Jul 2020 19:51:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="OnDYOzRm"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="6KRLHLBQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729662AbgGPTus (ORCPT ); Thu, 16 Jul 2020 15:50:48 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35246 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729506AbgGPTur (ORCPT ); Thu, 16 Jul 2020 15:50:47 -0400 Message-Id: <20200716185424.011950288@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=IR3XLMzlMIVsyZTFeP3Anvt5KzF7MuEmK/Q7BPDXqmc=; b=OnDYOzRmeE7nqK/J2OoONv6Zsdr+FwRdTOZLnGKBgHoGoVE1tKlBmGjRIvUUfYRqzMVHWV VCECtSa+Ab91J3CDrU+Idkwki/N+LruCBPiT6qujD5GWVicP6rIgPpU3/QecRhX82l0GdY d17gkrFn0L4Qnb7V9guJZvMRXNuYZqMLjXqqQys0y0QQNhNMV1BwjYDOqU5YqNPZA1O6UU 7tczlkfYItK8kWJS/mXPrD30Bq5NDj2nGWPT1ce7UCQYkHAH39H9spJrNsch3QL2dzYHFc agkdmG2zqkP8E8c01rE/uveDKbqEeUzF4zgQQnNQ1hVEtU5z9EIfPBpUWpzh+Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=IR3XLMzlMIVsyZTFeP3Anvt5KzF7MuEmK/Q7BPDXqmc=; b=6KRLHLBQ90TMjrFTaI0Ux1Clquu8kJ1lvIiEmlWKLz/rWHXCig8p0+VKnCvm7qHU4aJVpQ haTNazrp8vNCwAAA== Date: Thu, 16 Jul 2020 20:22:09 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 01/13] entry: Provide generic syscall entry functionality References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner On syscall entry certain work needs to be done: - Establish state (lockdep, context tracking, tracing) - Conditional work (ptrace, seccomp, audit...) This code is needlessly duplicated and different in all architectures. Provide a generic version based on the x86 implementation which has all the RCU and instrumentation bits right. As interrupt/exception entry from user space needs parts of the same functionality, provide a function for this as well. syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be called right after the low level ASM entry. The calling code must be non-instrumentable. After the functions returns state is correct and the subsequent functions can be instrumented. Signed-off-by: Thomas Gleixner --- V3: Adapt to noinstr changes. Add interrupt/exception entry function. V2: Fix function documentation (Mike) Add comment about return value (Andy) --- arch/Kconfig | 3 include/linux/entry-common.h | 156 +++++++++++++++++++++++++++++++++++++++++++ kernel/Makefile | 1 kernel/entry/Makefile | 3 kernel/entry/common.c | 78 +++++++++++++++++++++ 5 files changed, 241 insertions(+) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -27,6 +27,9 @@ config HAVE_IMA_KEXEC config HOTPLUG_SMT bool +config GENERIC_ENTRY + bool + config OPROFILE tristate "OProfile system profiling" depends on PROFILING --- /dev/null +++ b/include/linux/entry-common.h @@ -0,0 +1,156 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_ENTRYCOMMON_H +#define __LINUX_ENTRYCOMMON_H + +#include +#include +#include +#include + +#include + +/* + * Define dummy _TIF work flags if not defined by the architecture or for + * disabled functionality. + */ +#ifndef _TIF_SYSCALL_TRACE +# define _TIF_SYSCALL_TRACE (0) +#endif + +#ifndef _TIF_SYSCALL_EMU +# define _TIF_SYSCALL_EMU (0) +#endif + +#ifndef _TIF_SYSCALL_TRACEPOINT +# define _TIF_SYSCALL_TRACEPOINT (0) +#endif + +#ifndef _TIF_SECCOMP +# define _TIF_SECCOMP (0) +#endif + +#ifndef _TIF_AUDIT +# define _TIF_AUDIT (0) +#endif + +/* + * TIF flags handled in syscall_enter_from_usermode() + */ +#ifndef ARCH_SYSCALL_ENTER_WORK +# define ARCH_SYSCALL_ENTER_WORK (0) +#endif + +#define SYSCALL_ENTER_WORK \ + (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | TIF_SECCOMP | \ + _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \ + ARCH_SYSCALL_ENTER_WORK) + +/** + * arch_check_user_regs - Architecture specific sanity check for user mode regs + * @regs: Pointer to currents pt_regs + * + * Defaults to an empty implementation. Can be replaced by architecture + * specific code. + * + * Invoked from syscall_enter_from_user_mode() in the non-instrumentable + * section. Use __always_inline so the compiler cannot push it out of line + * and make it instrumentable. + */ +static __always_inline void arch_check_user_regs(struct pt_regs *regs); + +#ifndef arch_check_user_regs +static __always_inline void arch_check_user_regs(struct pt_regs *regs) {} +#endif + +/** + * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry() + * @regs: Pointer to currents pt_regs + * + * Returns: 0 on success or an error code to skip the syscall. + * + * Defaults to tracehook_report_syscall_entry(). Can be replaced by + * architecture specific code. + * + * Invoked from syscall_enter_from_user_mode() + */ +static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs); + +#ifndef arch_syscall_enter_tracehook +static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs) +{ + return tracehook_report_syscall_entry(regs); +} +#endif + +/** + * arch_syscall_enter_seccomp - Architecture specific seccomp invocation + * @regs: Pointer to currents pt_regs + * + * Returns: The original or a modified syscall number + * + * Invoked from syscall_enter_from_user_mode(). Can be replaced by + * architecture specific code. + */ +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs); + +#ifndef arch_syscall_enter_seccomp +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs) +{ + return secure_computing(NULL); +} +#endif + +/** + * arch_syscall_enter_audit - Architecture specific audit invocation + * @regs: Pointer to currents pt_regs + * + * Invoked from syscall_enter_from_user_mode(). Must be replaced by + * architecture specific code if the architecture supports audit. + */ +static inline void arch_syscall_enter_audit(struct pt_regs *regs); + +#ifndef arch_syscall_enter_audit +static inline void arch_syscall_enter_audit(struct pt_regs *regs) { } +#endif + +/** + * syscall_enter_from_user_mode - Check and handle work before invoking + * a syscall + * @regs: Pointer to currents pt_regs + * @syscall: The syscall number + * + * Invoked from architecture specific syscall entry code with interrupts + * disabled. The calling code has to be non-instrumentable. When the + * function returns all state is correct and the subsequent functions can be + * instrumented. + * + * Returns: The original or a modified syscall number + * + * If the returned syscall number is -1 then the syscall should be + * skipped. In this case the caller may invoke syscall_set_error() or + * syscall_set_return_value() first. If neither of those are called and -1 + * is returned, then the syscall will fail with ENOSYS. + * + * The following functionality is handled here: + * + * 1) Establish state (lockdep, RCU (context tracking), tracing) + * 2) TIF flag dependent invocations of arch_syscall_enter_tracehook(), + * arch_syscall_enter_seccomp(), trace_sys_enter() + * 3) Invocation of arch_syscall_enter_audit() + */ +long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall); + +/** + * irqentry_enter_from_user_mode - Establish state before invoking the irq handler + * @regs: Pointer to currents pt_regs + * + * Invoked from architecture specific entry code with interrupts disabled. + * Can only be called when the interrupt entry came from user mode. The + * calling code must be non-instrumentable. When the function returns all + * state is correct and the subsequent functions can be instrumented. + * + * The function establishes state (lockdep, RCU (context tracking), tracing) + */ +void irqentry_enter_from_user_mode(struct pt_regs *regs); + +#endif --- a/kernel/Makefile +++ b/kernel/Makefile @@ -48,6 +48,7 @@ obj-y += irq/ obj-y += rcu/ obj-y += livepatch/ obj-y += dma/ +obj-y += entry/ obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o obj-$(CONFIG_FREEZER) += freezer.o --- /dev/null +++ b/kernel/entry/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_GENERIC_ENTRY) += common.o --- /dev/null +++ b/kernel/entry/common.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +#define CREATE_TRACE_POINTS +#include + +/** + * enter_from_user_mode - Establish state when coming from user mode + * + * Syscall/interrupt entry disables interrupts, but user mode is traced as + * interrupts enabled. Also with NO_HZ_FULL RCU might be idle. + * + * 1) Tell lockdep that interrupts are disabled + * 2) Invoke context tracking if enabled to reactivate RCU + * 3) Trace interrupts off state + */ +static __always_inline void enter_from_user_mode(struct pt_regs *regs) +{ + arch_check_user_regs(regs); + lockdep_hardirqs_off(CALLER_ADDR0); + + CT_WARN_ON(ct_state() != CONTEXT_USER); + user_exit_irqoff(); + + instrumentation_begin(); + trace_hardirqs_off_finish(); + instrumentation_end(); +} + +static long syscall_trace_enter(struct pt_regs *regs, long syscall, + unsigned long ti_work) +{ + long ret = 0; + + /* Handle ptrace */ + if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) { + ret = arch_syscall_enter_tracehook(regs); + if (ret || (ti_work & _TIF_SYSCALL_EMU)) + return -1L; + } + + /* Do seccomp after ptrace, to catch any tracer changes. */ + if (ti_work & _TIF_SECCOMP) { + ret = arch_syscall_enter_seccomp(regs); + if (ret == -1L) + return ret; + } + + if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT)) + trace_sys_enter(regs, syscall); + + arch_syscall_enter_audit(regs); + + return ret ? : syscall; +} + +noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall) +{ + unsigned long ti_work; + + enter_from_user_mode(regs); + instrumentation_begin(); + + local_irq_enable(); + ti_work = READ_ONCE(current_thread_info()->flags); + if (ti_work & SYSCALL_ENTER_WORK) + syscall = syscall_trace_enter(regs, syscall, ti_work); + instrumentation_end(); + + return syscall; +} + +noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs) +{ + enter_from_user_mode(regs); +} From patchwork Thu Jul 16 18:22:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668381 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38F8960D for ; Thu, 16 Jul 2020 19:52:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1207B2084C for ; Thu, 16 Jul 2020 19:52:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="S2k8soEd"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="pAxywQKF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729649AbgGPTus (ORCPT ); Thu, 16 Jul 2020 15:50:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728907AbgGPTuq (ORCPT ); Thu, 16 Jul 2020 15:50:46 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22247C061755; Thu, 16 Jul 2020 12:50:46 -0700 (PDT) Message-Id: <20200716185424.116500611@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=EAEo/xrncxDiuqz6k4a+2z4ZwMX8lOyNoNxJrTQsb38=; b=S2k8soEd3gcMuYZ+70Qtl1jSqfSaI8Nkrp9ofkmtzP2J/NZXD7b3ywJSt0JM8CaF53nQwF 2o1GJ+kc/shMa+dDoMhCNVMUmQSAO927MXJYCGrqNvM/zfWDsLW7tCRzQWTQ8vg+2uPU98 K3TZ4jDuiGjm10ZwWD9joayQHzTCuowEWG6POqWoBXI9x7ZpmbcP4TB0zLYzTas386m4B4 uEMTwVPnMktRJmslQfVfExnw9E8XiTdLGpeLNNiY8ugK6cxWL6TZsyhVWTbT48K0LGOKrk 4Llhv8KT53ekRSWmN6cQmoDjomrQS6ygI+z6QRDOBslC5QrBB7soAMSyXg4N6w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=EAEo/xrncxDiuqz6k4a+2z4ZwMX8lOyNoNxJrTQsb38=; b=pAxywQKF14R116Y0ab/NKErZPnH1aG3/9v3LAJsI06MWGYqas7PUPgMJUJ2TY3ZNzJyoNQ lx5CwU7+4hEeirBg== Date: Thu, 16 Jul 2020 20:22:10 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 02/13] entry: Provide generic syscall exit function References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Like syscall entry all architectures have similar and pointlessly different code to handle pending work before returning from a syscall to user space. 1) One-time syscall exit work: - rseq syscall exit - audit - syscall tracing - tracehook (single stepping) 2) Preparatory work - Exit to user mode loop (common TIF handling). - Architecture specific one time work arch_exit_to_user_mode_prepare() - Address limit and lockdep checks 3) Final transition (lockdep, tracing, context tracking, RCU). Invokes arch_exit_to_user_mode() to handle e.g. speculation mitigations Provide a generic version based on the x86 code which has all the RCU and instrumentation protections right. Provide a variant for interrupt return to user mode as well which shares the above #2 and #3 work items. After syscall_exit_to_user_mode() and irqentry_exit_to_user_mode() the architecture code just has to return to user space. The code after returning from these functions must not be instrumented. Signed-off-by: Thomas Gleixner Reviewed-by: Kees Cook --- include/linux/entry-common.h | 193 +++++++++++++++++++++++++++++++++++++++++++ kernel/entry/common.c | 169 +++++++++++++++++++++++++++++++++++++ 2 files changed, 362 insertions(+) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -33,6 +33,18 @@ # define _TIF_AUDIT (0) #endif +#ifndef _TIF_PATCH_PENDING +# define _TIF_PATCH_PENDING (0) +#endif + +#ifndef _TIF_UPROBE +# define _TIF_UPROBE (0) +#endif + +#ifndef _TIF_NOTIFY_RESUME +# define _TIF_NOTIFY_RESUME (0) +#endif + /* * TIF flags handled in syscall_enter_from_usermode() */ @@ -45,6 +57,29 @@ _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \ ARCH_SYSCALL_ENTER_WORK) +/* + * TIF flags handled in syscall_exit_to_user_mode() + */ +#ifndef ARCH_SYSCALL_EXIT_WORK +# define ARCH_SYSCALL_EXIT_WORK (0) +#endif + +#define SYSCALL_EXIT_WORK \ + (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ + _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK) + +/* + * TIF flags handled in exit_to_user_mode_loop() + */ +#ifndef ARCH_EXIT_TO_USER_MODE_WORK +# define ARCH_EXIT_TO_USER_MODE_WORK (0) +#endif + +#define EXIT_TO_USER_MODE_WORK \ + (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ + _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | \ + ARCH_EXIT_TO_USER_MODE_WORK) + /** * arch_check_user_regs - Architecture specific sanity check for user mode regs * @regs: Pointer to currents pt_regs @@ -132,6 +167,149 @@ static inline void arch_syscall_enter_au long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall); /** + * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable() + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Defaults to local_irq_enable(). Can be supplied by architecture specific + * code. + */ +static inline void local_irq_enable_exit_to_user(unsigned long ti_work); + +#ifndef local_irq_enable_exit_to_user +static inline void local_irq_enable_exit_to_user(unsigned long ti_work) +{ + local_irq_enable(); +} +#endif + +/** + * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable() + * + * Defaults to local_irq_disable(). Can be supplied by architecture specific + * code. + */ +static inline void local_irq_disable_exit_to_user(void); + +#ifndef local_irq_disable_exit_to_user +static inline void local_irq_disable_exit_to_user(void) +{ + local_irq_disable(); +} +#endif + +/** + * arch_exit_to_user_mode_work - Architecture specific TIF work for exit + * to user mode. + * @regs: Pointer to currents pt_regs + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Invoked from exit_to_user_mode_loop() with interrupt enabled + * + * Defaults to NOOP. Can be supplied by architecture specific code. + */ +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, + unsigned long ti_work); + +#ifndef arch_exit_to_user_mode_work +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, + unsigned long ti_work) +{ +} +#endif + +/** + * arch_exit_to_user_mode_prepare - Architecture specific preparation for + * exit to user mode. + * @regs: Pointer to currents pt_regs + * @ti_work: Cached TIF flags gathered with interrupts disabled + * + * Invoked from exit_to_user_mode_prepare() with interrupt disabled as the last + * function before return. Defaults to NOOP. + */ +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, + unsigned long ti_work); + +#ifndef arch_exit_to_user_mode_prepare +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, + unsigned long ti_work) +{ +} +#endif + +/** + * arch_exit_to_user_mode - Architecture specific final work before + * exit to user mode. + * + * Invoked from exit_to_user_mode() with interrupt disabled as the last + * function before return. Defaults to NOOP. + * + * This needs to be __always_inline because it is non-instrumentable code + * invoked after context tracking switched to user mode. + * + * An architecture implementation must not do anything complex, no locking + * etc. The main purpose is for speculation mitigations. + */ +static __always_inline void arch_exit_to_user_mode(void); + +#ifndef arch_exit_to_user_mode +static __always_inline void arch_exit_to_user_mode(void) { } +#endif + +/** + * arch_do_signal - Architecture specific signal delivery function + * @regs: Pointer to currents pt_regs + * + * Invoked from exit_to_user_mode_loop(). + */ +void arch_do_signal(struct pt_regs *regs); + +/** + * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit() + * @regs: Pointer to currents pt_regs + * @step: Indicator for single step + * + * Defaults to tracehook_report_syscall_exit(). Can be replaced by + * architecture specific code. + * + * Invoked from syscall_exit_to_user_mode() + */ +static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step); + +#ifndef arch_syscall_exit_tracehook +static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step) +{ + tracehook_report_syscall_exit(regs, step); +} +#endif + +/** + * syscall_exit_to_user_mode - Handle work before returning to user mode + * @regs: Pointer to currents pt_regs + * + * Invoked with interrupts enabled and fully valid regs. Returns with all + * work handled, interrupts disabled such that the caller can immediately + * switch to user mode. Called from architecture specific syscall and ret + * from fork code. + * + * The call order is: + * 1) One-time syscall exit work: + * - rseq syscall exit + * - audit + * - syscall tracing + * - tracehook (single stepping) + * + * 2) Preparatory work + * - Exit to user mode loop (common TIF handling). Invokes + * arch_exit_to_user_mode_work() for architecture specific TIF work + * - Architecture specific one time work arch_exit_to_user_mode_prepare() + * - Address limit and lockdep checks + * + * 3) Final transition (lockdep, tracing, context tracking, RCU). Invokes + * arch_exit_to_user_mode() to handle e.g. speculation mitigations + */ +void syscall_exit_to_user_mode(struct pt_regs *regs); + +/** * irqentry_enter_from_user_mode - Establish state before invoking the irq handler * @regs: Pointer to currents pt_regs * @@ -140,4 +318,19 @@ long syscall_enter_from_user_mode(struct */ void irqentry_enter_from_user_mode(struct pt_regs *regs); +/** + * irqentry_exit_to_user_mode - Interrupt exit work + * @regs: Pointer to current's pt_regs + * + * Invoked with interrupts disbled and fully valid regs. Returns with all + * work handled, interrupts disabled such that the caller can immediately + * switch to user mode. Called from architecture specific interrupt + * handling code. + * + * The call order is #2 and #3 as described in syscall_exit_to_user_mode(). + * Interrupt exit is not invoking #1 which is the syscall specific one time + * work. + */ +void irqentry_exit_to_user_mode(struct pt_regs *regs); + #endif --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -2,6 +2,8 @@ #include #include +#include +#include #define CREATE_TRACE_POINTS #include @@ -72,7 +74,174 @@ noinstr long syscall_enter_from_user_mod return syscall; } +/** + * exit_to_user_mode - Fixup state when exiting to user mode + * + * Syscall/interupt exit enables interrupts, but the kernel state is + * interrupts disabled when this is invoked. Also tell RCU about it. + * + * 1) Trace interrupts on state + * 2) Invoke context tracking if enabled to adjust RCU state + * 3) Invoke architecture specific last minute exit code, e.g. speculation + * mitigations, etc. + * 4) Tell lockdep that interrupts are enabled + */ +static __always_inline void exit_to_user_mode(void) +{ + instrumentation_begin(); + trace_hardirqs_on_prepare(); + lockdep_hardirqs_on_prepare(CALLER_ADDR0); + instrumentation_end(); + + user_enter_irqoff(); + arch_exit_to_user_mode(); + lockdep_hardirqs_on(CALLER_ADDR0); +} + +/* Workaround to allow gradual conversion of architecture code */ +void __weak arch_do_signal(struct pt_regs *regs) { } + +static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, + unsigned long ti_work) +{ + /* + * Before returning to user space ensure that all pending work + * items have been completed. + */ + while (ti_work & EXIT_TO_USER_MODE_WORK) { + + local_irq_enable_exit_to_user(ti_work); + + if (ti_work & _TIF_NEED_RESCHED) + schedule(); + + if (ti_work & _TIF_UPROBE) + uprobe_notify_resume(regs); + + if (ti_work & _TIF_PATCH_PENDING) + klp_update_patch_state(current); + + if (ti_work & _TIF_SIGPENDING) + arch_do_signal(regs); + + if (ti_work & _TIF_NOTIFY_RESUME) { + clear_thread_flag(TIF_NOTIFY_RESUME); + tracehook_notify_resume(regs); + rseq_handle_notify_resume(NULL, regs); + } + + /* Architecture specific TIF work */ + arch_exit_to_user_mode_work(regs, ti_work); + + /* + * Disable interrupts and reevaluate the work flags as they + * might have changed while interrupts and preemption was + * enabled above. + */ + local_irq_disable_exit_to_user(); + ti_work = READ_ONCE(current_thread_info()->flags); + } + + /* Return the latest work state for arch_exit_to_user_mode() */ + return ti_work; +} + +static void exit_to_user_mode_prepare(struct pt_regs *regs) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + + lockdep_assert_irqs_disabled(); + + if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) + ti_work = exit_to_user_mode_loop(regs, ti_work); + + arch_exit_to_user_mode_prepare(regs, ti_work); + + /* Ensure that the address limit is intact and no locks are held */ + addr_limit_user_check(); + lockdep_assert_irqs_disabled(); + lockdep_sys_exit(); +} + +#ifndef _TIF_SINGLESTEP +static inline bool report_single_step(unsigned long ti_work) +{ + return false; +} +#else +/* + * If TIF_SYSCALL_EMU is set, then the only reason to report is when + * TIF_SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall + * instruction has been already reported in syscall_enter_from_usermode(). + */ +#define SYSEMU_STEP (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU) + +static inline bool report_single_step(unsigned long ti_work) +{ + return (ti_work & SYSEMU_STEP) == _TIF_SINGLESTEP; +} +#endif + +static void syscall_exit_work(struct pt_regs *regs, unsigned long ti_work) +{ + bool step; + + audit_syscall_exit(regs); + + if (ti_work & _TIF_SYSCALL_TRACEPOINT) + trace_sys_exit(regs, regs_syscall_retval(regs)); + + step = report_single_step(ti_work); + if (step || ti_work & _TIF_SYSCALL_TRACE) + arch_syscall_exit_tracehook(regs, step); +} + +/* + * Syscall specific exit to user mode preparation. Runs with interrupts + * enabled. + */ +static void syscall_exit_to_user_mode_prepare(struct pt_regs *regs) +{ + u32 cached_flags = READ_ONCE(current_thread_info()->flags); + unsigned long nr = regs_syscall_nr(regs); + + CT_WARN_ON(ct_state() != CONTEXT_KERNEL); + + if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { + if (WARN(irqs_disabled(), "syscall %lu left IRQs disabled", nr)) + local_irq_enable(); + } + + rseq_syscall(regs); + + /* + * Do one-time syscall specific work. If these work items are + * enabled, we want to run them exactly once per syscall exit with + * interrupts enabled. + */ + if (unlikely(cached_flags & SYSCALL_EXIT_WORK)) + syscall_exit_work(regs, cached_flags); +} + +__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) +{ + instrumentation_begin(); + syscall_exit_to_user_mode_prepare(regs); + local_irq_disable_exit_to_user(); + exit_to_user_mode_prepare(regs); + instrumentation_end(); + exit_to_user_mode(); +} + noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs) { enter_from_user_mode(regs); } + +noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs) +{ + instrumentation_begin(); + exit_to_user_mode_prepare(regs); + instrumentation_end(); + exit_to_user_mode(); +} From patchwork Thu Jul 16 18:22:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668377 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9BBCA13A4 for ; Thu, 16 Jul 2020 19:51:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7AA4D207BC for ; Thu, 16 Jul 2020 19:51:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VChtaXfz"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ZYtQazO1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729876AbgGPTvp (ORCPT ); Thu, 16 Jul 2020 15:51:45 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35274 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729614AbgGPTut (ORCPT ); Thu, 16 Jul 2020 15:50:49 -0400 Message-Id: <20200716185424.225286036@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=ctQM3iSOOt3/bGIElr7rYCMePndJKxoEuwzq7GrDzms=; b=VChtaXfzbtnhY5umE8KJoVt6h/WDuDLsl2hHlTngKdi8WRHthbmINsHwG3i/pxMU4Nq2Mv J63mPJJPDiQQphLAqQHLo5z/QnOKAUgNEOZDDLEFyQq3AsfI4nCdgcvg6OXxI2axZbbogW ArXCSFTxS0YTJV78kIwkptobDQulKFXrLtIBJ21P5kweD+NadlWnRUC69wWwRbPRhaSZQs VilLA56cnr6Ykog1WkRn/pTqNzVLA5uSYUyFihjZVLGDyobRmxjlRZhBTuDfzRyCN7XT8j nyPLdaPpH6bKSEyYScGxLJ3ZWBGaXf+8zQ4h6toQ0nX7+Orp7bsvDkMxkG7fsQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929045; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=ctQM3iSOOt3/bGIElr7rYCMePndJKxoEuwzq7GrDzms=; b=ZYtQazO1H9KESo+XV4rjbxdQdZL3dAN7LRGRcEbbPS+ItoNm2t6Vj+aql2FR9zM4p72L5Z 45TeHd4NFzMWgwCQ== Date: Thu, 16 Jul 2020 20:22:11 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 03/13] entry: Provide generic interrupt entry/exit code References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Like the syscall entry/exit code interrupt/exception entry after the real low level ASM bits should not be different accross architectures. Provide a generic version based on the x86 code. irqentry_enter() is called after the low level entry code and irqentry_exit() must be invoked right before returning to the low level code which just contains the actual return logic. The code before irqentry_enter() and irqentry_exit() must not be instrumented. Code after irqentry_enter() and before irqentry_exit() can be instrumented. irqentry_enter() invokes irqentry_enter_from_user_mode() if the interrupt/exception came from user mode. If if entered from kernel mode it handles the kernel mode variant of establishing state for lockdep, RCU and tracing depending on the kernel context it interrupted (idle, non-idle). Signed-off-by: Thomas Gleixner --- include/linux/entry-common.h | 62 ++++++++++++++++++++++ kernel/entry/common.c | 117 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 179 insertions(+) --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -346,4 +346,66 @@ void irqentry_enter_from_user_mode(struc */ void irqentry_exit_to_user_mode(struct pt_regs *regs); +#ifndef irqentry_state +typedef struct irqentry_state { + bool exit_rcu; +} irqentry_state_t; +#endif + +/** + * irqentry_enter - Handle state tracking on ordinary interrupt entries + * @regs: Pointer to pt_regs of interrupted context + * + * Invokes: + * - lockdep irqflag state tracking as low level ASM entry disabled + * interrupts. + * + * - Context tracking if the exception hit user mode. + * + * - The hardirq tracer to keep the state consistent as low level ASM + * entry disabled interrupts. + * + * As a precondition, this requires that the entry came from user mode, + * idle, or a kernel context in which RCU is watching. + * + * For kernel mode entries RCU handling is done conditional. If RCU is + * watching then the only RCU requirement is to check whether the tick has + * to be restarted. If RCU is not watching then rcu_irq_enter() has to be + * invoked on entry and rcu_irq_exit() on exit. + * + * Avoiding the rcu_irq_enter/exit() calls is an optimization but also + * solves the problem of kernel mode pagefaults which can schedule, which + * is not possible after invoking rcu_irq_enter() without undoing it. + * + * For user mode entries irqentry_enter_from_user_mode() is invoked to + * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit + * would not be possible. + * + * Returns: An opaque object that must be passed to idtentry_exit() + */ +irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); + +/** + * irqentry_exit_cond_resched - Conditionally reschedule on return from interrupt + * + * Conditional reschedule with additional sanity checks. + */ +void irqentry_exit_cond_resched(void); + +/** + * irqentry_exit - Handle return from exception that used irqentry_enter() + * @regs: Pointer to pt_regs (exception entry regs) + * @state: Return value from matching call to irqentry_enter() + * + * Depending on the return target (kernel/user) this runs the necessary + * preemption and work checks if possible and reguired and returns to + * the caller with interrupts disabled and no further work pending. + * + * This is the last action before returning to the low level ASM code which + * just needs to return to the appropriate context. + * + * Counterpart to irqentry_enter(). + */ +void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state); + #endif --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -245,3 +245,120 @@ noinstr void irqentry_exit_to_user_mode( instrumentation_end(); exit_to_user_mode(); } + +irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs) +{ + irqentry_state_t ret = { + .exit_rcu = false, + }; + + if (user_mode(regs)) { + irqentry_enter_from_user_mode(regs); + return ret; + } + + /* + * If this entry hit the idle task invoke rcu_irq_enter() whether + * RCU is watching or not. + * + * Interupts can nest when the first interrupt invokes softirq + * processing on return which enables interrupts. + * + * Scheduler ticks in the idle task can mark quiescent state and + * terminate a grace period, if and only if the timer interrupt is + * not nested into another interrupt. + * + * Checking for __rcu_is_watching() here would prevent the nesting + * interrupt to invoke rcu_irq_enter(). If that nested interrupt is + * the tick then rcu_flavor_sched_clock_irq() would wrongfully + * assume that it is the first interupt and eventually claim + * quiescient state and end grace periods prematurely. + * + * Unconditionally invoke rcu_irq_enter() so RCU state stays + * consistent. + * + * TINY_RCU does not support EQS, so let the compiler eliminate + * this part when enabled. + */ + if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { + /* + * If RCU is not watching then the same careful + * sequence vs. lockdep and tracing is required + * as in irq_enter_from_user_mode(). + */ + lockdep_hardirqs_off(CALLER_ADDR0); + rcu_irq_enter(); + instrumentation_begin(); + trace_hardirqs_off_finish(); + instrumentation_end(); + + ret.exit_rcu = true; + return ret; + } + + /* + * If RCU is watching then RCU only wants to check whether it needs + * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick() + * already contains a warning when RCU is not watching, so no point + * in having another one here. + */ + instrumentation_begin(); + rcu_irq_enter_check_tick(); + /* Use the combo lockdep/tracing function */ + trace_hardirqs_off(); + instrumentation_end(); + + return ret; +} + +void irqentry_exit_cond_resched(void) +{ + if (!preempt_count()) { + /* Sanity check RCU and thread stack */ + rcu_irq_exit_check_preempt(); + if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) + WARN_ON_ONCE(!on_thread_stack()); + if (need_resched()) + preempt_schedule_irq(); + } +} + +void noinstr irqentry_exit(struct pt_regs *regs, irqentry_state_t state) +{ + lockdep_assert_irqs_disabled(); + + /* Check whether this returns to user mode */ + if (user_mode(regs)) { + irqentry_exit_to_user_mode(regs); + } else if (!regs_irqs_disabled(regs)) { + /* + * If RCU was not watching on entry this needs to be done + * carefully and needs the same ordering of lockdep/tracing + * and RCU as the return to user mode path. + */ + if (state.exit_rcu) { + instrumentation_begin(); + /* Tell the tracer that IRET will enable interrupts */ + trace_hardirqs_on_prepare(); + lockdep_hardirqs_on_prepare(CALLER_ADDR0); + instrumentation_end(); + rcu_irq_exit(); + lockdep_hardirqs_on(CALLER_ADDR0); + return; + } + + instrumentation_begin(); + if (IS_ENABLED(CONFIG_PREEMPTION)) + irqentry_exit_cond_resched(); + /* Covers both tracing and lockdep */ + trace_hardirqs_on(); + instrumentation_end(); + } else { + /* + * IRQ flags state is correct already. Just tell RCU if it + * was not watching on entry. + */ + if (state.exit_rcu) + rcu_irq_exit(); + } +} From patchwork Thu Jul 16 18:22:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C35860D for ; Thu, 16 Jul 2020 19:51:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2C59F2083B for ; Thu, 16 Jul 2020 19:51:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="BsjSosrG"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="4SFXnmuW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729866AbgGPTvo (ORCPT ); Thu, 16 Jul 2020 15:51:44 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35282 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729635AbgGPTut (ORCPT ); Thu, 16 Jul 2020 15:50:49 -0400 Message-Id: <20200716185424.333483453@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929047; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=cVfEZjCkZ09pfI7rdgjG1e1cGx1sdhzZcD6Ss8C4u0s=; b=BsjSosrG6lDuFfdSA5EsCpYDWAdLPFWZUjJWhGeBTBI0J9cz+ikMAh+QAaGt6ZKhP+LWNy ZZf+8DeBEr0gwiTfNZgRK1KXxHfKJxad6Hn5XQ6jXOr4EkKS0ynn5pIiDczZrzUmaiG6Ql iwM3bk6VhwTPzhNmjqIwlqeZBoy8aWq+mNZ5c0LkIW4xBG9b/RAU+XgnnKMkxiKd+Gb1gr MgXJyqZsVaXXrP0oJo/ylFeWA0OxzQW+Q0pOSaOVZC3iyiTq+PjXwwgbFOKIfQQtQiFoKG WFzfCpK1H0uIYnVVxqqXdtxJHskHXMCbznZT8hE4JCQYjAS/0MoYSze+cfbNgA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929047; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=cVfEZjCkZ09pfI7rdgjG1e1cGx1sdhzZcD6Ss8C4u0s=; b=4SFXnmuW1BXxVDyX3AzEip+0B7keM+Rk1Nf/n8Xa7NlJDry9eQ8p9zsoPd+R98sFRurG7N c7ldYCBFCmLCYRCg== Date: Thu, 16 Jul 2020 20:22:12 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 04/13] entry: Provide infrastructure for work before exiting to guest mode References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Entering a guest is similar to exiting to user space. Pending work like handling signals, rescheduling, task work etc. needs to be handled before that. Provide generic infrastructure to avoid duplication of the same handling code all over the place. The exit to guest mode handling is different from the exit to usermode handling, e.g. vs. rseq and live patching, so a separate function is used. The initial list of work items handled is: TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME Architecture specific TIF flags can be added via defines in the architecture specific include files. The calling convention is also different from the syscall/interrupt entry functions as KVM invokes this from the outer vcpu_run() loop with interrupts and preemption disabled. To prevent missing a pending work item it invokes a check for pending TIF work from interrupt disabled code right before exiting to guest mode. The lockdep, RCU and tracing state handling is also done directly around the switch to and from guest mode. Signed-off-by: Thomas Gleixner Cc: Paolo Bonzini Cc: kvm@vger.kernel.org --- V3: Reworked and simplified version adopted to recent X86 and KVM changes V2: Moved KVM specific functions to kvm (Paolo) Added lockdep assert (Andy) Dropped live patching from enter guest mode work (Miroslav) --- include/linux/entry-kvm.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/kvm_host.h | 8 ++++ kernel/entry/Makefile | 3 + kernel/entry/kvm.c | 51 +++++++++++++++++++++++++++++ virt/kvm/Kconfig | 3 + 5 files changed, 144 insertions(+), 1 deletion(-) --- /dev/null +++ b/include/linux/entry-kvm.h @@ -0,0 +1,80 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_ENTRYKVM_H +#define __LINUX_ENTRYKVM_H + +#include + +/* Exit to guest mode work */ +#ifdef CONFIG_KVM_EXIT_TO_GUEST_WORK + +#ifndef ARCH_EXIT_TO_GUEST_MODE_WORK +# define ARCH_EXIT_TO_GUEST_MODE_WORK (0) +#endif + +#define EXIT_TO_GUEST_MODE_WORK \ + (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ + _TIF_NOTIFY_RESUME | ARCH_EXIT_TO_GUEST_MODE_WORK) + +struct kvm_vcpu; + +/** + * arch_exit_to_guest_mode_work - Architecture specific exit to guest mode + * work function. + * @vcpu: Pointer to current's VCPU data + * @ti_work: Cached TIF flags gathered in exit_to_guest_mode() + * + * Invoked from exit_to_guest_mode_work(). Defaults to NOOP. Can be + * replaced by architecture specific code. + */ +static inline int arch_exit_to_guest_mode_work(struct kvm_vcpu *vcpu, + unsigned long ti_work); + +#ifndef arch_exit_to_guest_mode_work +static inline int arch_exit_to_guest_mode_work(struct kvm_vcpu *vcpu, + unsigned long ti_work) +{ + return 0; +} +#endif + +/** + * exit_to_guest_mode - Check and handle pending work which needs to be + * handled before returning to guest mode + * @vcpu: Pointer to current's VCPU data + * + * Returns: 0 or an error code + */ +int exit_to_guest_mode(struct kvm_vcpu *vcpu); + +/** + * __exit_to_guest_mode_work_pending - Check if work is pending + * + * Returns: True if work pending, False otherwise. + * + * Bare variant of exit_to_guest_mode_work_pending(). Can be called from + * interrupt enabled code for racy quick checks with care. + */ +static inline bool __exit_to_guest_mode_work_pending(void) +{ + unsigned long ti_work = READ_ONCE(current_thread_info()->flags); + + return !!(ti_work & EXIT_TO_GUEST_MODE_WORK); +} + +/** + * exit_to_guest_mode_work_pending - Check if work is pending which needs to be + * handled before returning to guest mode + * + * Returns: True if work pending, False otherwise. + * + * Has to be invoked with interrupts disabled before the transition to + * guest mode. + */ +static inline bool exit_to_guest_mode_work_pending(void) +{ + lockdep_assert_irqs_disabled(); + return __exit_to_guest_mode_work_pending(); +} +#endif /* CONFIG_KVM_EXIT_TO_GUEST_WORK */ + +#endif --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1439,4 +1439,12 @@ int kvm_vm_create_worker_thread(struct k uintptr_t data, const char *name, struct task_struct **thread_ptr); +#ifdef CONFIG_KVM_EXIT_TO_GUEST_WORK +static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) +{ + vcpu->run->exit_reason = KVM_EXIT_INTR; + vcpu->stat.signal_exits++; +} +#endif /* CONFIG_KVM_EXIT_TO_GUEST_WORK */ + #endif --- a/kernel/entry/Makefile +++ b/kernel/entry/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 -obj-$(CONFIG_GENERIC_ENTRY) += common.o +obj-$(CONFIG_GENERIC_ENTRY) += common.o +obj-$(CONFIG_KVM_EXIT_TO_GUEST_WORK) += kvm.o --- /dev/null +++ b/kernel/entry/kvm.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +static int exit_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work) +{ + do { + int ret; + + if (ti_work & _TIF_SIGPENDING) { + kvm_handle_signal_exit(vcpu); + return -EINTR; + } + + if (ti_work & _TIF_NEED_RESCHED) + schedule(); + + if (ti_work & _TIF_NOTIFY_RESUME) { + clear_thread_flag(TIF_NOTIFY_RESUME); + tracehook_notify_resume(NULL); + } + + ret = arch_exit_to_guest_mode_work(vcpu, ti_work); + if (ret) + return ret; + + ti_work = READ_ONCE(current_thread_info()->flags); + } while (ti_work & EXIT_TO_GUEST_MODE_WORK || need_resched()); + return 0; +} + +int exit_to_guest_mode(struct kvm_vcpu *vcpu) +{ + unsigned long ti_work; + + /* + * This is invoked from the outer guest loop with interrupts and + * preemption enabled. + * + * KVM invokes exit_to_guest_mode_work_pending() with interrupts + * disabled in the inner loop before going into guest mode. No need + * to disable interrupts here. + */ + ti_work = READ_ONCE(current_thread_info()->flags); + if (!(ti_work & EXIT_TO_GUEST_MODE_WORK)) + return 0; + + return exit_to_guest_mode_work(vcpu, ti_work); +} +EXPORT_SYMBOL_GPL(exit_to_guest_mode); --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -60,3 +60,6 @@ config HAVE_KVM_VCPU_RUN_PID_CHANGE config HAVE_KVM_NO_POLL bool + +config KVM_EXIT_TO_GUEST_WORK + bool From patchwork Thu Jul 16 18:22:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668379 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69A3313A4 for ; Thu, 16 Jul 2020 19:52:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 472A82083B for ; Thu, 16 Jul 2020 19:52:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="zp3FTpTN"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Qs93ePuf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729842AbgGPTvn (ORCPT ); Thu, 16 Jul 2020 15:51:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729670AbgGPTuu (ORCPT ); Thu, 16 Jul 2020 15:50:50 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5B08C061755; Thu, 16 Jul 2020 12:50:49 -0700 (PDT) Message-Id: <20200716185424.442693273@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929048; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=QDiuaUV5RglXmem7ghioVXw54QezNWx7Wn+F+gWw0Wg=; b=zp3FTpTN6m0rNd2v1dw8M7sF9NsZDQhWdPx5RklPR05r2GB2VFqnlFZGoUkPBA8oUgAUjM K0kHalGyhT1CPUg1Wt22JgsN9bx6kc+tJo2vizAv7j73STG1SltY2nCmvESZf8JbTdqAQq L+Gc5rZtXXaX7Qq/BbNmI2q4DQHN3E5ea2xASins4qZVzZ6wbxyMTbDTaoRD8Ca0C+uBym IPITQe6QcKc5cWf40eoL34B5/l1g6oq77fAFiMacj23OYjCAKNFV2b5T2XN/pXAc540Gd0 Z/NYZR9bT4brqbZ1m1eRZUz5pUU7ThVrRQbtvkPxThthGJbYmR1IkahUMhW0zQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929048; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=QDiuaUV5RglXmem7ghioVXw54QezNWx7Wn+F+gWw0Wg=; b=Qs93ePufVxbdwyx0526EezNVJoLTL0UaqDTLTC1+WzKnn8RueIvOpGm6wNMCczCL6fbPrU bcl84q3K7Yz66sCA== Date: Thu, 16 Jul 2020 20:22:13 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 05/13] x86/entry: Consolidate check_user_regs() References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The user register sanity check is sprinkled all over the place. Move it into enter_from_user_mode(). Signed-off-by: Thomas Gleixner Reviewed-by: Kees Cook --- arch/x86/entry/common.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -82,10 +82,11 @@ static noinstr void check_user_regs(stru * 2) Invoke context tracking if enabled to reactivate RCU * 3) Trace interrupts off state */ -static noinstr void enter_from_user_mode(void) +static noinstr void enter_from_user_mode(struct pt_regs *regs) { enum ctx_state state = ct_state(); + check_user_regs(regs); lockdep_hardirqs_off(CALLER_ADDR0); user_exit_irqoff(); @@ -95,8 +96,9 @@ static noinstr void enter_from_user_mode instrumentation_end(); } #else -static __always_inline void enter_from_user_mode(void) +static __always_inline void enter_from_user_mode(struct pt_regs *regs) { + check_user_regs(regs); lockdep_hardirqs_off(CALLER_ADDR0); instrumentation_begin(); trace_hardirqs_off_finish(); @@ -369,9 +371,7 @@ static void __syscall_return_slowpath(st { struct thread_info *ti; - check_user_regs(regs); - - enter_from_user_mode(); + enter_from_user_mode(regs); instrumentation_begin(); local_irq_enable(); @@ -434,9 +434,7 @@ static void do_syscall_32_irqs_on(struct /* Handles int $0x80 */ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs) { - check_user_regs(regs); - - enter_from_user_mode(); + enter_from_user_mode(regs); instrumentation_begin(); local_irq_enable(); @@ -487,8 +485,6 @@ static bool __do_fast_syscall_32(struct vdso_image_32.sym_int80_landing_pad; bool success; - check_user_regs(regs); - /* * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward * so that 'regs->ip -= 2' lands back on an int $0x80 instruction. @@ -496,7 +492,7 @@ static bool __do_fast_syscall_32(struct */ regs->ip = landing_pad; - enter_from_user_mode(); + enter_from_user_mode(regs); instrumentation_begin(); local_irq_enable(); @@ -599,8 +595,7 @@ idtentry_state_t noinstr idtentry_enter( }; if (user_mode(regs)) { - check_user_regs(regs); - enter_from_user_mode(); + enter_from_user_mode(regs); return ret; } @@ -733,8 +728,7 @@ void noinstr idtentry_exit(struct pt_reg */ void noinstr idtentry_enter_user(struct pt_regs *regs) { - check_user_regs(regs); - enter_from_user_mode(); + enter_from_user_mode(regs); } /** From patchwork Thu Jul 16 18:22:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668371 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10C5513A4 for ; Thu, 16 Jul 2020 19:51:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E8891207E8 for ; Thu, 16 Jul 2020 19:51:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="UdBrVwUo"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EeaD+9MY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729706AbgGPTuy (ORCPT ); Thu, 16 Jul 2020 15:50:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729687AbgGPTuv (ORCPT ); Thu, 16 Jul 2020 15:50:51 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C779C08C5CE; Thu, 16 Jul 2020 12:50:51 -0700 (PDT) Message-Id: <20200716185424.550204929@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=R0XDSub0sjDn9oKzbJHaS75DRaYr66PZXPLf9m24vlI=; b=UdBrVwUozAf57aqhhNCimV3YzsOYWYuQFR9y1MTyQdGKhWi3BoxcDlMRAPTv7IO0HQ1dFw +P+obQhj+0R1470vv9DZIkToghi/8L4mBVftW4FQoLR7GW6nDklAlOUtrdw0m1S//fITlK Jrg8EBmkYnwGKSWJNw+OvZCzHw/KvM2aQCDUYmEIJK4lPsjywzvyhhtsOJIOvO2gFAlVJM i8fiRlEZBvqqw1FxWe7j+0ziV9IVj+9kytwjCnsqDzKr8YYhP3UQP1NwE3Td0DbRfWKZhn jwXjm2KqEHjzaDxf6J33n8QkqF81q2XLr9dSzoqj7BjOcrQZ4ou7H7CD8oqHNQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=R0XDSub0sjDn9oKzbJHaS75DRaYr66PZXPLf9m24vlI=; b=EeaD+9MYC/pncMOo9sBKwYFNIyKFUw1J8pD2yNAbq9elbXGluXVeWQ3REefL9MsRTawu0P 3aasMqUApY66zgDA== Date: Thu, 16 Jul 2020 20:22:14 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 06/13] x86/entry: Consolidate 32/64 bit syscall entry References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org 64bit and 32bit entry code have the same open coded syscall entry handling after the bitwidth specific bits. Move it to a helper function and share the code. Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 93 +++++++++++++++++++++--------------------------- 1 file changed, 41 insertions(+), 52 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -366,8 +366,7 @@ static void __syscall_return_slowpath(st exit_to_user_mode(); } -#ifdef CONFIG_X86_64 -__visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) +static noinstr long syscall_enter(struct pt_regs *regs, unsigned long nr) { struct thread_info *ti; @@ -379,6 +378,16 @@ static void __syscall_return_slowpath(st if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) nr = syscall_trace_enter(regs); + instrumentation_end(); + return nr; +} + +#ifdef CONFIG_X86_64 +__visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) +{ + nr = syscall_enter(regs, nr); + + instrumentation_begin(); if (likely(nr < NR_syscalls)) { nr = array_index_nospec(nr, NR_syscalls); regs->ax = sys_call_table[nr](regs); @@ -390,64 +399,53 @@ static void __syscall_return_slowpath(st regs->ax = x32_sys_call_table[nr](regs); #endif } - __syscall_return_slowpath(regs); - instrumentation_end(); - exit_to_user_mode(); + syscall_return_slowpath(regs); } #endif #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) +static __always_inline unsigned int syscall_32_enter(struct pt_regs *regs) +{ + if (IS_ENABLED(CONFIG_IA32_EMULATION)) + current_thread_info()->status |= TS_COMPAT; + /* + * Subtlety here: if ptrace pokes something larger than 2^32-1 into + * orig_ax, the unsigned int return value truncates it. This may + * or may not be necessary, but it matches the old asm behavior. + */ + return syscall_enter(regs, (unsigned int)regs->orig_ax); +} + /* - * Does a 32-bit syscall. Called with IRQs on in CONTEXT_KERNEL. Does - * all entry and exit work and returns with IRQs off. This function is - * extremely hot in workloads that use it, and it's usually called from - * do_fast_syscall_32, so forcibly inline it to improve performance. + * Invoke a 32-bit syscall. Called with IRQs on in CONTEXT_KERNEL. */ -static void do_syscall_32_irqs_on(struct pt_regs *regs) +static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, + unsigned int nr) { - struct thread_info *ti = current_thread_info(); - unsigned int nr = (unsigned int)regs->orig_ax; - -#ifdef CONFIG_IA32_EMULATION - ti->status |= TS_COMPAT; -#endif - - if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) { - /* - * Subtlety here: if ptrace pokes something larger than - * 2^32-1 into orig_ax, this truncates it. This may or - * may not be necessary, but it matches the old asm - * behavior. - */ - nr = syscall_trace_enter(regs); - } - if (likely(nr < IA32_NR_syscalls)) { + instrumentation_begin(); nr = array_index_nospec(nr, IA32_NR_syscalls); regs->ax = ia32_sys_call_table[nr](regs); + instrumentation_end(); } - - __syscall_return_slowpath(regs); } /* Handles int $0x80 */ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs) { - enter_from_user_mode(regs); - instrumentation_begin(); - - local_irq_enable(); - do_syscall_32_irqs_on(regs); + unsigned int nr = syscall_32_enter(regs); - instrumentation_end(); - exit_to_user_mode(); + do_syscall_32_irqs_on(regs, nr); + syscall_return_slowpath(regs); } -static bool __do_fast_syscall_32(struct pt_regs *regs) +static noinstr bool __do_fast_syscall_32(struct pt_regs *regs) { + unsigned int nr = syscall_32_enter(regs); int res; + instrumentation_begin(); /* Fetch EBP from where the vDSO stashed it. */ if (IS_ENABLED(CONFIG_X86_64)) { /* @@ -460,17 +458,18 @@ static bool __do_fast_syscall_32(struct res = get_user(*(u32 *)®s->bp, (u32 __user __force *)(unsigned long)(u32)regs->sp); } + instrumentation_end(); if (res) { /* User code screwed up. */ regs->ax = -EFAULT; - local_irq_disable(); - __prepare_exit_to_usermode(regs); + syscall_return_slowpath(regs); return false; } /* Now this is just like a normal syscall. */ - do_syscall_32_irqs_on(regs); + do_syscall_32_irqs_on(regs, nr); + syscall_return_slowpath(regs); return true; } @@ -483,7 +482,6 @@ static bool __do_fast_syscall_32(struct */ unsigned long landing_pad = (unsigned long)current->mm->context.vdso + vdso_image_32.sym_int80_landing_pad; - bool success; /* * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward @@ -492,17 +490,8 @@ static bool __do_fast_syscall_32(struct */ regs->ip = landing_pad; - enter_from_user_mode(regs); - instrumentation_begin(); - - local_irq_enable(); - success = __do_fast_syscall_32(regs); - - instrumentation_end(); - exit_to_user_mode(); - - /* If it failed, keep it simple: use IRET. */ - if (!success) + /* Invoke the syscall. If it failed, keep it simple: use IRET. */ + if (!__do_fast_syscall_32(regs)) return 0; #ifdef CONFIG_X86_64 From patchwork Thu Jul 16 18:22:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668357 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4E80F60D for ; Thu, 16 Jul 2020 19:50:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3182A20870 for ; Thu, 16 Jul 2020 19:50:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="YeQdEmXM"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="obtkJwyT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729715AbgGPTuz (ORCPT ); Thu, 16 Jul 2020 15:50:55 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35274 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729682AbgGPTuw (ORCPT ); Thu, 16 Jul 2020 15:50:52 -0400 Message-Id: <20200716185424.658427667@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=hm0SGxiCrQHQkfsKjql/X4GV7MtYKpxZQquxBECJJ+c=; b=YeQdEmXMSr+7kwj1h20ury1vxph6Q6pGLKfN+hqR8tt3X+gpEl2WrRnzWDnQ4cGVr9kfxm l9wR98GBPWzbuwagUsnnRV2CUsF98ikw/dkyC71WzapqKcUbY1dn/yI67H3Mgik7xyRTbV M34D86bV+Xn4xLBgeISfJDx257UPywjz5ZFPVL1AZ2b1OPpgRz+iGUjMq5O8lkyhYBQUJK L6Ks1ejANygnOjnDsiOwrEKf/7/CHf8V/JwUDLOx+TIs5uP9fecILD/q4Fb/8ezQEL4GpT VsD/PWFh13CohjW+/PYf04pLpg/Cpv2V1xFU1EvDNG5ocGzxxBVydQB38Jp/Ug== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929050; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=hm0SGxiCrQHQkfsKjql/X4GV7MtYKpxZQquxBECJJ+c=; b=obtkJwyTzhno54Ob6seTeBpsV8ZwaKtWMbLcbRpMottNV3jZA1ohI6wiRhCD5os6fHRcwK IqjKsa04C0hlFcAg== Date: Thu, 16 Jul 2020 20:22:15 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 07/13] x86/ptrace: Provide pt_regs helpers for entry/exit References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org As a preparatory step for moving the syscall and interrupt entry/exit handling into generic code, provide pt_regs helpers which allow to: - Retrieve the syscall number from pt_regs - Retrieve the syscall return value from pt_regs - Retrieve the interrupt state from pt_regs to check whether interrupts are reenabled by return from interrupt/exception. Signed-off-by: Thomas Gleixner Reviewed-by: Kees Cook --- arch/x86/include/asm/ptrace.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -209,6 +209,21 @@ static inline void user_stack_pointer_se regs->sp = val; } +static inline unsigned long regs_syscall_nr(struct pt_regs *regs) +{ + return regs->orig_ax; +} + +static inline long regs_syscall_retval(struct pt_regs *regs) +{ + return regs->ax; +} + +static __always_inline bool regs_irqs_disabled(struct pt_regs *regs) +{ + return !(regs->flags & X86_EFLAGS_IF); +} + /* Query offset/name of register from its name/offset */ extern int regs_query_register_offset(const char *name); extern const char *regs_query_register_name(unsigned int offset); From patchwork Thu Jul 16 18:22:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668367 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B51F360D for ; Thu, 16 Jul 2020 19:51:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8F126207BC for ; Thu, 16 Jul 2020 19:51:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="rRjEwfs0"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Tm+S07QO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729834AbgGPTv1 (ORCPT ); Thu, 16 Jul 2020 15:51:27 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35282 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729693AbgGPTuz (ORCPT ); Thu, 16 Jul 2020 15:50:55 -0400 Message-Id: <20200716185424.765294277@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=WQ8/w8anIf65NuZQ59f92ctNO50e6ggjwkWnIr+VLZ4=; b=rRjEwfs0vT9c2R/Sn1+dyi+wKBSmHKYnR7hjsaDrQVkvv2oryOPIAvUbcFWKQMdLA3wKaz gnD5mGGqwcMaxQdX0kRIkgsfhSNTXpblI7/orVt5y3o/+ZfnLWtxSdmWaAkzY/3jkECmLv D2cynPDCxtJFCRoVcVLRquAvTxr+kC5utv+POYnimeGxWEpUWgzR2lfEEk7mvKbf5Ehg0G IAUIW+g51s76tVHWj5XLb6NJ+mVmO9c36MfaY6o8o1dueGXVFBm6BF183AZLuIX0tC3BSq PWUUAm8lNjSPIpaEzG1U6esV5Q5q5lm8Aq8/mN1P9v0whQixbTxn0vIRB9WKGw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929052; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=WQ8/w8anIf65NuZQ59f92ctNO50e6ggjwkWnIr+VLZ4=; b=Tm+S07QOkfzoXivMaEcqb22uEsX1t08+lURcWRQBfFPdDgO8UXjvCHI0yhIAVOloWeKZ2E rvF2i+m2F+EiEyCw== Date: Thu, 16 Jul 2020 20:22:16 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 08/13] x86/entry: Use generic syscall entry function References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Replace the syscall entry work handling with the generic version. Provide the necessary helper inlines to handle the real architecture specific parts, e.g. audit and seccomp invocations. Use a temporary define for idtentry_enter_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner --- V3: Adapt to the noinstr changes --- arch/x86/Kconfig | 1 arch/x86/entry/common.c | 181 +----------------------------------- arch/x86/include/asm/entry-common.h | 86 +++++++++++++++++ arch/x86/include/asm/idtentry.h | 5 arch/x86/include/asm/thread_info.h | 5 5 files changed, 99 insertions(+), 179 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -115,6 +115,7 @@ config X86 select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP + select GENERIC_ENTRY select GENERIC_FIND_FIRST_BIT select GENERIC_IOMAP select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -10,13 +10,13 @@ #include #include #include +#include #include #include #include #include #include #include -#include #include #include #include @@ -42,70 +42,8 @@ #include #include -#define CREATE_TRACE_POINTS #include -/* Check that the stack and regs on entry from user mode are sane. */ -static noinstr void check_user_regs(struct pt_regs *regs) -{ - if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) { - /* - * Make sure that the entry code gave us a sensible EFLAGS - * register. Native because we want to check the actual CPU - * state, not the interrupt state as imagined by Xen. - */ - unsigned long flags = native_save_fl(); - WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF | - X86_EFLAGS_NT)); - - /* We think we came from user mode. Make sure pt_regs agrees. */ - WARN_ON_ONCE(!user_mode(regs)); - - /* - * All entries from user mode (except #DF) should be on the - * normal thread stack and should have user pt_regs in the - * correct location. - */ - WARN_ON_ONCE(!on_thread_stack()); - WARN_ON_ONCE(regs != task_pt_regs(current)); - } -} - -#ifdef CONFIG_CONTEXT_TRACKING -/** - * enter_from_user_mode - Establish state when coming from user mode - * - * Syscall entry disables interrupts, but user mode is traced as interrupts - * enabled. Also with NO_HZ_FULL RCU might be idle. - * - * 1) Tell lockdep that interrupts are disabled - * 2) Invoke context tracking if enabled to reactivate RCU - * 3) Trace interrupts off state - */ -static noinstr void enter_from_user_mode(struct pt_regs *regs) -{ - enum ctx_state state = ct_state(); - - check_user_regs(regs); - lockdep_hardirqs_off(CALLER_ADDR0); - user_exit_irqoff(); - - instrumentation_begin(); - CT_WARN_ON(state != CONTEXT_USER); - trace_hardirqs_off_finish(); - instrumentation_end(); -} -#else -static __always_inline void enter_from_user_mode(struct pt_regs *regs) -{ - check_user_regs(regs); - lockdep_hardirqs_off(CALLER_ADDR0); - instrumentation_begin(); - trace_hardirqs_off_finish(); - instrumentation_end(); -} -#endif - /** * exit_to_user_mode - Fixup state when exiting to user mode * @@ -129,83 +67,6 @@ static __always_inline void exit_to_user lockdep_hardirqs_on(CALLER_ADDR0); } -static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch) -{ -#ifdef CONFIG_X86_64 - if (arch == AUDIT_ARCH_X86_64) { - audit_syscall_entry(regs->orig_ax, regs->di, - regs->si, regs->dx, regs->r10); - } else -#endif - { - audit_syscall_entry(regs->orig_ax, regs->bx, - regs->cx, regs->dx, regs->si); - } -} - -/* - * Returns the syscall nr to run (which should match regs->orig_ax) or -1 - * to skip the syscall. - */ -static long syscall_trace_enter(struct pt_regs *regs) -{ - u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64; - - struct thread_info *ti = current_thread_info(); - unsigned long ret = 0; - u32 work; - - work = READ_ONCE(ti->flags); - - if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) { - ret = tracehook_report_syscall_entry(regs); - if (ret || (work & _TIF_SYSCALL_EMU)) - return -1L; - } - -#ifdef CONFIG_SECCOMP - /* - * Do seccomp after ptrace, to catch any tracer changes. - */ - if (work & _TIF_SECCOMP) { - struct seccomp_data sd; - - sd.arch = arch; - sd.nr = regs->orig_ax; - sd.instruction_pointer = regs->ip; -#ifdef CONFIG_X86_64 - if (arch == AUDIT_ARCH_X86_64) { - sd.args[0] = regs->di; - sd.args[1] = regs->si; - sd.args[2] = regs->dx; - sd.args[3] = regs->r10; - sd.args[4] = regs->r8; - sd.args[5] = regs->r9; - } else -#endif - { - sd.args[0] = regs->bx; - sd.args[1] = regs->cx; - sd.args[2] = regs->dx; - sd.args[3] = regs->si; - sd.args[4] = regs->di; - sd.args[5] = regs->bp; - } - - ret = __secure_computing(&sd); - if (ret == -1) - return ret; - } -#endif - - if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) - trace_sys_enter(regs, regs->orig_ax); - - do_audit_syscall_entry(regs, arch); - - return ret ?: regs->orig_ax; -} - #define EXIT_TO_USERMODE_LOOP_FLAGS \ (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) @@ -366,26 +227,10 @@ static void __syscall_return_slowpath(st exit_to_user_mode(); } -static noinstr long syscall_enter(struct pt_regs *regs, unsigned long nr) -{ - struct thread_info *ti; - - enter_from_user_mode(regs); - instrumentation_begin(); - - local_irq_enable(); - ti = current_thread_info(); - if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) - nr = syscall_trace_enter(regs); - - instrumentation_end(); - return nr; -} - #ifdef CONFIG_X86_64 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) { - nr = syscall_enter(regs, nr); + nr = syscall_enter_from_user_mode(regs, nr); instrumentation_begin(); if (likely(nr < NR_syscalls)) { @@ -407,6 +252,8 @@ static noinstr long syscall_enter(struct #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION) static __always_inline unsigned int syscall_32_enter(struct pt_regs *regs) { + unsigned int nr = (unsigned int)regs->orig_ax; + if (IS_ENABLED(CONFIG_IA32_EMULATION)) current_thread_info()->status |= TS_COMPAT; /* @@ -414,7 +261,7 @@ static __always_inline unsigned int sysc * orig_ax, the unsigned int return value truncates it. This may * or may not be necessary, but it matches the old asm behavior. */ - return syscall_enter(regs, (unsigned int)regs->orig_ax); + return (unsigned int)syscall_enter_from_user_mode(regs, nr); } /* @@ -568,7 +415,7 @@ SYSCALL_DEFINE0(ni_syscall) * solves the problem of kernel mode pagefaults which can schedule, which * is not possible after invoking rcu_irq_enter() without undoing it. * - * For user mode entries enter_from_user_mode() must be invoked to + * For user mode entries irqentry_enter_from_user_mode() must be invoked to * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit * would not be possible. * @@ -584,7 +431,7 @@ idtentry_state_t noinstr idtentry_enter( }; if (user_mode(regs)) { - enter_from_user_mode(regs); + irqentry_enter_from_user_mode(regs); return ret; } @@ -615,7 +462,7 @@ idtentry_state_t noinstr idtentry_enter( /* * If RCU is not watching then the same careful * sequence vs. lockdep and tracing is required - * as in enter_from_user_mode(). + * as in irqentry_enter_from_user_mode(). */ lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); @@ -709,18 +556,6 @@ void noinstr idtentry_exit(struct pt_reg } /** - * idtentry_enter_user - Handle state tracking on idtentry from user mode - * @regs: Pointer to pt_regs of interrupted context - * - * Invokes enter_from_user_mode() to establish the proper context for - * NOHZ_FULL. Otherwise scheduling on exit would not be possible. - */ -void noinstr idtentry_enter_user(struct pt_regs *regs) -{ - enter_from_user_mode(regs); -} - -/** * idtentry_exit_user - Handle return from exception to user mode * @regs: Pointer to pt_regs (exception entry regs) * --- /dev/null +++ b/arch/x86/include/asm/entry-common.h @@ -0,0 +1,86 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ASM_X86_ENTRY_COMMON_H +#define _ASM_X86_ENTRY_COMMON_H + +#include +#include + +/* Check that the stack and regs on entry from user mode are sane. */ +static __always_inline void arch_check_user_regs(struct pt_regs *regs) +{ + if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) { + /* + * Make sure that the entry code gave us a sensible EFLAGS + * register. Native because we want to check the actual CPU + * state, not the interrupt state as imagined by Xen. + */ + unsigned long flags = native_save_fl(); + WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF | + X86_EFLAGS_NT)); + + /* We think we came from user mode. Make sure pt_regs agrees. */ + WARN_ON_ONCE(!user_mode(regs)); + + /* + * All entries from user mode (except #DF) should be on the + * normal thread stack and should have user pt_regs in the + * correct location. + */ + WARN_ON_ONCE(!on_thread_stack()); + WARN_ON_ONCE(regs != task_pt_regs(current)); + } +} +#define arch_check_user_regs arch_check_user_regs + +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs) +{ +#ifdef CONFIG_SECCOMP + u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64; + struct seccomp_data sd; + + sd.arch = arch; + sd.nr = regs->orig_ax; + sd.instruction_pointer = regs->ip; + +#ifdef CONFIG_X86_64 + if (arch == AUDIT_ARCH_X86_64) { + sd.args[0] = regs->di; + sd.args[1] = regs->si; + sd.args[2] = regs->dx; + sd.args[3] = regs->r10; + sd.args[4] = regs->r8; + sd.args[5] = regs->r9; + } else +#endif + { + sd.args[0] = regs->bx; + sd.args[1] = regs->cx; + sd.args[2] = regs->dx; + sd.args[3] = regs->si; + sd.args[4] = regs->di; + sd.args[5] = regs->bp; + } + + return __secure_computing(&sd); +#else + return 0; +#endif +} +#define arch_syscall_enter_seccomp arch_syscall_enter_seccomp + +static inline void arch_syscall_enter_audit(struct pt_regs *regs) +{ +#ifdef CONFIG_X86_64 + if (in_ia32_syscall()) { + audit_syscall_entry(regs->orig_ax, regs->di, + regs->si, regs->dx, regs->r10); + } else +#endif + { + audit_syscall_entry(regs->orig_ax, regs->bx, + regs->cx, regs->dx, regs->si); + } +} +#define arch_syscall_enter_audit arch_syscall_enter_audit + +#endif --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -6,11 +6,14 @@ #include #ifndef __ASSEMBLY__ +#include #include #include -void idtentry_enter_user(struct pt_regs *regs); +/* Temporary define */ +#define idtentry_enter_user irqentry_enter_from_user_mode + void idtentry_exit_user(struct pt_regs *regs); typedef struct idtentry_state { --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -133,11 +133,6 @@ struct thread_info { #define _TIF_X32 (1 << TIF_X32) #define _TIF_FSCHECK (1 << TIF_FSCHECK) -/* Work to do before invoking the actual syscall. */ -#define _TIF_WORK_SYSCALL_ENTRY \ - (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \ - _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT) - /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW_BASE \ (_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP | \ From patchwork Thu Jul 16 18:22:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668359 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E1BDA60D for ; Thu, 16 Jul 2020 19:51:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4112207BC for ; Thu, 16 Jul 2020 19:51:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="3Lp1PVFk"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="KRWHo3Op" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729738AbgGPTvC (ORCPT ); Thu, 16 Jul 2020 15:51:02 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35348 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729703AbgGPTu5 (ORCPT ); Thu, 16 Jul 2020 15:50:57 -0400 Message-Id: <20200716185424.872183268@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=EN2od1ywZ6SkCpGoBdnMWUVsfu/lI73Grkm5kM9+ZkE=; b=3Lp1PVFk7fRQ+u4NYgwFL+tPCxSPobz+lGkoMFIMTqZgb0dy0vji4SgMA44ZtPUgmaxSoA HRDtmTg4h1i8u2vwDu43ReCoPyCN9W+OvCNF7psjodqph6+lvfYuRaC+GLzAYC1tgvrvLD O5VqpErxlCReJgpWQB7wXucFmH4Y8GHMCQM09sUOo0ESe4UGsSS8BL3D+K5dXcjziRiCfX O5+KBrcb0M/tpgm6Jet3Y2LaFAmf2Y1SR/nzjQJatlX0iZWnfjf3ZOWAAtj4o8petpCvFd nluSkGSUZEm7i4UO4G3YGX3UqQd4PdLhrelo/Ya+UAjUnaFqGKzj6KB/mCfsXA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=EN2od1ywZ6SkCpGoBdnMWUVsfu/lI73Grkm5kM9+ZkE=; b=KRWHo3Op/D4fm9NPtr5uMXed2e8f4vT7mC0lHIzXacY3btfXrj5VYSFLgzrEU89OmwRXuU qWTUIhCknqIB1oDw== Date: Thu, 16 Jul 2020 20:22:17 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 09/13] x86/entry: Use generic syscall exit functionality References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Replace the x86 variant with the generic version. Provide the relevant architecture specific helper functions and defines. Use a temporary define for idtentry_exit_user which will be cleaned up seperately. Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 221 ------------------------------------ arch/x86/entry/entry_32.S | 2 arch/x86/entry/entry_64.S | 2 arch/x86/include/asm/entry-common.h | 49 +++++++ arch/x86/include/asm/idtentry.h | 3 arch/x86/include/asm/signal.h | 1 arch/x86/kernel/signal.c | 2 7 files changed, 58 insertions(+), 222 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -15,15 +15,8 @@ #include #include #include -#include -#include -#include #include -#include -#include #include -#include -#include #include #include @@ -42,191 +35,6 @@ #include #include -#include - -/** - * exit_to_user_mode - Fixup state when exiting to user mode - * - * Syscall exit enables interrupts, but the kernel state is interrupts - * disabled when this is invoked. Also tell RCU about it. - * - * 1) Trace interrupts on state - * 2) Invoke context tracking if enabled to adjust RCU state - * 3) Clear CPU buffers if CPU is affected by MDS and the migitation is on. - * 4) Tell lockdep that interrupts are enabled - */ -static __always_inline void exit_to_user_mode(void) -{ - instrumentation_begin(); - trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(CALLER_ADDR0); - instrumentation_end(); - - user_enter_irqoff(); - mds_user_clear_cpu_buffers(); - lockdep_hardirqs_on(CALLER_ADDR0); -} - -#define EXIT_TO_USERMODE_LOOP_FLAGS \ - (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE | \ - _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING) - -static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags) -{ - /* - * In order to return to user mode, we need to have IRQs off with - * none of EXIT_TO_USERMODE_LOOP_FLAGS set. Several of these flags - * can be set at any time on preemptible kernels if we have IRQs on, - * so we need to loop. Disabling preemption wouldn't help: doing the - * work to clear some of the flags can sleep. - */ - while (true) { - /* We have work to do. */ - local_irq_enable(); - - if (cached_flags & _TIF_NEED_RESCHED) - schedule(); - - if (cached_flags & _TIF_UPROBE) - uprobe_notify_resume(regs); - - if (cached_flags & _TIF_PATCH_PENDING) - klp_update_patch_state(current); - - /* deal with pending signal delivery */ - if (cached_flags & _TIF_SIGPENDING) - do_signal(regs); - - if (cached_flags & _TIF_NOTIFY_RESUME) { - clear_thread_flag(TIF_NOTIFY_RESUME); - tracehook_notify_resume(regs); - rseq_handle_notify_resume(NULL, regs); - } - - if (cached_flags & _TIF_USER_RETURN_NOTIFY) - fire_user_return_notifiers(); - - /* Disable IRQs and retry */ - local_irq_disable(); - - cached_flags = READ_ONCE(current_thread_info()->flags); - - if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS)) - break; - } -} - -static void __prepare_exit_to_usermode(struct pt_regs *regs) -{ - struct thread_info *ti = current_thread_info(); - u32 cached_flags; - - addr_limit_user_check(); - - lockdep_assert_irqs_disabled(); - lockdep_sys_exit(); - - cached_flags = READ_ONCE(ti->flags); - - if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS)) - exit_to_usermode_loop(regs, cached_flags); - - /* Reload ti->flags; we may have rescheduled above. */ - cached_flags = READ_ONCE(ti->flags); - - if (unlikely(cached_flags & _TIF_IO_BITMAP)) - tss_update_io_bitmap(); - - fpregs_assert_state_consistent(); - if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD)) - switch_fpu_return(); - -#ifdef CONFIG_COMPAT - /* - * Compat syscalls set TS_COMPAT. Make sure we clear it before - * returning to user mode. We need to clear it *after* signal - * handling, because syscall restart has a fixup for compat - * syscalls. The fixup is exercised by the ptrace_syscall_32 - * selftest. - * - * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer - * special case only applies after poking regs and before the - * very next return to user mode. - */ - ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); -#endif -} - -static noinstr void prepare_exit_to_usermode(struct pt_regs *regs) -{ - instrumentation_begin(); - __prepare_exit_to_usermode(regs); - instrumentation_end(); - exit_to_user_mode(); -} - -#define SYSCALL_EXIT_WORK_FLAGS \ - (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ - _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT) - -static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags) -{ - bool step; - - audit_syscall_exit(regs); - - if (cached_flags & _TIF_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, regs->ax); - - /* - * If TIF_SYSCALL_EMU is set, we only get here because of - * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP). - * We already reported this syscall instruction in - * syscall_trace_enter(). - */ - step = unlikely( - (cached_flags & (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU)) - == _TIF_SINGLESTEP); - if (step || cached_flags & _TIF_SYSCALL_TRACE) - tracehook_report_syscall_exit(regs, step); -} - -static void __syscall_return_slowpath(struct pt_regs *regs) -{ - struct thread_info *ti = current_thread_info(); - u32 cached_flags = READ_ONCE(ti->flags); - - CT_WARN_ON(ct_state() != CONTEXT_KERNEL); - - if (IS_ENABLED(CONFIG_PROVE_LOCKING) && - WARN(irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax)) - local_irq_enable(); - - rseq_syscall(regs); - - /* - * First do one-time work. If these work items are enabled, we - * want to run them exactly once per syscall exit with IRQs on. - */ - if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS)) - syscall_slow_exit_work(regs, cached_flags); - - local_irq_disable(); - __prepare_exit_to_usermode(regs); -} - -/* - * Called with IRQs on and fully valid regs. Returns with IRQs off in a - * state such that we can immediately switch to user mode. - */ -__visible noinstr void syscall_return_slowpath(struct pt_regs *regs) -{ - instrumentation_begin(); - __syscall_return_slowpath(regs); - instrumentation_end(); - exit_to_user_mode(); -} - #ifdef CONFIG_X86_64 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) { @@ -245,7 +53,7 @@ static void __syscall_return_slowpath(st #endif } instrumentation_end(); - syscall_return_slowpath(regs); + syscall_exit_to_user_mode(regs); } #endif @@ -284,7 +92,7 @@ static __always_inline void do_syscall_3 unsigned int nr = syscall_32_enter(regs); do_syscall_32_irqs_on(regs, nr); - syscall_return_slowpath(regs); + syscall_exit_to_user_mode(regs); } static noinstr bool __do_fast_syscall_32(struct pt_regs *regs) @@ -310,13 +118,13 @@ static noinstr bool __do_fast_syscall_32 if (res) { /* User code screwed up. */ regs->ax = -EFAULT; - syscall_return_slowpath(regs); + syscall_exit_to_user_mode(regs); return false; } /* Now this is just like a normal syscall. */ do_syscall_32_irqs_on(regs, nr); - syscall_return_slowpath(regs); + syscall_exit_to_user_mode(regs); return true; } @@ -524,7 +332,7 @@ void noinstr idtentry_exit(struct pt_reg /* Check whether this returns to user mode */ if (user_mode(regs)) { - prepare_exit_to_usermode(regs); + irqentry_exit_to_user_mode(regs); } else if (regs->flags & X86_EFLAGS_IF) { /* * If RCU was not watching on entry this needs to be done @@ -555,25 +363,6 @@ void noinstr idtentry_exit(struct pt_reg } } -/** - * idtentry_exit_user - Handle return from exception to user mode - * @regs: Pointer to pt_regs (exception entry regs) - * - * Runs the necessary preemption and work checks and returns to the caller - * with interrupts disabled and no further work pending. - * - * This is the last action before returning to the low level ASM code which - * just needs to return to the appropriate context. - * - * Counterpart to idtentry_enter_user(). - */ -void noinstr idtentry_exit_user(struct pt_regs *regs) -{ - lockdep_assert_irqs_disabled(); - - prepare_exit_to_usermode(regs); -} - #ifdef CONFIG_XEN_PV #ifndef CONFIG_PREEMPTION /* --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -846,7 +846,7 @@ SYM_CODE_START(ret_from_fork) 2: /* When we fork, we trace the syscall return in the child, too. */ movl %esp, %eax - call syscall_return_slowpath + call syscall_exit_to_user_mode jmp .Lsyscall_32_done /* kernel thread */ --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -283,7 +283,7 @@ SYM_CODE_START(ret_from_fork) 2: UNWIND_HINT_REGS movq %rsp, %rdi - call syscall_return_slowpath /* returns with IRQs disabled */ + call syscall_exit_to_user_mode /* returns with IRQs disabled */ jmp swapgs_restore_regs_and_return_to_usermode 1: --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -2,9 +2,14 @@ #ifndef _ASM_X86_ENTRY_COMMON_H #define _ASM_X86_ENTRY_COMMON_H +#include +#include #include #include +#include +#include + /* Check that the stack and regs on entry from user mode are sane. */ static __always_inline void arch_check_user_regs(struct pt_regs *regs) { @@ -83,4 +88,48 @@ static inline void arch_syscall_enter_au } #define arch_syscall_enter_audit arch_syscall_enter_audit +#define ARCH_SYSCALL_EXIT_WORK (_TIF_SINGLESTEP) + +#define ARCH_EXIT_TO_USERMODE_WORK (_TIF_USER_RETURN_NOTIFY) + +#define ARCH_EXIT_TO_USER_FROM_SYSCALL_EXIT + +static inline void arch_exit_to_user_mode_work(struct pt_regs *regs, + unsigned long ti_work) +{ + if (ti_work & _TIF_USER_RETURN_NOTIFY) + fire_user_return_notifiers(); +} +#define arch_exit_to_user_mode_work arch_exit_to_user_mode_work + +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, + unsigned long ti_work) +{ + fpregs_assert_state_consistent(); + if (unlikely(ti_work & _TIF_NEED_FPU_LOAD)) + switch_fpu_return(); + +#ifdef CONFIG_COMPAT + /* + * Compat syscalls set TS_COMPAT. Make sure we clear it before + * returning to user mode. We need to clear it *after* signal + * handling, because syscall restart has a fixup for compat + * syscalls. The fixup is exercised by the ptrace_syscall_32 + * selftest. + * + * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer + * special case only applies after poking regs and before the + * very next return to user mode. + */ + current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED); +#endif +} +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare + +static __always_inline void arch_exit_to_user_mode(void) +{ + mds_user_clear_cpu_buffers(); +} +#define arch_exit_to_user_mode arch_exit_to_user_mode + #endif --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -13,8 +13,7 @@ /* Temporary define */ #define idtentry_enter_user irqentry_enter_from_user_mode - -void idtentry_exit_user(struct pt_regs *regs); +#define idtentry_exit_user irqentry_exit_to_user_mode typedef struct idtentry_state { bool exit_rcu; --- a/arch/x86/include/asm/signal.h +++ b/arch/x86/include/asm/signal.h @@ -35,7 +35,6 @@ typedef sigset_t compat_sigset_t; #endif /* __ASSEMBLY__ */ #include #ifndef __ASSEMBLY__ -extern void do_signal(struct pt_regs *regs); #define __ARCH_HAS_SA_RESTORER --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -803,7 +803,7 @@ static inline unsigned long get_nr_resta * want to handle. Thus you cannot kill init even with a SIGKILL even by * mistake. */ -void do_signal(struct pt_regs *regs) +void arch_do_signal(struct pt_regs *regs) { struct ksignal ksig; From patchwork Thu Jul 16 18:22:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A4AE60D for ; Thu, 16 Jul 2020 19:51:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7E3B2207E8 for ; Thu, 16 Jul 2020 19:51:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Kv1LVKj3"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="uFh4Wbtt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729812AbgGPTv0 (ORCPT ); Thu, 16 Jul 2020 15:51:26 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35274 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729718AbgGPTu4 (ORCPT ); Thu, 16 Jul 2020 15:50:56 -0400 Message-Id: <20200716185424.980336370@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=TW2YREFZRpmpV7lTAtkkgutVDumUXCXPLDpmUk0C+O8=; b=Kv1LVKj3M48EaAVRlz3/IW4tZuacIc+gqhsM0b7h4CDi70/Bme8m/APPR7VrQtAx1BFSKG aAexOvv/vawtRMevc/PkwmHR02f1nnyahLC1AvnrwiQ11a3I9ai+s5kym/E0hw+i6c4x+6 pv9w6GOG5EhVeKkx6BDZ8IZKdM9ZdCs+Pc6whWz3evgTboXJ2rd00L8QDV2V2lxxehmOTQ zbiw6xfMR2bJ+b7wjXLSiKu2/crkmSPdPVWawbTkDsUI7SXMGgOJ4fp4nZBQM/bfGlNDwD QNP0uUYH8Fi3saagAcTFgQsRns/fOyCM4wfP0iuNv8q1n8Xar9uG6C8NcirGkg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=TW2YREFZRpmpV7lTAtkkgutVDumUXCXPLDpmUk0C+O8=; b=uFh4Wbtt8O7CAnX4F4BeD2cBAoNzMxd7LJrrxfK7MbD4F4lj10vl62eg5JB4tByajyizPP iStBrfEZt+/lU2Cw== Date: Thu, 16 Jul 2020 20:22:18 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 10/13] x86/entry: Cleanup idtentry_entry/exit_user References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Cleanup the temporary defines and use irqentry_ instead of idtentry_. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/idtentry.h | 4 ---- arch/x86/kernel/cpu/mce/core.c | 4 ++-- arch/x86/kernel/traps.c | 18 +++++++++--------- 3 files changed, 11 insertions(+), 15 deletions(-) --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -11,10 +11,6 @@ #include -/* Temporary define */ -#define idtentry_enter_user irqentry_enter_from_user_mode -#define idtentry_exit_user irqentry_exit_to_user_mode - typedef struct idtentry_state { bool exit_rcu; } idtentry_state_t; --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1927,11 +1927,11 @@ static __always_inline void exc_machine_ static __always_inline void exc_machine_check_user(struct pt_regs *regs) { - idtentry_enter_user(regs); + irqentry_enter_from_user_mode(regs); instrumentation_begin(); machine_check_vector(regs); instrumentation_end(); - idtentry_exit_user(regs); + irqentry_exit_to_user_mode(regs); } #ifdef CONFIG_X86_64 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -638,18 +638,18 @@ DEFINE_IDTENTRY_RAW(exc_int3) return; /* - * idtentry_enter_user() uses static_branch_{,un}likely() and therefore - * can trigger INT3, hence poke_int3_handler() must be done - * before. If the entry came from kernel mode, then use nmi_enter() - * because the INT3 could have been hit in any context including - * NMI. + * irqentry_enter_from_user_mode() uses static_branch_{,un}likely() + * and therefore can trigger INT3, hence poke_int3_handler() must + * be done before. If the entry came from kernel mode, then use + * nmi_enter() because the INT3 could have been hit in any context + * including NMI. */ if (user_mode(regs)) { - idtentry_enter_user(regs); + irqentry_enter_from_user_mode(regs); instrumentation_begin(); do_int3_user(regs); instrumentation_end(); - idtentry_exit_user(regs); + irqentry_exit_to_user_mode(regs); } else { nmi_enter(); instrumentation_begin(); @@ -901,12 +901,12 @@ static __always_inline void exc_debug_us */ WARN_ON_ONCE(!user_mode(regs)); - idtentry_enter_user(regs); + irqentry_enter_from_user_mode(regs); instrumentation_begin(); handle_debug(regs, dr6, true); instrumentation_end(); - idtentry_exit_user(regs); + irqentry_exit_to_user_mode(regs); } #ifdef CONFIG_X86_64 From patchwork Thu Jul 16 18:22:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668361 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 337C160D for ; Thu, 16 Jul 2020 19:51:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 128BB207E8 for ; Thu, 16 Jul 2020 19:51:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Sd3RbAc7"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="TODPHiiN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729799AbgGPTvJ (ORCPT ); Thu, 16 Jul 2020 15:51:09 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:35376 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729736AbgGPTu6 (ORCPT ); Thu, 16 Jul 2020 15:50:58 -0400 Message-Id: <20200716185425.089131416@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=LW1oXcrimGaDAUblNGc1jRG45a34MbycV9GdwQtKRSY=; b=Sd3RbAc7WZjGDteOs0U8bH9OfhPulkjAArE6kOTeJxfgYWX8+u4jH0cTu9gq/l/5Zkkhx+ lUpblOYwQ9dzB3rfeqDFSV/f1Pfs19tudmKsNNBNB5153OlfKqkwMNp9E7DbZzi22e2KxN v3/uT+7zbyHQ3b7SBlsjlXY/i0djlzgr59txiYI1iNAt7z/iRe9nuPUFcaXMTL8A29wl9d 6bcGF1LPe/32AE14ij3m2z2bfm8PRV2UUXwxaZvgbqpRLgnYQn0XZ44up/QgaYIUlPV+gG Vj1oCMJhtOOaOAHfv9Wi1cky+72s9iuPEWrJLb3BuqxTzr82R/kOnhI0he/3WQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=LW1oXcrimGaDAUblNGc1jRG45a34MbycV9GdwQtKRSY=; b=TODPHiiNivvdpAZAoEXArBLkl2p4uiq2fO2DbP7XmYTy7mqSr4OmbYIZj+eNyus7osJUGM KTrsC6+uJVROI0Dg== Date: Thu, 16 Jul 2020 20:22:19 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 11/13] x86/entry: Use generic interrupt entry/exit code References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Replace the x86 code with the generic variant. Use temporary defines for idtentry_* which will be cleaned up in the next step. Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 167 ---------------------------------------- arch/x86/include/asm/idtentry.h | 10 -- 2 files changed, 5 insertions(+), 172 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -198,171 +198,6 @@ SYSCALL_DEFINE0(ni_syscall) return -ENOSYS; } -/** - * idtentry_enter - Handle state tracking on ordinary idtentries - * @regs: Pointer to pt_regs of interrupted context - * - * Invokes: - * - lockdep irqflag state tracking as low level ASM entry disabled - * interrupts. - * - * - Context tracking if the exception hit user mode. - * - * - The hardirq tracer to keep the state consistent as low level ASM - * entry disabled interrupts. - * - * As a precondition, this requires that the entry came from user mode, - * idle, or a kernel context in which RCU is watching. - * - * For kernel mode entries RCU handling is done conditional. If RCU is - * watching then the only RCU requirement is to check whether the tick has - * to be restarted. If RCU is not watching then rcu_irq_enter() has to be - * invoked on entry and rcu_irq_exit() on exit. - * - * Avoiding the rcu_irq_enter/exit() calls is an optimization but also - * solves the problem of kernel mode pagefaults which can schedule, which - * is not possible after invoking rcu_irq_enter() without undoing it. - * - * For user mode entries irqentry_enter_from_user_mode() must be invoked to - * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit - * would not be possible. - * - * Returns: An opaque object that must be passed to idtentry_exit() - * - * The return value must be fed into the state argument of - * idtentry_exit(). - */ -idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs) -{ - idtentry_state_t ret = { - .exit_rcu = false, - }; - - if (user_mode(regs)) { - irqentry_enter_from_user_mode(regs); - return ret; - } - - /* - * If this entry hit the idle task invoke rcu_irq_enter() whether - * RCU is watching or not. - * - * Interupts can nest when the first interrupt invokes softirq - * processing on return which enables interrupts. - * - * Scheduler ticks in the idle task can mark quiescent state and - * terminate a grace period, if and only if the timer interrupt is - * not nested into another interrupt. - * - * Checking for __rcu_is_watching() here would prevent the nesting - * interrupt to invoke rcu_irq_enter(). If that nested interrupt is - * the tick then rcu_flavor_sched_clock_irq() would wrongfully - * assume that it is the first interupt and eventually claim - * quiescient state and end grace periods prematurely. - * - * Unconditionally invoke rcu_irq_enter() so RCU state stays - * consistent. - * - * TINY_RCU does not support EQS, so let the compiler eliminate - * this part when enabled. - */ - if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { - /* - * If RCU is not watching then the same careful - * sequence vs. lockdep and tracing is required - * as in irqentry_enter_from_user_mode(). - */ - lockdep_hardirqs_off(CALLER_ADDR0); - rcu_irq_enter(); - instrumentation_begin(); - trace_hardirqs_off_finish(); - instrumentation_end(); - - ret.exit_rcu = true; - return ret; - } - - /* - * If RCU is watching then RCU only wants to check whether it needs - * to restart the tick in NOHZ mode. rcu_irq_enter_check_tick() - * already contains a warning when RCU is not watching, so no point - * in having another one here. - */ - instrumentation_begin(); - rcu_irq_enter_check_tick(); - /* Use the combo lockdep/tracing function */ - trace_hardirqs_off(); - instrumentation_end(); - - return ret; -} - -static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched) -{ - if (may_sched && !preempt_count()) { - /* Sanity check RCU and thread stack */ - rcu_irq_exit_check_preempt(); - if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) - WARN_ON_ONCE(!on_thread_stack()); - if (need_resched()) - preempt_schedule_irq(); - } - /* Covers both tracing and lockdep */ - trace_hardirqs_on(); -} - -/** - * idtentry_exit - Handle return from exception that used idtentry_enter() - * @regs: Pointer to pt_regs (exception entry regs) - * @state: Return value from matching call to idtentry_enter() - * - * Depending on the return target (kernel/user) this runs the necessary - * preemption and work checks if possible and reguired and returns to - * the caller with interrupts disabled and no further work pending. - * - * This is the last action before returning to the low level ASM code which - * just needs to return to the appropriate context. - * - * Counterpart to idtentry_enter(). The return value of the entry - * function must be fed into the @state argument. - */ -void noinstr idtentry_exit(struct pt_regs *regs, idtentry_state_t state) -{ - lockdep_assert_irqs_disabled(); - - /* Check whether this returns to user mode */ - if (user_mode(regs)) { - irqentry_exit_to_user_mode(regs); - } else if (regs->flags & X86_EFLAGS_IF) { - /* - * If RCU was not watching on entry this needs to be done - * carefully and needs the same ordering of lockdep/tracing - * and RCU as the return to user mode path. - */ - if (state.exit_rcu) { - instrumentation_begin(); - /* Tell the tracer that IRET will enable interrupts */ - trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(CALLER_ADDR0); - instrumentation_end(); - rcu_irq_exit(); - lockdep_hardirqs_on(CALLER_ADDR0); - return; - } - - instrumentation_begin(); - idtentry_exit_cond_resched(regs, IS_ENABLED(CONFIG_PREEMPTION)); - instrumentation_end(); - } else { - /* - * IRQ flags state is correct already. Just tell RCU if it - * was not watching on entry. - */ - if (state.exit_rcu) - rcu_irq_exit(); - } -} - #ifdef CONFIG_XEN_PV #ifndef CONFIG_PREEMPTION /* @@ -427,7 +262,7 @@ static void __xen_pv_evtchn_do_upcall(vo inhcall = get_and_clear_inhcall(); if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) { instrumentation_begin(); - idtentry_exit_cond_resched(regs, true); + irqentry_exit_cond_resched(); instrumentation_end(); restore_inhcall(inhcall); } else { --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -11,12 +11,10 @@ #include -typedef struct idtentry_state { - bool exit_rcu; -} idtentry_state_t; - -idtentry_state_t idtentry_enter(struct pt_regs *regs); -void idtentry_exit(struct pt_regs *regs, idtentry_state_t state); +/* Temporary defines */ +typedef irqentry_state_t idtentry_state_t; +#define idtentry_enter irqentry_enter +#define idtentry_exit irqentry_exit /** * DECLARE_IDTENTRY - Declare functions for simple IDT entry points From patchwork Thu Jul 16 18:22:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668363 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0752B60D for ; Thu, 16 Jul 2020 19:51:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DADD32083B for ; Thu, 16 Jul 2020 19:51:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qxwl67rG"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="e6UtrLMD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729784AbgGPTvH (ORCPT ); Thu, 16 Jul 2020 15:51:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729682AbgGPTu6 (ORCPT ); Thu, 16 Jul 2020 15:50:58 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6034BC08C5C0; Thu, 16 Jul 2020 12:50:58 -0700 (PDT) Message-Id: <20200716185425.198080509@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=+01HYjN82B7X1zjGIPOd0Udv26XprPXO1fOLOOWkUwo=; b=qxwl67rGF65/8LpO2yQMYsuxcjnpUIKwp5Z9V+k2X422hUmOfzwpjqMivirp2hxsxKINCn E6gL8K9v+nwdSgH1Jl1RRoLfAyH7qCkDXeFwe4zeBP4LrsTVhLYZaulH0QV/zNuqeMZS+J G1/JnCLaNZyEnR6XIggP8J/CS4GCk2aKudmNVUoXqTKGFMvFz9spgW2kN1ZVJuRSpkCdFW pdx9aHdCrHsTScyM+oKUBTsI4CsXoEdkB+HtizJRjCPgMcgnf7UUJR6emlX2XWr0MG16Jf IqTW5mJviAo9EC6IPBf6DyPdF6FmODq6D/RVdjs5pO/aNHzLMdM67rjMCa5k4Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=+01HYjN82B7X1zjGIPOd0Udv26XprPXO1fOLOOWkUwo=; b=e6UtrLMDT0xDjU4UI6NodhXXUo01vylAcJW7xLjbV5cUufGN6+kv8tRG1d3YmM+/9lndt9 EYKdiwj4qrRGsxBQ== Date: Thu, 16 Jul 2020 20:22:20 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 12/13] x86/entry: Cleanup idtentry_enter/exit References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Remove the temporary defines and fixup all references. Signed-off-by: Thomas Gleixner --- arch/x86/entry/common.c | 6 +++--- arch/x86/include/asm/idtentry.h | 33 ++++++++++++++------------------- arch/x86/kernel/kvm.c | 6 +++--- arch/x86/kernel/traps.c | 6 +++--- arch/x86/mm/fault.c | 6 +++--- 5 files changed, 26 insertions(+), 31 deletions(-) --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -248,9 +248,9 @@ static void __xen_pv_evtchn_do_upcall(vo { struct pt_regs *old_regs; bool inhcall; - idtentry_state_t state; + irqentry_state_t state; - state = idtentry_enter(regs); + state = irqentry_enter(regs); old_regs = set_irq_regs(regs); instrumentation_begin(); @@ -266,7 +266,7 @@ static void __xen_pv_evtchn_do_upcall(vo instrumentation_end(); restore_inhcall(inhcall); } else { - idtentry_exit(regs, state); + irqentry_exit(regs, state); } } #endif /* CONFIG_XEN_PV */ --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -11,11 +11,6 @@ #include -/* Temporary defines */ -typedef irqentry_state_t idtentry_state_t; -#define idtentry_enter irqentry_enter -#define idtentry_exit irqentry_exit - /** * DECLARE_IDTENTRY - Declare functions for simple IDT entry points * No error code pushed by hardware @@ -45,8 +40,8 @@ typedef irqentry_state_t idtentry_state_ * The macro is written so it acts as function definition. Append the * body with a pair of curly brackets. * - * idtentry_enter() contains common code which has to be invoked before - * arbitrary code in the body. idtentry_exit() contains common code + * irqentry_enter() contains common code which has to be invoked before + * arbitrary code in the body. irqentry_exit() contains common code * which has to run before returning to the low level assembly code. */ #define DEFINE_IDTENTRY(func) \ @@ -54,12 +49,12 @@ static __always_inline void __##func(str \ __visible noinstr void func(struct pt_regs *regs) \ { \ - idtentry_state_t state = idtentry_enter(regs); \ + irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ __##func (regs); \ instrumentation_end(); \ - idtentry_exit(regs, state); \ + irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs) @@ -101,12 +96,12 @@ static __always_inline void __##func(str __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ - idtentry_state_t state = idtentry_enter(regs); \ + irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ __##func (regs, error_code); \ instrumentation_end(); \ - idtentry_exit(regs, state); \ + irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, \ @@ -161,7 +156,7 @@ static __always_inline void __##func(str * body with a pair of curly brackets. * * Contrary to DEFINE_IDTENTRY_ERRORCODE() this does not invoke the - * idtentry_enter/exit() helpers before and after the body invocation. This + * irqentry_enter/exit() helpers before and after the body invocation. This * needs to be done in the body itself if applicable. Use if extra work * is required before the enter/exit() helpers are invoked. */ @@ -197,7 +192,7 @@ static __always_inline void __##func(str __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ - idtentry_state_t state = idtentry_enter(regs); \ + irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ irq_enter_rcu(); \ @@ -205,7 +200,7 @@ static __always_inline void __##func(str __##func (regs, (u8)error_code); \ irq_exit_rcu(); \ instrumentation_end(); \ - idtentry_exit(regs, state); \ + irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, u8 vector) @@ -229,7 +224,7 @@ static __always_inline void __##func(str * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points * @func: Function name of the entry point * - * idtentry_enter/exit() and irq_enter/exit_rcu() are invoked before the + * irqentry_enter/exit() and irq_enter/exit_rcu() are invoked before the * function body. KVM L1D flush request is set. * * Runs the function on the interrupt stack if the entry hit kernel mode @@ -239,7 +234,7 @@ static void __##func(struct pt_regs *reg \ __visible noinstr void func(struct pt_regs *regs) \ { \ - idtentry_state_t state = idtentry_enter(regs); \ + irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ irq_enter_rcu(); \ @@ -247,7 +242,7 @@ static void __##func(struct pt_regs *reg run_on_irqstack_cond(__##func, regs, regs); \ irq_exit_rcu(); \ instrumentation_end(); \ - idtentry_exit(regs, state); \ + irqentry_exit(regs, state); \ } \ \ static noinline void __##func(struct pt_regs *regs) @@ -268,7 +263,7 @@ static __always_inline void __##func(str \ __visible noinstr void func(struct pt_regs *regs) \ { \ - idtentry_state_t state = idtentry_enter(regs); \ + irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ __irq_enter_raw(); \ @@ -276,7 +271,7 @@ static __always_inline void __##func(str __##func (regs); \ __irq_exit_raw(); \ instrumentation_end(); \ - idtentry_exit(regs, state); \ + irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs) --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token) { u32 reason = kvm_read_and_reset_apf_flags(); - idtentry_state_t state; + irqentry_state_t state; switch (reason) { case KVM_PV_REASON_PAGE_NOT_PRESENT: @@ -243,7 +243,7 @@ noinstr bool __kvm_handle_async_pf(struc return false; } - state = idtentry_enter(regs); + state = irqentry_enter(regs); instrumentation_begin(); /* @@ -264,7 +264,7 @@ noinstr bool __kvm_handle_async_pf(struc } instrumentation_end(); - idtentry_exit(regs, state); + irqentry_exit(regs, state); return true; } --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -245,7 +245,7 @@ static noinstr bool handle_bug(struct pt DEFINE_IDTENTRY_RAW(exc_invalid_op) { - idtentry_state_t state; + irqentry_state_t state; /* * We use UD2 as a short encoding for 'CALL __WARN', as such @@ -255,11 +255,11 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op) if (!user_mode(regs) && handle_bug(regs)) return; - state = idtentry_enter(regs); + state = irqentry_enter(regs); instrumentation_begin(); handle_invalid_op(regs); instrumentation_end(); - idtentry_exit(regs, state); + irqentry_exit(regs, state); } DEFINE_IDTENTRY(exc_coproc_segment_overrun) --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1377,7 +1377,7 @@ handle_page_fault(struct pt_regs *regs, DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) { unsigned long address = read_cr2(); - idtentry_state_t state; + irqentry_state_t state; prefetchw(¤t->mm->mmap_lock); @@ -1412,11 +1412,11 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_f * code reenabled RCU to avoid subsequent wreckage which helps * debugability. */ - state = idtentry_enter(regs); + state = irqentry_enter(regs); instrumentation_begin(); handle_page_fault(regs, error_code, address); instrumentation_end(); - idtentry_exit(regs, state); + irqentry_exit(regs, state); } From patchwork Thu Jul 16 18:22:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 11668365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5533E60D for ; Thu, 16 Jul 2020 19:51:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38F2A2083B for ; Thu, 16 Jul 2020 19:51:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="D7FuUAYt"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="/vm5ygBP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729687AbgGPTvH (ORCPT ); Thu, 16 Jul 2020 15:51:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729727AbgGPTvA (ORCPT ); Thu, 16 Jul 2020 15:51:00 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFE37C08C5CE; Thu, 16 Jul 2020 12:50:59 -0700 (PDT) Message-Id: <20200716185425.307587523@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1594929058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=9xkAs7jG5EBlpBrFKUyEy4hqz+L6GrXswNue6258hSg=; b=D7FuUAYthunwvlhxyDHFNWxdUIk+TTqXfqddyjMDbxw+FGiYiviomWcOF9jyUZdTqCv1FH KNVuVw8VQXyoSW4+y3Y8SgzpeEM7se6uVodXFIJkKQXt7dym+na+BClavT4KmTd4ZsBENT 1CsN51U85hqJQ85ylh+y+BlMgjwwD57D3i9gjkQVCttMNWqgm+FL/EghUSXqbIXroSKT47 Z1Efk495z2Y56LloFcnWl8VOPL6O+qljnxzk9R62zoeQCCb6fALbUyQRB5eO4pBLVCr11x c2vfDn/IKyUWf2FBPv7m/I+10+NFyFr2c2hpQDZ28LK0Zi3q1Orlog9PB2GgXQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1594929058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references; bh=9xkAs7jG5EBlpBrFKUyEy4hqz+L6GrXswNue6258hSg=; b=/vm5ygBP046ZbGw4YdWZBN+fbSUqdCpS9v7s3dN4aTrESeC/JaK2E/UCk4ht37doKd7DBC SsjBKMh/ITvg4wDg== Date: Thu, 16 Jul 2020 20:22:21 +0200 From: Thomas Gleixner To: LKML Cc: x86@kernel.org, linux-arch@vger.kernel.org, Will Deacon , Arnd Bergmann , Mark Rutland , Kees Cook , Keno Fischer , Paolo Bonzini , kvm@vger.kernel.org Subject: [patch V3 13/13] x86/kvm: Use generic exit to guest work function References: <20200716182208.180916541@linutronix.de> MIME-Version: 1.0 Content-transfer-encoding: 8-bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use the generic infrastructure to check for and handle pending work before entering into guest mode. This now handles TIF_NOTIFY_RESUME as well which was ignored so far. Handling it is important as this covers task work and task work will be used to offload the heavy lifting of POSIX CPU timers to thread context. Signed-off-by: Thomas Gleixner Cc: Paolo Bonzini Cc: kvm@vger.kernel.org --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/vmx/vmx.c | 11 +++++------ arch/x86/kvm/x86.c | 15 ++++++--------- 3 files changed, 12 insertions(+), 15 deletions(-) --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -42,6 +42,7 @@ config KVM select HAVE_KVM_MSI select HAVE_KVM_CPU_RELAX_INTERCEPT select HAVE_KVM_NO_POLL + select KVM_EXIT_TO_GUEST_WORK select KVM_GENERIC_DIRTYLOG_READ_PROTECT select KVM_VFIO select SRCU --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include @@ -5376,14 +5377,12 @@ static int handle_invalid_guest_state(st } /* - * Note, return 1 and not 0, vcpu_run() is responsible for - * morphing the pending signal into the proper return code. + * Note, return 1 and not 0, vcpu_run() will invoke + * exit_to_guest_mode() which will create a proper return + * code. */ - if (signal_pending(current)) + if (__exit_to_guest_mode_work_pending()) return 1; - - if (need_resched()) - schedule(); } return 1; --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -56,6 +56,7 @@ #include #include #include +#include #include @@ -1585,7 +1586,7 @@ EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr); bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu) { return vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) || - need_resched() || signal_pending(current); + exit_to_guest_mode_work_pending(); } EXPORT_SYMBOL_GPL(kvm_vcpu_exit_request); @@ -8676,15 +8677,11 @@ static int vcpu_run(struct kvm_vcpu *vcp break; } - if (signal_pending(current)) { - r = -EINTR; - vcpu->run->exit_reason = KVM_EXIT_INTR; - ++vcpu->stat.signal_exits; - break; - } - if (need_resched()) { + if (exit_to_guest_mode_work_pending()) { srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); - cond_resched(); + r = exit_to_guest_mode(vcpu); + if (r) + return r; vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); } }