From patchwork Fri Feb 6 18:56:12 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 5794021 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 39B389F302 for ; Fri, 6 Feb 2015 18:56:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0D06020155 for ; Fri, 6 Feb 2015 18:56:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D3CD120148 for ; Fri, 6 Feb 2015 18:56:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752002AbbBFS4g (ORCPT ); Fri, 6 Feb 2015 13:56:36 -0500 Received: from mail-lb0-f175.google.com ([209.85.217.175]:65171 "EHLO mail-lb0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751147AbbBFS4e (ORCPT ); Fri, 6 Feb 2015 13:56:34 -0500 Received: by mail-lb0-f175.google.com with SMTP id n10so15406422lbv.6 for ; Fri, 06 Feb 2015 10:56:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=tpKmSalVYUCnQFQlPpA7Rnmx7FKU2nrI8bZx2rSikpw=; b=S5DSMdB2k/aaWIt4WNEpQRCYnsPpnylCZriBAXq3hizAkKHjqmJ5mEca9DPxe7saQg YRrolWvmAWRcPgMh+aBeUMroyQ5vjglirtZM20CSJdhCMupH421GP3cFCRL64djFLnOJ WJ6fgy+ueEf0I5i1VYsQ3BF/mxkQ2//hpQcIKWgX8EjpJoYZI33UHZo/5G3+NE4jPK88 OjHySsZTESjt/hQj5JDmDvv6efpMqAm/T4LhlW+MR5Zj9qqoRAmPvI8LLF1KWrQ/bgkQ 9rZmpPKDME43+FfoTjohADVM2zTxyJzmS0Mkh0/Vkt6DXWyZ288im6ewsVayGf+q4NMA zf0Q== X-Gm-Message-State: ALoCoQluytF/oyH0UDwsRdalw/xLBYJTTEtBkVCWCMHbixIYfY6YxlyFaJc5RWpKgCUTPbnHKCi5 X-Received: by 10.112.151.10 with SMTP id um10mr4330044lbb.87.1423248992351; Fri, 06 Feb 2015 10:56:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.152.205.98 with HTTP; Fri, 6 Feb 2015 10:56:12 -0800 (PST) In-Reply-To: References: <71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net> From: Andy Lutomirski Date: Fri, 6 Feb 2015 10:56:12 -0800 Message-ID: Subject: Fwd: [tip:x86/asm] x86_64, entry: Use sysret to return to userspace when possible To: kvm list Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In case you're interested, this change (queued for 3.20) should cut a couple hundred cycles off of kvm heavyweight exits. --Andy ---------- Forwarded message ---------- From: tip-bot for Andy Lutomirski Date: Tue, Feb 3, 2015 at 10:01 PM Subject: [tip:x86/asm] x86_64, entry: Use sysret to return to userspace when possible To: linux-tip-commits@vger.kernel.org Cc: tglx@linutronix.de, luto@amacapital.net, linux-kernel@vger.kernel.org, mingo@kernel.org, hpa@zytor.com Commit-ID: 2a23c6b8a9c42620182a2d2cfc7c16f6ff8c42b4 Gitweb: http://git.kernel.org/tip/2a23c6b8a9c42620182a2d2cfc7c16f6ff8c42b4 Author: Andy Lutomirski AuthorDate: Tue, 22 Jul 2014 12:46:50 -0700 Committer: Andy Lutomirski CommitDate: Sun, 1 Feb 2015 04:03:01 -0800 x86_64, entry: Use sysret to return to userspace when possible The x86_64 entry code currently jumps through complex and inconsistent hoops to try to minimize the impact of syscall exit work. For a true fast-path syscall, almost nothing needs to be done, so returning is just a check for exit work and sysret. For a full slow-path return from a syscall, the C exit hook is invoked if needed and we join the iret path. Using iret to return to userspace is very slow, so the entry code has accumulated various special cases to try to do certain forms of exit work without invoking iret. This is error-prone, since it duplicates assembly code paths, and it's dangerous, since sysret can malfunction in interesting ways if used carelessly. It's also inefficient, since a lot of useful cases aren't optimized and therefore force an iret out of a combination of paranoia and the fact that no one has bothered to write even more asm code to avoid it. I would argue that this approach is backwards. Rather than trying to avoid the iret path, we should instead try to make the iret path fast. Under a specific set of conditions, iret is unnecessary. In particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not set, and both SS and CS are as expected, then movq 32(%rsp),%rsp;sysret does the same thing as iret. This set of conditions is nearly always satisfied on return from syscalls, and it can even occasionally be satisfied on return from an irq. Even with the careful checks for sysret applicability, this cuts nearly 80ns off of the overhead from syscalls with unoptimized exit work. This includes tracing and context tracking, and any return that invokes KVM's user return notifier. For example, the cost of getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to ~280ns on my computer. This may allow the removal and even eventual conversion to C of a respectable amount of exit asm. This may require further tweaking to give the full benefit on Xen. It may be worthwhile to adjust signal delivery and exec to try hit the sysret path. This does not optimize returns to 32-bit userspace. Making the same optimization for CS == __USER32_CS is conceptually straightforward, but it will require some tedious code to handle the differences between sysretl and sysexitl. Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net Signed-off-by: Andy Lutomirski --- arch/x86/kernel/entry_64.S | 54 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 501212f..eeab4cf 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -794,6 +794,60 @@ retint_swapgs: /* return to user-space */ */ DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_IRETQ + + /* + * Try to use SYSRET instead of IRET if we're returning to + * a completely clean 64-bit userspace context. + */ + movq (RCX-R11)(%rsp), %rcx + cmpq %rcx,(RIP-R11)(%rsp) /* RCX == RIP */ + jne opportunistic_sysret_failed + + /* + * On Intel CPUs, sysret with non-canonical RCX/RIP will #GP + * in kernel space. This essentially lets the user take over + * the kernel, since userspace controls RSP. It's not worth + * testing for canonicalness exactly -- this check detects any + * of the 17 high bits set, which is true for non-canonical + * or kernel addresses. (This will pessimize vsyscall=native. + * Big deal.) + * + * If virtual addresses ever become wider, this will need + * to be updated to remain correct on both old and new CPUs. + */ + .ifne __VIRTUAL_MASK_SHIFT - 47 + .error "virtual address width changed -- sysret checks need update" + .endif + shr $__VIRTUAL_MASK_SHIFT, %rcx + jnz opportunistic_sysret_failed + + cmpq $__USER_CS,(CS-R11)(%rsp) /* CS must match SYSRET */ + jne opportunistic_sysret_failed + + movq (R11-ARGOFFSET)(%rsp), %r11 + cmpq %r11,(EFLAGS-ARGOFFSET)(%rsp) /* R11 == RFLAGS */ + jne opportunistic_sysret_failed + + testq $X86_EFLAGS_RF,%r11 /* sysret can't restore RF */ + jnz opportunistic_sysret_failed + + /* nothing to check for RSP */ + + cmpq $__USER_DS,(SS-ARGOFFSET)(%rsp) /* SS must match SYSRET */ + jne opportunistic_sysret_failed + + /* + * We win! This label is here just for ease of understanding + * perf profiles. Nothing jumps here. + */ +irq_return_via_sysret: + CFI_REMEMBER_STATE + RESTORE_ARGS 1,8,1 + movq (RSP-RIP)(%rsp),%rsp + USERGS_SYSRET64 + CFI_RESTORE_STATE + +opportunistic_sysret_failed: SWAPGS jmp restore_args