From patchwork Fri Dec 9 18:24:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 9468887 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9D6B8607D8 for ; Fri, 9 Dec 2016 18:26:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9344F2864C for ; Fri, 9 Dec 2016 18:26:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8806F2867F; Fri, 9 Dec 2016 18:26:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 263C528681 for ; Fri, 9 Dec 2016 18:26:31 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cFPqM-00041n-5Z; Fri, 09 Dec 2016 18:24:26 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cFPqL-00040X-5A for Xen-devel@lists.xen.org; Fri, 09 Dec 2016 18:24:25 +0000 Received: from [85.158.143.35] by server-3.bemta-6.messagelabs.com id 6B/94-08948-8D6FA485; Fri, 09 Dec 2016 18:24:24 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprLIsWRWlGSWpSXmKPExsVybKJsh+71b14 RBk2t+hZLPi5mcWD0OLr7N1MAYxRrZl5SfkUCa0bzBt+CtbIVC1rmsDYwrhLvYuTiEBKYyijx vGU6M4Qzi0liTcsLti5GDg42AXWJlk5fEFNEQEhi6d06kBJmgSUsEpvezgQq4eQQFvCQOHdvO zuIzSKgKnH24gFWEJtXIFRictcUFhBbQkBO4tK2L8wgNqeAscSjn/PBbCEBI4nVS2biFZcQyJ CY1zOHFcL2klh04xKUrSZx9dwm5gmMAgsYGVYxahSnFpWlFukaWuglFWWmZ5TkJmbm6BoamOn lphYXJ6an5iQmFesl5+duYgQGFQMQ7GC8uTHgEKMkB5OSKG8xk1eEEF9SfkplRmJxRnxRaU5q 8SFGGQ4OJQleD2CQCgkWpaanVqRl5gDDGyYtwcGjJML76StQmre4IDG3ODMdInWKUZdjwZyVT 5mEWPLy81KlxHmNQGYIgBRllObBjYDF2iVGWSlhXkago4R4ClKLcjNLUOVfMYpzMCoJ82aBTO HJzCuB2/QK6AgmoCPm3XAHOaIkESEl1cBok/vuBfek16s+HGMVmWEVq3hwua74W9vL1svO6J3 RmL+j4uz2e0/5Bd/t3d11sujRLonGNTyFzxddkSgIW51vMV9xZ6lX9XxJ13Wq6rrTvMtmf+Oq 35BdG672x4lpzscLsx/OE5uUJHxvp045884AS7PXyX8v5uQxiuZkbTxqUL3TZsLOHW8blViKM xINtZiLihMBt7t21rACAAA= X-Env-Sender: luto@kernel.org X-Msg-Ref: server-9.tower-21.messagelabs.com!1481307862!47421398!1 X-Originating-IP: [198.145.29.136] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.1.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 48203 invoked from network); 9 Dec 2016 18:24:23 -0000 Received: from mail.kernel.org (HELO mail.kernel.org) (198.145.29.136) by server-9.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 9 Dec 2016 18:24:23 -0000 Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 0C6B120380; Fri, 9 Dec 2016 18:24:21 +0000 (UTC) Received: from localhost (c-71-202-137-17.hsd1.ca.comcast.net [71.202.137.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9028E203E9; Fri, 9 Dec 2016 18:24:19 +0000 (UTC) From: Andy Lutomirski To: x86@kernel.org Date: Fri, 9 Dec 2016 10:24:08 -0800 Message-Id: <5c79f0225f68bc8c40335612bf624511abb78941.1481307769.git.luto@kernel.org> X-Mailer: git-send-email 2.9.3 In-Reply-To: References: In-Reply-To: References: X-Virus-Scanned: ClamAV using ClamSMTP Cc: Juergen Gross , One Thousand Gnomes , Andy Lutomirski , Peter Zijlstra , Brian Gerst , "H. Peter Anvin" , "linux-kernel@vger.kernel.org" , Matthew Whitehead , Borislav Petkov , Henrique de Moraes Holschuh , Andrew Cooper , Boris Ostrovsky , xen-devel Subject: [Xen-devel] [PATCH v4 4/4] x86/asm: Rewrite sync_core() to use IRET-to-self X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Aside from being excessively slow, CPUID is problematic: Linux runs on a handful of CPUs that don't have CPUID. Use IRET-to-self instead. IRET-to-self works everywhere, so it makes testing easy. For reference, On my laptop, IRET-to-self is ~110ns, CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM, and MOV-to-CR2 is ~42ns. While we're at it: sync_core() serves a very specific purpose. Document it. Cc: "H. Peter Anvin" Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/processor.h | 80 +++++++++++++++++++++++++++++----------- 1 file changed, 58 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 64fbc937d586..ceb1f4d3f3fa 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -590,33 +590,69 @@ static __always_inline void cpu_relax(void) #define cpu_relax_lowlatency() cpu_relax() -/* Stop speculative execution and prefetching of modified code. */ +/* + * This function forces the icache and prefetched instruction stream to + * catch up with reality in two very specific cases: + * + * a) Text was modified using one virtual address and is about to be executed + * from the same physical page at a different virtual address. + * + * b) Text was modified on a different CPU, may subsequently be + * executed on this CPU, and you want to make sure the new version + * gets executed. This generally means you're calling this in a IPI. + * + * If you're calling this for a different reason, you're probably doing + * it wrong. + */ static inline void sync_core(void) { - int tmp; - -#ifdef CONFIG_X86_32 /* - * Do a CPUID if available, otherwise do a jump. The jump - * can conveniently enough be the jump around CPUID. + * There are quite a few ways to do this. IRET-to-self is nice + * because it works on every CPU, at any CPL (so it's compatible + * with paravirtualization), and it never exits to a hypervisor. + * The only down sides are that it's a bit slow (it seems to be + * a bit more than 2x slower than the fastest options) and that + * it unmasks NMIs. The "push %cs" is needed because, in + * paravirtual environments, __KERNEL_CS may not be a valid CS + * value when we do IRET directly. + * + * In case NMI unmasking or performance ever becomes a problem, + * the next best option appears to be MOV-to-CR2 and an + * unconditional jump. That sequence also works on all CPUs, + * but it will fault at CPL3 (i.e. Xen PV and lguest). + * + * CPUID is the conventional way, but it's nasty: it doesn't + * exist on some 486-like CPUs, and it usually exits to a + * hypervisor. + * + * Like all of Linux's memory ordering operations, this is a + * compiler barrier as well. */ - asm volatile("cmpl %2,%1\n\t" - "jl 1f\n\t" - "cpuid\n" - "1:" - : "=a" (tmp) - : "rm" (boot_cpu_data.cpuid_level), "ri" (0), "0" (1) - : "ebx", "ecx", "edx", "memory"); + register void *__sp asm(_ASM_SP); + +#ifdef CONFIG_X86_32 + asm volatile ( + "pushfl\n\t" + "pushl %%cs\n\t" + "pushl $1f\n\t" + "iret\n\t" + "1:" + : "+r" (__sp) : : "memory"); #else - /* - * CPUID is a barrier to speculative execution. - * Prefetched instructions are automatically - * invalidated when modified. - */ - asm volatile("cpuid" - : "=a" (tmp) - : "0" (1) - : "ebx", "ecx", "edx", "memory"); + unsigned int tmp; + + asm volatile ( + "mov %%ss, %0\n\t" + "pushq %q0\n\t" + "pushq %%rsp\n\t" + "addq $8, (%%rsp)\n\t" + "pushfq\n\t" + "mov %%cs, %0\n\t" + "pushq %q0\n\t" + "pushq $1f\n\t" + "iretq\n\t" + "1:" + : "=&r" (tmp), "+r" (__sp) : : "cc", "memory"); #endif }