From patchwork Sun Jun 13 15:03:47 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avi Kivity X-Patchwork-Id: 105829 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.3/8.14.3) with ESMTP id o5DF5Vcd002806 for ; Sun, 13 Jun 2010 15:05:32 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754781Ab0FMPEX (ORCPT ); Sun, 13 Jun 2010 11:04:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:27063 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754778Ab0FMPES (ORCPT ); Sun, 13 Jun 2010 11:04:18 -0400 Received: from int-mx05.intmail.prod.int.phx2.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.18]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o5DF3nhi009844 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 13 Jun 2010 11:03:49 -0400 Received: from cleopatra.tlv.redhat.com (cleopatra.tlv.redhat.com [10.35.255.11]) by int-mx05.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o5DF3mxS019232; Sun, 13 Jun 2010 11:03:48 -0400 Received: from file.tlv.redhat.com (file.tlv.redhat.com [10.35.255.8]) by cleopatra.tlv.redhat.com (Postfix) with ESMTP id CA481250ADB; Sun, 13 Jun 2010 18:03:47 +0300 (IDT) From: Avi Kivity To: Ingo Molnar , "H. Peter Anvin" Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 4/4] x86, fpu: don't save fpu state when switching from a task Date: Sun, 13 Jun 2010 18:03:47 +0300 Message-Id: <1276441427-31514-5-git-send-email-avi@redhat.com> In-Reply-To: <1276441427-31514-1-git-send-email-avi@redhat.com> References: <1276441427-31514-1-git-send-email-avi@redhat.com> X-Scanned-By: MIMEDefang 2.67 on 10.5.11.18 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Sun, 13 Jun 2010 15:05:32 +0000 (UTC) diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 8d12878..4cb5bc4 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -302,10 +302,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) * If the task has used fpu the last 5 timeslices, just do a full * restore of the math state immediately to avoid the trap; the * chances of needing FPU soon are obviously high now + * + * If the fpu is remote, we can't preload it since that requires an + * IPI. Let a math execption move it locally. */ - preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5; - - __unlazy_fpu(prev_p); + preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5 + && !fpu_remote(&next->fpu); /* we're going to use this soon, after a few expensive things */ if (preload_fpu) @@ -351,8 +353,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* If we're going to preload the fpu context, make sure clts is run while we're batching the cpu state updates. */ - if (preload_fpu) + if (preload_fpu || fpu_loaded(&next->fpu)) clts(); + else + stts(); /* * Leave lazy mode, flushing any hypercalls made here. diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 3c2422a..65d2130 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -383,8 +383,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) * If the task has used fpu the last 5 timeslices, just do a full * restore of the math state immediately to avoid the trap; the * chances of needing FPU soon are obviously high now + * + * If the fpu is remote, we can't preload it since that requires an + * IPI. Let a math execption move it locally. */ - preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5; + preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5 + && !fpu_remote(&next->fpu); /* we're going to use this soon, after a few expensive things */ if (preload_fpu) @@ -418,12 +422,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) load_TLS(next, cpu); - /* Must be after DS reload */ - unlazy_fpu(prev_p); - /* Make sure cpu is ready for new context */ - if (preload_fpu) + if (preload_fpu || fpu_loaded(&next->fpu)) clts(); + else + stts(); /* * Leave lazy mode, flushing any hypercalls made here.