From patchwork Thu May 28 09:56:32 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andre Przywara X-Patchwork-Id: 26689 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n4S9tnvW031211 for ; Thu, 28 May 2009 09:55:51 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757356AbZE1Jy5 (ORCPT ); Thu, 28 May 2009 05:54:57 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752326AbZE1Jy5 (ORCPT ); Thu, 28 May 2009 05:54:57 -0400 Received: from outbound-dub.frontbridge.com ([213.199.154.16]:19305 "EHLO IE1EHSOBE002.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755153AbZE1Jyu (ORCPT ); Thu, 28 May 2009 05:54:50 -0400 Received: from mail63-dub-R.bigfish.com (10.5.252.3) by IE1EHSOBE002.bigfish.com (10.5.252.22) with Microsoft SMTP Server id 8.1.340.0; Thu, 28 May 2009 09:54:50 +0000 Received: from mail63-dub (localhost.localdomain [127.0.0.1]) by mail63-dub-R.bigfish.com (Postfix) with ESMTP id 73CBED081AC; Thu, 28 May 2009 09:54:50 +0000 (UTC) X-BigFish: VPS6(zz853kzz1202hzzz32i43j66h) X-Spam-TCS-SCL: 5:0 Received: by mail63-dub (MessageSwitch) id 1243504487165625_5454; Thu, 28 May 2009 09:54:47 +0000 (UCT) Received: from ausb3extmailp01.amd.com (unknown [163.181.251.8]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail63-dub.bigfish.com (Postfix) with ESMTP id 751848D0052; Thu, 28 May 2009 09:54:46 +0000 (UTC) Received: from ausb3twp02.amd.com ([163.181.250.38]) by ausb3extmailp01.amd.com (Switch-3.2.7/Switch-3.2.7) with ESMTP id n4S9se13028551; Thu, 28 May 2009 04:54:43 -0500 X-WSS-ID: 0KKCM6Z-02-MMG-01 Received: from sausexbh2.amd.com (SAUSEXBH2.amd.com [163.181.22.102]) by ausb3twp02.amd.com (Tumbleweed MailGate 3.5.1) with ESMTP id 2C71E123400A; Thu, 28 May 2009 04:54:34 -0500 (CDT) Received: from sausexmb5.amd.com ([163.181.49.129]) by sausexbh2.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 28 May 2009 04:54:40 -0500 Received: from SDRSEXMB1.amd.com ([172.20.3.116]) by sausexmb5.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 28 May 2009 04:54:39 -0500 Received: from localhost.localdomain ([165.204.15.42]) by SDRSEXMB1.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 28 May 2009 11:54:00 +0200 From: Andre Przywara To: avi@redhat.com CC: kvm@vger.kernel.org, Andre Przywara , Amit Shah , Christoph Egger Subject: [PATCH 2/2] add sysenter/syscall emulation for 32bit compat mode Date: Thu, 28 May 2009 11:56:32 +0200 Message-ID: <1243504592-5112-2-git-send-email-andre.przywara@amd.com> X-Mailer: git-send-email 1.6.1.3 In-Reply-To: <1243504592-5112-1-git-send-email-andre.przywara@amd.com> References: <1243504592-5112-1-git-send-email-andre.przywara@amd.com> X-OriginalArrivalTime: 28 May 2009 09:54:00.0391 (UTC) FILETIME=[39B49570:01C9DF7A] MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org sysenter/sysexit are not supported on AMD's 32bit compat mode, whereas syscall is not supported on Intel's 32bit compat mode. To allow cross vendor migration we emulate the missing instructions by setting up the processor state according to the other call. The sysenter code was originally sketched by Amit Shah, it was completed, debugged, syscall added and made-to-work by Christoph Egger and polished up by Andre Przywara. Please note that sysret does not need to be emulated, because it will be exectued in 64bit mode and returning to 32bit compat mode works on Intel. Signed-off-by: Amit Shah Signed-off-by: Christoph Egger Signed-off-by: Andre Przywara --- arch/x86/kvm/x86.c | 37 ++++- arch/x86/kvm/x86_emulate.c | 349 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 380 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6d44dd5..dae7726 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2593,11 +2593,38 @@ int emulate_instruction(struct kvm_vcpu *vcpu, /* Reject the instructions other than VMCALL/VMMCALL when * try to emulate invalid opcode */ c = &vcpu->arch.emulate_ctxt.decode; - if ((emulation_type & EMULTYPE_TRAP_UD) && - (!(c->twobyte && c->b == 0x01 && - (c->modrm_reg == 0 || c->modrm_reg == 3) && - c->modrm_mod == 3 && c->modrm_rm == 1))) - return EMULATE_FAIL; + + if (emulation_type & EMULTYPE_TRAP_UD) { + if (!c->twobyte) + return EMULATE_FAIL; + switch (c->b) { + case 0x01: /* VMMCALL */ + if (c->modrm_mod != 3) + return EMULATE_FAIL; + if (c->modrm_rm != 1) + return EMULATE_FAIL; + break; + case 0x34: /* sysenter */ + case 0x35: /* sysexit */ + if (c->modrm_mod != 0) + return EMULATE_FAIL; + if (c->modrm_rm != 0) + return EMULATE_FAIL; + break; + case 0x05: /* syscall */ + r = 0; + if (c->modrm_mod != 0) + return EMULATE_FAIL; + if (c->modrm_rm != 0) + return EMULATE_FAIL; + break; + default: + return EMULATE_FAIL; + } + + if (!(c->modrm_reg == 0 || c->modrm_reg == 3)) + return EMULATE_FAIL; + } ++vcpu->stat.insn_emulation; if (r) { diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index 22c765d..41b78fa 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -32,6 +32,8 @@ #include #include +#include "mmu.h" + /* * Opcode effective-address decode tables. * Note that we only emulate instructions that have at least one memory @@ -217,7 +219,9 @@ static u32 twobyte_table[256] = { ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x30 - 0x3F */ - ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, + ImplicitOps, 0, ImplicitOps, 0, + ImplicitOps, ImplicitOps, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, /* 0x40 - 0x47 */ DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov, @@ -320,8 +324,11 @@ static u32 group2_table[] = { }; /* EFLAGS bit definitions. */ +#define EFLG_VM (1<<17) +#define EFLG_RF (1<<16) #define EFLG_OF (1<<11) #define EFLG_DF (1<<10) +#define EFLG_IF (1<<9) #define EFLG_SF (1<<7) #define EFLG_ZF (1<<6) #define EFLG_AF (1<<4) @@ -1985,10 +1992,114 @@ twobyte_insn: goto cannot_emulate; } break; + case 0x05: { /* syscall */ + unsigned long cr0 = ctxt->vcpu->arch.cr0; + struct kvm_segment cs, ss; + + memset(&cs, 0, sizeof(struct kvm_segment)); + memset(&ss, 0, sizeof(struct kvm_segment)); + + /* inject #UD if + * 1. we are in real mode + * 2. protected mode is not enabled + * 3. LOCK prefix is used + */ + if ((ctxt->mode == X86EMUL_MODE_REAL) + || (!(cr0 & X86_CR0_PE)) + || (c->lock_prefix)) { + /* we don't need to inject #UD here, because + * when emulate_instruction() returns something else + * than EMULATE_DONE, then svm.c:ud_interception() + * will do that for us. + */ + goto cannot_emulate; + } + + /* inject #UD if syscall/sysret are disabled. */ + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_K6_EFER, &msr_data); + if ((msr_data & EFER_SCE) == 0) { + /* we don't need to inject #UD here, because + * when emulate_instruction() returns something else + * than EMULATE_DONE, then svm.c:ud_interception() + * will do that for us. + */ + goto cannot_emulate; + } + + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); + msr_data >>= 32; + + cs.selector = (u16)(msr_data & 0xfffc); + cs.l = 0; /* will be adjusted later */ + cs.base = 0; /* flat segment */ + cs.g = 1; /* 4kb granularity */ + cs.limit = 0xfffff; /* 4GB limit */ + cs.type = 0x0b; /* Read, Execute, Accessed */ + cs.dpl = 0; /* will be adjusted later */ + cs.present = 1; + cs.s = 1; + cs.db = 1; + + ss.unusable = 0; + ss.selector = (u16)(msr_data + 8); + ss.base = 0; + ss.type = 0x03; /* Read/Write, Expand-Up, Accessed */ + ss.present = 1; + ss.s = 1; + ss.db = 1; + + if (is_long_mode(ctxt->vcpu)) { + + cs.db = 0; + cs.l = 1; /* long mode */ + + c->regs[VCPU_REGS_RCX] = c->eip; + c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF; + + switch (ctxt->mode) { + case X86EMUL_MODE_PROT64: + /* Intel cares about granularity (g bit), + * so we don't set the effective limit. + */ + cs.g = 1; + cs.limit = 0xffffffff; + + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_LSTAR, &msr_data); + break; + case X86EMUL_MODE_PROT32: + /* compat mode */ + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_CSTAR, &msr_data); + break; + } + + c->eip = msr_data; + + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_SYSCALL_MASK, &msr_data); + ctxt->eflags &= ~(msr_data | EFLG_RF); + } else { + /* legacy mode */ + + kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); + c->regs[VCPU_REGS_RCX] = c->eip; + c->eip = (u32)msr_data; + + ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF); + } + + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS); + + goto writeback; + break; + } case 0x06: emulate_clts(ctxt->vcpu); c->dst.type = OP_NONE; break; + case 0x07: /* sysret */ case 0x08: /* invd */ case 0x09: /* wbinvd */ case 0x0d: /* GrpP (prefetch) */ @@ -2051,6 +2162,242 @@ twobyte_insn: rc = X86EMUL_CONTINUE; c->dst.type = OP_NONE; break; + case 0x34: { /* sysenter */ + /* Intel manual Vol 2b */ + unsigned long cr0 = ctxt->vcpu->arch.cr0; + struct kvm_segment cs, ss; + + memset(&cs, 0, sizeof(struct kvm_segment)); + memset(&ss, 0, sizeof(struct kvm_segment)); + + /* XXX sysenter/sysexit have not been tested in 64bit mode. + * Therefore, we inject an #UD. + */ + if (ctxt->mode == X86EMUL_MODE_PROT64) { + /* we don't need to inject #UD here, because + * when emulate_instruction() returns something else + * than EMULATE_DONE, then svm.c:ud_interception() + * will do that for us. + */ + goto cannot_emulate; + } + + if ((ctxt->mode == X86EMUL_MODE_REAL) || + (!(cr0 & X86_CR0_PE))) { + kvm_inject_gp(ctxt->vcpu, 0); + goto cannot_emulate; + } + + /* inject #UD if LOCK prefix is used */ + if (c->lock_prefix) { + /* we don't need to inject #UD here, because + * when emulate_instruction() returns something else + * than EMULATE_DONE, then svm.c:ud_interception() + * will do that for us. + */ + goto cannot_emulate; + } + + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_IA32_SYSENTER_CS, &msr_data); + switch (ctxt->mode) { + case X86EMUL_MODE_PROT32: + if ((msr_data & 0xfffc) != 0x0) + break; + kvm_inject_gp(ctxt->vcpu, 0); + goto cannot_emulate; + case X86EMUL_MODE_PROT64: + if (msr_data != 0x0) + break; + kvm_inject_gp(ctxt->vcpu, 0); + goto cannot_emulate; + } + + ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF); + + kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); + + cs.selector = (u16)msr_data; + cs.base = 0; /* flat segment */ + cs.limit = 0xfffff; /* 4GB limit */ + cs.g = 1; /* 4kb granularity */ + cs.s = 1; + cs.type = 0x0b; /* Execute + Read, Accessed */ + cs.db = 1; /* 32bit code segment */ + cs.dpl = 0; + cs.selector &= ~SELECTOR_RPL_MASK; + cs.present = 1; + + /* No need to set cpl explicitely here. set_segment() + * does this below based on the cs.dpl value. + */ + + ss.unusable = 0; + ss.selector = cs.selector + 8; + ss.base = 0; /* flat segment */ + ss.limit = 0xfffff; /* 4GB limit */ + ss.g = 1; /* 4kb granularity */ + ss.s = 1; + ss.type = 0x03; /* Read/Write, Accessed */ + ss.db = 1; /* 32bit stack segment */ + ss.dpl = 0; + ss.selector &= ~SELECTOR_RPL_MASK; + ss.present = 1; + + switch (ctxt->mode) { + case X86EMUL_MODE_PROT32: + if (!is_long_mode(ctxt->vcpu)) + break; + /* fallthrough */ + case X86EMUL_MODE_PROT64: + cs.base = 0; + cs.db = 0; + cs.l = 1; + cs.limit = 0xffffffff; + ss.base = 0; + ss.limit = 0xffffffff; + break; + default: + break; + } + + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS); + + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_IA32_SYSENTER_EIP, &msr_data); + c->eip = msr_data; + + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_IA32_SYSENTER_ESP, &msr_data); + c->regs[VCPU_REGS_RSP] = msr_data; + + goto writeback; + break; + } + case 0x35: { /* sysexit */ + /* Intel manual Vol 2b */ + unsigned long cr0 = ctxt->vcpu->arch.cr0; + struct kvm_segment cs, ss; + int usermode; + + memset(&cs, 0, sizeof(struct kvm_segment)); + memset(&ss, 0, sizeof(struct kvm_segment)); + + if ((ctxt->mode == X86EMUL_MODE_REAL) + || (!(cr0 & X86_CR0_PE)) + || (kvm_x86_ops->get_cpl(ctxt->vcpu) != 0)) { + kvm_inject_gp(ctxt->vcpu, 0); + goto cannot_emulate; + } + + /* inject #UD if LOCK prefix is used */ + if (c->lock_prefix) { + /* we don't need to inject #UD here, because + * when emulate_instruction() returns something else + * than EMULATE_DONE, then svm.c:ud_interception() + * will do that for us. + */ + goto cannot_emulate; + } + + /* TODO: Check if rip and rsp are canonical. + * inject_gp() if not + */ + + /* if REX.W bit is set ... */ + if ((c->rex_prefix & 0x8) != 0x0) { + /* Application is in 64bit mode */ + usermode = X86EMUL_MODE_PROT64; + } else { + /* Application is in 32bit legacy/compat mode */ + usermode = X86EMUL_MODE_PROT32; + } + + kvm_x86_ops->get_msr(ctxt->vcpu, + MSR_IA32_SYSENTER_CS, &msr_data); + kvm_x86_ops->get_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); + switch (usermode) { + case X86EMUL_MODE_PROT32: + cs.selector = (u16)(msr_data + 16); + if ((msr_data & 0xfffc) != 0x0) + break; + kvm_inject_gp(ctxt->vcpu, 0); + goto cannot_emulate; + case X86EMUL_MODE_PROT64: + cs.selector = (u16)(msr_data + 32); + if (msr_data != 0x0) + break; + kvm_inject_gp(ctxt->vcpu, 0); + goto cannot_emulate; + } + + cs.base = 0; /* flat segment */ + cs.limit = 0xfffff; /* 4GB limit */ + cs.g = 1; /* 4kb granularity */ + cs.s = 1; + cs.type = 0x0b; /* Execute, Read, Non-conforming code */ + cs.db = 1; /* 32bit code segment */ + cs.dpl = 3; + cs.selector |= SELECTOR_RPL_MASK; + cs.present = 1; + cs.l = 0; /* For return to compatibility mode */ + + /* No need to set cpl explicitely here. set_segment() + * does this below based on the cs.dpl value. + */ + + switch (usermode) { + case X86EMUL_MODE_PROT32: + ss.selector = (u16)(msr_data + 24); + break; + case X86EMUL_MODE_PROT64: + ss.selector = (cs.selector + 8); + break; + } + ss.base = 0; /* flat segment */ + ss.limit = 0xfffff; /* 4GB limit */ + ss.g = 1; /* 4kb granularity */ + ss.s = 1; + ss.type = 0x03; /* Expand Up, Read/Write, Data */ + ss.db = 1; /* 32bit stack segment */ + ss.dpl = 3; + ss.selector |= SELECTOR_RPL_MASK; + ss.present = 1; + + switch (usermode) { + case X86EMUL_MODE_PROT32: + /* We don't care about cs.g/ss.g bits + * (= 4kb granularity) so we have to set the effective + * limit here or we get a #GP in the guest, otherwise. + */ + cs.limit = 0xffffffff; + ss.limit = 0xffffffff; + break; + case X86EMUL_MODE_PROT64: + /* We don't care about cs.g/ss.g bits + * (= 4kb granularity) so we have to set the effective + * limit here or we get a #GP in the guest, otherwise. + */ + cs.base = 0; + cs.db = 0; + cs.l = 1; + cs.limit = 0xffffffff; + ss.base = 0; + ss.limit = 0xffffffff; + break; + default: + break; + } + kvm_x86_ops->set_segment(ctxt->vcpu, &cs, VCPU_SREG_CS); + kvm_x86_ops->set_segment(ctxt->vcpu, &ss, VCPU_SREG_SS); + + c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX]; + c->regs[VCPU_REGS_RSP] = ctxt->vcpu->arch.regs[VCPU_REGS_RCX]; + + goto writeback; + break; + } case 0x40 ... 0x4f: /* cmov */ c->dst.val = c->dst.orig_val = c->src.val; if (!test_cc(c->b, ctxt->eflags))