From patchwork Thu Nov 21 22:15:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Cooper X-Patchwork-Id: 11256997 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCAD2930 for ; Thu, 21 Nov 2019 22:17:10 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 98A76206CB for ; Thu, 21 Nov 2019 22:17:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=citrix.com header.i=@citrix.com header.b="dAPun9QD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 98A76206CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=citrix.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iXuk7-00077Z-8L; Thu, 21 Nov 2019 22:16:03 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iXuk6-00077O-Mg for xen-devel@lists.xenproject.org; Thu, 21 Nov 2019 22:16:02 +0000 X-Inumbo-ID: 7fcedb86-0cac-11ea-a340-12813bfff9fa Received: from esa5.hc3370-68.iphmx.com (unknown [216.71.155.168]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 7fcedb86-0cac-11ea-a340-12813bfff9fa; Thu, 21 Nov 2019 22:16:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=citrix.com; s=securemail; t=1574374561; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0ePuMLCdOQUmAhARIX3k88jIct2Adw/krLtsw+wFAFM=; b=dAPun9QDpKpbpdZMRnEIF7gIs2IK5QD7Oon0wUB2q4nw1GrUuB2+2Skd YEZStzltciQP7y/YIrDDpPnxIE3aRsxkGJ0YPP76pZu3Lg21kI0YwjAZY fdbbp0s6Vr8LtLu39zGyPxKnwTDF9RjYymRzmZNIQwPj1A0dLWjvlKzSq U=; Authentication-Results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@citrix.com; spf=Pass smtp.mailfrom=Andrew.Cooper3@citrix.com; spf=None smtp.helo=postmaster@mail.citrix.com Received-SPF: None (esa5.hc3370-68.iphmx.com: no sender authenticity information available from domain of andrew.cooper3@citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa5.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="andrew.cooper3@citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa5.hc3370-68.iphmx.com: domain of Andrew.Cooper3@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa5.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="Andrew.Cooper3@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa5.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa5.hc3370-68.iphmx.com; envelope-from="Andrew.Cooper3@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: agOf/DVA1bB2E4rTKnKYH7BlLyWQynXlD8MNHTkwsoRKSnbwxQeSs79Xc13MY1vf8wUP4QNK0S asAd38W89CG85UsuLL9Ow8GLaYKtvfrstZsy70hSkQnDM2E75Ug3XNDiaGItmFTxmdK4UYF8Qm M4gxCnp8SaNSUL48PErZpbskEjZHwTIMwUDHYFoyJI2cVd/UNTHCB6wasCzm3QIA9KMaJ7Dp6C Fe6OVOh8H0/H06zO+J4RiwX41fzn2DHs9MWBHLv+u6Z2Kj+xcqPiWAAhVH9dnGu3vu+z0R6vEe G+k= X-SBRS: 2.7 X-MesageID: 9042800 X-Ironport-Server: esa5.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.69,227,1571716800"; d="scan'208";a="9042800" From: Andrew Cooper To: Xen-devel Date: Thu, 21 Nov 2019 22:15:51 +0000 Message-ID: <20191121221551.1175-3-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20191121221551.1175-1-andrew.cooper3@citrix.com> References: <20191121221551.1175-1-andrew.cooper3@citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH 2/2] x86/svm: Write the correct %eip into the outgoing task X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Andrew Cooper , Wei Liu , Jan Beulich , =?utf-8?q?Roger_Pau_Monn=C3=A9?= Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" The TASK_SWITCH vmexit has fault semantics, and doesn't provide any NRIPs assistance with instruction length. As a result, any instruction-induced task switch has the outgoing task's %eip pointing at the instruction switch caused the switch, rather than after it. This causes explicit use of task gates to livelock (as when the task returns, it executes the task-switching instruction again), and any restartable task to become a nop after its first instantiation (the entry state points at the ret/iret instruction used to exit the task). 32bit Windows in particular is known to use task gates for NMI handling, and to use NMI IPIs. In the task switch handler, distinguish instruction-induced from interrupt/exception-induced task switches, and decode the instruction under %rip to calculate its length. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Juergen Gross The implementation of svm_get_task_switch_insn_len() is bug-compatible with svm_get_insn_len() when it comes to conditional #GP'ing. I still haven't had time to address this more thoroughly. AMD does permit TASK_SWITCH not to be intercepted and, I'm informed does do the right thing when it comes to a TSS crossing a page boundary. However, it is not actually safe to leave task switches unintercepted. Any NPT or shadow page fault, even from logdirty/paging/etc will corrupt guest state in an unrecoverable manner. --- xen/arch/x86/hvm/svm/emulate.c | 55 +++++++++++++++++++++++++++++++++++ xen/arch/x86/hvm/svm/svm.c | 46 ++++++++++++++++++++++------- xen/include/asm-x86/hvm/svm/emulate.h | 1 + 3 files changed, 92 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 3e52592847..176c25f60d 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -117,6 +117,61 @@ unsigned int svm_get_insn_len(struct vcpu *v, unsigned int instr_enc) } /* + * TASK_SWITCH vmexits never provide an instruction length. We must always + * decode under %rip to find the answer. + */ +unsigned int svm_get_task_switch_insn_len(struct vcpu *v) +{ + struct hvm_emulate_ctxt ctxt; + struct x86_emulate_state *state; + unsigned int emul_len, modrm_reg; + + ASSERT(v == current); + hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); + hvm_emulate_init_per_insn(&ctxt, NULL, 0); + state = x86_decode_insn(&ctxt.ctxt, hvmemul_insn_fetch); + if ( IS_ERR_OR_NULL(state) ) + return 0; + + emul_len = x86_insn_length(state, &ctxt.ctxt); + + /* + * Check for an instruction which can cause a task switch. Any far + * jmp/call/ret, any software interrupt/exception, and iret. + */ + switch ( ctxt.ctxt.opcode ) + { + case 0xff: /* Grp 5 */ + /* call / jmp (far, absolute indirect) */ + if ( x86_insn_modrm(state, NULL, &modrm_reg) != 3 || + (modrm_reg != 3 && modrm_reg != 5) ) + { + /* Wrong instruction. Throw #GP back for now. */ + default: + hvm_inject_hw_exception(TRAP_gp_fault, 0); + emul_len = 0; + break; + } + /* Fallthrough */ + case 0x62: /* bound */ + case 0x9a: /* call (far, absolute) */ + case 0xca: /* ret imm16 (far) */ + case 0xcb: /* ret (far) */ + case 0xcc: /* int3 */ + case 0xcd: /* int imm8 */ + case 0xce: /* into */ + case 0xcf: /* iret */ + case 0xea: /* jmp (far, absolute) */ + case 0xf1: /* icebp */ + break; + } + + x86_emulate_free_state(state); + + return emul_len; +} + +/* * Local variables: * mode: C * c-file-style: "BSD" diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 049b800e20..ba9c24a70c 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2776,7 +2776,41 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) case VMEXIT_TASK_SWITCH: { enum hvm_task_switch_reason reason; - int32_t errcode = -1; + int32_t errcode = -1, insn_len = -1; + + /* + * All TASK_SWITCH intercepts have fault-like semantics. NRIP is + * never provided, even for instruction-induced task switches, but we + * need to know the instruction length in order to set %eip suitably + * in the outgoing TSS. + * + * For a task switch which vectored through the IDT, look at the type + * to distinguish interrupts/exceptions from instruction based + * switches. + */ + if ( vmcb->eventinj.fields.v ) + { + /* + * HW_EXCEPTION, NMI and EXT_INTR are not instruction based. All + * others are. + */ + if ( vmcb->eventinj.fields.type <= X86_EVENTTYPE_HW_EXCEPTION ) + insn_len = 0; + + /* + * Clobber the vectoring information, as we are going to emulate + * the task switch in full. + */ + vmcb->eventinj.bytes = 0; + } + + /* + * insn_len being -1 indicates that we have an instruction-induced + * task switch. Decode under %rip to find its length. + */ + if ( insn_len < 0 && (insn_len = svm_get_task_switch_insn_len(v)) == 0 ) + break; + if ( (vmcb->exitinfo2 >> 36) & 1 ) reason = TSW_iret; else if ( (vmcb->exitinfo2 >> 38) & 1 ) @@ -2786,15 +2820,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) if ( (vmcb->exitinfo2 >> 44) & 1 ) errcode = (uint32_t)vmcb->exitinfo2; - /* - * Some processors set the EXITINTINFO field when the task switch - * is caused by a task gate in the IDT. In this case we will be - * emulating the event injection, so we do not want the processor - * to re-inject the original event! - */ - vmcb->eventinj.bytes = 0; - - hvm_task_switch(vmcb->exitinfo1, reason, errcode, 0); + hvm_task_switch(vmcb->exitinfo1, reason, errcode, insn_len); break; } diff --git a/xen/include/asm-x86/hvm/svm/emulate.h b/xen/include/asm-x86/hvm/svm/emulate.h index 9af10061c5..d7364f774a 100644 --- a/xen/include/asm-x86/hvm/svm/emulate.h +++ b/xen/include/asm-x86/hvm/svm/emulate.h @@ -51,6 +51,7 @@ struct vcpu; unsigned int svm_get_insn_len(struct vcpu *v, unsigned int instr_enc); +unsigned int svm_get_task_switch_insn_len(struct vcpu *v); #endif /* __ASM_X86_HVM_SVM_EMULATE_H__ */