From patchwork Mon Feb 3 15:16:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoyao Li X-Patchwork-Id: 11362977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57C641395 for ; Mon, 3 Feb 2020 15:21:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3E720217BA for ; Mon, 3 Feb 2020 15:21:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728151AbgBCPVU (ORCPT ); Mon, 3 Feb 2020 10:21:20 -0500 Received: from mga02.intel.com ([134.134.136.20]:32939 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726272AbgBCPVT (ORCPT ); Mon, 3 Feb 2020 10:21:19 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 07:21:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="429473347" Received: from lxy-dell.sh.intel.com ([10.239.13.109]) by fmsmga005.fm.intel.com with ESMTP; 03 Feb 2020 07:21:17 -0800 From: Xiaoyao Li To: Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Laight , Xiaoyao Li Subject: [PATCH v2 1/6] x86/split_lock: Add and export get_split_lock_detect_state() Date: Mon, 3 Feb 2020 23:16:03 +0800 Message-Id: <20200203151608.28053-2-xiaoyao.li@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200203151608.28053-1-xiaoyao.li@intel.com> References: <20200203151608.28053-1-xiaoyao.li@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org get_split_lock_detect_state() will be used by KVM module to get sld_state. Signed-off-by: Xiaoyao Li --- arch/x86/include/asm/cpu.h | 12 ++++++++++++ arch/x86/kernel/cpu/intel.c | 12 ++++++------ 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index ff6f3ca649b3..167d0539e0ad 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -40,11 +40,23 @@ int mwait_usable(const struct cpuinfo_x86 *); unsigned int x86_family(unsigned int sig); unsigned int x86_model(unsigned int sig); unsigned int x86_stepping(unsigned int sig); + +enum split_lock_detect_state { + sld_off = 0, + sld_warn, + sld_fatal, +}; + #ifdef CONFIG_CPU_SUP_INTEL +extern enum split_lock_detect_state get_split_lock_detect_state(void); extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c); extern void switch_to_sld(unsigned long tifn); extern bool handle_user_split_lock(struct pt_regs *regs, long error_code); #else +static inline enum split_lock_detect_state get_split_lock_detect_state(void) +{ + return sld_off; +} static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {} static inline void switch_to_sld(unsigned long tifn) {} static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index db3e745e5d47..a810cd022db5 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -33,12 +33,6 @@ #include #endif -enum split_lock_detect_state { - sld_off = 0, - sld_warn, - sld_fatal, -}; - /* * Default to sld_off because most systems do not support split lock detection * split_lock_setup() will switch this to sld_warn on systems that support @@ -968,6 +962,12 @@ cpu_dev_register(intel_cpu_dev); #undef pr_fmt #define pr_fmt(fmt) "x86/split lock detection: " fmt +enum split_lock_detect_state get_split_lock_detect_state(void) +{ + return sld_state; +} +EXPORT_SYMBOL_GPL(get_split_lock_detect_state); + static const struct { const char *option; enum split_lock_detect_state state; From patchwork Mon Feb 3 15:16:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoyao Li X-Patchwork-Id: 11362979 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BC30138D for ; Mon, 3 Feb 2020 15:21:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1131024650 for ; Mon, 3 Feb 2020 15:21:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728520AbgBCPVW (ORCPT ); Mon, 3 Feb 2020 10:21:22 -0500 Received: from mga02.intel.com ([134.134.136.20]:32939 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728206AbgBCPVV (ORCPT ); Mon, 3 Feb 2020 10:21:21 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 07:21:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="429473355" Received: from lxy-dell.sh.intel.com ([10.239.13.109]) by fmsmga005.fm.intel.com with ESMTP; 03 Feb 2020 07:21:19 -0800 From: Xiaoyao Li To: Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Laight , Xiaoyao Li Subject: [PATCH v2 2/6] x86/split_lock: Add and export split_lock_detect_set() Date: Mon, 3 Feb 2020 23:16:04 +0800 Message-Id: <20200203151608.28053-3-xiaoyao.li@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200203151608.28053-1-xiaoyao.li@intel.com> References: <20200203151608.28053-1-xiaoyao.li@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add and export split_lock_detect_set(), which will be used by KVM module to change the MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit to switch SLD. Signed-off-by: Xiaoyao Li --- arch/x86/include/asm/cpu.h | 1 + arch/x86/kernel/cpu/intel.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 167d0539e0ad..b46262afa6c1 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -52,6 +52,7 @@ extern enum split_lock_detect_state get_split_lock_detect_state(void); extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c); extern void switch_to_sld(unsigned long tifn); extern bool handle_user_split_lock(struct pt_regs *regs, long error_code); +extern void split_lock_detect_set(bool on); #else static inline enum split_lock_detect_state get_split_lock_detect_state(void) { diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index a810cd022db5..44138dd64808 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1088,6 +1088,12 @@ void switch_to_sld(unsigned long tifn) __sld_msr_set(!(tifn & _TIF_SLD)); } +void split_lock_detect_set(bool on) +{ + __sld_msr_set(on); +} +EXPORT_SYMBOL_GPL(split_lock_detect_set); + #define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY} /* From patchwork Mon Feb 3 15:16:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoyao Li X-Patchwork-Id: 11362987 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92F28138D for ; Mon, 3 Feb 2020 15:21:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 71B4E2192A for ; Mon, 3 Feb 2020 15:21:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728591AbgBCPV0 (ORCPT ); Mon, 3 Feb 2020 10:21:26 -0500 Received: from mga02.intel.com ([134.134.136.20]:32939 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728548AbgBCPVY (ORCPT ); Mon, 3 Feb 2020 10:21:24 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 07:21:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="429473368" Received: from lxy-dell.sh.intel.com ([10.239.13.109]) by fmsmga005.fm.intel.com with ESMTP; 03 Feb 2020 07:21:21 -0800 From: Xiaoyao Li To: Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Laight , Xiaoyao Li Subject: [PATCH v2 3/6] kvm: x86: Emulate split-lock access as a write Date: Mon, 3 Feb 2020 23:16:05 +0800 Message-Id: <20200203151608.28053-4-xiaoyao.li@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200203151608.28053-1-xiaoyao.li@intel.com> References: <20200203151608.28053-1-xiaoyao.li@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If split lock detect is enabled (warn/fatal), #AC handler calls die() when split lock happens in kernel. A sane guest should never tigger emulation on a split-lock access, but it cannot prevent malicous guest from doing this. So just emulating the access as a write if it's a split-lock access to avoid malicous guest polluting the kernel log. More detail analysis can be found: https://lkml.kernel.org/r/20200131200134.GD18946@linux.intel.com Signed-off-by: Xiaoyao Li --- arch/x86/kvm/x86.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2d3be7f3ad67..821b7404c0fd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5847,6 +5847,13 @@ static int emulator_write_emulated(struct x86_emulate_ctxt *ctxt, (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old)) #endif +static inline bool across_cache_line_access(gpa_t gpa, unsigned int bytes) +{ + unsigned int cache_line_size = cache_line_size(); + + return (gpa & (cache_line_size - 1)) + bytes > cache_line_size; +} + static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, unsigned long addr, const void *old, @@ -5873,6 +5880,10 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) goto emul_write; + if (get_split_lock_detect_state() != sld_off && + across_cache_line_access(gpa, bytes)) + goto emul_write; + if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map)) goto emul_write; From patchwork Mon Feb 3 15:16:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoyao Li X-Patchwork-Id: 11362985 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBB211395 for ; Mon, 3 Feb 2020 15:21:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CA91E217BA for ; Mon, 3 Feb 2020 15:21:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728707AbgBCPVj (ORCPT ); Mon, 3 Feb 2020 10:21:39 -0500 Received: from mga02.intel.com ([134.134.136.20]:32939 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728588AbgBCPV0 (ORCPT ); Mon, 3 Feb 2020 10:21:26 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 07:21:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="429473383" Received: from lxy-dell.sh.intel.com ([10.239.13.109]) by fmsmga005.fm.intel.com with ESMTP; 03 Feb 2020 07:21:23 -0800 From: Xiaoyao Li To: Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Laight , Xiaoyao Li Subject: [PATCH v2 4/6] kvm: vmx: Extend VMX's #AC handding for split lock in guest Date: Mon, 3 Feb 2020 23:16:06 +0800 Message-Id: <20200203151608.28053-5-xiaoyao.li@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200203151608.28053-1-xiaoyao.li@intel.com> References: <20200203151608.28053-1-xiaoyao.li@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org There are two types of #AC can be generated in Intel CPUs: 1. legacy alignment check #AC; 2. split lock #AC; Legacy alignment check #AC can be injected to guest if guest has enabled alignemnet check. When host enables split lock detection, i.e., split_lock_detect != off, guest will receive an unexpected #AC when there is a split lock happens since KVM doesn't virtualize this feature to guest hardware value of MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit stays unchanged when vcpu is running. Since old guests lack split_lock #AC handler and may have split lock buges. To make them survive from split lock, applying the similar policy as host's split lock detect configuration: - host split lock detect is sld_warn: warn the split lock happened in guest, and disabling split lock detect during vcpu is running to allow the guest to continue running. - host split lock detect is sld_fatal: forwarding #AC to userspace, somewhat similar as sending SIGBUS. Please note: 1. If sld_warn and SMT is enabled, the split lock in guest's vcpu leads to disable split lock detect on the sibling CPU thread during the vcpu is running. 2. When host is sld_warn, it allows guest to generate split lock which also opens the door for malicious guest to do DoS attack. It is same that in sld_warn mode, userspace application can do DoS attack. 3. If want to prevent DoS attack from guest, host must use sld_fatal mode. Signed-off-by: Xiaoyao Li --- arch/x86/kvm/vmx/vmx.c | 48 +++++++++++++++++++++++++++++++++++++++--- arch/x86/kvm/vmx/vmx.h | 3 +++ 2 files changed, 48 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c475fa2aaae0..93e3370c5f84 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4233,6 +4233,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx->msr_ia32_umwait_control = 0; + vmx->disable_split_lock_detect = false; + vcpu->arch.microcode_version = 0x100000000ULL; vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); vmx->hv_deadline_tsc = -1; @@ -4557,6 +4559,12 @@ static int handle_machine_check(struct kvm_vcpu *vcpu) return 1; } +static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu) +{ + return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) && + (kvm_get_rflags(vcpu) & X86_EFLAGS_AC); +} + static int handle_exception_nmi(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -4622,9 +4630,6 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) return handle_rmode_exception(vcpu, ex_no, error_code); switch (ex_no) { - case AC_VECTOR: - kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); - return 1; case DB_VECTOR: dr6 = vmcs_readl(EXIT_QUALIFICATION); if (!(vcpu->guest_debug & @@ -4653,6 +4658,33 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip; kvm_run->debug.arch.exception = ex_no; break; + case AC_VECTOR: + /* + * Inject #AC back to guest only when legacy alignment check + * enabled. + * Otherwise, it must be an unexpected split-lock #AC for guest + * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged + * when vcpu is running. + * - If sld_state == sld_warn, treat it similar as user space + * process that warn and allow it to continue running. + * In this case, setting vmx->diasble_split_lock_detect to + * true so that it will toggle MSR_TEST.SPLIT_LOCK_DETECT + * bit during every following VM Entry and Exit; + * - If sld_state == sld_fatal, it forwards #AC to userspace, + * similar as sending SIGBUS. + */ + if (guest_cpu_alignment_check_enabled(vcpu) || + WARN_ON(get_split_lock_detect_state() == sld_off)) { + kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); + return 1; + } + if (get_split_lock_detect_state() == sld_warn) { + pr_warn("kvm: split lock #AC happened in %s [%d]\n", + current->comm, current->pid); + vmx->disable_split_lock_detect = true; + return 1; + } + /* fall through*/ default: kvm_run->exit_reason = KVM_EXIT_EXCEPTION; kvm_run->ex.exception = ex_no; @@ -6530,6 +6562,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) */ x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); + if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && + unlikely(vmx->disable_split_lock_detect) && + !test_tsk_thread_flag(current, TIF_SLD)) + split_lock_detect_set(false); + /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); @@ -6564,6 +6601,11 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0); + if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && + unlikely(vmx->disable_split_lock_detect) && + !test_tsk_thread_flag(current, TIF_SLD)) + split_lock_detect_set(true); + /* All fields are clean at this point */ if (static_branch_unlikely(&enable_evmcs)) current_evmcs->hv_clean_fields |= diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 7f42cf3dcd70..912eba66c5d5 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -274,6 +274,9 @@ struct vcpu_vmx { bool req_immediate_exit; + /* Disable split-lock detection when running the vCPU */ + bool disable_split_lock_detect; + /* Support for PML */ #define PML_ENTITY_NUM 512 struct page *pml_pg; From patchwork Mon Feb 3 15:16:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoyao Li X-Patchwork-Id: 11362981 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADBCF138D for ; Mon, 3 Feb 2020 15:21:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 96200217BA for ; Mon, 3 Feb 2020 15:21:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728614AbgBCPV2 (ORCPT ); Mon, 3 Feb 2020 10:21:28 -0500 Received: from mga02.intel.com ([134.134.136.20]:32939 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728602AbgBCPV2 (ORCPT ); Mon, 3 Feb 2020 10:21:28 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 07:21:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="429473393" Received: from lxy-dell.sh.intel.com ([10.239.13.109]) by fmsmga005.fm.intel.com with ESMTP; 03 Feb 2020 07:21:26 -0800 From: Xiaoyao Li To: Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Laight , Xiaoyao Li Subject: [PATCH v2 5/6] kvm: x86: Emulate MSR IA32_CORE_CAPABILITIES Date: Mon, 3 Feb 2020 23:16:07 +0800 Message-Id: <20200203151608.28053-6-xiaoyao.li@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200203151608.28053-1-xiaoyao.li@intel.com> References: <20200203151608.28053-1-xiaoyao.li@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Emulate MSR_IA32_CORE_CAPABILITIES in software and unconditionally advertise its support to userspace. Like MSR_IA32_ARCH_CAPABILITIES, it is a feature-enumerating MSR and can be fully emulated regardless of hardware support. Existence of CORE_CAPABILITIES is enumerated via CPUID.(EAX=7H,ECX=0):EDX[30]. Note, support for individual features enumerated via CORE_CAPABILITIES, e.g., split lock detection, will be added in future patches. Signed-off-by: Xiaoyao Li --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c | 5 +++-- arch/x86/kvm/x86.c | 22 ++++++++++++++++++++++ 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 329d01c689b7..dc231240102f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -591,6 +591,7 @@ struct kvm_vcpu_arch { u64 ia32_xss; u64 microcode_version; u64 arch_capabilities; + u64 core_capabilities; /* * Paging state of the vcpu diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index b1c469446b07..7282d04f3a6b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -409,10 +409,11 @@ static inline void do_cpuid_7_mask(struct kvm_cpuid_entry2 *entry, int index) boot_cpu_has(X86_FEATURE_AMD_SSBD)) entry->edx |= F(SPEC_CTRL_SSBD); /* - * We emulate ARCH_CAPABILITIES in software even - * if the host doesn't support it. + * ARCH_CAPABILITIES and CORE_CAPABILITIES are emulated in + * software regardless of host support. */ entry->edx |= F(ARCH_CAPABILITIES); + entry->edx |= F(CORE_CAPABILITIES); break; case 1: entry->eax &= kvm_cpuid_7_1_eax_x86_features; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 821b7404c0fd..a97a8f5dd1df 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1222,6 +1222,7 @@ static const u32 emulated_msrs_all[] = { MSR_IA32_TSC_ADJUST, MSR_IA32_TSCDEADLINE, MSR_IA32_ARCH_CAPABILITIES, + MSR_IA32_CORE_CAPS, MSR_IA32_MISC_ENABLE, MSR_IA32_MCG_STATUS, MSR_IA32_MCG_CTL, @@ -1288,6 +1289,7 @@ static const u32 msr_based_features_all[] = { MSR_F10H_DECFG, MSR_IA32_UCODE_REV, MSR_IA32_ARCH_CAPABILITIES, + MSR_IA32_CORE_CAPS, }; static u32 msr_based_features[ARRAY_SIZE(msr_based_features_all)]; @@ -1341,12 +1343,20 @@ static u64 kvm_get_arch_capabilities(void) return data; } +static u64 kvm_get_core_capabilities(void) +{ + return 0; +} + static int kvm_get_msr_feature(struct kvm_msr_entry *msr) { switch (msr->index) { case MSR_IA32_ARCH_CAPABILITIES: msr->data = kvm_get_arch_capabilities(); break; + case MSR_IA32_CORE_CAPS: + msr->data = kvm_get_core_capabilities(); + break; case MSR_IA32_UCODE_REV: rdmsrl_safe(msr->index, &msr->data); break; @@ -2716,6 +2726,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; vcpu->arch.arch_capabilities = data; break; + case MSR_IA32_CORE_CAPS: + if (!msr_info->host_initiated) + return 1; + vcpu->arch.core_capabilities = data; + break; case MSR_EFER: return set_efer(vcpu, msr_info); case MSR_K7_HWCR: @@ -3044,6 +3059,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; msr_info->data = vcpu->arch.arch_capabilities; break; + case MSR_IA32_CORE_CAPS: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITIES)) + return 1; + msr_info->data = vcpu->arch.core_capabilities; + break; case MSR_IA32_POWER_CTL: msr_info->data = vcpu->arch.msr_ia32_power_ctl; break; @@ -9288,6 +9309,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) goto free_guest_fpu; vcpu->arch.arch_capabilities = kvm_get_arch_capabilities(); + vcpu->arch.core_capabilities = kvm_get_core_capabilities(); vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; kvm_vcpu_mtrr_init(vcpu); vcpu_load(vcpu); From patchwork Mon Feb 3 15:16:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoyao Li X-Patchwork-Id: 11362983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 262241395 for ; Mon, 3 Feb 2020 15:21:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EF42A2192A for ; Mon, 3 Feb 2020 15:21:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728639AbgBCPVd (ORCPT ); Mon, 3 Feb 2020 10:21:33 -0500 Received: from mga02.intel.com ([134.134.136.20]:32939 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728633AbgBCPVa (ORCPT ); Mon, 3 Feb 2020 10:21:30 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Feb 2020 07:21:29 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,398,1574150400"; d="scan'208";a="429473404" Received: from lxy-dell.sh.intel.com ([10.239.13.109]) by fmsmga005.fm.intel.com with ESMTP; 03 Feb 2020 07:21:28 -0800 From: Xiaoyao Li To: Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Laight , Xiaoyao Li Subject: [PATCH v2 6/6] x86: vmx: virtualize split lock detection Date: Mon, 3 Feb 2020 23:16:08 +0800 Message-Id: <20200203151608.28053-7-xiaoyao.li@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20200203151608.28053-1-xiaoyao.li@intel.com> References: <20200203151608.28053-1-xiaoyao.li@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Due to the fact that MSR_TEST_CTRL is per-core scope, i.e., the sibling threads in the same physical CPU core share the same MSR, only advertising feature split lock detection to guest when SMT is disabled or unsupported for simplicitly. Only when host is sld_off, can guest control the hardware value of MSR_TEST_CTL, i.e., KVM loads guest's value into hardware when vcpu is running. The vmx->disable_split_lock_detect can be set to true after unhandled split_lock #AC in guest only when host is sld_warn mode. It's for not burnning old guest, of course malicous guest can exploit it for DoS attack. If want to prevent DoS attack from malicious guest, it must use sld_fatal mode in host. When host is sld_fatal, hardware value of MSR_TEST_CTL.SPLIT_LOCK_DETECT never cleared. Below summarizing how guest behaves if SMT is off and it's a linux guest: ----------------------------------------------------------------------- Host | Guest | Guest behavior ----------------------------------------------------------------------- 1. off | | same as in bare metal ----------------------------------------------------------------------- 2. warn | off | hardware bit set initially. Once split lock happens, | | it sets vmx->disable_split_lock_detect, which leads | | hardware bit to be cleared when vcpu is running | | So, it's the same as in bare metal --------------------------------------------------------------- 3. | warn | - user space: get #AC when split lock, then clear | | MSR bit, but hardware bit is not cleared. #AC again, | | finally sets vmx->disable_split_lock_detect, which | | leads hardware bit to be cleared when vcpu is running; | | After the userspace process finishes, it sets vcpu's | | MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit, which causes | | vmx->disable_split_lock_detect to be set false | | So it's somehow the same as in bare-metal | | - kernel: same as in bare metal. -------------------------------------------------------------- 4. | fatal | same as in bare metal ---------------------------------------------------------------------- 5. fatal| off | #AC reported to userspace -------------------------------------------------------------- 6. | warn | - user space: get #AC when split lock, then clear | | MSR bit, but hardware bit is not cleared, #AC again, | | #AC reported to userspace | | - kernel: same as in bare metal, call die(); ------------------------------------------------------------- 7. | fatal | same as in bare metal ---------------------------------------------------------------------- Signed-off-by: Xiaoyao Li --- arch/x86/kvm/vmx/vmx.c | 72 +++++++++++++++++++++++++++++++++++------- arch/x86/kvm/vmx/vmx.h | 1 + arch/x86/kvm/x86.c | 13 ++++++-- 3 files changed, 73 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 93e3370c5f84..a0c3f579ecb6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1781,6 +1781,26 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr) } } +/* + * Note: for guest, feature split lock detection can only be enumerated by + * MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT. The FMS enumeration is invalid. + */ +static inline bool guest_has_feature_split_lock_detect(struct kvm_vcpu *vcpu) +{ + return !!(vcpu->arch.core_capabilities & + MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT); +} + +static inline u64 vmx_msr_test_ctrl_valid_bits(struct kvm_vcpu *vcpu) +{ + u64 valid_bits = 0; + + if (guest_has_feature_split_lock_detect(vcpu)) + valid_bits |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT; + + return valid_bits; +} + /* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. @@ -1793,6 +1813,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) u32 index; switch (msr_info->index) { + case MSR_TEST_CTRL: + if (!msr_info->host_initiated && + !guest_has_feature_split_lock_detect(vcpu)) + return 1; + msr_info->data = vmx->msr_test_ctrl; + break; #ifdef CONFIG_X86_64 case MSR_FS_BASE: msr_info->data = vmcs_readl(GUEST_FS_BASE); @@ -1934,6 +1960,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) u32 index; switch (msr_index) { + case MSR_TEST_CTRL: + if (!msr_info->host_initiated && + (!guest_has_feature_split_lock_detect(vcpu) || + data & ~vmx_msr_test_ctrl_valid_bits(vcpu))) + return 1; + if (data & MSR_TEST_CTRL_SPLIT_LOCK_DETECT) + vmx->disable_split_lock_detect = false; + vmx->msr_test_ctrl = data; + break; case MSR_EFER: ret = kvm_set_msr_common(vcpu, msr_info); break; @@ -4233,6 +4268,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx->msr_ia32_umwait_control = 0; + vmx->msr_test_ctrl = 0; vmx->disable_split_lock_detect = false; vcpu->arch.microcode_version = 0x100000000ULL; @@ -4565,6 +4601,11 @@ static inline bool guest_cpu_alignment_check_enabled(struct kvm_vcpu *vcpu) (kvm_get_rflags(vcpu) & X86_EFLAGS_AC); } +static inline bool guest_cpu_split_lock_detect_enabled(struct vcpu_vmx *vmx) +{ + return !!(vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT); +} + static int handle_exception_nmi(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -4660,8 +4701,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) break; case AC_VECTOR: /* - * Inject #AC back to guest only when legacy alignment check - * enabled. + * Inject #AC back to guest only when guest is expecting it, + * i.e., legacy alignment check or split lock #AC enabled. * Otherwise, it must be an unexpected split-lock #AC for guest * since KVM keeps hardware SPLIT_LOCK_DETECT bit unchanged * when vcpu is running. @@ -4674,12 +4715,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) * similar as sending SIGBUS. */ if (guest_cpu_alignment_check_enabled(vcpu) || + guest_cpu_split_lock_detect_enabled(vmx) || WARN_ON(get_split_lock_detect_state() == sld_off)) { kvm_queue_exception_e(vcpu, AC_VECTOR, error_code); return 1; } if (get_split_lock_detect_state() == sld_warn) { - pr_warn("kvm: split lock #AC happened in %s [%d]\n", + pr_warn_ratelimited("kvm: split lock #AC happened in %s [%d]\n", current->comm, current->pid); vmx->disable_split_lock_detect = true; return 1; @@ -6491,6 +6533,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long cr3, cr4; + bool host_sld_enabled, guest_sld_enabled; /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && @@ -6562,10 +6605,15 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) */ x86_spec_ctrl_set_guest(vmx->spec_ctrl, 0); - if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && - unlikely(vmx->disable_split_lock_detect) && - !test_tsk_thread_flag(current, TIF_SLD)) - split_lock_detect_set(false); + if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) { + host_sld_enabled = get_split_lock_detect_state() && + !test_tsk_thread_flag(current, TIF_SLD); + guest_sld_enabled = guest_cpu_split_lock_detect_enabled(vmx); + if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect)) + split_lock_detect_set(false); + else if (!host_sld_enabled && guest_sld_enabled) + split_lock_detect_set(true); + } /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) @@ -6601,10 +6649,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0); - if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && - unlikely(vmx->disable_split_lock_detect) && - !test_tsk_thread_flag(current, TIF_SLD)) - split_lock_detect_set(true); + if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) { + if (host_sld_enabled && unlikely(vmx->disable_split_lock_detect)) + split_lock_detect_set(true); + else if (!host_sld_enabled && guest_sld_enabled) + split_lock_detect_set(false); + } /* All fields are clean at this point */ if (static_branch_unlikely(&enable_evmcs)) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 912eba66c5d5..c36c663f4bae 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -222,6 +222,7 @@ struct vcpu_vmx { #endif u64 spec_ctrl; + u64 msr_test_ctrl; u32 msr_ia32_umwait_control; u32 secondary_exec_control; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a97a8f5dd1df..56e799981d53 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1163,7 +1163,7 @@ static const u32 msrs_to_save_all[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEAT_CTL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, - MSR_IA32_SPEC_CTRL, + MSR_IA32_SPEC_CTRL, MSR_TEST_CTRL, MSR_IA32_RTIT_CTL, MSR_IA32_RTIT_STATUS, MSR_IA32_RTIT_CR3_MATCH, MSR_IA32_RTIT_OUTPUT_BASE, MSR_IA32_RTIT_OUTPUT_MASK, MSR_IA32_RTIT_ADDR0_A, MSR_IA32_RTIT_ADDR0_B, @@ -1345,7 +1345,12 @@ static u64 kvm_get_arch_capabilities(void) static u64 kvm_get_core_capabilities(void) { - return 0; + u64 data = 0; + + if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && !cpu_smt_possible()) + data |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT; + + return data; } static int kvm_get_msr_feature(struct kvm_msr_entry *msr) @@ -5259,6 +5264,10 @@ static void kvm_init_msr_list(void) * to the guests in some cases. */ switch (msrs_to_save_all[i]) { + case MSR_TEST_CTRL: + if (!(kvm_get_core_capabilities() & + MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)) + continue; case MSR_IA32_BNDCFGS: if (!kvm_mpx_supported()) continue;