From patchwork Wed Dec 8 00:03:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662065 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2BF9C433EF for ; Tue, 7 Dec 2021 15:09:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237781AbhLGPMm (ORCPT ); Tue, 7 Dec 2021 10:12:42 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229503AbhLGPMl (ORCPT ); Tue, 7 Dec 2021 10:12:41 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237820979" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237820979" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289739" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:05 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 01/19] x86/fpu: Extend prctl() with guest permissions Date: Tue, 7 Dec 2021 19:03:41 -0500 Message-Id: <20211208000359.2853257-2-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Add guest permission control for dynamic XSTATE components, including extension to prctl() with two new options (ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM) and to struct fpu with a new member (guest_perm). Userspace VMM has to request guest permissions before it exposes any XSAVE feature using dynamic XSTATE components. The permission can be set only once when the first vCPU is created. A new flag FPU_GUEST_PERM_LOCKED is introduced to lock the change for this purpose Similar to native permissions this doesn't actually enable the permitted feature. KVM is expected to install a larger kernel buffer and enable the feature when detecting the intention from the guest. Signed-off-by: Thomas Gleixner Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- (To Thomas) We change the definition of xstate_get_guest_group_perm() from xstate.h to api.h since this will be called by KVM. arch/x86/include/asm/fpu/api.h | 2 ++ arch/x86/include/asm/fpu/types.h | 9 ++++++ arch/x86/include/uapi/asm/prctl.h | 26 ++++++++-------- arch/x86/kernel/fpu/core.c | 3 ++ arch/x86/kernel/fpu/xstate.c | 50 +++++++++++++++++++++++-------- arch/x86/kernel/fpu/xstate.h | 13 ++++++-- arch/x86/kernel/process.c | 2 ++ 7 files changed, 78 insertions(+), 27 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 6053674f9132..7532f73c82a6 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -138,6 +138,8 @@ static inline void fpstate_free(struct fpu *fpu) { } /* fpstate-related functions which are exported to KVM */ extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature); +extern inline u64 xstate_get_guest_group_perm(void); + /* KVM specific functions */ extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu); extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu); diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 3c06c82ab355..6ddf80637697 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -387,6 +387,8 @@ struct fpstate { /* @regs is dynamically sized! Don't add anything after @regs! */ } __aligned(64); +#define FPU_GUEST_PERM_LOCKED BIT_ULL(63) + struct fpu_state_perm { /* * @__state_perm: @@ -476,6 +478,13 @@ struct fpu { */ struct fpu_state_perm perm; + /* + * @guest_perm: + * + * Permission related information for guest pseudo FPUs + */ + struct fpu_state_perm guest_perm; + /* * @__fpstate: * diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 754a07856817..500b96e71f18 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -2,20 +2,22 @@ #ifndef _ASM_X86_PRCTL_H #define _ASM_X86_PRCTL_H -#define ARCH_SET_GS 0x1001 -#define ARCH_SET_FS 0x1002 -#define ARCH_GET_FS 0x1003 -#define ARCH_GET_GS 0x1004 +#define ARCH_SET_GS 0x1001 +#define ARCH_SET_FS 0x1002 +#define ARCH_GET_FS 0x1003 +#define ARCH_GET_GS 0x1004 -#define ARCH_GET_CPUID 0x1011 -#define ARCH_SET_CPUID 0x1012 +#define ARCH_GET_CPUID 0x1011 +#define ARCH_SET_CPUID 0x1012 -#define ARCH_GET_XCOMP_SUPP 0x1021 -#define ARCH_GET_XCOMP_PERM 0x1022 -#define ARCH_REQ_XCOMP_PERM 0x1023 +#define ARCH_GET_XCOMP_SUPP 0x1021 +#define ARCH_GET_XCOMP_PERM 0x1022 +#define ARCH_REQ_XCOMP_PERM 0x1023 +#define ARCH_GET_XCOMP_GUEST_PERM 0x1024 +#define ARCH_REQ_XCOMP_GUEST_PERM 0x1025 -#define ARCH_MAP_VDSO_X32 0x2001 -#define ARCH_MAP_VDSO_32 0x2002 -#define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_MAP_VDSO_X32 0x2001 +#define ARCH_MAP_VDSO_32 0x2002 +#define ARCH_MAP_VDSO_64 0x2003 #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 8ea306b1bf8e..ab19b3d8b2f7 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -450,6 +450,8 @@ void fpstate_reset(struct fpu *fpu) fpu->perm.__state_perm = fpu_kernel_cfg.default_features; fpu->perm.__state_size = fpu_kernel_cfg.default_size; fpu->perm.__user_state_size = fpu_user_cfg.default_size; + /* Same defaults for guests */ + fpu->guest_perm = fpu->perm; } static inline void fpu_inherit_perms(struct fpu *dst_fpu) @@ -460,6 +462,7 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) spin_lock_irq(¤t->sighand->siglock); /* Fork also inherits the permissions of the parent */ dst_fpu->perm = src_fpu->perm; + dst_fpu->guest_perm = src_fpu->guest_perm; spin_unlock_irq(¤t->sighand->siglock); } } diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index d28829403ed0..9856d579aa6e 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1595,7 +1595,7 @@ static int validate_sigaltstack(unsigned int usize) return 0; } -static int __xstate_request_perm(u64 permitted, u64 requested) +static int __xstate_request_perm(u64 permitted, u64 requested, bool guest) { /* * This deliberately does not exclude !XSAVES as we still might @@ -1605,6 +1605,7 @@ static int __xstate_request_perm(u64 permitted, u64 requested) */ bool compacted = cpu_feature_enabled(X86_FEATURE_XSAVES); struct fpu *fpu = ¤t->group_leader->thread.fpu; + struct fpu_state_perm *perm; unsigned int ksize, usize; u64 mask; int ret; @@ -1621,15 +1622,18 @@ static int __xstate_request_perm(u64 permitted, u64 requested) mask &= XFEATURE_MASK_USER_SUPPORTED; usize = xstate_calculate_size(mask, false); - ret = validate_sigaltstack(usize); - if (ret) - return ret; + if (!guest) { + ret = validate_sigaltstack(usize); + if (ret) + return ret; + } + perm = guest ? &fpu->guest_perm : &fpu->perm; /* Pairs with the READ_ONCE() in xstate_get_group_perm() */ - WRITE_ONCE(fpu->perm.__state_perm, requested); + WRITE_ONCE(perm->__state_perm, requested); /* Protected by sighand lock */ - fpu->perm.__state_size = ksize; - fpu->perm.__user_state_size = usize; + perm->__state_size = ksize; + perm->__user_state_size = usize; return ret; } @@ -1640,7 +1644,7 @@ static const u64 xstate_prctl_req[XFEATURE_MAX] = { [XFEATURE_XTILE_DATA] = XFEATURE_MASK_XTILE_DATA, }; -static int xstate_request_perm(unsigned long idx) +static int xstate_request_perm(unsigned long idx, bool guest) { u64 permitted, requested; int ret; @@ -1661,14 +1665,19 @@ static int xstate_request_perm(unsigned long idx) return -EOPNOTSUPP; /* Lockless quick check */ - permitted = xstate_get_host_group_perm(); + permitted = xstate_get_group_perm(guest); if ((permitted & requested) == requested) return 0; /* Protect against concurrent modifications */ spin_lock_irq(¤t->sighand->siglock); - permitted = xstate_get_host_group_perm(); - ret = __xstate_request_perm(permitted, requested); + permitted = xstate_get_group_perm(guest); + + /* First vCPU allocation locks the permissions. */ + if (guest && (permitted & FPU_GUEST_PERM_LOCKED)) + ret = -EBUSY; + else + ret = __xstate_request_perm(permitted, requested, guest); spin_unlock_irq(¤t->sighand->siglock); return ret; } @@ -1713,12 +1722,17 @@ int xfd_enable_feature(u64 xfd_err) return 0; } #else /* CONFIG_X86_64 */ -static inline int xstate_request_perm(unsigned long idx) +static inline int xstate_request_perm(unsigned long idx, bool guest) { return -EPERM; } #endif /* !CONFIG_X86_64 */ +inline u64 xstate_get_guest_group_perm(void) +{ + return xstate_get_group_perm(true); +} +EXPORT_SYMBOL_GPL(xstate_get_guest_group_perm); /** * fpu_xstate_prctl - xstate permission operations * @tsk: Redundant pointer to current @@ -1742,6 +1756,7 @@ long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2) u64 __user *uptr = (u64 __user *)arg2; u64 permitted, supported; unsigned long idx = arg2; + bool guest = false; if (tsk != current) return -EPERM; @@ -1760,11 +1775,20 @@ long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2) permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); + case ARCH_GET_XCOMP_GUEST_PERM: + permitted = xstate_get_guest_group_perm(); + permitted &= XFEATURE_MASK_USER_SUPPORTED; + return put_user(permitted, uptr); + + case ARCH_REQ_XCOMP_GUEST_PERM: + guest = true; + fallthrough; + case ARCH_REQ_XCOMP_PERM: if (!IS_ENABLED(CONFIG_X86_64)) return -EOPNOTSUPP; - return xstate_request_perm(idx); + return xstate_request_perm(idx, guest); default: return -EINVAL; diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 86ea7c0fa2f6..98a472775c97 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -20,10 +20,19 @@ static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask) xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT; } -static inline u64 xstate_get_host_group_perm(void) +static inline u64 xstate_get_group_perm(bool guest) { + struct fpu *fpu = ¤t->group_leader->thread.fpu; + struct fpu_state_perm *perm; + /* Pairs with WRITE_ONCE() in xstate_request_perm() */ - return READ_ONCE(current->group_leader->thread.fpu.perm.__state_perm); + perm = guest ? &fpu->guest_perm : &fpu->perm; + return READ_ONCE(perm->__state_perm); +} + +static inline u64 xstate_get_host_group_perm(void) +{ + return xstate_get_group_perm(false); } enum xstate_copy_mode { diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 04143a653a8a..d7bc23589062 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -993,6 +993,8 @@ long do_arch_prctl_common(struct task_struct *task, int option, case ARCH_GET_XCOMP_SUPP: case ARCH_GET_XCOMP_PERM: case ARCH_REQ_XCOMP_PERM: + case ARCH_GET_XCOMP_GUEST_PERM: + case ARCH_REQ_XCOMP_GUEST_PERM: return fpu_xstate_prctl(task, option, arg2); } From patchwork Wed Dec 8 00:03:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9ABE9C433FE for ; Tue, 7 Dec 2021 15:09:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238535AbhLGPMp (ORCPT ); Tue, 7 Dec 2021 10:12:45 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238530AbhLGPMo (ORCPT ); Tue, 7 Dec 2021 10:12:44 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237820994" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237820994" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289759" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:10 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 02/19] x86/fpu: Prepare KVM for dynamically enabled states Date: Tue, 7 Dec 2021 19:03:42 -0500 Message-Id: <20211208000359.2853257-3-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Add more fields for tracking per-vCPU permissions for dynamic XSTATE components: - user_xfeatures Track which features are currently enabled for the vCPU - user_perm Copied from guest_perm of the group leader thread. The first vCPU which does the copy locks the guest_perm - realloc_request KVM sets this field to request dynamically-enabled features which require reallocation of @fpstate Initialize those fields properly. Signed-off-by: Thomas Gleixner Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/include/asm/fpu/types.h | 23 +++++++++++++++++++++++ arch/x86/kernel/fpu/core.c | 26 +++++++++++++++++++++++++- 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 6ddf80637697..861cffca3209 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -504,6 +504,29 @@ struct fpu { * Guest pseudo FPU container */ struct fpu_guest { + /* + * @user_xfeatures: xfeature bitmap of features which are + * currently enabled for the guest vCPU. + */ + u64 user_xfeatures; + + /* + * @user_perm: xfeature bitmap of features which are + * permitted to be enabled for the guest + * vCPU. + */ + u64 user_perm; + + /* + * @realloc_request: xfeature bitmap of features which are + * requested to be enabled dynamically + * which requires reallocation of @fpstate + * + * Set by an intercept handler and + * evaluated in fpu_swap_kvm_fpstate() + */ + u64 realloc_request; + /* * @fpstate: Pointer to the allocated guest fpstate */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index ab19b3d8b2f7..fe592799508c 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -201,6 +201,26 @@ void fpu_reset_from_exception_fixup(void) #if IS_ENABLED(CONFIG_KVM) static void __fpstate_reset(struct fpstate *fpstate); +static void fpu_init_guest_permissions(struct fpu_guest *gfpu) +{ + struct fpu_state_perm *fpuperm; + u64 perm; + + if (!IS_ENABLED(CONFIG_X86_64)) + return; + + spin_lock_irq(¤t->sighand->siglock); + fpuperm = ¤t->group_leader->thread.fpu.guest_perm; + perm = fpuperm->__state_perm; + + /* First fpstate allocation locks down permissions. */ + WRITE_ONCE(fpuperm->__state_perm, perm | FPU_GUEST_PERM_LOCKED); + + spin_unlock_irq(¤t->sighand->siglock); + + gfpu->user_perm = perm & ~FPU_GUEST_PERM_LOCKED; +} + bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) { struct fpstate *fpstate; @@ -216,7 +236,11 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) fpstate->is_valloc = true; fpstate->is_guest = true; - gfpu->fpstate = fpstate; + gfpu->fpstate = fpstate; + gfpu->user_xfeatures = fpu_user_cfg.default_features; + gfpu->user_perm = fpu_user_cfg.default_features; + fpu_init_guest_permissions(gfpu); + return true; } EXPORT_SYMBOL_GPL(fpu_alloc_guest_fpstate); From patchwork Wed Dec 8 00:03:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DFA2C433EF for ; Tue, 7 Dec 2021 15:09:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238540AbhLGPMv (ORCPT ); Tue, 7 Dec 2021 10:12:51 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238508AbhLGPMu (ORCPT ); Tue, 7 Dec 2021 10:12:50 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821018" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821018" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289768" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:14 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 03/19] kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule Date: Tue, 7 Dec 2021 19:03:43 -0500 Message-Id: <20211208000359.2853257-4-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu CPUID.0xD.1.EBX enumerates the size of the XSAVE area (in compacted format) required by XSAVES. If CPUID.0xD.i.ECX[1] is set for a state component (i), this state component should be located on the next 64-bytes boundary following the preceding state component in the compacted layout. Fix xstate_required_size() to follow the alignment rule. AMX is the first state component with 64-bytes alignment to catch this bug. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kvm/cpuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 07e9215e911d..148003e26cbb 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -42,7 +42,8 @@ static u32 xstate_required_size(u64 xstate_bv, bool compacted) if (xstate_bv & 0x1) { u32 eax, ebx, ecx, edx, offset; cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx, &edx); - offset = compacted ? ret : ebx; + /* ECX[1]: 64B alignment in compacted form */ + offset = compacted ? ((ecx & 0x2) ? ALIGN(ret, 64) : ret) : ebx; ret = max(ret, offset + eax); } From patchwork Wed Dec 8 00:03:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBACCC433F5 for ; Tue, 7 Dec 2021 15:09:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238541AbhLGPMy (ORCPT ); Tue, 7 Dec 2021 10:12:54 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238544AbhLGPMw (ORCPT ); Tue, 7 Dec 2021 10:12:52 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821039" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821039" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289781" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:18 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 04/19] kvm: x86: Check guest xstate permissions when KVM_SET_CPUID2 Date: Tue, 7 Dec 2021 19:03:44 -0500 Message-Id: <20211208000359.2853257-5-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu Guest xstate permissions should be set by userspace VMM before vcpu creation. This patch extends KVM to check the guest permissions in KVM_SET_CPUID2 ioctl to avoid permission failure at guest run-time (e.g. when reallocation path is triggered). Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kvm/cpuid.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 148003e26cbb..f3c61205bbf4 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "cpuid.h" #include "lapic.h" @@ -97,6 +98,17 @@ static int kvm_check_cpuid(struct kvm_cpuid_entry2 *entries, int nent) return -EINVAL; } + /* + * Check guest permissions for XSTATE features which must + * be enabled dynamically. + */ + best = cpuid_entry2_find(entries, nent, 7, 0); + if (best && cpuid_entry_has(best, X86_FEATURE_AMX_TILE)) { + if (!(xstate_get_guest_group_perm() & + XFEATURE_MASK_XTILE_DATA)) + return -EINVAL; + } + return 0; } From patchwork Wed Dec 8 00:03:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77AE8C433EF for ; Tue, 7 Dec 2021 15:09:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238513AbhLGPM6 (ORCPT ); Tue, 7 Dec 2021 10:12:58 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238547AbhLGPM4 (ORCPT ); Tue, 7 Dec 2021 10:12:56 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821064" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821064" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289795" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:22 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 05/19] x86/fpu: Move xfd initialization out of __fpstate_reset() to the callers Date: Tue, 7 Dec 2021 19:03:45 -0500 Message-Id: <20211208000359.2853257-6-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu vCPU threads are different from native tasks regarding to the initial xfd value. While all native tasks follow a fixed value (init_fpstate::xfd) defined by fpu core, vCPU threads need to obey the reset value (i.e. ZERO) defined by the spec, to meet the expectation of the guest. Move xfd initialization out of __fpstate_reset() to the callers for choosing a specific value. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kernel/fpu/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index fe592799508c..fae44fa27cdb 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -231,6 +231,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) if (!fpstate) return false; + /* Leave xfd to 0 (the reset value defined by spec) */ __fpstate_reset(fpstate); fpstate_init_user(fpstate); fpstate->is_valloc = true; @@ -461,7 +462,6 @@ static void __fpstate_reset(struct fpstate *fpstate) fpstate->user_size = fpu_user_cfg.default_size; fpstate->xfeatures = fpu_kernel_cfg.default_features; fpstate->user_xfeatures = fpu_user_cfg.default_features; - fpstate->xfd = init_fpstate.xfd; } void fpstate_reset(struct fpu *fpu) @@ -469,6 +469,7 @@ void fpstate_reset(struct fpu *fpu) /* Set the fpstate pointer to the default fpstate */ fpu->fpstate = &fpu->__fpstate; __fpstate_reset(fpu->fpstate); + fpu->fpstate->xfd = init_fpstate.xfd; /* Initialize the permission related info in fpu */ fpu->perm.__state_perm = fpu_kernel_cfg.default_features; From patchwork Wed Dec 8 00:03:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23526C433F5 for ; Tue, 7 Dec 2021 15:09:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238595AbhLGPNC (ORCPT ); Tue, 7 Dec 2021 10:13:02 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238582AbhLGPNA (ORCPT ); Tue, 7 Dec 2021 10:13:00 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821089" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821089" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289817" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:26 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 06/19] x86/fpu: Add reallocation mechanims for KVM Date: Tue, 7 Dec 2021 19:03:46 -0500 Message-Id: <20211208000359.2853257-7-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Extend fpstate reallocation mechanism to cover guest fpu. Unlike native tasks which have reallocation triggered from #NM handler, guest fpstate reallocation is requested by KVM when detecting the guest intention on using a dynamically-enabled XSAVE feature. Since KVM currently swaps host/guest fpstate when exiting to userspace VMM (see fpu_swap_kvm_fpstate()), deal with fpstate reallocation also at this point. The implication - KVM must break vcpu_run() loop to exit to userspace VMM instead of immediately returning back to the guest when fpstate requires reallocation. In this case KVM should set guest_fpu::realloc_request to mark those features in related VM exit handlers. Signed-off-by: Thomas Gleixner Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kernel/fpu/core.c | 26 +++++++++++++++++++--- arch/x86/kernel/fpu/xstate.c | 43 ++++++++++++++++++++++++++++++------ arch/x86/kernel/fpu/xstate.h | 2 ++ 3 files changed, 61 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index fae44fa27cdb..7a0436a0cb2c 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -261,11 +261,31 @@ void fpu_free_guest_fpstate(struct fpu_guest *gfpu) } EXPORT_SYMBOL_GPL(fpu_free_guest_fpstate); +static int fpu_guest_realloc_fpstate(struct fpu_guest *guest_fpu, + bool enter_guest) +{ + /* + * Reallocation requests can only be handled when + * switching from guest to host mode. + */ + if (WARN_ON_ONCE(enter_guest || !IS_ENABLED(CONFIG_X86_64))) { + guest_fpu->realloc_request = 0; + return -EUNATCH; + } + return xfd_enable_guest_features(guest_fpu); +} + int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest) { - struct fpstate *guest_fps = guest_fpu->fpstate; + struct fpstate *guest_fps, *cur_fps; struct fpu *fpu = ¤t->thread.fpu; - struct fpstate *cur_fps = fpu->fpstate; + int ret = 0; + + if (unlikely(guest_fpu->realloc_request)) + ret = fpu_guest_realloc_fpstate(guest_fpu, enter_guest); + + guest_fps = guest_fpu->fpstate; + cur_fps = fpu->fpstate; fpregs_lock(); if (!cur_fps->is_confidential && !test_thread_flag(TIF_NEED_FPU_LOAD)) @@ -298,7 +318,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest) fpregs_mark_activate(); fpregs_unlock(); - return 0; + return ret; } EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpstate); diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 9856d579aa6e..fe3d8ed3db0e 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1529,6 +1529,7 @@ static struct fpstate *fpu_install_fpstate(struct fpu *fpu, * of that task * @ksize: The required size for the kernel buffer * @usize: The required size for user space buffers + * @guest_fpu: Pointer to a guest FPU container. NULL for host allocations * * Note vs. vmalloc(): If the task with a vzalloc()-allocated buffer * terminates quickly, vfree()-induced IPIs may be a concern, but tasks @@ -1537,7 +1538,7 @@ static struct fpstate *fpu_install_fpstate(struct fpu *fpu, * Returns: 0 on success, -ENOMEM on allocation error. */ static int fpstate_realloc(u64 xfeatures, unsigned int ksize, - unsigned int usize) + unsigned int usize, struct fpu_guest *guest_fpu) { struct fpu *fpu = ¤t->thread.fpu; struct fpstate *curfps, *newfps = NULL; @@ -1553,6 +1554,12 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize, newfps->user_size = usize; newfps->is_valloc = true; + if (guest_fpu) { + newfps->is_guest = true; + newfps->is_confidential = curfps->is_confidential; + guest_fpu->user_xfeatures |= xfeatures; + } + fpregs_lock(); /* * Ensure that the current state is in the registers before @@ -1566,12 +1573,14 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize, newfps->user_xfeatures = curfps->user_xfeatures | xfeatures; newfps->xfd = curfps->xfd & ~xfeatures; + if (guest_fpu) + guest_fpu->fpstate = newfps; + curfps = fpu_install_fpstate(fpu, newfps); /* Do the final updates within the locked region */ xstate_init_xcomp_bv(&newfps->regs.xsave, newfps->xfeatures); xfd_update_state(newfps); - fpregs_unlock(); vfree(curfps); @@ -1682,9 +1691,10 @@ static int xstate_request_perm(unsigned long idx, bool guest) return ret; } -int xfd_enable_feature(u64 xfd_err) +static int __xfd_enable_feature(u64 xfd_err, struct fpu_guest *guest_fpu) { u64 xfd_event = xfd_err & XFEATURE_MASK_USER_DYNAMIC; + struct fpu_state_perm *perm; unsigned int ksize, usize; struct fpu *fpu; @@ -1697,14 +1707,16 @@ int xfd_enable_feature(u64 xfd_err) spin_lock_irq(¤t->sighand->siglock); /* If not permitted let it die */ - if ((xstate_get_host_group_perm() & xfd_event) != xfd_event) { + if ((xstate_get_group_perm(!!guest_fpu) & xfd_event) != xfd_event) { spin_unlock_irq(¤t->sighand->siglock); return -EPERM; } fpu = ¤t->group_leader->thread.fpu; - ksize = fpu->perm.__state_size; - usize = fpu->perm.__user_state_size; + perm = guest_fpu ? &fpu->guest_perm : &fpu->perm; + ksize = perm->__state_size; + usize = perm->__user_state_size; + /* * The feature is permitted. State size is sufficient. Dropping * the lock is safe here even if more features are added from @@ -1717,10 +1729,27 @@ int xfd_enable_feature(u64 xfd_err) * Try to allocate a new fpstate. If that fails there is no way * out. */ - if (fpstate_realloc(xfd_event, ksize, usize)) + if (fpstate_realloc(xfd_event, ksize, usize, guest_fpu)) return -EFAULT; return 0; } + +int xfd_enable_feature(u64 xfd_err) +{ + return __xfd_enable_feature(xfd_err, NULL); +} + +int xfd_enable_guest_features(struct fpu_guest *guest_fpu) +{ + u64 xfd_err = guest_fpu->realloc_request & XFEATURE_MASK_USER_SUPPORTED; + + guest_fpu->realloc_request = 0; + + if (!xfd_err) + return 0; + return __xfd_enable_feature(xfd_err, guest_fpu); +} + #else /* CONFIG_X86_64 */ static inline int xstate_request_perm(unsigned long idx, bool guest) { diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 98a472775c97..3254e2b5f17f 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -55,6 +55,8 @@ extern void fpu__init_system_xstate(unsigned int legacy_size); extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr); +extern int xfd_enable_guest_features(struct fpu_guest *guest_fpu); + static inline u64 xfeatures_mask_supervisor(void) { return fpu_kernel_cfg.max_features & XFEATURE_MASK_SUPERVISOR_SUPPORTED; From patchwork Wed Dec 8 00:03:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 572A8C433F5 for ; Tue, 7 Dec 2021 15:09:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238616AbhLGPNI (ORCPT ); Tue, 7 Dec 2021 10:13:08 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238622AbhLGPNE (ORCPT ); Tue, 7 Dec 2021 10:13:04 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821116" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821116" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289832" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:30 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 07/19] kvm: x86: Propagate fpstate reallocation error to userspace Date: Tue, 7 Dec 2021 19:03:47 -0500 Message-Id: <20211208000359.2853257-8-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu fpstate reallocation is handled when the vCPU thread returns to userspace. As reallocation could fail (e.g. lack of memory), this patch extends kvm_put_guest_fpu() to return an integer value to carry error code to userspace VMM. The userspace VMM is expected to handle any error caused by fpstate reallocation. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kvm/x86.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ee1a039b490..05f2cda73d69 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10171,17 +10171,21 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) } /* When vcpu_run ends, restore user space FPU context. */ -static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) +static int kvm_put_guest_fpu(struct kvm_vcpu *vcpu) { - fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false); + int ret; + + ret = fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false); ++vcpu->stat.fpu_reload; trace_kvm_fpu(0); + + return ret; } int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) { struct kvm_run *kvm_run = vcpu->run; - int r; + int r, ret; vcpu_load(vcpu); kvm_sigset_activate(vcpu); @@ -10243,7 +10247,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) r = vcpu_run(vcpu); out: - kvm_put_guest_fpu(vcpu); + ret = kvm_put_guest_fpu(vcpu); + if ((r >= 0) && (ret < 0)) + r = ret; + if (kvm_run->kvm_valid_regs) store_regs(vcpu); post_kvm_run_save(vcpu); From patchwork Wed Dec 8 00:03:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95E7AC433FE for ; Tue, 7 Dec 2021 15:09:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238594AbhLGPNL (ORCPT ); Tue, 7 Dec 2021 10:13:11 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238582AbhLGPNI (ORCPT ); Tue, 7 Dec 2021 10:13:08 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821133" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821133" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289843" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:34 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 08/19] x86/fpu: Move xfd_update_state() to xstate.c and export symbol Date: Tue, 7 Dec 2021 19:03:48 -0500 Message-Id: <20211208000359.2853257-9-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu xfd_update_state() is the interface to update IA32_XFD and its per-cpu cache. All callers of this interface are currently in fpu core. KVM only indirectly triggers IA32_XFD update via a helper function (fpu_swap_kvm_fpstate()) when switching between user fpu and guest fpu. Supporting AMX in guest now requires KVM to directly update IA32_XFD with the guest value (when emulating WRMSR) so XSAVE/XRSTOR can manage XSTATE components correctly inside guest. This patch moves xfd_update_state() from fpu/xstate.h to fpu/xstate.c and export it for reference outside of fpu core. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/include/asm/fpu/api.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 12 ++++++++++++ arch/x86/kernel/fpu/xstate.h | 14 +------------- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 7532f73c82a6..999d89026be9 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -131,8 +131,10 @@ DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); /* Process cleanup */ #ifdef CONFIG_X86_64 extern void fpstate_free(struct fpu *fpu); +extern void xfd_update_state(struct fpstate *fpstate); #else static inline void fpstate_free(struct fpu *fpu) { } +static void xfd_update_state(struct fpstate *fpstate) { } #endif /* fpstate-related functions which are exported to KVM */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index fe3d8ed3db0e..3c39789deeb9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1750,6 +1750,18 @@ int xfd_enable_guest_features(struct fpu_guest *guest_fpu) return __xfd_enable_feature(xfd_err, guest_fpu); } +void xfd_update_state(struct fpstate *fpstate) +{ + if (fpu_state_size_dynamic()) { + u64 xfd = fpstate->xfd; + + if (__this_cpu_read(xfd_state) != xfd) { + wrmsrl(MSR_IA32_XFD, xfd); + __this_cpu_write(xfd_state, xfd); + } + } +} +EXPORT_SYMBOL_GPL(xfd_update_state); #else /* CONFIG_X86_64 */ static inline int xstate_request_perm(unsigned long idx, bool guest) { diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 3254e2b5f17f..651bd29977b9 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -149,19 +149,7 @@ static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rs #endif #ifdef CONFIG_X86_64 -static inline void xfd_update_state(struct fpstate *fpstate) -{ - if (fpu_state_size_dynamic()) { - u64 xfd = fpstate->xfd; - - if (__this_cpu_read(xfd_state) != xfd) { - wrmsrl(MSR_IA32_XFD, xfd); - __this_cpu_write(xfd_state, xfd); - } - } -} -#else -static inline void xfd_update_state(struct fpstate *fpstate) { } +extern void xfd_update_state(struct fpstate *fpstate); #endif /* From patchwork Wed Dec 8 00:03:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71CB0C433F5 for ; Tue, 7 Dec 2021 15:09:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238562AbhLGPNP (ORCPT ); Tue, 7 Dec 2021 10:13:15 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238582AbhLGPNM (ORCPT ); Tue, 7 Dec 2021 10:13:12 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821152" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821152" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289855" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:38 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 09/19] kvm: x86: Prepare reallocation check Date: Tue, 7 Dec 2021 19:03:49 -0500 Message-Id: <20211208000359.2853257-10-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu On native fpstate reallocation is triggered by #NM because IA32_XFD is initialized to 1 for all native tasks. However #NM in guest is not trapped by KVM. Instead, guest enabling of a dynamic extended feature can be captured via emulation of IA32_XFD and XSETBV. Basically having guest XCR0[i]=1 and XFD[i]=0 indicates that the feature[i] is activated by the guest. This patch provides a helper function for such check, invoked when either XCR0 or XFD is changed in the emulation path. Signed-off-by: Jing Liu Signed-off-by: Kevin Tian Signed-off-by: Yang Zhong --- arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++ arch/x86/kvm/x86.h | 1 + 2 files changed, 25 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 05f2cda73d69..91cc6f69a7ca 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -956,6 +956,30 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_load_host_xsave_state); +bool kvm_check_guest_realloc_fpstate(struct kvm_vcpu *vcpu, u64 xfd) +{ + u64 xcr0 = vcpu->arch.xcr0 & XFEATURE_MASK_USER_DYNAMIC; + + /* For any state which is enabled dynamically */ + if ((xfd & xcr0) != xcr0) { + u64 request = (xcr0 ^ xfd) & xcr0; + struct fpu_guest *guest_fpu = &vcpu->arch.guest_fpu; + + /* + * If requested features haven't been enabled, update + * the request bitmap and tell the caller to request + * dynamic buffer reallocation. + */ + if ((guest_fpu->user_xfeatures & request) != request) { + vcpu->arch.guest_fpu.realloc_request = request; + return true; + } + } + + return false; +} +EXPORT_SYMBOL_GPL(kvm_check_guest_realloc_fpstate); + static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) { u64 xcr0 = xcr; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 4abcd8d9836d..24a323980146 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -445,6 +445,7 @@ static inline void kvm_machine_check(void) void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu); void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu); +bool kvm_check_guest_realloc_fpstate(struct kvm_vcpu *vcpu, u64 new_xfd); int kvm_spec_ctrl_test_value(u64 value); bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4); int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r, From patchwork Wed Dec 8 00:03:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E67A5C433EF for ; Tue, 7 Dec 2021 15:09:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238662AbhLGPNX (ORCPT ); Tue, 7 Dec 2021 10:13:23 -0500 Received: from mga14.intel.com ([192.55.52.115]:5480 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238619AbhLGPNQ (ORCPT ); Tue, 7 Dec 2021 10:13:16 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821175" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821175" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289885" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:41 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 10/19] kvm: x86: Emulate WRMSR of guest IA32_XFD Date: Tue, 7 Dec 2021 19:03:50 -0500 Message-Id: <20211208000359.2853257-11-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu Intel's eXtended Feature Disable (XFD) feature allows the software to dynamically adjust fpstate buffer size for XSAVE features which have large state. WRMSR to IA32_XFD is intercepted so if the written value enables a dynamic XSAVE feature the emulation code can exit to userspace to trigger fpstate reallocation for the state. Introduce a new KVM exit reason (KVM_EXIT_FPU_REALLOC) for this purpose. If reallocation succeeds in fpu_swap_kvm_fpstate(), this exit just bounces to userspace and then back. Otherwise the userspace VMM should handle the error properly. Use a new exit reason (instead of KVM_EXIT_X86_WRMSR) is clearer and can be shared between WRMSR(IA32_XFD) and XSETBV. This also avoids mixing with the userspace MSR machinery which is tied to KVM_EXIT_X86_WRMSR today. Also introduce a new MSR return type (KVM_MSR_RET_USERSPACE). Currently MSR emulation returns to userspace only upon error or per certain filtering rules via the userspace MSR mechinary. This new return type indicates that emulation of certain MSR has its own specific reason to bounce to userspace. IA32_XFD is updated in two ways: - If reallocation is not required, the emulation code directly updates guest_fpu::xfd and then calls xfd_update_state() to update IA32_XFD and per-cpu cache; - If reallocation is triggered, above updates are completed as part of the fpstate reallocation process if succeeds; RDMSR to IA32_XFD is not intercepted. fpu_swap_kvm_fpstate() ensures the guest XFD value loaded into MSR before re-entering the guest. Just save an unnecessary VM-exit here Signed-off-by: Jing Liu Signed-off-by: Kevin Tian Signed-off-by: Yang Zhong Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/vmx.c | 8 +++++++ arch/x86/kvm/x86.c | 48 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.h | 1 + include/uapi/linux/kvm.h | 1 + 4 files changed, 58 insertions(+) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 70d86ffbccf7..971d60980d5b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7141,6 +7141,11 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_update_intercept_xfd(struct kvm_vcpu *vcpu) +{ + vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_R, false); +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7181,6 +7186,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) } } + if (cpu_feature_enabled(X86_FEATURE_XFD) && guest_cpuid_has(vcpu, X86_FEATURE_XFD)) + vmx_update_intercept_xfd(vcpu); + set_cr4_guest_host_mask(vmx); vmx_write_encls_bitmap(vcpu, NULL); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91cc6f69a7ca..c83887cb55ee 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1873,6 +1873,16 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index, { u64 msr_reason = kvm_msr_reason(r); + /* + * MSR emulation may need certain effect triggered in the + * path transitioning to userspace (e.g. fpstate realloction). + * In this case the actual exit reason and completion + * func should have been set by the emulation code before + * this point. + */ + if (r == KVM_MSR_RET_USERSPACE) + return 1; + /* Check if the user wanted to know about this MSR fault */ if (!(vcpu->kvm->arch.user_space_msr_mask & msr_reason)) return 0; @@ -3692,6 +3702,44 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; vcpu->arch.msr_misc_features_enables = data; break; +#ifdef CONFIG_X86_64 + case MSR_IA32_XFD: + if (!guest_cpuid_has(vcpu, X86_FEATURE_XFD)) + return 1; + + /* Setting unsupported bits causes #GP */ + if (~XFEATURE_MASK_USER_DYNAMIC & data) { + kvm_inject_gp(vcpu, 0); + break; + } + + WARN_ON_ONCE(current->thread.fpu.fpstate != + vcpu->arch.guest_fpu.fpstate); + + /* + * Check if fpstate reallocate is required. If yes, then + * let the fpu core do reallocation and update xfd; + * otherwise, update xfd here. + */ + if (kvm_check_guest_realloc_fpstate(vcpu, data)) { + vcpu->run->exit_reason = KVM_EXIT_FPU_REALLOC; + vcpu->arch.complete_userspace_io = + kvm_skip_emulated_instruction; + return KVM_MSR_RET_USERSPACE; + } + + /* + * Update IA32_XFD to the guest value so #NM can be + * raised properly in the guest. Instead of directly + * writing the MSR, call a helper to avoid breaking + * per-cpu cached value in fpu core. + */ + fpregs_lock(); + current->thread.fpu.fpstate->xfd = data; + xfd_update_state(current->thread.fpu.fpstate); + fpregs_unlock(); + break; +#endif default: if (kvm_pmu_is_valid_msr(vcpu, msr)) return kvm_pmu_set_msr(vcpu, msr_info); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 24a323980146..446ffa8c7804 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -460,6 +460,7 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type); */ #define KVM_MSR_RET_INVALID 2 /* in-kernel MSR emulation #GP condition */ #define KVM_MSR_RET_FILTERED 3 /* #GP due to userspace MSR filter */ +#define KVM_MSR_RET_USERSPACE 4 /* Userspace handling */ #define __cr4_reserved_bits(__cpu_has, __c) \ ({ \ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1daa45268de2..0c7b301c7254 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -270,6 +270,7 @@ struct kvm_xen_exit { #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 #define KVM_EXIT_RISCV_SBI 35 +#define KVM_EXIT_FPU_REALLOC 36 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ From patchwork Wed Dec 8 00:03:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6760AC433F5 for ; Tue, 7 Dec 2021 15:10:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238620AbhLGPNa (ORCPT ); Tue, 7 Dec 2021 10:13:30 -0500 Received: from mga14.intel.com ([192.55.52.115]:5581 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238630AbhLGPNU (ORCPT ); Tue, 7 Dec 2021 10:13:20 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821193" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821193" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289944" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:46 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 11/19] kvm: x86: Check fpstate reallocation in XSETBV emulation Date: Tue, 7 Dec 2021 19:03:51 -0500 Message-Id: <20211208000359.2853257-12-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org XSETBV allows the software to write the extended control register XCR0, thus its emulation handler also needs to check fpstate reallocation when the changed XCR0 value enables certain dynamically-enabled features. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kvm/x86.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c83887cb55ee..b195f4fa888f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1028,6 +1028,15 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu) return 1; } + if (guest_cpuid_has(vcpu, X86_FEATURE_XFD)) { + if (kvm_check_guest_realloc_fpstate(vcpu, vcpu->arch.guest_fpu.fpstate->xfd)) { + vcpu->run->exit_reason = KVM_EXIT_FPU_REALLOC; + vcpu->arch.complete_userspace_io = + kvm_skip_emulated_instruction; + return 0; + } + } + return kvm_skip_emulated_instruction(vcpu); } EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv); From patchwork Wed Dec 8 00:03:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C0ECC43217 for ; Tue, 7 Dec 2021 15:10:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238733AbhLGPNe (ORCPT ); Tue, 7 Dec 2021 10:13:34 -0500 Received: from mga14.intel.com ([192.55.52.115]:5593 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238668AbhLGPNY (ORCPT ); Tue, 7 Dec 2021 10:13:24 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821219" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821219" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289971" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:50 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 12/19] x86/fpu: Prepare KVM for bringing XFD state back in-sync Date: Tue, 7 Dec 2021 19:03:52 -0500 Message-Id: <20211208000359.2853257-13-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Thomas Gleixner Guest may toggle IA32_XFD in high frequency as it is part of the fpstate information (features, sizes, xfd) and swapped in task context switch. To minimize the trap overhead of writes to this MSR, one optimization is to allow guest direct write thus eliminate traps. However MSR passthrough implies that guest_fpstate::xfd and per-cpu xfd cache might be out of sync with the current IA32_XFD value by the guest. This suggests KVM needs to re-sync guest_fpstate::xfd and per-cpu cache with IA32_XFD before the vCPU thread might be preempted or interrupted. This patch provides a helper function for the re-sync purpose. Signed-off-by: Thomas Gleixner Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- (To Thomas): the original name kvm_update_guest_xfd_state() in your sample code is renamed to xfd_sync_state() in this patch. In concept it is a general helper to bring software values in-sync with the MSR value after they become out-of-sync. KVM is just the first out-of-sync usage on this helper, so a neutral name may make more sense. But if you prefer to the original name we can also change back. arch/x86/include/asm/fpu/xstate.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..c8b51d34daab 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -129,4 +129,6 @@ static __always_inline __pure bool fpu_state_size_dynamic(void) } #endif +extern void xfd_sync_state(void); + #endif diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 3c39789deeb9..a5656237a763 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1762,11 +1762,25 @@ void xfd_update_state(struct fpstate *fpstate) } } EXPORT_SYMBOL_GPL(xfd_update_state); + +/* Bring software state in sync with the current MSR value */ +void xfd_sync_state(void) +{ + if (fpu_state_size_dynamic()) { + u64 xfd; + + rdmsrl(MSR_IA32_XFD, xfd); + current->thread.fpu.fpstate->xfd = xfd; + __this_cpu_write(xfd_state, xfd); + } +} +EXPORT_SYMBOL_GPL(xfd_sync_state); #else /* CONFIG_X86_64 */ static inline int xstate_request_perm(unsigned long idx, bool guest) { return -EPERM; } +void xfd_sync_state(void) {} #endif /* !CONFIG_X86_64 */ inline u64 xstate_get_guest_group_perm(void) From patchwork Wed Dec 8 00:03:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0E28C4332F for ; Tue, 7 Dec 2021 15:10:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238795AbhLGPNm (ORCPT ); Tue, 7 Dec 2021 10:13:42 -0500 Received: from mga14.intel.com ([192.55.52.115]:5602 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230446AbhLGPN3 (ORCPT ); Tue, 7 Dec 2021 10:13:29 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821237" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821237" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:09:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461289995" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:53 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 13/19] kvm: x86: Disable WRMSR interception for IA32_XFD on demand Date: Tue, 7 Dec 2021 19:03:53 -0500 Message-Id: <20211208000359.2853257-14-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu Always intercepting IA32_XFD causes non-negligible overhead when this register is updated frequently in the guest. Disable WRMSR interception to IA32_XFD after fpstate reallocation is completed. There are three options for when to disable the interception: 1) When emulating the 1st WRMSR which requires reallocation, disable interception before exiting to userapce with the assumption that the userspace VMM should not bounch back to the kernel if reallocation fails. However it's not good to design kernel based on application behavior. If due to bug the vCPU thread comes back to the kernel after reallocation fails, XFD passthrough may lead to host memory corruption when doing XSAVES for guest fpstate which has a smaller size than what guest XFD allows. 2) Disable interception when coming back from the userspace VMM (for the 1st WRMSR which triggers reallocation). Re-check whether fpstate size can serve the new guest XFD value. Disable interception only when the check succeeds. This requires KVM to store guest XFD value in some place and then compare it to guest_fpu::user_xfeatures in the completion handler. 3) Disable interception at the 2nd WRMSR which enables dynamic XSTATE features. If guest_fpu::user_xfeatures already includes bits for dynamic features set in guest XFD value, disable interception. Currently 3) is implemented, with a flow like below: (G) WRMSR(IA32_XFD) which enables AMX for the FIRST time --trap to host-- (HK) Emulate WRMSR and find fpstate size too small (HK) Reallocate fpstate --exit to userspace-- (HU) do nothing --back to kernel via kvm_run-- (HK) complete WRMSR emulation --enter guest-- (G) do something (G) WRMSR(IA32_XFD) which disables AMX --trap to host-- (HK) Emulate WRMSR and disable AMX in IA32_XFD --enter guest-- (G) do something (G) WRMSR(IA32_XFD) which enables AMX for the SECOND time --trap to host-- (HK) Emulate WRMSR and find fpstate size sufficient for AMX (HK) Disable WRMSR interception for IA32_XFD --enter guest-- (G) WRMSR(IA32_XFD) (G) WRMSR(IA32_XFD) (G) WRMSR(IA32_XFD) ... After disabling WRMSR interception, the guest directly updates IA32_XFD which becomes out-of-sync with the host-side software state (guest_fpstate::xfd and per-cpu xfd cache). This requires KVM to call xfd_sync_state() to bring the software state in sync with IA32_XFD register after VM-exit (before preemption happens or exiting to userspace). p.s. We have confirmed that SDM is being revised to say that when setting IA32_XFD[18] the AMX register state is not guaranteed to be preserved. This clarification avoids adding mess for a creative guest which sets IA32_XFD[18]=1 before saving active AMX state to its own storage. Signed-off-by: Jing Liu Signed-off-by: Kevin Tian Signed-off-by: Yang Zhong --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.c | 7 +++++++ 5 files changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index cefe1d81e2e8..60c27f9990e9 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -30,6 +30,7 @@ KVM_X86_OP(update_exception_bitmap) KVM_X86_OP(get_msr) KVM_X86_OP(set_msr) KVM_X86_OP(get_segment_base) +KVM_X86_OP_NULL(set_xfd_passthrough) KVM_X86_OP(get_segment) KVM_X86_OP(get_cpl) KVM_X86_OP(set_segment) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6ac61f85e07b..7c97cc1fea89 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -640,6 +640,7 @@ struct kvm_vcpu_arch { u64 smi_count; bool tpr_access_reporting; bool xsaves_enabled; + bool xfd_out_of_sync; u64 ia32_xss; u64 microcode_version; u64 arch_capabilities; @@ -1328,6 +1329,7 @@ struct kvm_x86_ops { void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); + void (*set_xfd_passthrough)(struct kvm_vcpu *vcpu); u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); void (*get_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 971d60980d5b..6198b13c4846 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -160,6 +160,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = { MSR_FS_BASE, MSR_GS_BASE, MSR_KERNEL_GS_BASE, + MSR_IA32_XFD, #endif MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, @@ -1924,6 +1925,14 @@ static u64 vcpu_supported_debugctl(struct kvm_vcpu *vcpu) return debugctl; } +#ifdef CONFIG_X86_64 +static void vmx_set_xfd_passthrough(struct kvm_vcpu *vcpu) +{ + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_RW); + vcpu->arch.xfd_out_of_sync = true; +} +#endif + /* * Writes msr value into the appropriate "register". * Returns 0 on success, non-0 otherwise. @@ -7657,6 +7666,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { #ifdef CONFIG_X86_64 .set_hv_timer = vmx_set_hv_timer, .cancel_hv_timer = vmx_cancel_hv_timer, + .set_xfd_passthrough = vmx_set_xfd_passthrough, #endif .setup_mce = vmx_setup_mce, diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 4df2ac24ffc1..bf9d3051cd6c 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -340,7 +340,7 @@ struct vcpu_vmx { struct lbr_desc lbr_desc; /* Save desired MSR intercept (read: pass-through) state */ -#define MAX_POSSIBLE_PASSTHROUGH_MSRS 13 +#define MAX_POSSIBLE_PASSTHROUGH_MSRS 14 struct { DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS); DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b195f4fa888f..d127b229dd29 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -974,6 +974,10 @@ bool kvm_check_guest_realloc_fpstate(struct kvm_vcpu *vcpu, u64 xfd) vcpu->arch.guest_fpu.realloc_request = request; return true; } + + /* Disable WRMSR interception if possible */ + if (kvm_x86_ops.set_xfd_passthrough) + static_call(kvm_x86_set_xfd_passthrough)(vcpu); } return false; @@ -10002,6 +10006,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (hw_breakpoint_active()) hw_breakpoint_restore(); + if (vcpu->arch.xfd_out_of_sync) + xfd_sync_state(); + vcpu->arch.last_vmentry_cpu = vcpu->cpu; vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); From patchwork Wed Dec 8 00:03:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 675CAC433EF for ; Tue, 7 Dec 2021 15:10:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238654AbhLGPNt (ORCPT ); Tue, 7 Dec 2021 10:13:49 -0500 Received: from mga14.intel.com ([192.55.52.115]:5593 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238201AbhLGPNd (ORCPT ); Tue, 7 Dec 2021 10:13:33 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821257" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821257" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290058" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:09:58 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 14/19] x86/fpu: Prepare for KVM XFD_ERR handling Date: Tue, 7 Dec 2021 19:03:54 -0500 Message-Id: <20211208000359.2853257-15-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu When XFD causes an instruction to generate #NM, IA32_XFD_ERR contains information about which disabled state components are being accessed. The #NM handler is expected to check this information and then enable the state components by clearing IA32_XFD for the faulting task (if having permission). if the XFD_ERR value generated in guest is consumed/clobbered by the host before the guest itself doing so. This may lead to non-XFD-related #NM treated as XFD #NM in host (due to non-zero value in XFD_ERR), or XFD-related #NM treated as non-XFD #NM in guest (XFD_ERR cleared by the host #NM handler). This patch provides two helpers to swap the guest XFD_ERR and host XFD_ERR. Where to call them in KVM will be discussed thoroughly in next patch. The guest XFD_ERR value is saved in fpu_guest::xfd_err. There is no need to save host XFD_ERR because it's always cleared to ZERO by the host #NM handler (which cannot be preempted by a vCPU thread to observe a non-zero value). The lower two bits in fpu_guest::xfd_err is borrowed for special purposes. The state components (FP and SSE) covered by the two bits are not XSAVE-enabled feature, thus not XFD-enabled either. It's impossible to see hardware setting them in XFD_ERR: - XFD_ERR_GUEST_DISABLED (bit 0) Indicate that XFD extension is not exposed to the guest thus no need to save/restore it. - XFD_ERR_GUEST_SAVED (bit 1) Indicate fpu_guest::xfd_err already contains a saved value thus no need for duplicated saving (e.g. when the vCPU thread is preempted multiple times before re-enter the guest). Signed-off-by: Jing Liu Signed-off-by: Kevin Tian Signed-off-by: Yang Zhong --- arch/x86/include/asm/fpu/api.h | 8 ++++++ arch/x86/include/asm/fpu/types.h | 24 ++++++++++++++++ arch/x86/kernel/fpu/core.c | 49 ++++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 999d89026be9..c2e8f2172994 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -147,6 +147,14 @@ extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu); extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu); extern int fpu_swap_kvm_fpstate(struct fpu_guest *gfpu, bool enter_guest); +#ifdef CONFIG_X86_64 +extern void fpu_save_guest_xfd_err(struct fpu_guest *guest_fpu); +extern void fpu_restore_guest_xfd_err(struct fpu_guest *guest_fpu); +#else +static inline void fpu_save_guest_xfd_err(struct fpu_guest *guest_fpu) { } +static inline void fpu_restore_guest_xfd_err(struct fpu_guest *guest_fpu) { } +#endif + extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru); extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru); diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 861cffca3209..5ee98222c103 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -500,6 +500,22 @@ struct fpu { */ }; +/* + * Use @xfd_err:bit0 to indicate whether guest XFD_ERR should be + * saved/restored. The x87 state covered by bit 0 is not a + * XSAVE-enabled feature, thus is not XFD-enabled either (won't + * occur in XFD_ERR). + */ +#define XFD_ERR_GUEST_DISABLED (1 << XFEATURE_FP) + +/* + * Use @xfd_err:bit1 to indicate the validity of @xfd_err. Used to + * avoid duplicated savings in case the vCPU is preempted multiple + * times before it re-enters the guest. The SSE state covered by + * bit 1 is neither XSAVE-enabled nor XFD-enabled. + */ +#define XFD_ERR_GUEST_SAVED (1 << XFEATURE_SSE) + /* * Guest pseudo FPU container */ @@ -527,6 +543,14 @@ struct fpu_guest { */ u64 realloc_request; + /* + * @xfd_err: save the guest value. bit 0 and bit1 + * have special meaning to indicate the + * requirement of saving and the validity + * of the saved value. + */ + u64 xfd_err; + /* * @fpstate: Pointer to the allocated guest fpstate */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 7a0436a0cb2c..5089f2e7dc22 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -322,6 +322,55 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest) } EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpstate); +#ifdef CONFIG_X86_64 +void fpu_save_guest_xfd_err(struct fpu_guest *guest_fpu) +{ + if (guest_fpu->xfd_err & XFD_ERR_GUEST_DISABLED) + return; + + /* A non-zero value indicates guest XFD_ERR already saved */ + if (guest_fpu->xfd_err) + return; + + /* Guest XFD_ERR must be saved before switching to host fpstate */ + WARN_ON_ONCE(!current->thread.fpu.fpstate->is_guest); + + rdmsrl(MSR_IA32_XFD_ERR, guest_fpu->xfd_err); + + /* + * Restore to the host value if guest xfd_err is non-zero. + * Except in #NM handler, all other places in the kernel + * should just see xfd_err=0. So just restore to 0. + */ + if (guest_fpu->xfd_err) + wrmsrl(MSR_IA32_XFD_ERR, 0); + + guest_fpu->xfd_err |= XFD_ERR_GUEST_SAVED; +} +EXPORT_SYMBOL_GPL(fpu_save_guest_xfd_err); + +void fpu_restore_guest_xfd_err(struct fpu_guest *guest_fpu) +{ + u64 xfd_err = guest_fpu->xfd_err; + + if (xfd_err & XFD_ERR_GUEST_DISABLED) + return; + + xfd_err &= ~XFD_ERR_GUEST_SAVED; + + /* + * No need to restore a zero value since XFD_ERR + * is always zero outside of #NM handler in the host. + */ + if (!xfd_err) + return; + + wrmsrl(MSR_IA32_XFD_ERR, xfd_err); + guest_fpu->xfd_err = 0; +} +EXPORT_SYMBOL_GPL(fpu_restore_guest_xfd_err); +#endif + void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru) { From patchwork Wed Dec 8 00:03:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 200B9C433FE for ; Tue, 7 Dec 2021 15:10:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238791AbhLGPNz (ORCPT ); Tue, 7 Dec 2021 10:13:55 -0500 Received: from mga14.intel.com ([192.55.52.115]:5619 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238765AbhLGPNh (ORCPT ); Tue, 7 Dec 2021 10:13:37 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821291" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821291" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290108" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:10:02 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly Date: Tue, 7 Dec 2021 19:03:55 -0500 Message-Id: <20211208000359.2853257-16-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org KVM needs to save the guest XFD_ERR value before this register might be accessed by the host and restore it before entering the guest. This implementation saves guest XFD_ERR in two transition points: - When the vCPU thread exits to the userspace VMM; - When the vCPU thread is preempted; XFD_ERR is cleared to ZERO right after saving the previous guest value. Otherwise a stale guest value may confuse the host #NM handler to misinterpret a non-XFD-related #NM as XFD related. There is no need to save the host XFD_ERR value because the only place where XFD_ERR is consumed outside of KVM is in #NM handler (which can not be preempted by a vCPU thread). XFD_ERR should always be observed as ZER0 outside of #NM hanlder, thus clearing XFD_ERR meets the host expectation here. The saved guest value is restored to XFD_ERR right before entering the guest (with preemption disabled). Current implementation still has two opens which we would like to hear suggestions: 1) Will #NM be triggered in host kernel? Now the code is written assuming above is true, and it's the only reason for saving guest XFD_ERR at preemption time. Otherwise the save is only required when the CPU enters ring-3 (either from the vCPU itself or other threads), by leveraging the "user-return notifier" machinery as suggested by Paolo. 2) When to enable XFD_ERR save/restore? There are four options on the table: a) As long as guest cpuid has xfd enabled XFD_ERR save/restore is enabled in every VM-exit (if preemption or ret-to-userspace happens) b) When the guest sets IA32_XFD to 1 for the first time Indicate that guest OS supports XFD features. Because guest OS usually initializes IA32_XFD at boot time, XFD_ERR save/restore is enabled for almost every VM-exit (if preemption or ret-to- userspace happens). No save/restore for legacy guest OS which doesn't support XFD features at all (thus won't touch IA32_XFD). c) When the guest sets IA32_XFD to 0 for the first time Lazily enabling XFD_ERR save/restore until XFD features are used inside guest. However, this option doesn't work because XFD_ERR is set when #NM is raised. An VM-exit could happen between CPU raising #NM and guest #NM handler reading XFD_ERR (before setting XFD to 0). The very first XFD_ERR might be already clobbered by the host due to no save/restore in that small window. d) When the 1st guest #NM with non-zero XFD_ERR occurs Lazily enabling XFD_ERR save/restore until XFD features are used inside guest. This requires intercepting guest #NM until non-zero XFD_ERR occurs. If a guest with XFD in cpuid never launches an AMX application, it implies that #NM is always trapped thus adding a constant overhead which may be even higher than doing RDMSR in preemption path in a) and b): #preempts < #VMEXITS (no #NM trap) < #VMEXITS (#NM trap) The number of preemptions and ret-to-userspaces should be a small portion of total #VMEXITs in a healthy virtualization environment. Our gut-feeling is that adding at most one MSR read and one MSR write to the preempt/user-ret paths is possibly more efficient than increasing #VMEXITs due to trapping #NM. For above analysis we plan to go option b), although this version currently implements a). But we would like to hear other suggestions before making this change. Signed-off-by: Jing Liu Signed-off-by: Kevin Tian Signed-off-by: Yang Zhong --- arch/x86/kernel/fpu/core.c | 2 ++ arch/x86/kvm/cpuid.c | 5 +++++ arch/x86/kvm/vmx/vmx.c | 2 ++ arch/x86/kvm/vmx/vmx.h | 2 +- arch/x86/kvm/x86.c | 5 +++++ 5 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 5089f2e7dc22..9811dc98d550 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu) fpstate->is_guest = true; gfpu->fpstate = fpstate; + gfpu->xfd_err = XFD_ERR_GUEST_DISABLED; gfpu->user_xfeatures = fpu_user_cfg.default_features; gfpu->user_perm = fpu_user_cfg.default_features; fpu_init_guest_permissions(gfpu); @@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest) fpu->fpstate = guest_fps; guest_fps->in_use = true; } else { + fpu_save_guest_xfd_err(guest_fpu); guest_fps->in_use = false; fpu->fpstate = fpu->__task_fpstate; fpu->__task_fpstate = NULL; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index f3c61205bbf4..ea51b986ee67 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -219,6 +219,11 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_apic_set_version(vcpu); } + /* Enable saving guest XFD_ERR */ + best = kvm_find_cpuid_entry(vcpu, 7, 0); + if (best && cpuid_entry_has(best, X86_FEATURE_AMX_TILE)) + vcpu->arch.guest_fpu.xfd_err = 0; + best = kvm_find_cpuid_entry(vcpu, 0xD, 0); if (!best) vcpu->arch.guest_supported_xcr0 = 0; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6198b13c4846..0db8bdf273e2 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -161,6 +161,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = { MSR_GS_BASE, MSR_KERNEL_GS_BASE, MSR_IA32_XFD, + MSR_IA32_XFD_ERR, #endif MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, @@ -7153,6 +7154,7 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) static void vmx_update_intercept_xfd(struct kvm_vcpu *vcpu) { vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD, MSR_TYPE_R, false); + vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_RW, false); } static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index bf9d3051cd6c..0a00242a91e7 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -340,7 +340,7 @@ struct vcpu_vmx { struct lbr_desc lbr_desc; /* Save desired MSR intercept (read: pass-through) state */ -#define MAX_POSSIBLE_PASSTHROUGH_MSRS 14 +#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15 struct { DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS); DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d127b229dd29..8b033c9241d6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) kvm_steal_time_set_preempted(vcpu); srcu_read_unlock(&vcpu->kvm->srcu, idx); + if (vcpu->preempted) + fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu); + static_call(kvm_x86_vcpu_put)(vcpu); vcpu->arch.last_host_tsc = rdtsc(); } @@ -9951,6 +9954,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (test_thread_flag(TIF_NEED_FPU_LOAD)) switch_fpu_return(); + fpu_restore_guest_xfd_err(&vcpu->arch.guest_fpu); + if (unlikely(vcpu->arch.switch_db_regs)) { set_debugreg(0, 7); set_debugreg(vcpu->arch.eff_db[0], 0); From patchwork Wed Dec 8 00:03:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80117C433F5 for ; Tue, 7 Dec 2021 15:10:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238831AbhLGPOV (ORCPT ); Tue, 7 Dec 2021 10:14:21 -0500 Received: from mga03.intel.com ([134.134.136.65]:53842 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238876AbhLGPNt (ORCPT ); Tue, 7 Dec 2021 10:13:49 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237536538" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237536538" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290155" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:10:06 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 16/19] kvm: x86: Introduce KVM_{G|S}ET_XSAVE2 ioctl Date: Tue, 7 Dec 2021 19:03:56 -0500 Message-Id: <20211208000359.2853257-17-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu When dynamic XSTATE features are supported, the xsave states are beyond 4KB. The current kvm_xsave structure and related KVM_{G, S}ET_XSAVE only allows 4KB which is not enough for full states. Introduce a new kvm_xsave2 structure and the corresponding KVM_GET_XSAVE2 and KVM_SET_XSAVE2 ioctls so that userspace VMM can get and set the full xsave states. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/include/uapi/asm/kvm.h | 6 ++++ arch/x86/kvm/x86.c | 62 +++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 7 +++- 3 files changed, 74 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 5a776a08f78c..de42a51e20c3 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -47,6 +47,7 @@ #define __KVM_HAVE_VCPU_EVENTS #define __KVM_HAVE_DEBUGREGS #define __KVM_HAVE_XSAVE +#define __KVM_HAVE_XSAVE2 #define __KVM_HAVE_XCRS #define __KVM_HAVE_READONLY_MEM @@ -378,6 +379,11 @@ struct kvm_xsave { __u32 region[1024]; }; +struct kvm_xsave2 { + __u32 size; + __u8 state[0]; +}; + #define KVM_MAX_XCRS 16 struct kvm_xcr { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8b033c9241d6..d212f6d2d39a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4216,6 +4216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_DEBUGREGS: case KVM_CAP_X86_ROBUST_SINGLESTEP: case KVM_CAP_XSAVE: + case KVM_CAP_XSAVE2: case KVM_CAP_ASYNC_PF: case KVM_CAP_ASYNC_PF_INT: case KVM_CAP_GET_TSC_KHZ: @@ -4940,6 +4941,17 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, vcpu->arch.pkru); } +static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu, + u8 *state, u32 size) +{ + if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + return; + + fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, + state, size, + vcpu->arch.pkru); +} + static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { @@ -4951,6 +4963,15 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, supported_xcr0, &vcpu->arch.pkru); } +static int kvm_vcpu_ioctl_x86_set_xsave2(struct kvm_vcpu *vcpu, u8 *state) +{ + if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + return 0; + + return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, state, + supported_xcr0, &vcpu->arch.pkru); +} + static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, struct kvm_xcrs *guest_xcrs) { @@ -5416,6 +5437,47 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave); break; } + case KVM_GET_XSAVE2: { + struct kvm_xsave2 __user *xsave2_arg = argp; + struct kvm_xsave2 xsave2; + + r = -EFAULT; + if (copy_from_user(&xsave2, xsave2_arg, sizeof(struct kvm_xsave2))) + break; + + u.buffer = kzalloc(xsave2.size, GFP_KERNEL_ACCOUNT); + + r = -ENOMEM; + if (!u.buffer) + break; + + kvm_vcpu_ioctl_x86_get_xsave2(vcpu, u.buffer, xsave2.size); + + r = -EFAULT; + if (copy_to_user(xsave2_arg->state, u.buffer, xsave2.size)) + break; + + r = 0; + break; + } + case KVM_SET_XSAVE2: { + struct kvm_xsave2 __user *xsave2_arg = argp; + struct kvm_xsave2 xsave2; + + r = -EFAULT; + if (copy_from_user(&xsave2, xsave2_arg, sizeof(struct kvm_xsave2))) + break; + + u.buffer = memdup_user(xsave2_arg->state, xsave2.size); + + if (IS_ERR(u.buffer)) { + r = PTR_ERR(u.buffer); + goto out_nofree; + } + + r = kvm_vcpu_ioctl_x86_set_xsave2(vcpu, u.buffer); + break; + } case KVM_GET_XCRS: { u.xcrs = kzalloc(sizeof(struct kvm_xcrs), GFP_KERNEL_ACCOUNT); r = -ENOMEM; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0c7b301c7254..603e1ca9ba09 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1132,7 +1132,9 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204 #define KVM_CAP_ARM_MTE 205 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 - +#ifdef __KVM_HAVE_XSAVE2 +#define KVM_CAP_XSAVE2 207 +#endif #ifdef KVM_CAP_IRQ_ROUTING struct kvm_irq_routing_irqchip { @@ -1679,6 +1681,9 @@ struct kvm_xen_hvm_attr { #define KVM_GET_SREGS2 _IOR(KVMIO, 0xcc, struct kvm_sregs2) #define KVM_SET_SREGS2 _IOW(KVMIO, 0xcd, struct kvm_sregs2) +#define KVM_GET_XSAVE2 _IOR(KVMIO, 0xcf, struct kvm_xsave2) +#define KVM_SET_XSAVE2 _IOW(KVMIO, 0xd0, struct kvm_xsave2) + struct kvm_xen_vcpu_attr { __u16 type; __u16 pad[3]; From patchwork Wed Dec 8 00:03:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8519DC433F5 for ; Tue, 7 Dec 2021 15:10:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238582AbhLGPOP (ORCPT ); Tue, 7 Dec 2021 10:14:15 -0500 Received: from mga03.intel.com ([134.134.136.65]:53832 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238832AbhLGPNq (ORCPT ); Tue, 7 Dec 2021 10:13:46 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237536539" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237536539" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290184" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:10:11 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 17/19] docs: virt: api.rst: Document the new KVM_{G, S}ET_XSAVE2 ioctls Date: Tue, 7 Dec 2021 19:03:57 -0500 Message-Id: <20211208000359.2853257-18-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu Document the detailed information of the new KVM_{G, S}ET_XSAVE2 ioctls. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- Documentation/virt/kvm/api.rst | 47 ++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index aeeb071c7688..39dfd867e429 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1569,6 +1569,8 @@ otherwise it will return EBUSY error. }; This ioctl would copy current vcpu's xsave struct to the userspace. +Application should use KVM_GET_XSAVE2 if xsave states are larger than +4KB. 4.43 KVM_SET_XSAVE @@ -1588,6 +1590,8 @@ This ioctl would copy current vcpu's xsave struct to the userspace. }; This ioctl would copy userspace's xsave struct to the kernel. +Application should use KVM_SET_XSAVE2 if xsave states are larger than +4KB. 4.44 KVM_GET_XCRS @@ -7484,3 +7488,46 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace the hypercalls whose corresponding bit is in the argument, and return ENOSYS for the others. + +8.35 KVM_GET_XSAVE2 +------------------- + +:Capability: KVM_CAP_XSAVE2 +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: struct kvm_xsave2 (in) +:Returns: 0 on success, -1 on error + + +:: + + struct kvm_xsave2 { + __u32 size; + __u8 state[0]; + }; + +This ioctl is used for copying current vcpu's xsave struct to the +userspace when xsave state size is larger than 4KB. Application code +should set the 'size' member which indicates the size of xsave state +and KVM copies the xsave state into the 'state' region. + +8.36 KVM_SET_XSAVE2 +------------------- + +:Capability: KVM_CAP_XSAVE2 +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: struct kvm_xsave2 (out) +:Returns: 0 on success, -1 on error + + +:: + + struct kvm_xsave2 { + __u32 size; + __u8 state[0]; + }; + +This ioctl is used for copying userspace's xsave struct to the kernel +when xsave size is larger than 4KB. Application code should set the +'size' member which indicates the size of xsave state. From patchwork Wed Dec 8 00:03:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E71DC433F5 for ; Tue, 7 Dec 2021 15:10:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238858AbhLGPOY (ORCPT ); Tue, 7 Dec 2021 10:14:24 -0500 Received: from mga03.intel.com ([134.134.136.65]:53856 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238728AbhLGPNx (ORCPT ); Tue, 7 Dec 2021 10:13:53 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237536558" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237536558" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290211" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:10:15 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 18/19] kvm: x86: AMX XCR0 support for guest Date: Tue, 7 Dec 2021 19:03:58 -0500 Message-Id: <20211208000359.2853257-19-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu Two XCR0 bits are defined for AMX to support XSAVE mechanism. Bit 17 is for tilecfg and bit 18 is for tiledata. The value of XCR0[17:18] is always either 00b or 11b. Also, SDM recommends that only 64-bit operating systems enable Intel AMX by setting XCR0[18:17]. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/kvm/x86.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d212f6d2d39a..a9a608c8fa50 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -210,7 +210,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \ | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \ | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \ - | XFEATURE_MASK_PKRU) + | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE) u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); @@ -1017,6 +1017,23 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512) return 1; } + +#ifdef CONFIG_X86_64 + if ((xcr0 & XFEATURE_MASK_XTILE) && + ((xcr0 & XFEATURE_MASK_XTILE) != XFEATURE_MASK_XTILE)) + return 1; +#else + /* + * Intel AMX instructions can be executed only in 64-bit mode but + * XSAVE can operate on XTILECFG and XTILEDATA in any mode. + * Since the FPU core follows SDM recommendation to set + * XCR[18:17] only in 64-bit environment, here also prevent any + * guest OS from setting the two bits when host is 32-bit. + * + * XFEATURE_MASK_XTILE cannot be used since it is 0 in this case. + */ + xcr0 &= ~(XFEATURE_MASK_XTILE_DATA | XFEATURE_MASK_XTILE_CFG); +#endif vcpu->arch.xcr0 = xcr0; if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND) From patchwork Wed Dec 8 00:03:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhong X-Patchwork-Id: 12662101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C288FC433F5 for ; Tue, 7 Dec 2021 15:11:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238706AbhLGPOf (ORCPT ); Tue, 7 Dec 2021 10:14:35 -0500 Received: from mga03.intel.com ([134.134.136.65]:53832 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238673AbhLGPOO (ORCPT ); Tue, 7 Dec 2021 10:14:14 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237536580" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237536580" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290237" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:10:19 -0800 From: Yang Zhong To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 19/19] kvm: x86: Add AMX CPUIDs support Date: Tue, 7 Dec 2021 19:03:59 -0500 Message-Id: <20211208000359.2853257-20-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Liu Extend CPUID emulation to support XFD, AMX_TILE, AMX_INT8 and AMX_BF16. Adding those bits into kvm_cpu_caps finally activates all previous logics in this series. Signed-off-by: Jing Liu Signed-off-by: Yang Zhong --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/kvm/cpuid.c | 16 +++++++++++++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d5b5f2ab87a0..da872b6f8d8b 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -299,7 +299,9 @@ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ +#define X86_FEATURE_AMX_BF16 (18*32+22) /* AMX bf16 Support */ #define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */ +#define X86_FEATURE_AMX_INT8 (18*32+25) /* AMX int8 Support */ /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index ea51b986ee67..7bb56cc89aa7 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -510,7 +510,8 @@ void kvm_set_cpu_caps(void) F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) | F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) | - F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) + F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | + F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) ); /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */ @@ -529,7 +530,7 @@ void kvm_set_cpu_caps(void) ); kvm_cpu_cap_mask(CPUID_D_1_EAX, - F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) + F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | F(XFD) ); kvm_cpu_cap_init_scattered(CPUID_12_EAX, @@ -655,6 +656,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array, case 0x14: case 0x17: case 0x18: + case 0x1d: + case 0x1e: case 0x1f: case 0x8000001d: entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX; @@ -779,6 +782,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) } break; case 9: + case 0x1e: /* TMUL information */ break; case 0xa: { /* Architectural Performance Monitoring */ struct x86_pmu_capability cap; @@ -914,7 +918,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) break; /* Intel PT */ case 0x14: - if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) { + if ((function == 0x14 && !kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) || + (function == 0x1d && !kvm_cpu_cap_has(X86_FEATURE_AMX_TILE))) { entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; } @@ -924,6 +929,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) goto out; } break; + /* Intel AMX TILE */ + case 0x1d: + if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) + entry->eax = entry->ebx = entry->ecx = entry->edx = 0; + break; case KVM_CPUID_SIGNATURE: { const u32 *sigptr = (const u32 *)KVM_SIGNATURE; entry->eax = KVM_CPUID_FEATURES;