From patchwork Wed Oct 14 02:11:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836631 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A58341580 for ; Wed, 14 Oct 2020 02:09:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 920AF21D7F for ; Wed, 14 Oct 2020 02:09:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729022AbgJNCJt (ORCPT ); Tue, 13 Oct 2020 22:09:49 -0400 Received: from mga06.intel.com ([134.134.136.31]:51139 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725874AbgJNCJr (ORCPT ); Tue, 13 Oct 2020 22:09:47 -0400 IronPort-SDR: U15kODlCu8xgK2h7oRkpa6Ky8CR13pQ555m9Ck1R0F1cOAezpD9bDqe4ry50ow38oNreSiHtgv XVWw24hsSLDQ== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659787" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659787" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:47 -0700 IronPort-SDR: kVFpELbcsuQC3I3CucF2zICCDVrPqhyTa8jZu/23UKwxnjc4pzbbb73VOp03nmU2PUcjXieht3 hKmfhhSSALWw== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645100" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:45 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 1/7] KVM: VMX: Introduce PKS VMCS fields Date: Wed, 14 Oct 2020 10:11:50 +0800 Message-Id: <20201014021157.18022-2-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org PKS(Protection Keys for Supervisor Pages) is a feature that extends the Protection Key architecture to support thread-specific permission restrictions on supervisor pages. A new PKS MSR(PKRS) is defined in kernel to support PKS, which holds a set of permissions associated with each protection domian. Two VMCS fields {HOST,GUEST}_IA32_PKRS are introduced in {host,guest}-state area to store the value of PKRS. Every VM exit saves PKRS into guest-state area. If VM_EXIT_LOAD_IA32_PKRS = 1, VM exit loads PKRS from the host-state area. If VM_ENTRY_LOAD_IA32_PKRS = 1, VM entry loads PKRS from the guest-state area. Signed-off-by: Chenyi Qiang Reviewed-by: Jim Mattson --- arch/x86/include/asm/vmx.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index cd7de4b401fe..425cf81dd722 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -94,6 +94,7 @@ #define VM_EXIT_CLEAR_BNDCFGS 0x00800000 #define VM_EXIT_PT_CONCEAL_PIP 0x01000000 #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000 +#define VM_EXIT_LOAD_IA32_PKRS 0x20000000 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff @@ -107,6 +108,7 @@ #define VM_ENTRY_LOAD_BNDCFGS 0x00010000 #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000 #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000 +#define VM_ENTRY_LOAD_IA32_PKRS 0x00400000 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff @@ -243,12 +245,16 @@ enum vmcs_field { GUEST_BNDCFGS_HIGH = 0x00002813, GUEST_IA32_RTIT_CTL = 0x00002814, GUEST_IA32_RTIT_CTL_HIGH = 0x00002815, + GUEST_IA32_PKRS = 0x00002818, + GUEST_IA32_PKRS_HIGH = 0x00002819, HOST_IA32_PAT = 0x00002c00, HOST_IA32_PAT_HIGH = 0x00002c01, HOST_IA32_EFER = 0x00002c02, HOST_IA32_EFER_HIGH = 0x00002c03, HOST_IA32_PERF_GLOBAL_CTRL = 0x00002c04, HOST_IA32_PERF_GLOBAL_CTRL_HIGH = 0x00002c05, + HOST_IA32_PKRS = 0x00002c06, + HOST_IA32_PKRS_HIGH = 0x00002c07, PIN_BASED_VM_EXEC_CONTROL = 0x00004000, CPU_BASED_VM_EXEC_CONTROL = 0x00004002, EXCEPTION_BITMAP = 0x00004004, From patchwork Wed Oct 14 02:11:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9B6A14B4 for ; Wed, 14 Oct 2020 02:10:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9C37021D7F for ; Wed, 14 Oct 2020 02:10:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728478AbgJNCKo (ORCPT ); Tue, 13 Oct 2020 22:10:44 -0400 Received: from mga06.intel.com ([134.134.136.31]:51139 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729057AbgJNCJv (ORCPT ); Tue, 13 Oct 2020 22:09:51 -0400 IronPort-SDR: q30WKea/oVQwzhoGV8YNGVV9WVj1KfujdcI7EBbhSlzgklJpvPsWxAfhkzE4r3BFM1db1dPjWi tnkvihH/kTSg== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659792" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659792" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:50 -0700 IronPort-SDR: K/c8ZMqpDYwp2/U0n8XulX8u5viSg2vDqEwnkdEAYPWsNwXIWrgINepuJcU+oSkc4QkPGnufl+ 7q1tURuhi2CA== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645115" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:47 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 2/7] KVM: VMX: Expose IA32_PKRS MSR Date: Wed, 14 Oct 2020 10:11:51 +0800 Message-Id: <20201014021157.18022-3-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Protection Keys for Supervisor Pages (PKS) uses IA32_PKRS MSR (PKRS) at index 0x6E1 to allow software to manage supervisor protection key rights. For performance consideration, PKRS intercept will be disabled so that the guest can access the PKRS without VM exits. PKS introduces dedicated control fields in VMCS to switch PKRS, which only does the retore part. In addition, every VM exit saves PKRS into the guest-state area in VMCS, while VM enter won't save the host value due to the expectation that the host won't change the MSR often. Update the host's value in VMCS manually if the MSR has been changed by the kernel since the last time the VMCS was run. The function get_current_pkrs() in arch/x86/mm/pkeys.c exports the per-cpu variable pkrs_cache to avoid frequent rdmsr of PKRS. Signed-off-by: Chenyi Qiang --- arch/x86/include/asm/pkeys.h | 1 + arch/x86/kvm/vmx/capabilities.h | 6 +++ arch/x86/kvm/vmx/nested.c | 1 + arch/x86/kvm/vmx/vmcs.h | 1 + arch/x86/kvm/vmx/vmx.c | 66 ++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.h | 6 +++ arch/x86/mm/pkeys.c | 6 +++ include/linux/pkeys.h | 4 ++ 8 files changed, 89 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index cae0153a5480..2e666dd2ea31 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -142,6 +142,7 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS int pks_key_alloc(const char *const pkey_user); void pks_key_free(int pkey); +u32 get_current_pkrs(void); void pks_mknoaccess(int pkey, bool global); void pks_mkread(int pkey, bool global); diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 4bbd8b448d22..7099e3105f48 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -103,6 +103,12 @@ static inline bool cpu_has_load_perf_global_ctrl(void) (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL); } +static inline bool cpu_has_load_ia32_pkrs(void) +{ + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PKRS) && + (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PKRS); +} + static inline bool cpu_has_vmx_mpx(void) { return (vmcs_config.vmexit_ctrl & VM_EXIT_CLEAR_BNDCFGS) && diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 1bb6b31eb646..14f56e8dd060 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -294,6 +294,7 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx, dest->ds_sel = src->ds_sel; dest->es_sel = src->es_sel; #endif + dest->pkrs = src->pkrs; } static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs) diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index 7a3675fddec2..39ec3d0c844b 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -40,6 +40,7 @@ struct vmcs_host_state { #ifdef CONFIG_X86_64 u16 ds_sel, es_sel; #endif + u32 pkrs; }; struct vmcs_controls_shadow { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 96979c09ebd1..e5da5dbe19d4 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -1147,6 +1147,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) #endif unsigned long fs_base, gs_base; u16 fs_sel, gs_sel; + u32 host_pkrs; int i; vmx->req_immediate_exit = false; @@ -1179,6 +1180,20 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) */ host_state->ldt_sel = kvm_read_ldt(); + /* + * Update the host pkrs vmcs field before vcpu runs. + * The setting of VM_EXIT_LOAD_IA32_PKRS can ensure + * kvm_cpu_cap_has(X86_FEATURE_PKS) && + * guest_cpuid_has(vcpu, X86_FEATURE_VMX). + */ + if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) { + host_pkrs = get_current_pkrs(); + if (unlikely(host_pkrs != host_state->pkrs)) { + vmcs_write64(HOST_IA32_PKRS, host_pkrs); + host_state->pkrs = host_pkrs; + } + } + #ifdef CONFIG_X86_64 savesegment(ds, host_state->ds_sel); savesegment(es, host_state->es_sel); @@ -1967,6 +1982,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; break; + case MSR_IA32_PKRS: + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) + return 1; + msr_info->data = vmcs_read64(GUEST_IA32_PKRS); + break; case MSR_TSC_AUX: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -2237,6 +2259,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) else vmx->pt_desc.guest.addr_a[index / 2] = data; break; + case MSR_IA32_PKRS: + if (!kvm_pkrs_valid(data)) + return 1; + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) + return 1; + vmcs_write64(GUEST_IA32_PKRS, data); + break; case MSR_TSC_AUX: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP)) @@ -2526,7 +2557,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, VM_EXIT_LOAD_IA32_EFER | VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_PT_CONCEAL_PIP | - VM_EXIT_CLEAR_IA32_RTIT_CTL; + VM_EXIT_CLEAR_IA32_RTIT_CTL | + VM_EXIT_LOAD_IA32_PKRS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS, &_vmexit_control) < 0) return -EIO; @@ -2550,7 +2582,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_LOAD_BNDCFGS | VM_ENTRY_PT_CONCEAL_PIP | - VM_ENTRY_LOAD_IA32_RTIT_CTL; + VM_ENTRY_LOAD_IA32_RTIT_CTL | + VM_ENTRY_LOAD_IA32_PKRS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, &_vmentry_control) < 0) return -EIO; @@ -5898,6 +5931,8 @@ void dump_vmcs(void) vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL)); if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS)); + if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PKRS) + pr_err("PKRS = 0x%016llx\n", vmcs_read64(GUEST_IA32_PKRS)); pr_err("Interruptibility = %08x ActivityState = %08x\n", vmcs_read32(GUEST_INTERRUPTIBILITY_INFO), vmcs_read32(GUEST_ACTIVITY_STATE)); @@ -5933,6 +5968,8 @@ void dump_vmcs(void) vmexit_ctl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL) pr_err("PerfGlobCtl = 0x%016llx\n", vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); + if (vmexit_ctl & VM_EXIT_LOAD_IA32_PKRS) + pr_err("PKRS = 0x%016llx\n", vmcs_read64(HOST_IA32_PKRS)); pr_err("*** Control State ***\n"); pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n", @@ -7312,6 +7349,26 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); } +static void vmx_update_pkrs_cfg(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; + bool pks_supported = guest_cpuid_has(vcpu, X86_FEATURE_PKS); + + /* + * set intercept for PKRS when the guest doesn't support pks + */ + vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_PKRS, MSR_TYPE_RW, !pks_supported); + + if (pks_supported) { + vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); + vm_exit_controls_setbit(vmx, VM_EXIT_LOAD_IA32_PKRS); + } else { + vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); + vm_exit_controls_clearbit(vmx, VM_EXIT_LOAD_IA32_PKRS); + } +} + static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -7333,6 +7390,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) ~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX | FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX); + if (kvm_cpu_cap_has(X86_FEATURE_PKS)) + vmx_update_pkrs_cfg(vcpu); + else + guest_cpuid_clear(vcpu, X86_FEATURE_PKS); + if (nested_vmx_allowed(vcpu)) { nested_vmx_cr_fixed1_bits_update(vcpu); nested_vmx_entry_exit_ctls_update(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 995ab696dcf0..193fdb514db5 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -367,6 +367,12 @@ static inline bool kvm_dr6_valid(u64 data) return !(data >> 32); } +static inline bool kvm_pkrs_valid(u64 data) +{ + /* bit[63,32] must be zero */ + return !(data >> 32); +} + void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu); void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu); int kvm_spec_ctrl_test_value(u64 value); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index aeb17a28719d..2921a835bd3c 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -441,6 +441,12 @@ void pks_key_free(int pkey) } EXPORT_SYMBOL_GPL(pks_key_free); +u32 get_current_pkrs(void) +{ + return this_cpu_read(pkrs_cache); +} +EXPORT_SYMBOL_GPL(get_current_pkrs); + static int pks_keys_allocated_show(struct seq_file *m, void *p) { int i; diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 8f3bfec83949..239ba84bdf9a 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -69,6 +69,10 @@ static inline void pks_mkrdwr(int pkey, bool global) { WARN_ON_ONCE(1); } +static inline u32 get_current_pkrs(void) +{ + return 0; +} #endif /* ! CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ #endif /* _LINUX_PKEYS_H */ From patchwork Wed Oct 14 02:11:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836645 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D120921 for ; Wed, 14 Oct 2020 02:10:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0BA4421D81 for ; Wed, 14 Oct 2020 02:10:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727189AbgJNCJ6 (ORCPT ); Tue, 13 Oct 2020 22:09:58 -0400 Received: from mga06.intel.com ([134.134.136.31]:51162 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728898AbgJNCJx (ORCPT ); Tue, 13 Oct 2020 22:09:53 -0400 IronPort-SDR: woRzmbBuu2aT9F/2rBKY/kzdZL3bAuCxMi2Q0jSji4FLnlwChtGg4cDNXchTyd8DSYim637IHJ 9X/5fD2+WJRQ== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659795" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659795" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:52 -0700 IronPort-SDR: IBLp8IaPyiGX8GWsrNk00jPxnyj0k6NE5+/Z4Bl/wYoa2c8NaSVH6gO/kwVXAzsy08c4jXL876 2+mZyi5GNSoQ== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645129" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:50 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 3/7] KVM: MMU: Rename the pkru to pkr Date: Wed, 14 Oct 2020 10:11:52 +0800 Message-Id: <20201014021157.18022-4-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org PKRU represents the PKU register utilized in the protection key rights check for user pages. Protection Keys for Superviosr Pages (PKS) extends the protection key architecture to cover supervisor pages. Rename the *pkru* related variables and functions to *pkr* which stands for both of the PKRU and PKRS. It makes sense because both registers have the same format. PKS and PKU can also share the same bitmap to cache the conditions where protection key checks are needed. Signed-off-by: Chenyi Qiang --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu.h | 12 ++++++------ arch/x86/kvm/mmu/mmu.c | 18 +++++++++--------- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5303dbc5c9bc..dd3af15e109f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -381,7 +381,7 @@ struct kvm_mmu { * with PFEC.RSVD replaced by ACC_USER_MASK from the page tables. * Each domain has 2 bits which are ANDed with AD and WD from PKRU. */ - u32 pkru_mask; + u32 pkr_mask; u64 *pae_root; u64 *lm_root; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 5efc6081ca13..306608248594 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -195,8 +195,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, u32 errcode = PFERR_PRESENT_MASK; WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); - if (unlikely(mmu->pkru_mask)) { - u32 pkru_bits, offset; + if (unlikely(mmu->pkr_mask)) { + u32 pkr_bits, offset; /* * PKRU defines 32 bits, there are 16 domains and 2 @@ -204,15 +204,15 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, * index of the protection domain, so pte_pkey * 2 is * is the index of the first bit for the domain. */ - pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ offset = (pfec & ~1) + ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT)); - pkru_bits &= mmu->pkru_mask >> offset; - errcode |= -pkru_bits & PFERR_PK_MASK; - fault |= (pkru_bits != 0); + pkr_bits &= mmu->pkr_mask >> offset; + errcode |= -pkr_bits & PFERR_PK_MASK; + fault |= (pkr_bits != 0); } return -(u32)fault & errcode; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 71aa3da2a0b7..834a95cf49fa 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4695,20 +4695,20 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, * away both AD and WD. For all reads or if the last condition holds, WD * only will be masked away. */ -static void update_pkru_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, +static void update_pkr_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, bool ept) { unsigned bit; bool wp; if (ept) { - mmu->pkru_mask = 0; + mmu->pkr_mask = 0; return; } /* PKEY is enabled only if CR4.PKE and EFER.LMA are both set. */ if (!kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || !is_long_mode(vcpu)) { - mmu->pkru_mask = 0; + mmu->pkr_mask = 0; return; } @@ -4742,7 +4742,7 @@ static void update_pkru_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, /* PKRU.WD stops write access. */ pkey_bits |= (!!check_write) << 1; - mmu->pkru_mask |= (pkey_bits & 3) << pfec; + mmu->pkr_mask |= (pkey_bits & 3) << pfec; } } @@ -4764,7 +4764,7 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu, reset_rsvds_bits_mask(vcpu, context); update_permission_bitmask(vcpu, context, false); - update_pkru_bitmask(vcpu, context, false); + update_pkr_bitmask(vcpu, context, false); update_last_nonleaf_level(vcpu, context); MMU_WARN_ON(!is_pae(vcpu)); @@ -4794,7 +4794,7 @@ static void paging32_init_context(struct kvm_vcpu *vcpu, reset_rsvds_bits_mask(vcpu, context); update_permission_bitmask(vcpu, context, false); - update_pkru_bitmask(vcpu, context, false); + update_pkr_bitmask(vcpu, context, false); update_last_nonleaf_level(vcpu, context); context->page_fault = paging32_page_fault; @@ -4913,7 +4913,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) } update_permission_bitmask(vcpu, context, false); - update_pkru_bitmask(vcpu, context, false); + update_pkr_bitmask(vcpu, context, false); update_last_nonleaf_level(vcpu, context); reset_tdp_shadow_zero_bits_mask(vcpu, context); } @@ -5061,7 +5061,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, context->mmu_role.as_u64 = new_role.as_u64; update_permission_bitmask(vcpu, context, true); - update_pkru_bitmask(vcpu, context, true); + update_pkr_bitmask(vcpu, context, true); update_last_nonleaf_level(vcpu, context); reset_rsvds_bits_mask_ept(vcpu, context, execonly); reset_ept_shadow_zero_bits_mask(vcpu, context, execonly); @@ -5132,7 +5132,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu) } update_permission_bitmask(vcpu, g_context, false); - update_pkru_bitmask(vcpu, g_context, false); + update_pkr_bitmask(vcpu, g_context, false); update_last_nonleaf_level(vcpu, g_context); } From patchwork Wed Oct 14 02:11:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DFBF514B4 for ; Wed, 14 Oct 2020 02:10:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD70122227 for ; Wed, 14 Oct 2020 02:10:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729079AbgJNCJ6 (ORCPT ); Tue, 13 Oct 2020 22:09:58 -0400 Received: from mga06.intel.com ([134.134.136.31]:51170 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727278AbgJNCJ4 (ORCPT ); Tue, 13 Oct 2020 22:09:56 -0400 IronPort-SDR: Q8QMPy2lC9QZei9sWYU58m6kN02htWsHqrqFNegPXRLHdsIqVjghbr8q21riYLK6DRy1YHG6W5 v18j4B0bDmJg== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659799" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659799" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:55 -0700 IronPort-SDR: V3aq9B9jgPf4LWjECQ64jqbZbbzod5sbsERtSd10x+HH7jntfVtiI1j/y1uEqQjLB2oU4N6AmP HPVPBA6kPJZA== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645144" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:53 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 4/7] KVM: MMU: Refactor pkr_mask to cache condition Date: Wed, 14 Oct 2020 10:11:53 +0800 Message-Id: <20201014021157.18022-5-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org pkr_mask bitmap indicates if protection key checks are needed for user pages currently. It is indexed by page fault error code bits [4:1] with PFEC.RSVD replaced by the ACC_USER_MASK from the page tables. Refactor it by reverting to the use of PFEC.RSVD. After that, PKS and PKU can share the same bitmap. Signed-off-by: Chenyi Qiang --- arch/x86/kvm/mmu.h | 10 ++++++---- arch/x86/kvm/mmu/mmu.c | 16 ++++++++++------ 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 306608248594..597b9159c10b 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -204,11 +204,13 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, * index of the protection domain, so pte_pkey * 2 is * is the index of the first bit for the domain. */ - pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + if (pte_access & PT_USER_MASK) + pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + else + pkr_bits = 0; - /* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */ - offset = (pfec & ~1) + - ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT)); + /* clear present bit */ + offset = (pfec & ~1); pkr_bits &= mmu->pkr_mask >> offset; errcode |= -pkr_bits & PFERR_PK_MASK; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 834a95cf49fa..f9814ab0596d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4716,21 +4716,25 @@ static void update_pkr_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, for (bit = 0; bit < ARRAY_SIZE(mmu->permissions); ++bit) { unsigned pfec, pkey_bits; - bool check_pkey, check_write, ff, uf, wf, pte_user; + bool check_pkey, check_write, ff, uf, wf, rsvdf; pfec = bit << 1; ff = pfec & PFERR_FETCH_MASK; uf = pfec & PFERR_USER_MASK; wf = pfec & PFERR_WRITE_MASK; - /* PFEC.RSVD is replaced by ACC_USER_MASK. */ - pte_user = pfec & PFERR_RSVD_MASK; + /* + * PFERR_RSVD_MASK bit is not set if the + * access is subject to PK restrictions. + */ + rsvdf = pfec & PFERR_RSVD_MASK; /* - * Only need to check the access which is not an - * instruction fetch and is to a user page. + * need to check the access which is not an + * instruction fetch and is not a rsvd fault. */ - check_pkey = (!ff && pte_user); + check_pkey = (!ff && !rsvdf); + /* * write access is controlled by PKRU if it is a * user access or CR0.WP = 1. From patchwork Wed Oct 14 02:11:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 26CFE921 for ; Wed, 14 Oct 2020 02:10:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04C78221FF for ; Wed, 14 Oct 2020 02:10:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729126AbgJNCKA (ORCPT ); Tue, 13 Oct 2020 22:10:00 -0400 Received: from mga06.intel.com ([134.134.136.31]:51174 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727861AbgJNCJ7 (ORCPT ); Tue, 13 Oct 2020 22:09:59 -0400 IronPort-SDR: bJtSGSnzBANAvPReItA2zLv/PRFbkENm0bmrE02fYFcmDOZQlwFMnoXTmOO4GFujJrIpu9spQL MOIUbAi41CXg== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659803" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659803" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:57 -0700 IronPort-SDR: LHaXqkG810sSCh0Kv4rOAqcSdBJ3r2DFLCnNfsdC34wPISXOg7Mf2Nv5kAy79i6/rlEGaO/P9c nP3kzG/2DKHw== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645159" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:55 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 5/7] KVM: MMU: Add support for PKS emulation Date: Wed, 14 Oct 2020 10:11:54 +0800 Message-Id: <20201014021157.18022-6-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Advertise pkr_mask to cache the conditions where pretection key checks for supervisor pages are needed. When the accessed pages are those with a translation for which the U/S flag is 0 in at least one paging-structure entry controlling the translation, they are the supervisor pages and PKRS enforces the access rights check. Signed-off-by: Chenyi Qiang --- arch/x86/include/asm/kvm_host.h | 8 +++--- arch/x86/kvm/mmu.h | 12 ++++++--- arch/x86/kvm/mmu/mmu.c | 44 +++++++++++++++++---------------- 3 files changed, 35 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dd3af15e109f..d5f0c3a71a41 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -376,10 +376,10 @@ struct kvm_mmu { u8 permissions[16]; /* - * The pkru_mask indicates if protection key checks are needed. It - * consists of 16 domains indexed by page fault error code bits [4:1], - * with PFEC.RSVD replaced by ACC_USER_MASK from the page tables. - * Each domain has 2 bits which are ANDed with AD and WD from PKRU. + * The pkr_mask indicates if protection key checks are needed. + * It consists of 16 domains indexed by page fault error code + * bits[4:1]. Each domain has 2 bits which are ANDed with AD + * and WD from PKRU/PKRS. */ u32 pkr_mask; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 597b9159c10b..aca1fc7f1ad7 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -197,15 +197,19 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK)); if (unlikely(mmu->pkr_mask)) { u32 pkr_bits, offset; + u64 pkrs; /* - * PKRU defines 32 bits, there are 16 domains and 2 - * attribute bits per domain in pkru. pte_pkey is the - * index of the protection domain, so pte_pkey * 2 is - * is the index of the first bit for the domain. + * PKRU and PKRS both define 32 bits. There are 16 domains + * and 2 attribute bits per domain in them. pte_key is the + * index of the protection domain, so pte_pkey * 2 is the + * index of the first bit for the domain. The choice of + * PKRU and PKRS is determined by the accessed pages. */ if (pte_access & PT_USER_MASK) pkr_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3; + else if (!kvm_get_msr(vcpu, MSR_IA32_PKRS, &pkrs)) + pkr_bits = (pkrs >> (pte_pkey * 2)) & 3; else pkr_bits = 0; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f9814ab0596d..3614952a8c7e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4672,28 +4672,29 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, } /* -* PKU is an additional mechanism by which the paging controls access to -* user-mode addresses based on the value in the PKRU register. Protection -* key violations are reported through a bit in the page fault error code. +* Protection Keys (PKEY) is an additional mechanism by which +* the paging controls access to user-mode/supervisor-mode address +* based on the values in PKEY registers (PKRU/PKRS). Protection key +* violations are reported through a bit in the page fault error code. * Unlike other bits of the error code, the PK bit is not known at the * call site of e.g. gva_to_gpa; it must be computed directly in -* permission_fault based on two bits of PKRU, on some machine state (CR4, -* CR0, EFER, CPL), and on other bits of the error code and the page tables. +* permission_fault based on two bits of PKRU/PKRS, on some machine +* state (CR4, CR0, EFER, CPL), and on other bits of the error code +* and the page tables. * * In particular the following conditions come from the error code, the * page tables and the machine state: -* - PK is always zero unless CR4.PKE=1 and EFER.LMA=1 +* - PK is always zero unless CR4.PKE=1/CR4.PKS=1 and EFER.LMA=1 * - PK is always zero if RSVD=1 (reserved bit set) or F=1 (instruction fetch) -* - PK is always zero if U=0 in the page tables -* - PKRU.WD is ignored if CR0.WP=0 and the access is a supervisor access. +* - (PKRU/PKRS).WD is ignored if CR0.WP=0 and the access is a supervisor access. * -* The PKRU bitmask caches the result of these four conditions. The error -* code (minus the P bit) and the page table's U bit form an index into the -* PKRU bitmask. Two bits of the PKRU bitmask are then extracted and ANDed -* with the two bits of the PKRU register corresponding to the protection key. -* For the first three conditions above the bits will be 00, thus masking -* away both AD and WD. For all reads or if the last condition holds, WD -* only will be masked away. +* The pkr_mask caches the result of these three conditions. The error +* code (minus the P bit) forms an index into the pkr_mask. Both PKU and +* PKS shares the same bitmask. Two bits of the pkr_mask are then extracted +* and ANDed with the two bits of the PKEY register corresponding to +* the protection key. For the first two conditions above the bits will be 00, +* thus masking away both AD and WD. For all reads or if the last condition +* holds, WD only will be masked away. */ static void update_pkr_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, bool ept) @@ -4706,8 +4707,9 @@ static void update_pkr_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return; } - /* PKEY is enabled only if CR4.PKE and EFER.LMA are both set. */ - if (!kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || !is_long_mode(vcpu)) { + /* PKEY is enabled only if CR4.PKE/CR4.PKS and EFER.LMA are both set. */ + if ((!kvm_read_cr4_bits(vcpu, X86_CR4_PKE) && + !kvm_read_cr4_bits(vcpu, X86_CR4_PKS)) || !is_long_mode(vcpu)) { mmu->pkr_mask = 0; return; } @@ -4736,14 +4738,14 @@ static void update_pkr_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, check_pkey = (!ff && !rsvdf); /* - * write access is controlled by PKRU if it is a - * user access or CR0.WP = 1. + * write access is controlled by PKRU/PKRS if + * it is a user access or CR0.WP = 1. */ check_write = check_pkey && wf && (uf || wp); - /* PKRU.AD stops both read and write access. */ + /* PKRU/PKRS.AD stops both read and write access. */ pkey_bits = !!check_pkey; - /* PKRU.WD stops write access. */ + /* PKRU/PKRS.WD stops write access. */ pkey_bits |= (!!check_write) << 1; mmu->pkr_mask |= (pkey_bits & 3) << pfec; From patchwork Wed Oct 14 02:11:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836635 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B062D14B2 for ; Wed, 14 Oct 2020 02:10:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9AE1721D81 for ; Wed, 14 Oct 2020 02:10:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729213AbgJNCKC (ORCPT ); Tue, 13 Oct 2020 22:10:02 -0400 Received: from mga06.intel.com ([134.134.136.31]:51174 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729131AbgJNCKB (ORCPT ); Tue, 13 Oct 2020 22:10:01 -0400 IronPort-SDR: BZblanSwME/7jTFl6Tz/igQzS20lgiAUFEJvRipXk7uKteZjRaTnQ1fyGeAkwIbmhEHtlNkwNB mEK4dbpIPzCQ== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659807" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659807" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:10:00 -0700 IronPort-SDR: +aJTM1QzuTVvNMJY54TL+nNVgfHrDtg4oMMsDW6FFLVP9/QQzMUBgtcAuaeWU50YfdvIObwFgV pMy2Q3u4yWbg== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645177" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:09:58 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 6/7] KVM: X86: Expose PKS to guest and userspace Date: Wed, 14 Oct 2020 10:11:55 +0800 Message-Id: <20201014021157.18022-7-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Existence of PKS is enumerated via CPUID.(EAX=7H,ECX=0):ECX[31]. It is enabled by setting CR4.PKS when long mode is active. PKS is only implemented when EPT is enabled and requires the support of VM_{ENTRY, EXIT}_LOAD_IA32_PKRS currently. Signed-off-by: Chenyi Qiang --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/vmx/vmx.c | 15 ++++++++++++--- arch/x86/kvm/x86.c | 9 +++++++-- 4 files changed, 23 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d5f0c3a71a41..d798433a2117 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -99,7 +99,8 @@ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP)) + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ + | X86_CR4_PKS)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 3fd6eec202d7..6b725a3e84ec 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -354,7 +354,8 @@ void kvm_set_cpu_caps(void) F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) | F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) | F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) | - F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ + F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ | + 0 /*PKS*/ ); /* Set LA57 based on hardware capability. */ if (cpuid_ecx(7) & F(LA57)) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e5da5dbe19d4..ce24226e1aa3 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3228,7 +3228,7 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) } /* - * SMEP/SMAP/PKU is disabled if CPU is in non-paging mode in + * SMEP/SMAP/PKU/PKS is disabled if CPU is in non-paging mode in * hardware. To emulate this behavior, SMEP/SMAP/PKU needs * to be manually disabled when guest switches to non-paging * mode. @@ -3236,10 +3236,11 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) * If !enable_unrestricted_guest, the CPU is always running * with CR0.PG=1 and CR4 needs to be modified. * If enable_unrestricted_guest, the CPU automatically - * disables SMEP/SMAP/PKU when the guest sets CR0.PG=0. + * disables SMEP/SMAP/PKU/PKS when the guest sets CR0.PG=0. */ if (!is_paging(vcpu)) - hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE); + hw_cr4 &= ~(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE | + X86_CR4_PKS); } vmcs_writel(CR4_READ_SHADOW, cr4); @@ -7430,6 +7431,14 @@ static __init void vmx_set_cpu_caps(void) if (vmx_pt_mode_is_host_guest()) kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT); + /* + * PKS is not yet implemented for shadow paging. + * If not support VM_{ENTRY, EXIT}_LOAD_IA32_PKRS, + * don't expose the PKS as well. + */ + if (enable_ept && cpu_has_load_ia32_pkrs()) + kvm_cpu_cap_check_and_set(X86_FEATURE_PKS); + if (vmx_umip_emulated()) kvm_cpu_cap_set(X86_FEATURE_UMIP); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ce856e0ece84..93ac708e951d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -976,7 +976,8 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) unsigned long old_cr4 = kvm_read_cr4(vcpu); unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_SMEP; - unsigned long mmu_role_bits = pdptr_bits | X86_CR4_SMAP | X86_CR4_PKE; + unsigned long mmu_role_bits = pdptr_bits | X86_CR4_SMAP | X86_CR4_PKE | + X86_CR4_PKS; if (kvm_valid_cr4(vcpu, cr4)) return 1; @@ -1207,7 +1208,7 @@ static const u32 msrs_to_save_all[] = { MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B, MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B, MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, - MSR_IA32_UMWAIT_CONTROL, + MSR_IA32_UMWAIT_CONTROL, MSR_IA32_PKRS, MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1, MSR_ARCH_PERFMON_FIXED_CTR0 + 2, MSR_ARCH_PERFMON_FIXED_CTR0 + 3, @@ -5426,6 +5427,10 @@ static void kvm_init_msr_list(void) intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2) continue; break; + case MSR_IA32_PKRS: + if (!kvm_cpu_cap_has(X86_FEATURE_PKS)) + continue; + break; case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17: if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >= min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) From patchwork Wed Oct 14 02:11:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chenyi Qiang X-Patchwork-Id: 11836639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 894CE14B4 for ; Wed, 14 Oct 2020 02:10:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6DE7222201 for ; Wed, 14 Oct 2020 02:10:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727676AbgJNCK0 (ORCPT ); Tue, 13 Oct 2020 22:10:26 -0400 Received: from mga06.intel.com ([134.134.136.31]:51174 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729139AbgJNCKE (ORCPT ); Tue, 13 Oct 2020 22:10:04 -0400 IronPort-SDR: LleEyfq8LqNwDOkNdnjPZphyS+DlhJ1XYuhA96iW2xCbCycgDG5fZ1ZQKZJ3iDu+4ck3OHaCoa w+K3dCy6ODWw== X-IronPort-AV: E=McAfee;i="6000,8403,9773"; a="227659812" X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="227659812" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:10:03 -0700 IronPort-SDR: 4qlyAGtgqajStBUdFiabpLCl/KPnIe8lVmVhEZPd5rL6CphlVnvrUWLJXwUxmL3c9DXA+uQfYt FNf9hdk3n1iQ== X-IronPort-AV: E=Sophos;i="5.77,373,1596524400"; d="scan'208";a="530645240" Received: from chenyi-pc.sh.intel.com ([10.239.159.72]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Oct 2020 19:10:00 -0700 From: Chenyi Qiang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Xiaoyao Li Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC v2 7/7] KVM: VMX: Enable PKS for nested VM Date: Wed, 14 Oct 2020 10:11:56 +0800 Message-Id: <20201014021157.18022-8-chenyi.qiang@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20201014021157.18022-1-chenyi.qiang@intel.com> References: <20201014021157.18022-1-chenyi.qiang@intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org PKS MSR passes through guest directly. Configure the MSR to match the L0/L1 settings so that nested VM runs PKS properly. Signed-off-by: Chenyi Qiang --- arch/x86/kvm/vmx/nested.c | 37 +++++++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/vmcs12.c | 2 ++ arch/x86/kvm/vmx/vmcs12.h | 6 +++++- arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ arch/x86/kvm/vmx/vmx.h | 1 + 5 files changed, 53 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 14f56e8dd060..66c74d10dda5 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -652,6 +652,12 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W); + if (!msr_write_intercepted_l01(vcpu, MSR_IA32_PKRS)) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PKRS, + MSR_TYPE_R | MSR_TYPE_W); + kvm_vcpu_unmap(vcpu, &to_vmx(vcpu)->nested.msr_bitmap_map, false); return true; @@ -2433,6 +2439,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) if (kvm_mpx_supported() && vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs); + + if (vmx->nested.nested_run_pending && + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)) + vmcs_write64(GUEST_IA32_PKRS, vmcs12->guest_ia32_pkrs); } if (nested_cpu_has_xsaves(vmcs12)) @@ -2521,6 +2531,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs); + + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && + (!vmx->nested.nested_run_pending || + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))) + vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs); vmx_set_rflags(vcpu, vmcs12->guest_rflags); /* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the @@ -2861,6 +2876,10 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu, vmcs12->host_ia32_perf_global_ctrl))) return -EINVAL; + if ((vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) && + CC(!kvm_pkrs_valid(vmcs12->host_ia32_pkrs))) + return -EINVAL; + #ifdef CONFIG_X86_64 ia32e = !!(vcpu->arch.efer & EFER_LMA); #else @@ -3010,6 +3029,10 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, if (nested_check_guest_non_reg_state(vmcs12)) return -EINVAL; + if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS) && + CC(!kvm_pkrs_valid(vmcs12->guest_ia32_pkrs))) + return -EINVAL; + return 0; } @@ -3320,6 +3343,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, if (kvm_mpx_supported() && !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS)) + vmx->nested.vmcs01_guest_pkrs = vmcs_read64(GUEST_IA32_PKRS); /* * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* @@ -3913,6 +3939,7 @@ static bool is_vmcs12_ext_field(unsigned long field) case GUEST_IDTR_BASE: case GUEST_PENDING_DBG_EXCEPTIONS: case GUEST_BNDCFGS: + case GUEST_IA32_PKRS: return true; default: break; @@ -3964,6 +3991,8 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu, vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); if (kvm_mpx_supported()) vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); + if (guest_cpuid_has(vcpu, X86_FEATURE_PKS)) + vmcs12->guest_ia32_pkrs = vmcs_read64(GUEST_IA32_PKRS); vmx->nested.need_sync_vmcs02_to_vmcs12_rare = false; } @@ -4199,6 +4228,9 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, vmcs12->host_ia32_perf_global_ctrl)); + if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PKRS) + vmcs_write64(GUEST_IA32_PKRS, vmcs12->host_ia32_pkrs); + /* Set L1 segment info according to Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers */ seg = (struct kvm_segment) { @@ -6319,7 +6351,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_LOAD_IA32_PKRS; msrs->exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | @@ -6338,7 +6371,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps) #ifdef CONFIG_X86_64 VM_ENTRY_IA32E_MODE | #endif - VM_ENTRY_LOAD_IA32_PAT; + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_IA32_PKRS; msrs->entry_ctls_high |= (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER); diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c index c8e51c004f78..df7b2143b807 100644 --- a/arch/x86/kvm/vmx/vmcs12.c +++ b/arch/x86/kvm/vmx/vmcs12.c @@ -61,9 +61,11 @@ const unsigned short vmcs_field_to_offset_table[] = { FIELD64(GUEST_PDPTR2, guest_pdptr2), FIELD64(GUEST_PDPTR3, guest_pdptr3), FIELD64(GUEST_BNDCFGS, guest_bndcfgs), + FIELD64(GUEST_IA32_PKRS, guest_ia32_pkrs), FIELD64(HOST_IA32_PAT, host_ia32_pat), FIELD64(HOST_IA32_EFER, host_ia32_efer), FIELD64(HOST_IA32_PERF_GLOBAL_CTRL, host_ia32_perf_global_ctrl), + FIELD64(HOST_IA32_PKRS, host_ia32_pkrs), FIELD(PIN_BASED_VM_EXEC_CONTROL, pin_based_vm_exec_control), FIELD(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control), FIELD(EXCEPTION_BITMAP, exception_bitmap), diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h index 80232daf00ff..009b4c317375 100644 --- a/arch/x86/kvm/vmx/vmcs12.h +++ b/arch/x86/kvm/vmx/vmcs12.h @@ -69,7 +69,9 @@ struct __packed vmcs12 { u64 vm_function_control; u64 eptp_list_address; u64 pml_address; - u64 padding64[3]; /* room for future expansion */ + u64 guest_ia32_pkrs; + u64 host_ia32_pkrs; + u64 padding64[1]; /* room for future expansion */ /* * To allow migration of L1 (complete with its L2 guests) between * machines of different natural widths (32 or 64 bit), we cannot have @@ -256,6 +258,8 @@ static inline void vmx_check_vmcs12_offsets(void) CHECK_OFFSET(vm_function_control, 296); CHECK_OFFSET(eptp_list_address, 304); CHECK_OFFSET(pml_address, 312); + CHECK_OFFSET(guest_ia32_pkrs, 320); + CHECK_OFFSET(host_ia32_pkrs, 328); CHECK_OFFSET(cr0_guest_host_mask, 344); CHECK_OFFSET(cr4_guest_host_mask, 352); CHECK_OFFSET(cr0_read_shadow, 360); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index ce24226e1aa3..7635104a4eb7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7260,6 +7260,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu) cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU)); cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); + cr4_fixed1_update(X86_CR4_PKS, ecx, feature_bit(PKS)); #undef cr4_fixed1_update } @@ -7279,6 +7280,15 @@ static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu) vmx->nested.msrs.exit_ctls_high &= ~VM_EXIT_CLEAR_BNDCFGS; } } + + if (kvm_cpu_cap_has(X86_FEATURE_PKS) && + guest_cpuid_has(vcpu, X86_FEATURE_PKS)) { + vmx->nested.msrs.entry_ctls_high |= VM_ENTRY_LOAD_IA32_PKRS; + vmx->nested.msrs.exit_ctls_high |= VM_EXIT_LOAD_IA32_PKRS; + } else { + vmx->nested.msrs.entry_ctls_high &= ~VM_ENTRY_LOAD_IA32_PKRS; + vmx->nested.msrs.exit_ctls_high &= ~VM_EXIT_LOAD_IA32_PKRS; + } } static void update_intel_pt_cfg(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a0e47720f60c..6b21b3529afb 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -177,6 +177,7 @@ struct nested_vmx { /* to migrate it to L2 if VM_ENTRY_LOAD_DEBUG_CONTROLS is off */ u64 vmcs01_debugctl; u64 vmcs01_guest_bndcfgs; + u64 vmcs01_guest_pkrs; /* to migrate it to L1 if L2 writes to L1's CR8 directly */ int l1_tpr_threshold;