Message ID | 20220424101557.134102-4-lei4.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: PKS Virtualization support | expand |
Nit, something like: KVM: X86: Virtualize and pass-through IA32_PKRS MSR when supported because "expose" doesn't precisely cover the pass-through behavior On Sun, Apr 24, 2022, Lei Wang wrote: > @@ -1111,6 +1113,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) > #endif > unsigned long fs_base, gs_base; > u16 fs_sel, gs_sel; > + u32 host_pkrs; > int i; > > vmx->req_immediate_exit = false; > @@ -1146,6 +1149,17 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) > */ > host_state->ldt_sel = kvm_read_ldt(); > > + /* > + * Update the host pkrs vmcs field before vcpu runs. > + * The setting of VM_EXIT_LOAD_IA32_PKRS can ensure > + * kvm_cpu_cap_has(X86_FEATURE_PKS) && > + * guest_cpuid_has(vcpu, X86_FEATURE_PKS) > + */ Eh, I don't think this comment adds anything. And practically speaking, whether or not X86_FEATURE_PKS is reported by kvm_cpu_cap_has() and/or guest_cpuid_has() is irrelevant. If KVM is loading PKRS on exit, then the VMCS needs to hold an up-to-date value. E.g. if for some reason KVM enables the control even if PKS isn't exposed to the guest (see below), this code still needs to refresh the value. > + if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) { > + host_pkrs = get_current_pkrs(); No need for an intermediate host_pkrs. I.e. this can simply be: if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) vmx_set_host_pkrs(host_state, get_current_pkrs()); > + vmx_set_host_pkrs(host_state, host_pkrs); > + } > + > #ifdef CONFIG_X86_64 > savesegment(ds, host_state->ds_sel); > savesegment(es, host_state->es_sel); > @@ -1901,6 +1915,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > case MSR_IA32_DEBUGCTLMSR: > msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); > break; > + case MSR_IA32_PKRS: > + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || > + (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) Nit, please align the lines that are inside a pair of parantheses, i.e. if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) return 1; > + return 1; > + msr_info->data = kvm_read_pkrs(vcpu); A comment on the caching patch, please use kvm_pkrs_read() to follow the GPR, RIP, PDPTR, etc... terminology (CR0/3/4 got grandfathered in). > + break; > default: > find_uret_msr: > msr = vmx_find_uret_msr(vmx, msr_info->index); > @@ -2242,7 +2263,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > } > ret = kvm_set_msr_common(vcpu, msr_info); > break; > - > + case MSR_IA32_PKRS: > + if (!kvm_pkrs_valid(data)) > + return 1; Nit, please move this to after the capability checks. Does not affect functionality at all, but logically it doesn't make sense to check for a valid value of a register that doesn't exist. > + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || > + (!msr_info->host_initiated && > + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) > + return 1; > + vcpu->arch.pkrs = data; > + kvm_register_mark_available(vcpu, VCPU_EXREG_PKRS); > + vmcs_write64(GUEST_IA32_PKRS, data); The caching patch should add kvm_write_pkrs(). And in there, I think I'd vote for a WARN_ON_ONCE() that the incoming value doesn't set bits 63:32. It's redundant with kvm_pkrs_valid() > + break; > default: > find_uret_msr: > msr = vmx_find_uret_msr(vmx, msr_index); ... > @@ -7406,6 +7445,20 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > /* Refresh #PF interception to account for MAXPHYADDR changes. */ > vmx_update_exception_bitmap(vcpu); > + > + if (kvm_cpu_cap_has(X86_FEATURE_PKS)) {a > + if (guest_cpuid_has(vcpu, X86_FEATURE_PKS)) { > + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW); > + > + vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); > + vm_exit_controls_setbit(vmx, VM_EXIT_LOAD_IA32_PKRS); Ugh, toggling the entry/exit controls here won't do the correct thing if L2 is active. The MSR intercept logic works because it always operates on vmcs01's bitmap. Let's keep this as simple as possible and set the controls if they're supported. KVM will waste a few cycles on entry/exit if PKS is supported in the host but not exposed to the guest, but that's not the end of the world. If we want to optimize that, then we can do that in the future and in a more generic way. So this becomes: if (kvm_cpu_cap_has(X86_FEATURE_PKS)) vmx_set_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW, !guest_cpuid_has(vcpu, X86_FEATURE_PKS)); And please hoist that up to be next to the handling of X86_FEATURE_XFD to try and bunch together code that does similar things. The last edge case to deal with is if only one of the controls is supported, e.g. if an L0 hypervisor is being evil. Big surprise, BNDCFGS doesn't get this right and needs a bug fix patch, e.g. it could retain the guest's value after exit, or host's value after entry. It's a silly case because it basically requires broken host "hardware", but it'd be nice to get it right in KVM. And that'd also be a good opportunity to handle all of the pairs, i.e. clear the bits during setup_vmcs_config() instead of checking both the entry and exit flags in cpu_has_load_*(). So for this series, optimistically assume my idea will pan out and a adjust_vm_entry_exit_pair() helper will exit by the time this is fully baked. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6927f6e8ec31..53e12e6006af 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2434,6 +2434,16 @@ static bool cpu_has_sgx(void) return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0)); } +static __init void adjust_vm_entry_exit_pair(u32 *entry_controls, u32 entry_bit, + u32 *exit_controls, u32 exit_bit) +{ + if ((*entry_controls & entry_bit) && (*exit_controls & exit_bit)) + return; + + *entry_controls &= ~entry_bit; + *exit_controls &= ~exit_bit; +} + static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, u32 msr, u32 *result) { @@ -2614,6 +2624,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, &_vmentry_control) < 0) return -EIO; + adjust_vm_entry_exit_pair(&_vmentry_control, VM_ENTRY_LOAD_IA32_PKRS, + &_vmexit_control, VM_EXIT_LOAD_IA32_PKRS); + /* * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they * can't be used due to an errata where VM Exit may incorrectly clear @@ -7536,6 +7549,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R, !guest_cpuid_has(vcpu, X86_FEATURE_XFD)); + if (kvm_cpu_cap_has(X86_FEATURE_PKS)) + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW, + !guest_cpuid_has(vcpu, X86_FEATURE_PKS)); set_cr4_guest_host_mask(vmx); > + } else { > + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW); > + > + vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); > + vm_exit_controls_clearbit(vmx, VM_EXIT_LOAD_IA32_PKRS); > + } > + } > } > ... > @@ -11410,6 +11414,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); > kvm_rip_write(vcpu, 0xfff0); > > + if (!init_event && kvm_cpu_cap_has(X86_FEATURE_PKS)) > + __kvm_set_msr(vcpu, MSR_IA32_PKRS, 0, true); Please put this with the other !init_event stuff, e.g. look for MSR_IA32_XSS. > vcpu->arch.cr3 = 0; > kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); >
On 5/25/2022 6:11 AM, Sean Christopherson wrote: > Nit, something like: > > KVM: X86: Virtualize and pass-through IA32_PKRS MSR when supported > > because "expose" doesn't precisely cover the pass-through behavior Will change it. > Eh, I don't think this comment adds anything. And practically speaking, whether > or not X86_FEATURE_PKS is reported by kvm_cpu_cap_has() and/or guest_cpuid_has() > is irrelevant. If KVM is loading PKRS on exit, then the VMCS needs to hold an > up-to-date value. E.g. if for some reason KVM enables the control even if PKS > isn't exposed to the guest (see below), this code still needs to refresh the value. Will remove the irrelevant statement, just left the "Update the host pkrs vmcs field before vcpu runs." > No need for an intermediate host_pkrs. I.e. this can simply be: > > if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) > vmx_set_host_pkrs(host_state, get_current_pkrs()); OK, will write the code in one line. > Nit, please align the lines that are inside a pair of parantheses, i.e. My mistake, will align the lines. > A comment on the caching patch, please use kvm_pkrs_read() to follow the GPR, RIP, > PDPTR, etc... terminology (CR0/3/4 got grandfathered in). Will rename the function. > Nit, please move this to after the capability checks. Does not affect functionality > at all, but logically it doesn't make sense to check for a valid value of a register > that doesn't exist. Make sense, will move it. > The caching patch should add kvm_write_pkrs(). And in there, I think I'd vote for > a WARN_ON_ONCE() that the incoming value doesn't set bits 63:32. It's redundant > with kvm_pkrs_valid() Will add a new function at the caching patch. > Ugh, toggling the entry/exit controls here won't do the correct thing if L2 is > active. The MSR intercept logic works because it always operates on vmcs01's bitmap. > > Let's keep this as simple as possible and set the controls if they're supported. > KVM will waste a few cycles on entry/exit if PKS is supported in the host but not > exposed to the guest, but that's not the end of the world. If we want to optimize > that, then we can do that in the future and in a more generic way. > > So this becomes: > > if (kvm_cpu_cap_has(X86_FEATURE_PKS)) > vmx_set_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW, > !guest_cpuid_has(vcpu, X86_FEATURE_PKS)); > > And please hoist that up to be next to the handling of X86_FEATURE_XFD to try and > bunch together code that does similar things. > > The last edge case to deal with is if only one of the controls is supported, e.g. if > an L0 hypervisor is being evil. Big surprise, BNDCFGS doesn't get this right and > needs a bug fix patch, e.g. it could retain the guest's value after exit, or host's > value after entry. It's a silly case because it basically requires broken host > "hardware", but it'd be nice to get it right in KVM. And that'd also be a good > opportunity to handle all of the pairs, i.e. clear the bits during setup_vmcs_config() > instead of checking both the entry and exit flags in cpu_has_load_*(). > > So for this series, optimistically assume my idea will pan out and a > adjust_vm_entry_exit_pair() helper will exit by the time this is fully baked. > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 6927f6e8ec31..53e12e6006af 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -2434,6 +2434,16 @@ static bool cpu_has_sgx(void) > return cpuid_eax(0) >= 0x12 && (cpuid_eax(0x12) & BIT(0)); > } > > +static __init void adjust_vm_entry_exit_pair(u32 *entry_controls, u32 entry_bit, > + u32 *exit_controls, u32 exit_bit) > +{ > + if ((*entry_controls & entry_bit) && (*exit_controls & exit_bit)) > + return; > + > + *entry_controls &= ~entry_bit; > + *exit_controls &= ~exit_bit; > +} > + > static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, > u32 msr, u32 *result) > { > @@ -2614,6 +2624,9 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, > &_vmentry_control) < 0) > return -EIO; > > + adjust_vm_entry_exit_pair(&_vmentry_control, VM_ENTRY_LOAD_IA32_PKRS, > + &_vmexit_control, VM_EXIT_LOAD_IA32_PKRS); > + > /* > * Some cpus support VM_{ENTRY,EXIT}_IA32_PERF_GLOBAL_CTRL but they > * can't be used due to an errata where VM Exit may incorrectly clear > @@ -7536,6 +7549,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R, > !guest_cpuid_has(vcpu, X86_FEATURE_XFD)); > > + if (kvm_cpu_cap_has(X86_FEATURE_PKS)) > + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW, > + !guest_cpuid_has(vcpu, X86_FEATURE_PKS)); > > set_cr4_guest_host_mask(vmx); Thanks for your suggestions, will try your helper. > Please put this with the other !init_event stuff, e.g. look for MSR_IA32_XSS. Will do it.
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index e325c290a816..ee37741b2b9d 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -42,6 +42,7 @@ struct vmcs_host_state { #ifdef CONFIG_X86_64 u16 ds_sel, es_sel; #endif + u32 pkrs; }; struct vmcs_controls_shadow { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 395b2deb76aa..9d0588e85410 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -28,6 +28,7 @@ #include <linux/tboot.h> #include <linux/trace_events.h> #include <linux/entry-kvm.h> +#include <linux/pkeys.h> #include <asm/apic.h> #include <asm/asm.h> @@ -172,6 +173,7 @@ static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS] = { MSR_CORE_C3_RESIDENCY, MSR_CORE_C6_RESIDENCY, MSR_CORE_C7_RESIDENCY, + MSR_IA32_PKRS, }; /* @@ -1111,6 +1113,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) #endif unsigned long fs_base, gs_base; u16 fs_sel, gs_sel; + u32 host_pkrs; int i; vmx->req_immediate_exit = false; @@ -1146,6 +1149,17 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) */ host_state->ldt_sel = kvm_read_ldt(); + /* + * Update the host pkrs vmcs field before vcpu runs. + * The setting of VM_EXIT_LOAD_IA32_PKRS can ensure + * kvm_cpu_cap_has(X86_FEATURE_PKS) && + * guest_cpuid_has(vcpu, X86_FEATURE_PKS) + */ + if (vm_exit_controls_get(vmx) & VM_EXIT_LOAD_IA32_PKRS) { + host_pkrs = get_current_pkrs(); + vmx_set_host_pkrs(host_state, host_pkrs); + } + #ifdef CONFIG_X86_64 savesegment(ds, host_state->ds_sel); savesegment(es, host_state->es_sel); @@ -1901,6 +1915,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_IA32_DEBUGCTLMSR: msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); break; + case MSR_IA32_PKRS: + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) + return 1; + msr_info->data = kvm_read_pkrs(vcpu); + break; default: find_uret_msr: msr = vmx_find_uret_msr(vmx, msr_info->index); @@ -2242,7 +2263,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } ret = kvm_set_msr_common(vcpu, msr_info); break; - + case MSR_IA32_PKRS: + if (!kvm_pkrs_valid(data)) + return 1; + if (!kvm_cpu_cap_has(X86_FEATURE_PKS) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PKS))) + return 1; + vcpu->arch.pkrs = data; + kvm_register_mark_available(vcpu, VCPU_EXREG_PKRS); + vmcs_write64(GUEST_IA32_PKRS, data); + break; default: find_uret_msr: msr = vmx_find_uret_msr(vmx, msr_index); @@ -2533,7 +2564,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, VM_EXIT_LOAD_IA32_EFER | VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_PT_CONCEAL_PIP | - VM_EXIT_CLEAR_IA32_RTIT_CTL; + VM_EXIT_CLEAR_IA32_RTIT_CTL | + VM_EXIT_LOAD_IA32_PKRS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS, &_vmexit_control) < 0) return -EIO; @@ -2557,7 +2589,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, VM_ENTRY_LOAD_IA32_EFER | VM_ENTRY_LOAD_BNDCFGS | VM_ENTRY_PT_CONCEAL_PIP | - VM_ENTRY_LOAD_IA32_RTIT_CTL; + VM_ENTRY_LOAD_IA32_RTIT_CTL | + VM_ENTRY_LOAD_IA32_PKRS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, &_vmentry_control) < 0) return -EIO; @@ -4166,7 +4199,8 @@ static u32 vmx_vmentry_ctrl(void) VM_ENTRY_LOAD_IA32_RTIT_CTL); /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ return vmentry_ctrl & - ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER); + ~(VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL | VM_ENTRY_LOAD_IA32_EFER | + VM_ENTRY_LOAD_IA32_PKRS); } static u32 vmx_vmexit_ctrl(void) @@ -4178,7 +4212,8 @@ static u32 vmx_vmexit_ctrl(void) VM_EXIT_CLEAR_IA32_RTIT_CTL); /* Loading of EFER and PERF_GLOBAL_CTRL are toggled dynamically */ return vmexit_ctrl & - ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER); + ~(VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL | VM_EXIT_LOAD_IA32_EFER | + VM_EXIT_LOAD_IA32_PKRS); } static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) @@ -5923,6 +5958,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL)); if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS) pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS)); + if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PKRS) + pr_err("PKRS = 0x%016llx\n", vmcs_read64(GUEST_IA32_PKRS)); pr_err("Interruptibility = %08x ActivityState = %08x\n", vmcs_read32(GUEST_INTERRUPTIBILITY_INFO), vmcs_read32(GUEST_ACTIVITY_STATE)); @@ -5964,6 +6001,8 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL)); if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0) vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); + if (vmexit_ctl & VM_EXIT_LOAD_IA32_PKRS) + pr_err("PKRS = 0x%016llx\n", vmcs_read64(HOST_IA32_PKRS)); pr_err("*** Control State ***\n"); pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n", @@ -7406,6 +7445,20 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); + + if (kvm_cpu_cap_has(X86_FEATURE_PKS)) { + if (guest_cpuid_has(vcpu, X86_FEATURE_PKS)) { + vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW); + + vm_entry_controls_setbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); + vm_exit_controls_setbit(vmx, VM_EXIT_LOAD_IA32_PKRS); + } else { + vmx_enable_intercept_for_msr(vcpu, MSR_IA32_PKRS, MSR_TYPE_RW); + + vm_entry_controls_clearbit(vmx, VM_ENTRY_LOAD_IA32_PKRS); + vm_exit_controls_clearbit(vmx, VM_EXIT_LOAD_IA32_PKRS); + } + } } static __init void vmx_set_cpu_caps(void) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 661df9584b12..91723a226bf3 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -352,7 +352,7 @@ struct vcpu_vmx { struct lbr_desc lbr_desc; /* Save desired MSR intercept (read: pass-through) state */ -#define MAX_POSSIBLE_PASSTHROUGH_MSRS 15 +#define MAX_POSSIBLE_PASSTHROUGH_MSRS 16 struct { DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS); DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS); @@ -580,4 +580,11 @@ static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info) return (vmx_instr_info >> 28) & 0xf; } +static inline void vmx_set_host_pkrs(struct vmcs_host_state *host, u32 pkrs){ + if (unlikely(pkrs != host->pkrs)) { + vmcs_write64(HOST_IA32_PKRS, pkrs); + host->pkrs = pkrs; + } +} + #endif /* __KVM_X86_VMX_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 547ba00ef64f..d784bf3a4b3e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1396,7 +1396,7 @@ static const u32 msrs_to_save_all[] = { MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B, MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B, MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B, - MSR_IA32_UMWAIT_CONTROL, + MSR_IA32_UMWAIT_CONTROL, MSR_IA32_PKRS, MSR_ARCH_PERFMON_FIXED_CTR0, MSR_ARCH_PERFMON_FIXED_CTR1, MSR_ARCH_PERFMON_FIXED_CTR0 + 2, @@ -6638,6 +6638,10 @@ static void kvm_init_msr_list(void) intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2) continue; break; + case MSR_IA32_PKRS: + if (!kvm_cpu_cap_has(X86_FEATURE_PKS)) + continue; + break; case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17: if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >= min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) @@ -11410,6 +11414,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); kvm_rip_write(vcpu, 0xfff0); + if (!init_event && kvm_cpu_cap_has(X86_FEATURE_PKS)) + __kvm_set_msr(vcpu, MSR_IA32_PKRS, 0, true); + vcpu->arch.cr3 = 0; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 588792f00334..7610f0d40b0f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -404,6 +404,12 @@ static inline void kvm_machine_check(void) #endif } +static inline bool kvm_pkrs_valid(u64 data) +{ + /* bit[63,32] must be zero */ + return !(data >> 32); +} + void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu); void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu); int kvm_spec_ctrl_test_value(u64 value); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 74ba51b9853b..bd75af62b685 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -495,4 +495,10 @@ void pks_update_exception(struct pt_regs *regs, u8 pkey, u8 protection) } EXPORT_SYMBOL_GPL(pks_update_exception); +u32 get_current_pkrs(void) +{ + return this_cpu_read(pkrs_cache); +} +EXPORT_SYMBOL_GPL(get_current_pkrs); + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/include/linux/pks.h b/include/linux/pks.h index ce8eea81f208..0a71f8f4055d 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -53,6 +53,8 @@ static inline void pks_set_readwrite(u8 pkey) typedef bool (*pks_key_callback)(struct pt_regs *regs, unsigned long address, bool write); +u32 get_current_pkrs(void); + #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ static inline bool pks_available(void) @@ -68,6 +70,11 @@ static inline void pks_update_exception(struct pt_regs *regs, u8 protection) { } +static inline u32 get_current_pkrs(void) +{ + return 0; +} + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ #ifdef CONFIG_PKS_TEST