Message ID | 20240206182032.1596-1-xin3.li@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5,1/2] KVM: VMX: Cleanup VMX basic information defines and usages | expand |
Please send cover letters for series with more than one patch, even if there are only two patches. At the very least, cover letters are a convenient location to provide feedback/communication for the series as a whole. Instead, I need to put it here: I'll send a v6 with all of my suggestions incorporated. I like the cleanups, but there are too many process issues to fixup when applying, a few things that I straight up disagree with, and more aggressive memtype related changes that can be done in the context of this series. On Tue, Feb 06, 2024, Xin Li wrote: > Define VMX basic information fields with BIT_ULL()/GENMASK_ULL(), and > replace hardcoded VMX basic numbers with these field macros. > > Save the full/raw value of MSR_IA32_VMX_BASIC in the global vmcs_config > as type u64 to get rid of the hi/lo crud, and then use VMX_BASIC helpers > to extract info as needed. > > VMX_EPTP_MT_{WB,UC} values 0x6 and 0x0 are generic x86 memory type > values, no need to prefix them with VMX_EPTP_. *sigh* This obviously, like super duper obviously, should be at least three distinct patches. The changelog has three paragraphs that have *zero* relation to each other, and the changelog doesn't even cover all of the opportunistic cleanups that are being done. > +/* x86 memory types, explicitly used in VMX only */ > +#define MEM_TYPE_WB 0x6ULL > +#define MEM_TYPE_UC 0x0ULL No, this is ridiculous. These values are architectural, there's no reason for KVM to have yet another copy. The MTRRs #defines have goofy names, and are incomplete, but it's trivial to move the enums from pat/memtype.c to msr-index.h. > @@ -505,8 +521,6 @@ enum vmcs_field { > #define VMX_EPTP_PWL_5 0x20ull > #define VMX_EPTP_AD_ENABLE_BIT (1ull << 6) > #define VMX_EPTP_MT_MASK 0x7ull > -#define VMX_EPTP_MT_WB 0x6ull > -#define VMX_EPTP_MT_UC 0x0ull I would strongly prefer to keep the VMX_EPTP_MT_WB and VMX_EPTP_MT_UC defines, at least so long as KVM is open coding reads and writes to the EPTP. E.g. if someone wants to do a follow-up series that adds wrappers to decode/encode the memtype (and other fiels) from/to EPTP values, then I'd be fine dropping these. But this: /* Check for memory type validity */ switch (new_eptp & VMX_EPTP_MT_MASK) { case MEM_TYPE_UC: if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_UC_BIT))) return false; break; case MEM_TYPE_WB: if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_WB_BIT))) return false; break; default: return false; } looks wrong and is actively confusing, especially when the code below it does: /* Page-walk levels validity. */ switch (new_eptp & VMX_EPTP_PWL_MASK) { case VMX_EPTP_PWL_5: if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPT_PAGE_WALK_5_BIT))) return false; break; case VMX_EPTP_PWL_4: if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPT_PAGE_WALK_4_BIT))) return false; break; default: return false; } > static inline bool cpu_has_virtual_nmis(void) > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 994e014f8a50..80fea1875948 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -1226,23 +1226,29 @@ static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask) > return (superset | subset) == superset; > } > > +#define VMX_BASIC_FEATURES_MASK \ > + (VMX_BASIC_DUAL_MONITOR_TREATMENT | \ > + VMX_BASIC_INOUT | \ > + VMX_BASIC_TRUE_CTLS) > + > +#define VMX_BASIC_RESERVED_BITS \ > + (GENMASK_ULL(63, 56) | GENMASK_ULL(47, 45) | BIT_ULL(31)) Looking at this with fresh eyes, I think #defines are overkill. There is zero chance anything other than vmx_restore_vmx_basic() will use these, and the feature bits mask is rather weird. It's not a mask of features that KVM supports, it's a mask of feature *bits* that KVM knows about. So rather than add #defines, I think we can keep "const u64" variables, but split into feature_bits and reserved_bits (the latter will have open coded GENMASK_ULL() usage, whereas the former will not). BUILD_BUG_ON() is fancy enough that it can detect overlap. > @@ -6994,6 +7000,9 @@ static void nested_vmx_setup_misc_data(struct vmcs_config *vmcs_conf, > msrs->misc_high = 0; > } > > +#define VMX_BSAIC_VMCS12_SIZE ((u64)VMCS12_SIZE << 32) Typo. > +#define VMX_BASIC_MEM_TYPE_WB (MEM_TYPE_WB << 50) I don't see any value in either of these. In fact, I find them both to be far more confusing, and much more likely to be incorrectly used. Back in v1, when I said "don't bother with shift #defines", I was very specifically talking about feature bits where defining the bit shift is an extra, pointless layer. I even (tried) to clarify that.
> Please send cover letters for series with more than one patch, even if there are > only two patches. At the very least, cover letters are a convenient location to > provide feedback/communication for the series as a whole. Kai also said so... I'll take it as a standard practice. > Instead, I need to put it here: > > I'll send a v6 with all of my suggestions incorporated. Perfect! > I like the cleanups, but > there are too many process issues to fixup when applying, a few things that I > straight up disagree with, and more aggressive memtype related changes that can > be done in the context of this series. > > > @@ -505,8 +521,6 @@ enum vmcs_field { > > #define VMX_EPTP_PWL_5 0x20ull > > #define VMX_EPTP_AD_ENABLE_BIT (1ull << 6) > > #define VMX_EPTP_MT_MASK 0x7ull > > -#define VMX_EPTP_MT_WB 0x6ull > > -#define VMX_EPTP_MT_UC 0x0ull > > I would strongly prefer to keep the VMX_EPTP_MT_WB and VMX_EPTP_MT_UC > defines, > at least so long as KVM is open coding reads and writes to the EPTP. E.g. if > someone wants to do a follow-up series that adds wrappers to decode/encode > the > memtype (and other fiels) from/to EPTP values, then I'd be fine dropping these. > > But this: > > > /* Check for memory type validity */ > switch (new_eptp & VMX_EPTP_MT_MASK) { > case MEM_TYPE_UC: > if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_UC_BIT))) > return false; > break; > case MEM_TYPE_WB: > if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_WB_BIT))) > return false; > break; > default: > return false; > } > > looks wrong and is actively confusing, especially when the code below it does: > > /* Page-walk levels validity. */ > switch (new_eptp & VMX_EPTP_PWL_MASK) { > case VMX_EPTP_PWL_5: > if (CC(!(vmx->nested.msrs.ept_caps & > VMX_EPT_PAGE_WALK_5_BIT))) > return false; > break; > case VMX_EPTP_PWL_4: > if (CC(!(vmx->nested.msrs.ept_caps & > VMX_EPT_PAGE_WALK_4_BIT))) > return false; > break; > default: > return false; > } > I see your point here. But "#define VMX_EPTP_MT_WB 0x6ull" seems to define its own memory type 0x6. I think what we want is: /* in a pat/mtrr header */ #define MEM_TYPE_WB 0x6 /* vmx.h */ #define VMX_EPTP_MT_WB MEM_TYPE_WB if it's not regarded as another layer of indirect. > > static inline bool cpu_has_virtual_nmis(void) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index 994e014f8a50..80fea1875948 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -1226,23 +1226,29 @@ static bool is_bitwise_subset(u64 superset, u64 > subset, u64 mask) > > return (superset | subset) == superset; > > } > > > > +#define VMX_BASIC_FEATURES_MASK \ > > + (VMX_BASIC_DUAL_MONITOR_TREATMENT | \ > > + VMX_BASIC_INOUT | \ > > + VMX_BASIC_TRUE_CTLS) > > + > > +#define VMX_BASIC_RESERVED_BITS \ > > + (GENMASK_ULL(63, 56) | GENMASK_ULL(47, 45) | BIT_ULL(31)) > > Looking at this with fresh eyes, I think #defines are overkill. There is zero > chance anything other than vmx_restore_vmx_basic() will use these, and the > feature > bits mask is rather weird. It's not a mask of features that KVM supports, it's > a mask of feature *bits* that KVM knows about. > > So rather than add #defines, I think we can keep "const u64" variables, but split > into feature_bits and reserved_bits (the latter will have open coded > GENMASK_ULL() > usage, whereas the former will not). > > BUILD_BUG_ON() is fancy enough that it can detect overlap. Sounds reasonable to me. > > > +#define VMX_BSAIC_VMCS12_SIZE ((u64)VMCS12_SIZE << 32) > > Typo. Sigh! > > > +#define VMX_BASIC_MEM_TYPE_WB (MEM_TYPE_WB << 50) > > I don't see any value in either of these. In fact, I find them both to be far > more confusing, and much more likely to be incorrectly used. > > Back in v1, when I said "don't bother with shift #defines", I was very specifically > talking about feature bits where defining the bit shift is an extra, pointless > layer. I even (tried) to clarify that. Another review comment got me lost here: https://lore.kernel.org/kvm/2158ef3c5ce2de96c970b49802b7e1dba8b704d6.camel@intel.com/
On Wed, Feb 14, 2024, Xin3 Li wrote: > > VMX_EPT_PAGE_WALK_4_BIT))) > > return false; > > break; > > default: > > return false; > > } > > > > I see your point here. But "#define VMX_EPTP_MT_WB 0x6ull" seems to define > its own memory type 0x6. I think what we want is: > > /* in a pat/mtrr header */ > #define MEM_TYPE_WB 0x6 > > /* vmx.h */ > #define VMX_EPTP_MT_WB MEM_TYPE_WB > > if it's not regarded as another layer of indirect. Heh, yep, I already had this: /* The EPTP memtype is encoded in bits 2:0, i.e. doesn't need to be shifted. */ #define VMX_EPTP_MT_MASK 0x7ull #define VMX_EPTP_MT_WB X86_MEMTYPE_WB #define VMX_EPTP_MT_UC X86_MEMTYPE_UC
> > I see your point here. But "#define VMX_EPTP_MT_WB 0x6ull" seems to > define > > its own memory type 0x6. I think what we want is: > > > > /* in a pat/mtrr header */ > > #define MEM_TYPE_WB 0x6 > > > > /* vmx.h */ > > #define VMX_EPTP_MT_WB MEM_TYPE_WB > > > > if it's not regarded as another layer of indirect. > > Heh, yep, I already had this: > > /* The EPTP memtype is encoded in bits 2:0, i.e. doesn't need to be shifted. */ > #define VMX_EPTP_MT_MASK 0x7ull > #define VMX_EPTP_MT_WB X86_MEMTYPE_WB > #define VMX_EPTP_MT_UC X86_MEMTYPE_UC Perfect!
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index f1bd7b91b3c6..63cd50bfdc6d 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1102,15 +1102,6 @@ #define MSR_IA32_VMX_VMFUNC 0x00000491 #define MSR_IA32_VMX_PROCBASED_CTLS3 0x00000492 -/* VMX_BASIC bits and bitmasks */ -#define VMX_BASIC_VMCS_SIZE_SHIFT 32 -#define VMX_BASIC_TRUE_CTLS (1ULL << 55) -#define VMX_BASIC_64 0x0001000000000000LLU -#define VMX_BASIC_MEM_TYPE_SHIFT 50 -#define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU -#define VMX_BASIC_MEM_TYPE_WB 6LLU -#define VMX_BASIC_INOUT 0x0040000000000000LLU - /* Resctrl MSRs: */ /* - Intel: */ #define MSR_IA32_L3_QOS_CFG 0xc81 diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0e73616b82f3..4fa8012980f6 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -120,6 +120,17 @@ #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff +/* x86 memory types, explicitly used in VMX only */ +#define MEM_TYPE_WB 0x6ULL +#define MEM_TYPE_UC 0x0ULL + +/* VMX_BASIC bits */ +#define VMX_BASIC_32BIT_PHYS_ADDR_ONLY BIT_ULL(48) +#define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49) +#define VMX_BASIC_INOUT BIT_ULL(54) +#define VMX_BASIC_TRUE_CTLS BIT_ULL(55) + + #define VMX_MISC_PREEMPTION_TIMER_RATE_MASK 0x0000001f #define VMX_MISC_SAVE_EFER_LMA 0x00000020 #define VMX_MISC_ACTIVITY_HLT 0x00000040 @@ -143,6 +154,11 @@ static inline u32 vmx_basic_vmcs_size(u64 vmx_basic) return (vmx_basic & GENMASK_ULL(44, 32)) >> 32; } +static inline u32 vmx_basic_vmcs_mem_type(u64 vmx_basic) +{ + return (vmx_basic & GENMASK_ULL(53, 50)) >> 50; +} + static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc) { return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK; @@ -505,8 +521,6 @@ enum vmcs_field { #define VMX_EPTP_PWL_5 0x20ull #define VMX_EPTP_AD_ENABLE_BIT (1ull << 6) #define VMX_EPTP_MT_MASK 0x7ull -#define VMX_EPTP_MT_WB 0x6ull -#define VMX_EPTP_MT_UC 0x0ull #define VMX_EPT_READABLE_MASK 0x1ull #define VMX_EPT_WRITABLE_MASK 0x2ull #define VMX_EPT_EXECUTABLE_MASK 0x4ull diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 41a4533f9989..86ce8bb96bed 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -54,9 +54,7 @@ struct nested_vmx_msrs { }; struct vmcs_config { - int size; - u32 basic_cap; - u32 revision_id; + u64 basic; u32 pin_based_exec_ctrl; u32 cpu_based_exec_ctrl; u32 cpu_based_2nd_exec_ctrl; @@ -76,7 +74,7 @@ extern struct vmx_capability vmx_capability __ro_after_init; static inline bool cpu_has_vmx_basic_inout(void) { - return (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT); + return vmcs_config.basic & VMX_BASIC_INOUT; } static inline bool cpu_has_virtual_nmis(void) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 994e014f8a50..80fea1875948 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1226,23 +1226,29 @@ static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask) return (superset | subset) == superset; } +#define VMX_BASIC_FEATURES_MASK \ + (VMX_BASIC_DUAL_MONITOR_TREATMENT | \ + VMX_BASIC_INOUT | \ + VMX_BASIC_TRUE_CTLS) + +#define VMX_BASIC_RESERVED_BITS \ + (GENMASK_ULL(63, 56) | GENMASK_ULL(47, 45) | BIT_ULL(31)) + static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data) { - const u64 feature_and_reserved = - /* feature (except bit 48; see below) */ - BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | - /* reserved */ - BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56); u64 vmx_basic = vmcs_config.nested.basic; - if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved)) + static_assert(!(VMX_BASIC_FEATURES_MASK & VMX_BASIC_RESERVED_BITS)); + + if (!is_bitwise_subset(vmx_basic, data, + VMX_BASIC_FEATURES_MASK | VMX_BASIC_RESERVED_BITS)) return -EINVAL; /* * KVM does not emulate a version of VMX that constrains physical * addresses of VMX structures (e.g. VMCS) to 32-bits. */ - if (data & BIT_ULL(48)) + if (data & VMX_BASIC_32BIT_PHYS_ADDR_ONLY) return -EINVAL; if (vmx_basic_vmcs_revision_id(vmx_basic) != @@ -2726,11 +2732,11 @@ static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp) /* Check for memory type validity */ switch (new_eptp & VMX_EPTP_MT_MASK) { - case VMX_EPTP_MT_UC: + case MEM_TYPE_UC: if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_UC_BIT))) return false; break; - case VMX_EPTP_MT_WB: + case MEM_TYPE_WB: if (CC(!(vmx->nested.msrs.ept_caps & VMX_EPTP_WB_BIT))) return false; break; @@ -6994,6 +7000,9 @@ static void nested_vmx_setup_misc_data(struct vmcs_config *vmcs_conf, msrs->misc_high = 0; } +#define VMX_BSAIC_VMCS12_SIZE ((u64)VMCS12_SIZE << 32) +#define VMX_BASIC_MEM_TYPE_WB (MEM_TYPE_WB << 50) + static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs) { /* @@ -7005,8 +7014,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs) msrs->basic = VMCS12_REVISION | VMX_BASIC_TRUE_CTLS | - ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) | - (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT); + VMX_BSAIC_VMCS12_SIZE | + VMX_BASIC_MEM_TYPE_WB; if (cpu_has_vmx_basic_inout()) msrs->basic |= VMX_BASIC_INOUT; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e262bc2ba4e5..dc163a580f98 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2563,13 +2563,13 @@ static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) static int setup_vmcs_config(struct vmcs_config *vmcs_conf, struct vmx_capability *vmx_cap) { - u32 vmx_msr_low, vmx_msr_high; u32 _pin_based_exec_control = 0; u32 _cpu_based_exec_control = 0; u32 _cpu_based_2nd_exec_control = 0; u64 _cpu_based_3rd_exec_control = 0; u32 _vmexit_control = 0; u32 _vmentry_control = 0; + u64 basic_msr; u64 misc_msr; int i; @@ -2688,29 +2688,25 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf, _vmexit_control &= ~x_ctrl; } - rdmsr(MSR_IA32_VMX_BASIC, vmx_msr_low, vmx_msr_high); + rdmsrl(MSR_IA32_VMX_BASIC, basic_msr); /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */ - if ((vmx_msr_high & 0x1fff) > PAGE_SIZE) + if ((vmx_basic_vmcs_size(basic_msr) > PAGE_SIZE)) return -EIO; #ifdef CONFIG_X86_64 /* IA-32 SDM Vol 3B: 64-bit CPUs always have VMX_BASIC_MSR[48]==0. */ - if (vmx_msr_high & (1u<<16)) + if (basic_msr & VMX_BASIC_32BIT_PHYS_ADDR_ONLY) return -EIO; #endif /* Require Write-Back (WB) memory type for VMCS accesses. */ - if (((vmx_msr_high >> 18) & 15) != 6) + if (vmx_basic_vmcs_mem_type(basic_msr) != MEM_TYPE_WB) return -EIO; rdmsrl(MSR_IA32_VMX_MISC, misc_msr); - vmcs_conf->size = vmx_msr_high & 0x1fff; - vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff; - - vmcs_conf->revision_id = vmx_msr_low; - + vmcs_conf->basic = basic_msr; vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control; vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control; vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control; @@ -2860,13 +2856,13 @@ struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags) if (!pages) return NULL; vmcs = page_address(pages); - memset(vmcs, 0, vmcs_config.size); + memset(vmcs, 0, vmx_basic_vmcs_size(vmcs_config.basic)); /* KVM supports Enlightened VMCS v1 only */ if (kvm_is_using_evmcs()) vmcs->hdr.revision_id = KVM_EVMCS_VERSION; else - vmcs->hdr.revision_id = vmcs_config.revision_id; + vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic); if (shadow) vmcs->hdr.shadow_vmcs = 1; @@ -2959,7 +2955,7 @@ static __init int alloc_kvm_area(void) * physical CPU. */ if (kvm_is_using_evmcs()) - vmcs->hdr.revision_id = vmcs_config.revision_id; + vmcs->hdr.revision_id = vmx_basic_vmcs_revision_id(vmcs_config.basic); per_cpu(vmxarea, cpu) = vmcs; } @@ -3361,7 +3357,7 @@ static int vmx_get_max_ept_level(void) u64 construct_eptp(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level) { - u64 eptp = VMX_EPTP_MT_WB; + u64 eptp = MEM_TYPE_WB; eptp |= (root_level == 5) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4;