Message ID | 20220221162243.683208-13-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM MMU refactoring part 2: role changes | expand |
On Mon, Feb 21, 2022, Paolo Bonzini wrote: > Inline kvm_calc_mmu_role_common into its sole caller, and simplify it > by removing the computation of unnecessary bits. > > Extended bits are unnecessary because page walking uses the CPU mode, > and EFER.NX/CR0.WP can be set to one unconditionally---matching the > format of shadow pages rather than the format of guest pages. But they don't match the format of shadow pages. EPT has an equivalent to NX in that KVM can always clear X, but KVM explicitly supports running with EPT and EFER.NX=0 in the host (32-bit non-PAE kernels). CR0.WP equally confusing. Yes, both EPT and NPT enforce write protection at all times, but EPT has no concept of user vs. supervisor in the EPT tables themselves, at least with respect to writes (thanks mode-based execution for the qualifier...). NPT is even worse as the APM explicitly states: The host hCR0.WP bit is ignored under nested paging. Unless there's some hidden dependency I'm missing, I'd prefer we arbitrarily leave them zero. > The MMU role for two dimensional paging does still depend on the CPU mode, Heh, don't think it's necessary to spell out TDP, and I think it would be helpful to write it as "non-nested TDP" since the surrounding patches deal with both. > even if only barely so, due to SMM and guest mode; for consistency, > pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying > the vcpu with is_smm or is_guest_mode. The changelog should call out this is a _significant_ change in behavior for KVM, as it allows reusing shadow pages with different guest MMU "role bits". E.g. if this lands after the changes to not unload MMUs on cr0/cr4 emulation, it will be quite the functional change.
On 3/8/22 19:11, Sean Christopherson wrote: > On Mon, Feb 21, 2022, Paolo Bonzini wrote: >> Extended bits are unnecessary because page walking uses the CPU mode, >> and EFER.NX/CR0.WP can be set to one unconditionally---matching the >> format of shadow pages rather than the format of guest pages. > > But they don't match the format of shadow pages. EPT has an equivalent to NX in > that KVM can always clear X, but KVM explicitly supports running with EPT and > EFER.NX=0 in the host (32-bit non-PAE kernels). In which case bit 2 of EPTs doesn't change meaning, does it? > CR0.WP equally confusing. Yes, both EPT and NPT enforce write protection at all > times, but EPT has no concept of user vs. supervisor in the EPT tables themselves, > at least with respect to writes (thanks mode-based execution for the qualifier...). > NPT is even worse as the APM explicitly states: > > The host hCR0.WP bit is ignored under nested paging. > > Unless there's some hidden dependency I'm missing, I'd prefer we arbitrarily leave > them zero. Setting EFER.NX=0 might be okay for EPT/NPT, but I'd prefer to set it respectively to 1 (X bit always present) and host EFER.NX (NX bit present depending on host EFER). For CR0.WP it should really be 1 in my opinion, because CR0.WP=0 implies having a concept of user vs. supervisor access: CR0.WP=1 is the "default", while CR0.WP=0 is "always allow *supervisor* writes". >> even if only barely so, due to SMM and guest mode; for consistency, >> pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying >> the vcpu with is_smm or is_guest_mode. > > The changelog should call out this is a _significant_ change in behavior for KVM, > as it allows reusing shadow pages with different guest MMU "role bits". Good point! It's safe and arguably clea{n,r}er, but it's still a pretty large change. > E.g. if this lands after the changes to not unload MMUs on cr0/cr4 > emulation, it will be quite the functional change. I expect this to land first, so that the part where we don't really agree on the implementation comes last and benefits from a more understandable core. Paolo
On Tue, Mar 08, 2022, Sean Christopherson wrote: > On Mon, Feb 21, 2022, Paolo Bonzini wrote: > > Inline kvm_calc_mmu_role_common into its sole caller, and simplify it > > by removing the computation of unnecessary bits. > > > > Extended bits are unnecessary because page walking uses the CPU mode, > > and EFER.NX/CR0.WP can be set to one unconditionally---matching the > > format of shadow pages rather than the format of guest pages. > > But they don't match the format of shadow pages. EPT has an equivalent to NX in > that KVM can always clear X, but KVM explicitly supports running with EPT and > EFER.NX=0 in the host (32-bit non-PAE kernels). Oof, digging into this made me realize we probably have yet another bug in the handling of the NX hugepages workaround. Unless I'm overlooking something, NX is treated as reserved for NPT shadow pages and would cause WARN spam if someone enabled the mitigation on AMD CPUs. Untested... If this is needed, it also means I didn't properly test commit b26a71a1a5b9 ("KVM: SVM: Refuse to load kvm_amd if NX support is not available"). :-( From: Sean Christopherson <seanjc@google.com> Date: Tue, 8 Mar 2022 10:31:27 -0800 Subject: [PATCH] KVM: x86/mmu: Don't treat NX as reserved for NPT MMUs Mark the NX bit as allowed for NPT shadow pages. KVM doesn't use NX in "normal" operation as KVM doesn't honor userspace execute-protection settings, but KVM allows userspace to enable the NX hugepages mitigation on AMD CPUs, despite no known AMD CPUs being affected by the iTLB multi-hit erratum. Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 47a2c5f3044d..9c79a0927a48 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4461,13 +4461,17 @@ static inline bool boot_cpu_is_amd(void) return shadow_x_mask == 0; } -/* - * the direct page table on host, use as much mmu features as - * possible, however, kvm currently does not do execution-protection. - */ static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) { + /* + * KVM doesn't honor execute-protection from the host page tables, but + * NX is required and potentially used at any time by KVM for NPT, as + * the NX hugepages iTLB multi-hit mitigation is supported for any CPU + * despite no known AMD (and derivative) CPUs being affected by erratum. + */ + bool efer_nx = true; + struct rsvd_bits_validate *shadow_zero_check; int i; @@ -4475,7 +4479,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) if (boot_cpu_is_amd()) __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(), - context->shadow_root_level, false, + context->shadow_root_level, efer_nx, boot_cpu_has(X86_FEATURE_GBPAGES), false, true); else base-commit: 2a809a65d88e7e071dc6ef4a6be8416d25885efe --
On Tue, Mar 08, 2022, Paolo Bonzini wrote: > On 3/8/22 19:11, Sean Christopherson wrote: > > On Mon, Feb 21, 2022, Paolo Bonzini wrote: > > > Extended bits are unnecessary because page walking uses the CPU mode, > > > and EFER.NX/CR0.WP can be set to one unconditionally---matching the > > > format of shadow pages rather than the format of guest pages. > > > > But they don't match the format of shadow pages. EPT has an equivalent to NX in > > that KVM can always clear X, but KVM explicitly supports running with EPT and > > EFER.NX=0 in the host (32-bit non-PAE kernels). > > In which case bit 2 of EPTs doesn't change meaning, does it? > > > CR0.WP equally confusing. Yes, both EPT and NPT enforce write protection at all > > times, but EPT has no concept of user vs. supervisor in the EPT tables themselves, > > at least with respect to writes (thanks mode-based execution for the qualifier...). > > NPT is even worse as the APM explicitly states: > > > > The host hCR0.WP bit is ignored under nested paging. > > > > Unless there's some hidden dependency I'm missing, I'd prefer we arbitrarily leave > > them zero. > > Setting EFER.NX=0 might be okay for EPT/NPT, but I'd prefer to set it > respectively to 1 (X bit always present) and host EFER.NX (NX bit present > depending on host EFER). > > For CR0.WP it should really be 1 in my opinion, because CR0.WP=0 implies > having a concept of user vs. supervisor access: CR0.WP=1 is the "default", > while CR0.WP=0 is "always allow *supervisor* writes". Yeah, I think we generally agree, just came to different conclusions :-) I'm totally fine setting them to '1', especially given the patch I just "posted", but please add comments (suggested NX comment below). The explicit "WP is ignored" blurb for hCR0 on NPT will be especially confusing at some point. With efer_nx forced to '1', we can do this somewhere in this series. I really, really despise "context" :-). diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 9c79a0927a48..657df7fd74bf 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4461,25 +4461,15 @@ static inline bool boot_cpu_is_amd(void) return shadow_x_mask == 0; } -static void -reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) +static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *mmu) { - /* - * KVM doesn't honor execute-protection from the host page tables, but - * NX is required and potentially used at any time by KVM for NPT, as - * the NX hugepages iTLB multi-hit mitigation is supported for any CPU - * despite no known AMD (and derivative) CPUs being affected by erratum. - */ - bool efer_nx = true; - - struct rsvd_bits_validate *shadow_zero_check; int i; - shadow_zero_check = &context->shadow_zero_check; + shadow_zero_check = &mmu->shadow_zero_check; if (boot_cpu_is_amd()) __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(), - context->shadow_root_level, efer_nx, + mmu->shadow_root_level, is_efer_nx(mmu), boot_cpu_has(X86_FEATURE_GBPAGES), false, true); else @@ -4490,7 +4480,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context) if (!shadow_me_mask) return; - for (i = context->shadow_root_level; --i >= 0;) { + for (i = mmu->shadow_root_level; --i >= 0;) { shadow_zero_check->rsvd_bits_mask[0][i] &= ~shadow_me_mask; shadow_zero_check->rsvd_bits_mask[1][i] &= ~shadow_me_mask; } @@ -4751,6 +4741,16 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, role.base.access = ACC_ALL; role.base.cr0_wp = true; + + /* + * KVM doesn't honor execute-protection from the host page tables, but + * NX is required and potentially used at any time by KVM for NPT, as + * the NX hugepages iTLB multi-hit mitigation is supported for any CPU + * despite no known AMD (and derivative) CPUs being affected by erratum. + * + * This is functionally accurate for EPT, if technically wrong, as KVM + * can always clear the X bit on EPT, + */ role.base.efer_nx = true; role.base.smm = cpu_mode.base.smm; role.base.guest_mode = cpu_mode.base.guest_mode;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 31874fad12fb..0a08ab8e2e4e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4706,34 +4706,6 @@ kvm_calc_cpu_mode(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs) return role; } -static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu, - const struct kvm_mmu_role_regs *regs) -{ - union kvm_mmu_role role = {0}; - - role.base.access = ACC_ALL; - if (____is_cr0_pg(regs)) { - role.ext.cr0_pg = 1; - role.base.efer_nx = ____is_efer_nx(regs); - role.base.cr0_wp = ____is_cr0_wp(regs); - - role.ext.cr4_pae = ____is_cr4_pae(regs); - role.ext.cr4_smep = ____is_cr4_smep(regs); - role.ext.cr4_smap = ____is_cr4_smap(regs); - role.ext.cr4_pse = ____is_cr4_pse(regs); - - /* PKEY and LA57 are active iff long mode is active. */ - role.ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs); - role.ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs); - role.ext.efer_lma = ____is_efer_lma(regs); - } - role.base.smm = is_smm(vcpu); - role.base.guest_mode = is_guest_mode(vcpu); - role.ext.valid = 1; - - return role; -} - static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) { /* tdp_root_level is architecture forced level, use it if nonzero */ @@ -4749,14 +4721,20 @@ static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu) static union kvm_mmu_role kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, - const struct kvm_mmu_role_regs *regs) + union kvm_mmu_role cpu_mode) { - union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, regs); + union kvm_mmu_role role = {0}; + role.base.access = ACC_ALL; + role.base.cr0_wp = true; + role.base.efer_nx = true; + role.base.smm = cpu_mode.base.smm; + role.base.guest_mode = cpu_mode.base.guest_mode; role.base.ad_disabled = (shadow_accessed_mask == 0); role.base.level = kvm_mmu_get_tdp_level(vcpu); role.base.direct = true; role.base.has_4_byte_gpte = false; + role.ext.valid = true; return role; } @@ -4766,8 +4744,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu, { struct kvm_mmu *context = &vcpu->arch.root_mmu; union kvm_mmu_role cpu_mode = kvm_calc_cpu_mode(vcpu, regs); - union kvm_mmu_role mmu_role = - kvm_calc_tdp_mmu_root_page_role(vcpu, regs); + union kvm_mmu_role mmu_role = kvm_calc_tdp_mmu_root_page_role(vcpu, cpu_mode); if (cpu_mode.as_u64 == context->cpu_mode.as_u64 && mmu_role.as_u64 == context->mmu_role.as_u64)
Inline kvm_calc_mmu_role_common into its sole caller, and simplify it by removing the computation of unnecessary bits. Extended bits are unnecessary because page walking uses the CPU mode, and EFER.NX/CR0.WP can be set to one unconditionally---matching the format of shadow pages rather than the format of guest pages. The MMU role for two dimensional paging does still depend on the CPU mode, even if only barely so, due to SMM and guest mode; for consistency, pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying the vcpu with is_smm or is_guest_mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/mmu/mmu.c | 41 +++++++++-------------------------------- 1 file changed, 9 insertions(+), 32 deletions(-)