diff mbox series

[v2,12/25] KVM: x86/mmu: cleanup computation of MMU roles for two-dimensional paging

Message ID 20220221162243.683208-13-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM MMU refactoring part 2: role changes | expand

Commit Message

Paolo Bonzini Feb. 21, 2022, 4:22 p.m. UTC
Inline kvm_calc_mmu_role_common into its sole caller, and simplify it
by removing the computation of unnecessary bits.

Extended bits are unnecessary because page walking uses the CPU mode,
and EFER.NX/CR0.WP can be set to one unconditionally---matching the
format of shadow pages rather than the format of guest pages.

The MMU role for two dimensional paging does still depend on the CPU mode,
even if only barely so, due to SMM and guest mode; for consistency,
pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying
the vcpu with is_smm or is_guest_mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 41 +++++++++--------------------------------
 1 file changed, 9 insertions(+), 32 deletions(-)

Comments

Sean Christopherson March 8, 2022, 6:11 p.m. UTC | #1
On Mon, Feb 21, 2022, Paolo Bonzini wrote:
> Inline kvm_calc_mmu_role_common into its sole caller, and simplify it
> by removing the computation of unnecessary bits.
> 
> Extended bits are unnecessary because page walking uses the CPU mode,
> and EFER.NX/CR0.WP can be set to one unconditionally---matching the
> format of shadow pages rather than the format of guest pages.

But they don't match the format of shadow pages.  EPT has an equivalent to NX in
that KVM can always clear X, but KVM explicitly supports running with EPT and
EFER.NX=0 in the host (32-bit non-PAE kernels).

CR0.WP equally confusing.  Yes, both EPT and NPT enforce write protection at all
times, but EPT has no concept of user vs. supervisor in the EPT tables themselves,
at least with respect to writes (thanks mode-based execution for the qualifier...).
NPT is even worse as the APM explicitly states:

  The host hCR0.WP bit is ignored under nested paging.

Unless there's some hidden dependency I'm missing, I'd prefer we arbitrarily leave
them zero.

> The MMU role for two dimensional paging does still depend on the CPU mode,

Heh, don't think it's necessary to spell out TDP, and I think it would be helpful
to write it as "non-nested TDP" since the surrounding patches deal with both.

> even if only barely so, due to SMM and guest mode; for consistency,
> pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying
> the vcpu with is_smm or is_guest_mode.

The changelog should call out this is a _significant_ change in behavior for KVM,
as it allows reusing shadow pages with different guest MMU "role bits".  E.g. if
this lands after the changes to not unload MMUs on cr0/cr4 emulation, it will be
quite the functional change.
Paolo Bonzini March 8, 2022, 6:24 p.m. UTC | #2
On 3/8/22 19:11, Sean Christopherson wrote:
> On Mon, Feb 21, 2022, Paolo Bonzini wrote:
>> Extended bits are unnecessary because page walking uses the CPU mode,
>> and EFER.NX/CR0.WP can be set to one unconditionally---matching the
>> format of shadow pages rather than the format of guest pages.
> 
> But they don't match the format of shadow pages.  EPT has an equivalent to NX in
> that KVM can always clear X, but KVM explicitly supports running with EPT and
> EFER.NX=0 in the host (32-bit non-PAE kernels).

In which case bit 2 of EPTs doesn't change meaning, does it?

> CR0.WP equally confusing.  Yes, both EPT and NPT enforce write protection at all
> times, but EPT has no concept of user vs. supervisor in the EPT tables themselves,
> at least with respect to writes (thanks mode-based execution for the qualifier...).
> NPT is even worse as the APM explicitly states:
> 
>    The host hCR0.WP bit is ignored under nested paging.
> 
> Unless there's some hidden dependency I'm missing, I'd prefer we arbitrarily leave
> them zero.

Setting EFER.NX=0 might be okay for EPT/NPT, but I'd prefer to set it 
respectively to 1 (X bit always present) and host EFER.NX (NX bit 
present depending on host EFER).

For CR0.WP it should really be 1 in my opinion, because CR0.WP=0 implies 
having a concept of user vs. supervisor access: CR0.WP=1 is the 
"default", while CR0.WP=0 is "always allow *supervisor* writes".

>> even if only barely so, due to SMM and guest mode; for consistency,
>> pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying
>> the vcpu with is_smm or is_guest_mode.
> 
> The changelog should call out this is a _significant_ change in behavior for KVM,
> as it allows reusing shadow pages with different guest MMU "role bits".

Good point!  It's safe and arguably clea{n,r}er, but it's still a pretty 
large change.

> E.g. if this lands after the changes to not unload MMUs on cr0/cr4
> emulation, it will be quite the functional change.
I expect this to land first, so that the part where we don't really 
agree on the implementation comes last and benefits from a more 
understandable core.

Paolo
Sean Christopherson March 8, 2022, 6:38 p.m. UTC | #3
On Tue, Mar 08, 2022, Sean Christopherson wrote:
> On Mon, Feb 21, 2022, Paolo Bonzini wrote:
> > Inline kvm_calc_mmu_role_common into its sole caller, and simplify it
> > by removing the computation of unnecessary bits.
> > 
> > Extended bits are unnecessary because page walking uses the CPU mode,
> > and EFER.NX/CR0.WP can be set to one unconditionally---matching the
> > format of shadow pages rather than the format of guest pages.
> 
> But they don't match the format of shadow pages.  EPT has an equivalent to NX in
> that KVM can always clear X, but KVM explicitly supports running with EPT and
> EFER.NX=0 in the host (32-bit non-PAE kernels).

Oof, digging into this made me realize we probably have yet another bug in the
handling of the NX hugepages workaround.  Unless I'm overlooking something, NX is
treated as reserved for NPT shadow pages and would cause WARN spam if someone
enabled the mitigation on AMD CPUs.

Untested...  If this is needed, it also means I didn't properly test commit
b26a71a1a5b9 ("KVM: SVM: Refuse to load kvm_amd if NX support is not available").

:-(

From: Sean Christopherson <seanjc@google.com>
Date: Tue, 8 Mar 2022 10:31:27 -0800
Subject: [PATCH] KVM: x86/mmu: Don't treat NX as reserved for NPT MMUs

Mark the NX bit as allowed for NPT shadow pages.  KVM doesn't use NX in
"normal" operation as KVM doesn't honor userspace execute-protection
settings, but KVM allows userspace to enable the NX hugepages mitigation
on AMD CPUs, despite no known AMD CPUs being affected by the iTLB
multi-hit erratum.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 47a2c5f3044d..9c79a0927a48 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4461,13 +4461,17 @@ static inline bool boot_cpu_is_amd(void)
 	return shadow_x_mask == 0;
 }

-/*
- * the direct page table on host, use as much mmu features as
- * possible, however, kvm currently does not do execution-protection.
- */
 static void
 reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
 {
+	/*
+	 * KVM doesn't honor execute-protection from the host page tables, but
+	 * NX is required and potentially used at any time by KVM for NPT, as
+	 * the NX hugepages iTLB multi-hit mitigation is supported for any CPU
+	 * despite no known AMD (and derivative) CPUs being affected by erratum.
+	 */
+	bool efer_nx = true;
+
 	struct rsvd_bits_validate *shadow_zero_check;
 	int i;

@@ -4475,7 +4479,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)

 	if (boot_cpu_is_amd())
 		__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
-					context->shadow_root_level, false,
+					context->shadow_root_level, efer_nx,
 					boot_cpu_has(X86_FEATURE_GBPAGES),
 					false, true);
 	else

base-commit: 2a809a65d88e7e071dc6ef4a6be8416d25885efe
--
Sean Christopherson March 8, 2022, 6:44 p.m. UTC | #4
On Tue, Mar 08, 2022, Paolo Bonzini wrote:
> On 3/8/22 19:11, Sean Christopherson wrote:
> > On Mon, Feb 21, 2022, Paolo Bonzini wrote:
> > > Extended bits are unnecessary because page walking uses the CPU mode,
> > > and EFER.NX/CR0.WP can be set to one unconditionally---matching the
> > > format of shadow pages rather than the format of guest pages.
> > 
> > But they don't match the format of shadow pages.  EPT has an equivalent to NX in
> > that KVM can always clear X, but KVM explicitly supports running with EPT and
> > EFER.NX=0 in the host (32-bit non-PAE kernels).
> 
> In which case bit 2 of EPTs doesn't change meaning, does it?
> 
> > CR0.WP equally confusing.  Yes, both EPT and NPT enforce write protection at all
> > times, but EPT has no concept of user vs. supervisor in the EPT tables themselves,
> > at least with respect to writes (thanks mode-based execution for the qualifier...).
> > NPT is even worse as the APM explicitly states:
> > 
> >    The host hCR0.WP bit is ignored under nested paging.
> > 
> > Unless there's some hidden dependency I'm missing, I'd prefer we arbitrarily leave
> > them zero.
> 
> Setting EFER.NX=0 might be okay for EPT/NPT, but I'd prefer to set it
> respectively to 1 (X bit always present) and host EFER.NX (NX bit present
> depending on host EFER).
> 
> For CR0.WP it should really be 1 in my opinion, because CR0.WP=0 implies
> having a concept of user vs. supervisor access: CR0.WP=1 is the "default",
> while CR0.WP=0 is "always allow *supervisor* writes".

Yeah, I think we generally agree, just came to different conclusions :-)  I'm
totally fine setting them to '1', especially given the patch I just "posted",
but please add comments (suggested NX comment below).  The explicit "WP is ignored"
blurb for hCR0 on NPT will be especially confusing at some point.

With efer_nx forced to '1', we can do this somewhere in this series.  I really,
really despise "context" :-).

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9c79a0927a48..657df7fd74bf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4461,25 +4461,15 @@ static inline bool boot_cpu_is_amd(void)
        return shadow_x_mask == 0;
 }
 
-static void
-reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
+static void reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *mmu)
 {
-       /*
-        * KVM doesn't honor execute-protection from the host page tables, but
-        * NX is required and potentially used at any time by KVM for NPT, as
-        * the NX hugepages iTLB multi-hit mitigation is supported for any CPU
-        * despite no known AMD (and derivative) CPUs being affected by erratum.
-        */
-       bool efer_nx = true;
-
-       struct rsvd_bits_validate *shadow_zero_check;
        int i;
 
-       shadow_zero_check = &context->shadow_zero_check;
+       shadow_zero_check = &mmu->shadow_zero_check;
 
        if (boot_cpu_is_amd())
                __reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
-                                       context->shadow_root_level, efer_nx,
+                                       mmu->shadow_root_level, is_efer_nx(mmu),
                                        boot_cpu_has(X86_FEATURE_GBPAGES),
                                        false, true);
        else
@@ -4490,7 +4480,7 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_mmu *context)
        if (!shadow_me_mask)
                return;
 
-       for (i = context->shadow_root_level; --i >= 0;) {
+       for (i = mmu->shadow_root_level; --i >= 0;) {
                shadow_zero_check->rsvd_bits_mask[0][i] &= ~shadow_me_mask;
                shadow_zero_check->rsvd_bits_mask[1][i] &= ~shadow_me_mask;
        }
@@ -4751,6 +4741,16 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
 
        role.base.access = ACC_ALL;
        role.base.cr0_wp = true;
+
+       /*
+        * KVM doesn't honor execute-protection from the host page tables, but
+        * NX is required and potentially used at any time by KVM for NPT, as
+        * the NX hugepages iTLB multi-hit mitigation is supported for any CPU
+        * despite no known AMD (and derivative) CPUs being affected by erratum.
+        *
+        * This is functionally accurate for EPT, if technically wrong, as KVM
+        * can always clear the X bit on EPT,
+        */
        role.base.efer_nx = true;
        role.base.smm = cpu_mode.base.smm;
        role.base.guest_mode = cpu_mode.base.guest_mode;
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 31874fad12fb..0a08ab8e2e4e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4706,34 +4706,6 @@  kvm_calc_cpu_mode(struct kvm_vcpu *vcpu, const struct kvm_mmu_role_regs *regs)
 	return role;
 }
 
-static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu,
-						   const struct kvm_mmu_role_regs *regs)
-{
-	union kvm_mmu_role role = {0};
-
-	role.base.access = ACC_ALL;
-	if (____is_cr0_pg(regs)) {
-		role.ext.cr0_pg = 1;
-		role.base.efer_nx = ____is_efer_nx(regs);
-		role.base.cr0_wp = ____is_cr0_wp(regs);
-
-		role.ext.cr4_pae = ____is_cr4_pae(regs);
-		role.ext.cr4_smep = ____is_cr4_smep(regs);
-		role.ext.cr4_smap = ____is_cr4_smap(regs);
-		role.ext.cr4_pse = ____is_cr4_pse(regs);
-
-		/* PKEY and LA57 are active iff long mode is active. */
-		role.ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
-		role.ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
-		role.ext.efer_lma = ____is_efer_lma(regs);
-	}
-	role.base.smm = is_smm(vcpu);
-	role.base.guest_mode = is_guest_mode(vcpu);
-	role.ext.valid = 1;
-
-	return role;
-}
-
 static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
 {
 	/* tdp_root_level is architecture forced level, use it if nonzero */
@@ -4749,14 +4721,20 @@  static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
 
 static union kvm_mmu_role
 kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
-				const struct kvm_mmu_role_regs *regs)
+				union kvm_mmu_role cpu_mode)
 {
-	union kvm_mmu_role role = kvm_calc_mmu_role_common(vcpu, regs);
+	union kvm_mmu_role role = {0};
 
+	role.base.access = ACC_ALL;
+	role.base.cr0_wp = true;
+	role.base.efer_nx = true;
+	role.base.smm = cpu_mode.base.smm;
+	role.base.guest_mode = cpu_mode.base.guest_mode;
 	role.base.ad_disabled = (shadow_accessed_mask == 0);
 	role.base.level = kvm_mmu_get_tdp_level(vcpu);
 	role.base.direct = true;
 	role.base.has_4_byte_gpte = false;
+	role.ext.valid = true;
 
 	return role;
 }
@@ -4766,8 +4744,7 @@  static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
 {
 	struct kvm_mmu *context = &vcpu->arch.root_mmu;
 	union kvm_mmu_role cpu_mode = kvm_calc_cpu_mode(vcpu, regs);
-	union kvm_mmu_role mmu_role =
-		kvm_calc_tdp_mmu_root_page_role(vcpu, regs);
+	union kvm_mmu_role mmu_role = kvm_calc_tdp_mmu_root_page_role(vcpu, cpu_mode);
 
 	if (cpu_mode.as_u64 == context->cpu_mode.as_u64 &&
 	    mmu_role.as_u64 == context->mmu_role.as_u64)