[00/23] KVM: MMU: MMU role refactoring

Message ID	20220204115718.14934-1-pbonzini@redhat.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: Paolo Bonzini <pbonzini@redhat.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: dmatlack@google.com, seanjc@google.com, vkuznets@redhat.com Subject: [PATCH 00/23] KVM: MMU: MMU role refactoring Date: Fri, 4 Feb 2022 06:56:55 -0500 Message-Id: <20220204115718.14934-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM: MMU: MMU role refactoring \| expand [00/23] KVM: MMU: MMU role refactoring [01/23] KVM: MMU: pass uses_nx directly to reset_shadow_zero_bits_mask [02/23] KVM: MMU: nested EPT cannot be used in SMM [03/23] KVM: MMU: remove valid from extended role [04/23] KVM: MMU: constify uses of struct kvm_mmu_role_regs [05/23] KVM: MMU: pull computation of kvm_mmu_role_regs to kvm_init_mmu [06/23] KVM: MMU: load new PGD once nested two-dimensional paging is initialized [07/23] KVM: MMU: remove kvm_mmu_calc_root_page_role [08/23] KVM: MMU: rephrase unclear comment [09/23] KVM: MMU: remove "bool base_only" arguments [10/23] KVM: MMU: split cpu_role from mmu_role [11/23] KVM: MMU: do not recompute root level from kvm_mmu_role_regs [12/23] KVM: MMU: remove ept_ad field [13/23] KVM: MMU: remove kvm_calc_shadow_root_page_role_common [14/23] KVM: MMU: cleanup computation of MMU roles for two-dimensional paging [15/23] KVM: MMU: cleanup computation of MMU roles for shadow paging [16/23] KVM: MMU: remove extended bits from mmu_role [17/23] KVM: MMU: remove redundant bits from extended role [18/23] KVM: MMU: fetch shadow EFER.NX from MMU role [19/23] KVM: MMU: simplify and/or inline computation of shadow MMU roles [20/23] KVM: MMU: pull CPU role computation to kvm_init_mmu [21/23] KVM: MMU: store shadow_root_level into mmu_role [22/23] KVM: MMU: use cpu_role for root_level [23/23] KVM: MMU: replace direct_map with mmu_role.direct

Paolo Bonzini Feb. 4, 2022, 11:56 a.m. UTC

The TDP MMU has a performance regression compared to the legacy
MMU when CR0 changes often.  This was reported for the grsecurity
kernel, which uses CR0.WP to implement kernel W^X.  In that case,
each change to CR0.WP unloads the MMU and causes a lot of unnecessary
work.  When running nested, this can even cause the L1 to hardly
make progress, as the L0 hypervisor it is overwhelmed by the amount
of MMU work that is needed.

The root cause of the issue is that the "MMU role" in KVM is a mess
that mixes the CPU setup (CR0/CR4/EFER, SMM, guest mode, etc.)
and the shadow page table format.  Whenever something is different
between the MMU and the CPU, it is stored as an extra field in struct
kvm_mmu---and for extra bonus complication, sometimes the same thing
is stored in both the role and an extra field.

So, this is the "no functional change intended" part of the changes
required to fix the performance regression.  It separates neatly
the shadow page table format ("MMU role") from the guest page table
format ("CPU role"), and removes the duplicate fields.  The next
step then is to avoid unloading the MMU as long as the MMU role
stays the same.

Please review!

Paolo

Paolo Bonzini (23):
  KVM: MMU: pass uses_nx directly to reset_shadow_zero_bits_mask
  KVM: MMU: nested EPT cannot be used in SMM
  KVM: MMU: remove valid from extended role
  KVM: MMU: constify uses of struct kvm_mmu_role_regs
  KVM: MMU: pull computation of kvm_mmu_role_regs to kvm_init_mmu
  KVM: MMU: load new PGD once nested two-dimensional paging is
    initialized
  KVM: MMU: remove kvm_mmu_calc_root_page_role
  KVM: MMU: rephrase unclear comment
  KVM: MMU: remove "bool base_only" arguments
  KVM: MMU: split cpu_role from mmu_role
  KVM: MMU: do not recompute root level from kvm_mmu_role_regs
  KVM: MMU: remove ept_ad field
  KVM: MMU: remove kvm_calc_shadow_root_page_role_common
  KVM: MMU: cleanup computation of MMU roles for two-dimensional paging
  KVM: MMU: cleanup computation of MMU roles for shadow paging
  KVM: MMU: remove extended bits from mmu_role
  KVM: MMU: remove redundant bits from extended role
  KVM: MMU: fetch shadow EFER.NX from MMU role
  KVM: MMU: simplify and/or inline computation of shadow MMU roles
  KVM: MMU: pull CPU role computation to kvm_init_mmu
  KVM: MMU: store shadow_root_level into mmu_role
  KVM: MMU: use cpu_role for root_level
  KVM: MMU: replace direct_map with mmu_role.direct

 arch/x86/include/asm/kvm_host.h |  13 +-
 arch/x86/kvm/mmu.h              |   2 +-
 arch/x86/kvm/mmu/mmu.c          | 408 ++++++++++++--------------------
 arch/x86/kvm/mmu/mmu_audit.c    |   6 +-
 arch/x86/kvm/mmu/paging_tmpl.h  |  12 +-
 arch/86/kvm/mmu/tdp_mmu.c      |   4 +-
 arch/x86/kvm/svm/svm.c          |   2 +-
 arch/x86/kvm/vmx/vmx.c          |   2 +-
 arch/x86/kvm/x86.c              |  12 +-
 10 files changed, 178 insertions(+), 284 deletions(-)

David Matlack Feb. 7, 2022, 11:08 p.m. UTC | #1

On Fri, Feb 04, 2022 at 06:56:55AM -0500, Paolo Bonzini wrote:
> The TDP MMU has a performance regression compared to the legacy
> MMU when CR0 changes often.  This was reported for the grsecurity
> kernel, which uses CR0.WP to implement kernel W^X.  In that case,
> each change to CR0.WP unloads the MMU and causes a lot of unnecessary
> work.  When running nested, this can even cause the L1 to hardly
> make progress, as the L0 hypervisor it is overwhelmed by the amount
> of MMU work that is needed.
> 
> The root cause of the issue is that the "MMU role" in KVM is a mess
> that mixes the CPU setup (CR0/CR4/EFER, SMM, guest mode, etc.)
> and the shadow page table format.  Whenever something is different
> between the MMU and the CPU, it is stored as an extra field in struct
> kvm_mmu---and for extra bonus complication, sometimes the same thing
> is stored in both the role and an extra field.
> 
> So, this is the "no functional change intended" part of the changes
> required to fix the performance regression.  It separates neatly
> the shadow page table format ("MMU role") from the guest page table
> format ("CPU role"), and removes the duplicate fields.

What do you think about calling this the guest_role instead of cpu_role?
There is a bit of a precedent for using "guest" instead of "cpu" already
for this type of concept (e.g. guest_walker), and I find it more
intuitive.

> The next
> step then is to avoid unloading the MMU as long as the MMU role
> stays the same.
> 
> Please review!
> 
> Paolo
> 
> Paolo Bonzini (23):
>   KVM: MMU: pass uses_nx directly to reset_shadow_zero_bits_mask
>   KVM: MMU: nested EPT cannot be used in SMM
>   KVM: MMU: remove valid from extended role
>   KVM: MMU: constify uses of struct kvm_mmu_role_regs
>   KVM: MMU: pull computation of kvm_mmu_role_regs to kvm_init_mmu
>   KVM: MMU: load new PGD once nested two-dimensional paging is
>     initialized
>   KVM: MMU: remove kvm_mmu_calc_root_page_role
>   KVM: MMU: rephrase unclear comment
>   KVM: MMU: remove "bool base_only" arguments
>   KVM: MMU: split cpu_role from mmu_role
>   KVM: MMU: do not recompute root level from kvm_mmu_role_regs
>   KVM: MMU: remove ept_ad field
>   KVM: MMU: remove kvm_calc_shadow_root_page_role_common
>   KVM: MMU: cleanup computation of MMU roles for two-dimensional paging
>   KVM: MMU: cleanup computation of MMU roles for shadow paging
>   KVM: MMU: remove extended bits from mmu_role
>   KVM: MMU: remove redundant bits from extended role
>   KVM: MMU: fetch shadow EFER.NX from MMU role
>   KVM: MMU: simplify and/or inline computation of shadow MMU roles
>   KVM: MMU: pull CPU role computation to kvm_init_mmu
>   KVM: MMU: store shadow_root_level into mmu_role
>   KVM: MMU: use cpu_role for root_level
>   KVM: MMU: replace direct_map with mmu_role.direct
> 
>  arch/x86/include/asm/kvm_host.h |  13 +-
>  arch/x86/kvm/mmu.h              |   2 +-
>  arch/x86/kvm/mmu/mmu.c          | 408 ++++++++++++--------------------
>  arch/x86/kvm/mmu/mmu_audit.c    |   6 +-
>  arch/x86/kvm/mmu/paging_tmpl.h  |  12 +-
>  arch/86/kvm/mmu/tdp_mmu.c      |   4 +-
>  arch/x86/kvm/svm/svm.c          |   2 +-
>  arch/x86/kvm/vmx/vmx.c          |   2 +-
>  arch/x86/kvm/x86.c              |  12 +-
>  10 files changed, 178 insertions(+), 284 deletions(-)
> 
> -- 
> 2.31.1
>

Sean Christopherson Feb. 7, 2022, 11:27 p.m. UTC | #2

On Mon, Feb 07, 2022, David Matlack wrote:
> On Fri, Feb 04, 2022 at 06:56:55AM -0500, Paolo Bonzini wrote:
> > The TDP MMU has a performance regression compared to the legacy
> > MMU when CR0 changes often.  This was reported for the grsecurity
> > kernel, which uses CR0.WP to implement kernel W^X.  In that case,
> > each change to CR0.WP unloads the MMU and causes a lot of unnecessary
> > work.  When running nested, this can even cause the L1 to hardly
> > make progress, as the L0 hypervisor it is overwhelmed by the amount
> > of MMU work that is needed.
> > 
> > The root cause of the issue is that the "MMU role" in KVM is a mess
> > that mixes the CPU setup (CR0/CR4/EFER, SMM, guest mode, etc.)
> > and the shadow page table format.  Whenever something is different
> > between the MMU and the CPU, it is stored as an extra field in struct
> > kvm_mmu---and for extra bonus complication, sometimes the same thing
> > is stored in both the role and an extra field.
> > 
> > So, this is the "no functional change intended" part of the changes
> > required to fix the performance regression.  It separates neatly
> > the shadow page table format ("MMU role") from the guest page table
> > format ("CPU role"), and removes the duplicate fields.
> 
> What do you think about calling this the guest_role instead of cpu_role?
> There is a bit of a precedent for using "guest" instead of "cpu" already
> for this type of concept (e.g. guest_walker), and I find it more
> intuitive.

Haven't looked at the series yet, but I'd prefer not to use guest_role, it's
too similar to is_guest_mode() and kvm_mmu_role.guest_mode.  E.g. we'd end up with

  static union kvm_mmu_role kvm_calc_guest_role(struct kvm_vcpu *vcpu,
  					      const struct kvm_mmu_role_regs *regs)
  {
	union kvm_mmu_role role = {0};

	role.base.access = ACC_ALL;
	role.base.smm = is_smm(vcpu);
	role.base.guest_mode = is_guest_mode(vcpu);
	role.base.direct = !____is_cr0_pg(regs);

	...
  }

and possibly

	if (guest_role.guest_mode)
		...

which would be quite messy.  Maybe vcpu_role if cpu_role isn't intuitive?

David Matlack Feb. 7, 2022, 11:53 p.m. UTC | #3

On Mon, Feb 7, 2022 at 3:27 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Feb 07, 2022, David Matlack wrote:
> > On Fri, Feb 04, 2022 at 06:56:55AM -0500, Paolo Bonzini wrote:
> > > The TDP MMU has a performance regression compared to the legacy
> > > MMU when CR0 changes often.  This was reported for the grsecurity
> > > kernel, which uses CR0.WP to implement kernel W^X.  In that case,
> > > each change to CR0.WP unloads the MMU and causes a lot of unnecessary
> > > work.  When running nested, this can even cause the L1 to hardly
> > > make progress, as the L0 hypervisor it is overwhelmed by the amount
> > > of MMU work that is needed.
> > >
> > > The root cause of the issue is that the "MMU role" in KVM is a mess
> > > that mixes the CPU setup (CR0/CR4/EFER, SMM, guest mode, etc.)
> > > and the shadow page table format.  Whenever something is different
> > > between the MMU and the CPU, it is stored as an extra field in struct
> > > kvm_mmu---and for extra bonus complication, sometimes the same thing
> > > is stored in both the role and an extra field.
> > >
> > > So, this is the "no functional change intended" part of the changes
> > > required to fix the performance regression.  It separates neatly
> > > the shadow page table format ("MMU role") from the guest page table
> > > format ("CPU role"), and removes the duplicate fields.
> >
> > What do you think about calling this the guest_role instead of cpu_role?
> > There is a bit of a precedent for using "guest" instead of "cpu" already
> > for this type of concept (e.g. guest_walker), and I find it more
> > intuitive.
>
> Haven't looked at the series yet, but I'd prefer not to use guest_role, it's
> too similar to is_guest_mode() and kvm_mmu_role.guest_mode.  E.g. we'd end up with
>
>   static union kvm_mmu_role kvm_calc_guest_role(struct kvm_vcpu *vcpu,
>                                               const struct kvm_mmu_role_regs *regs)
>   {
>         union kvm_mmu_role role = {0};
>
>         role.base.access = ACC_ALL;
>         role.base.smm = is_smm(vcpu);
>         role.base.guest_mode = is_guest_mode(vcpu);
>         role.base.direct = !____is_cr0_pg(regs);
>
>         ...
>   }
>
> and possibly
>
>         if (guest_role.guest_mode)
>                 ...
>
> which would be quite messy.  Maybe vcpu_role if cpu_role isn't intuitive?

I agree it's a little odd. But actually it's somewhat intuitive (the
guest is in guest-mode, i.e. we're running a nested guest).

Ok I'm stretching a little bit :). But if the trade-off is just
"guest_role.guest_mode" requires a clarifying comment, but the rest of
the code gets more readable (cpu_role is used a lot more than
role.guest_mode), it still might be worth it.

Sean Christopherson Feb. 9, 2022, 10:31 p.m. UTC | #4

On Fri, Feb 04, 2022, Paolo Bonzini wrote:
> Paolo Bonzini (23):
>   KVM: MMU: pass uses_nx directly to reset_shadow_zero_bits_mask
>   KVM: MMU: nested EPT cannot be used in SMM
>   KVM: MMU: remove valid from extended role
>   KVM: MMU: constify uses of struct kvm_mmu_role_regs
>   KVM: MMU: pull computation of kvm_mmu_role_regs to kvm_init_mmu
>   KVM: MMU: load new PGD once nested two-dimensional paging is
>     initialized
>   KVM: MMU: remove kvm_mmu_calc_root_page_role
>   KVM: MMU: rephrase unclear comment
>   KVM: MMU: remove "bool base_only" arguments
>   KVM: MMU: split cpu_role from mmu_role
>   KVM: MMU: do not recompute root level from kvm_mmu_role_regs
>   KVM: MMU: remove ept_ad field
>   KVM: MMU: remove kvm_calc_shadow_root_page_role_common
>   KVM: MMU: cleanup computation of MMU roles for two-dimensional paging
>   KVM: MMU: cleanup computation of MMU roles for shadow paging
>   KVM: MMU: remove extended bits from mmu_role
>   KVM: MMU: remove redundant bits from extended role
>   KVM: MMU: fetch shadow EFER.NX from MMU role
>   KVM: MMU: simplify and/or inline computation of shadow MMU roles
>   KVM: MMU: pull CPU role computation to kvm_init_mmu
>   KVM: MMU: store shadow_root_level into mmu_role
>   KVM: MMU: use cpu_role for root_level
>   KVM: MMU: replace direct_map with mmu_role.direct

Heresy!  Everyone knows the one true way is "KVM: x86/mmu:"

  $ glo | grep "KVM: MMU:" | wc -l
  740
  $ glo | grep "KVM: x86/mmu:" | wc -l
  403

Dammit, I'm the heathen...

I do think we should use x86/mmu though.  VMX and SVM (and nVMX and nSVM) are ok
because they're unlikely to collide with other architectures, but every arch has
an MMU...

Sean Christopherson Feb. 10, 2022, 1:11 a.m. UTC | #5

On Mon, Feb 07, 2022, David Matlack wrote:
> On Mon, Feb 7, 2022 at 3:27 PM Sean Christopherson <seanjc@google.com> wrote:
> > > What do you think about calling this the guest_role instead of cpu_role?
> > > There is a bit of a precedent for using "guest" instead of "cpu" already
> > > for this type of concept (e.g. guest_walker), and I find it more
> > > intuitive.
> >
> > Haven't looked at the series yet, but I'd prefer not to use guest_role, it's
> > too similar to is_guest_mode() and kvm_mmu_role.guest_mode.  E.g. we'd end up with
> >
> >   static union kvm_mmu_role kvm_calc_guest_role(struct kvm_vcpu *vcpu,
> >                                               const struct kvm_mmu_role_regs *regs)
> >   {
> >         union kvm_mmu_role role = {0};
> >
> >         role.base.access = ACC_ALL;
> >         role.base.smm = is_smm(vcpu);
> >         role.base.guest_mode = is_guest_mode(vcpu);
> >         role.base.direct = !____is_cr0_pg(regs);
> >
> >         ...
> >   }
> >
> > and possibly
> >
> >         if (guest_role.guest_mode)
> >                 ...
> >
> > which would be quite messy.  Maybe vcpu_role if cpu_role isn't intuitive?
> 
> I agree it's a little odd. But actually it's somewhat intuitive (the
> guest is in guest-mode, i.e. we're running a nested guest).
> 
> Ok I'm stretching a little bit :). But if the trade-off is just
> "guest_role.guest_mode" requires a clarifying comment, but the rest of
> the code gets more readable (cpu_role is used a lot more than
> role.guest_mode), it still might be worth it.

It's not just guest_mode, we also have guest_mmu, e.g. we'd end up with

	vcpu->arch.root_mmu.guest_role.base.level
	vcpu->arch.guest_mmu.guest_role.base.level
	vcpu->arch.nested_mmu.guest_role.base.level

In a vacuum, I 100% agree that guest_role is better than cpu_role or vcpu_role,
but the term "guest" has already been claimed for "L2" in far too many places.

While we're behind the bikeshed... the resulting:

	union kvm_mmu_role cpu_role;
	union kvm_mmu_page_role mmu_role;

is a mess.  Again, I really like "mmu_role" in a vacuum, but juxtaposed with
	
	union kvm_mmu_role cpu_role;

it's super confusing, e.g. I expected

	union kvm_mmu_role mmu_role;

Nested EPT is a good example of complete confusion, because we compute kvm_mmu_role,
compare it to cpu_role, then shove it into both cpu_role and mmu_ole.  It makes
sense once you reason about what it's doing, but on the surface it's confusing.

	struct kvm_mmu *context = &vcpu->arch.guest_mmu;
	u8 level = vmx_eptp_page_walk_level(new_eptp);
	union kvm_mmu_role new_role =
		kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
						   execonly, level);

	if (new_role.as_u64 != context->cpu_role.as_u64) {
		/* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
		context->cpu_role.as_u64 = new_role.as_u64;
		context->mmu_role.word = new_role.base.word;

Mabye this?

	union kvm_mmu_vcpu_role vcpu_role;
	union kvm_mmu_page_role mmu_role;

and some sample usage?

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d25f8cb2e62b..9f9b97c88738 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4836,13 +4836,16 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 {
        struct kvm_mmu *context = &vcpu->arch.guest_mmu;
        u8 level = vmx_eptp_page_walk_level(new_eptp);
-       union kvm_mmu_role new_role =
+       union kvm_mmu_vcpu_role new_role =
                kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
                                                   execonly, level);

-       if (new_role.as_u64 != context->cpu_role.as_u64) {
-               /* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
-               context->cpu_role.as_u64 = new_role.as_u64;
+       if (new_role.as_u64 != context->vcpu_role.as_u64) {
+               /*
+                * EPT, and thus nested EPT, does not consume CR0, CR4, nor
+                * EFER, so the mmu_role is a strict subset of the vcpu_role.
+               */
+               context->vcpu_role.as_u64 = new_role.as_u64;
                context->mmu_role.word = new_role.base.word;

                context->page_fault = ept_page_fault;



And while I'm on a soapbox....  am I the only one that absolutely detests the use
of "context" and "g_context"?  I'd be all in favor of renaming those to "mmu"
throughout the code as a prep to this series.

I also think we should move the initializing of guest_mmu => mmu into the MMU
helpers.  Pulling the mmu from guest_mmu but then relying on the caller to wire
up guest_mmu => mmu so that e.g. kvm_mmu_new_pgd() works is gross and confused
the heck out of me.  E.g.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d25f8cb2e62b..4e7fe9758ce8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4794,7 +4794,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
 void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
                             unsigned long cr4, u64 efer, gpa_t nested_cr3)
 {
-       struct kvm_mmu *context = &vcpu->arch.guest_mmu;
+       struct kvm_mmu *mmu = &vcpu->arch.guest_mmu;
        struct kvm_mmu_role_regs regs = {
                .cr0 = cr0,
                .cr4 = cr4 & ~X86_CR4_PKE,
@@ -4806,6 +4806,8 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
        mmu_role = cpu_role.base;
        mmu_role.level = kvm_mmu_get_tdp_level(vcpu);

+       vcpu->arch.mmu = &vcpu->arch.guest_mmu;
+
        shadow_mmu_init_context(vcpu, context, cpu_role, mmu_role);
        kvm_mmu_new_pgd(vcpu, nested_cr3);
 }
@@ -4834,12 +4836,14 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
                             int huge_page_level, bool accessed_dirty,
                             gpa_t new_eptp)
 {
-       struct kvm_mmu *context = &vcpu->arch.guest_mmu;
+       struct kvm_mmu *mmu = &vcpu->arch.guest_mmu;
        u8 level = vmx_eptp_page_walk_level(new_eptp);
        union kvm_mmu_role new_role =
                kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
                                                   execonly, level);

+       vcpu->arch.mmu = mmu;
+
        if (new_role.as_u64 != context->cpu_role.as_u64) {
                /* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
                context->cpu_role.as_u64 = new_role.as_u64;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1218b5a342fc..d0f8eddb32be 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -98,8 +98,6 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)

        WARN_ON(mmu_is_nested(vcpu));

-       vcpu->arch.mmu = &vcpu->arch.guest_mmu;
-
        /*
         * The NPT format depends on L1's CR4 and EFER, which is in vmcb01.  Note,
         * when called via KVM_SET_NESTED_STATE, that state may _not_ match current

Paolo Bonzini Feb. 10, 2022, 9:54 a.m. UTC | #6

On 2/9/22 23:31, Sean Christopherson wrote:
> Heresy!  Everyone knows the one true way is "KVM: x86/mmu:"
> 
>    $ glo | grep "KVM: MMU:" | wc -l
>    740
>    $ glo | grep "KVM: x86/mmu:" | wc -l
>    403
> 
> Dammit, I'm the heathen...
> 
> I do think we should use x86/mmu though.  VMX and SVM (and nVMX and nSVM) are ok
> because they're unlikely to collide with other architectures, but every arch has
> an MMU...
> 

Sure, I can adjust my habits.

Paolo

Paolo Bonzini Feb. 10, 2022, 11:58 a.m. UTC | #7

On 2/10/22 02:11, Sean Christopherson wrote:
> In a vacuum, I 100% agree that guest_role is better than cpu_role or vcpu_role,
> but the term "guest" has already been claimed for "L2" in far too many places.
> 
> While we're behind the bikeshed... the resulting:
> 
> 	union kvm_mmu_role cpu_role;
> 	union kvm_mmu_page_role mmu_role;
> 
> is a mess.  Again, I really like "mmu_role" in a vacuum, but juxtaposed with
> 	
> 	union kvm_mmu_role cpu_role;
> 
> it's super confusing, e.g. I expected
> 
> 	union kvm_mmu_role mmu_role;

What about

	union kvm_mmu_page_role root_role;
	union kvm_mmu_paging_mode cpu_mode;

?  I already have to remove ".base" from all accesses to mmu_role, so 
it's not much extra churn.

Paolo

Sean Christopherson Feb. 10, 2022, 4:55 p.m. UTC | #8

On Thu, Feb 10, 2022, Paolo Bonzini wrote:
> On 2/10/22 02:11, Sean Christopherson wrote:
> > In a vacuum, I 100% agree that guest_role is better than cpu_role or vcpu_role,
> > but the term "guest" has already been claimed for "L2" in far too many places.
> > 
> > While we're behind the bikeshed... the resulting:
> > 
> > 	union kvm_mmu_role cpu_role;
> > 	union kvm_mmu_page_role mmu_role;
> > 
> > is a mess.  Again, I really like "mmu_role" in a vacuum, but juxtaposed with
> > 	
> > 	union kvm_mmu_role cpu_role;
> > 
> > it's super confusing, e.g. I expected
> > 
> > 	union kvm_mmu_role mmu_role;
> 
> What about
> 
> 	union kvm_mmu_page_role root_role;
> 	union kvm_mmu_paging_mode cpu_mode;
> 
> ?  I already have to remove ".base" from all accesses to mmu_role, so it's
> not much extra churn.

I'd prefer to not use "paging mode", the SDM uses that terminology to refer to
the four paging modes.  My expectation given the name is that the union would
track only CR0.PG, EFER.LME, CR4.PAE, and CR4.PSE[*].

I'm out of ideas at the moment, I'll keep chewing on this while reviewing...

[*] Someone at Intel rewrote the SDM and eliminated Mode B, a.k.a. PSE 36-bit
physical paging, it's now just part of "32-bit paging".  But 5-level paging is
considered it's own paging mode?!?!  Lame.  I guess they really want to have
exactly four paging modes...

Paolo Bonzini Feb. 10, 2022, 5:30 p.m. UTC | #9

On 2/10/22 17:55, Sean Christopherson wrote:
>> 	union kvm_mmu_page_role root_role;
>> 	union kvm_mmu_paging_mode cpu_mode;
>
> I'd prefer to not use "paging mode", the SDM uses that terminology to refer to
> the four paging modes.  My expectation given the name is that the union would
> track only CR0.PG, EFER.LME, CR4.PAE, and CR4.PSE[*].

Yeah, I had started with kvm_mmu_paging_flags, but cpu_flags was an even 
worse method than kvm_mmu_paging_mode.

Anyway, now that I have done _some_ replacement, it's a matter of sed -i 
on the patch files once you or someone else come up with a good moniker.

I take it that "root_role" passed your filter successfully.

Paolo

> I'm out of ideas at the moment, I'll keep chewing on this while reviewing...
> 
> [*] Someone at Intel rewrote the SDM and eliminated Mode B, a.k.a. PSE 36-bit
> physical paging, it's now just part of "32-bit paging".  But 5-level paging is
> considered it's own paging mode?!?!  Lame.  I guess they really want to have
> exactly four paging modes...

Sean Christopherson Feb. 10, 2022, 7:28 p.m. UTC | #10

On Thu, Feb 10, 2022, Paolo Bonzini wrote:
> On 2/10/22 17:55, Sean Christopherson wrote:
> > > 	union kvm_mmu_page_role root_role;
> > > 	union kvm_mmu_paging_mode cpu_mode;
> > 
> > I'd prefer to not use "paging mode", the SDM uses that terminology to refer to
> > the four paging modes.  My expectation given the name is that the union would
> > track only CR0.PG, EFER.LME, CR4.PAE, and CR4.PSE[*].
> 
> Yeah, I had started with kvm_mmu_paging_flags, but cpu_flags was an even
> worse method than kvm_mmu_paging_mode.

We could always do s/is_guest_mode/is_nested_mode or something to that effect.
It would take some retraining, but I feel like we've been fighting the whole
"guest mode" thing over and over.

> Anyway, now that I have done _some_ replacement, it's a matter of sed -i on
> the patch files once you or someone else come up with a good moniker.
> 
> I take it that "root_role" passed your filter successfully.

Yep, works for me.  I almost suggested it, too, but decided I liked mmu_role
marginally better.  I like root_role because it ties in with root_hpa and root_pga.

David Matlack Feb. 14, 2022, 6:14 p.m. UTC | #11

On Wed, Feb 9, 2022 at 2:31 PM Sean Christopherson <seanjc@google.com> wrote:
> On Fri, Feb 04, 2022, Paolo Bonzini wrote:
> >   KVM: MMU: replace direct_map with mmu_role.direct
>
> Heresy!  Everyone knows the one true way is "KVM: x86/mmu:"
>
>   $ glo | grep "KVM: MMU:" | wc -l
>   740
>   $ glo | grep "KVM: x86/mmu:" | wc -l
>   403
>
> Dammit, I'm the heathen...
>
> I do think we should use x86/mmu though.  VMX and SVM (and nVMX and nSVM) are ok
> because they're unlikely to collide with other architectures, but every arch has
> an MMU...

Can you document these rules/preferences somewhere? Even better if we
can enforce them with checkpatch :)

[00/23] KVM: MMU: MMU role refactoring

Message

Comments