diff mbox series

[05/54] Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack"

Message ID 20210622175739.3610207-6-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Bug fixes and summer cleaning | expand

Commit Message

Sean Christopherson June 22, 2021, 5:56 p.m. UTC
Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
virtualization.  When KVM (L0) is using TDP, CR4.LA57 is not reflected in
mmu_role.base.level because that tracks the shadow root level, i.e. TDP
level.  Normally, this is not an issue because LA57 can't be toggled
while long mode is active, i.e. the guest has to first disable paging,
then toggle LA57, then re-enable paging, thus ensuring an MMU
reinitialization.

But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
without having to bounce through an unpaged section.  L1 can also load a
new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
single entry PML5 is sufficient.  Such shenanigans are only problematic
if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.

Note, in the L2 case with nested TDP, even though L1 can switch between
L2s with different LA57 settings, thus bypassing the paging requirement,
in that case KVM's nested_mmu will track LA57 in base.level.

This reverts commit 8053f924cad30bf9f9a24e02b6c8ddfabf5202ea.

Fixes: 8053f924cad3 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu/mmu.c          | 1 +
 2 files changed, 2 insertions(+)

Comments

Yu Zhang June 25, 2021, 8:47 a.m. UTC | #1
On Tue, Jun 22, 2021 at 10:56:50AM -0700, Sean Christopherson wrote:
> Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
> virtualization.  When KVM (L0) is using TDP, CR4.LA57 is not reflected in
> mmu_role.base.level because that tracks the shadow root level, i.e. TDP
> level.  Normally, this is not an issue because LA57 can't be toggled
> while long mode is active, i.e. the guest has to first disable paging,
> then toggle LA57, then re-enable paging, thus ensuring an MMU
> reinitialization.
> 
> But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
> without having to bounce through an unpaged section.  L1 can also load a

May I ask how this is done by the guest? Thanks!

> new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
> single entry PML5 is sufficient.  Such shenanigans are only problematic
> if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
> reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.
> 
> Note, in the L2 case with nested TDP, even though L1 can switch between
> L2s with different LA57 settings, thus bypassing the paging requirement,
> in that case KVM's nested_mmu will track LA57 in base.level.
> 
> This reverts commit 8053f924cad30bf9f9a24e02b6c8ddfabf5202ea.
> 
> Fixes: 8053f924cad3 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/mmu/mmu.c          | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e11d64aa0bcd..916e0f89fdfc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -320,6 +320,7 @@ union kvm_mmu_extended_role {
>  		unsigned int cr4_pke:1;
>  		unsigned int cr4_smap:1;
>  		unsigned int cr4_smep:1;
> +		unsigned int cr4_la57:1;
>  		unsigned int maxphyaddr:6;
>  	};
>  };
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0db12f461c9d..5024318dec45 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4537,6 +4537,7 @@ static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
>  	ext.cr4_smap = !!kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>  	ext.cr4_pse = !!is_pse(vcpu);
>  	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
> +	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
>  	ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
>  
>  	ext.valid = 1;
> -- 
> 2.32.0.288.g62a8d224e6-goog
> 

B.R.
Yu
Paolo Bonzini June 25, 2021, 8:57 a.m. UTC | #2
On 25/06/21 10:47, Yu Zhang wrote:
>> But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
>> without having to bounce through an unpaged section.  L1 can also load a
>
> May I ask how this is done by the guest? Thanks!

It can set HOST_CR3 and HOST_CR4 to a value that is different from the 
one on vmentry.

Paolo
Yu Zhang June 25, 2021, 9:29 a.m. UTC | #3
On Fri, Jun 25, 2021 at 10:57:51AM +0200, Paolo Bonzini wrote:
> On 25/06/21 10:47, Yu Zhang wrote:
> > > But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
> > > without having to bounce through an unpaged section.  L1 can also load a
> > 
> > May I ask how this is done by the guest? Thanks!
> 
> It can set HOST_CR3 and HOST_CR4 to a value that is different from the one
> on vmentry.

Thanks, Paolo.

Do you mean the L1 can modify its paging mode by setting HOST_CR3 as root of
a PML5 table in VMCS12 and HOST_CR4 with LA57 flipped in VMCS12, causing the
GUEST_CR3/4 being changed in VMCS01, and eventually updating the CR3/4 when 
L0 is injecting a VM Exit from L2? 

B.R.
Yu

  

> Paolo
>
Paolo Bonzini June 25, 2021, 10:25 a.m. UTC | #4
On 25/06/21 11:29, Yu Zhang wrote:
> Thanks, Paolo.
> 
> Do you mean the L1 can modify its paging mode by setting HOST_CR3 as root of
> a PML5 table in VMCS12 and HOST_CR4 with LA57 flipped in VMCS12, causing the
> GUEST_CR3/4 being changed in VMCS01, and eventually updating the CR3/4 when
> L0 is injecting a VM Exit from L2?

Yes, you can even do that without a "full" vmentry by setting invalid 
guest state in vmcs12. :)

Paolo
Yu Zhang June 25, 2021, 11:23 a.m. UTC | #5
On Fri, Jun 25, 2021 at 12:25:46PM +0200, Paolo Bonzini wrote:
> On 25/06/21 11:29, Yu Zhang wrote:
> > Thanks, Paolo.
> > 
> > Do you mean the L1 can modify its paging mode by setting HOST_CR3 as root of
> > a PML5 table in VMCS12 and HOST_CR4 with LA57 flipped in VMCS12, causing the
> > GUEST_CR3/4 being changed in VMCS01, and eventually updating the CR3/4 when
> > L0 is injecting a VM Exit from L2?
> 
> Yes, you can even do that without a "full" vmentry by setting invalid guest
> state in vmcs12. :)

Hah.. Interesting. :) I think this is what load_vmcs12_host_state() does. Anyway,
thanks a lot for the explanation!

Also my reviewed-by for this one.

Rviewed-by: Yu Zhang <yu.c.zhang@linux.intel.com>

B.R.
Yu
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e11d64aa0bcd..916e0f89fdfc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -320,6 +320,7 @@  union kvm_mmu_extended_role {
 		unsigned int cr4_pke:1;
 		unsigned int cr4_smap:1;
 		unsigned int cr4_smep:1;
+		unsigned int cr4_la57:1;
 		unsigned int maxphyaddr:6;
 	};
 };
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0db12f461c9d..5024318dec45 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4537,6 +4537,7 @@  static union kvm_mmu_extended_role kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
 	ext.cr4_smap = !!kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
 	ext.cr4_pse = !!is_pse(vcpu);
 	ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
+	ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
 	ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
 
 	ext.valid = 1;