diff mbox series

[03/54] KVM: x86: Properly reset MMU context at vCPU RESET/INIT

Message ID 20210622175739.3610207-4-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Bug fixes and summer cleaning | expand

Commit Message

Sean Christopherson June 22, 2021, 5:56 p.m. UTC
Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT.  Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.

This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.

In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c468ed ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.

In SVM, commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.

The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.

Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Paolo Bonzini June 23, 2021, 1:59 p.m. UTC | #1
On 22/06/21 19:56, Sean Christopherson wrote:
> +	/*
> +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> +	 * checked because it is unconditionally cleared on INIT and all other
> +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> +	 */
> +	if (old_cr0 & X86_CR0_PG)
> +		kvm_mmu_reset_context(vcpu);

Why not just check "if (init_event)", with a simple comment like

	/*
	 * Reset the MMU context in case paging was enabled prior to INIT (CR0
	 * will be '0' prior to RESET).
	 */

?

Paolo
Paolo Bonzini June 23, 2021, 2:01 p.m. UTC | #2
On 22/06/21 19:56, Sean Christopherson wrote:
> +	/*
> +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> +	 * checked because it is unconditionally cleared on INIT and all other
> +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> +	 */
> +	if (old_cr0 & X86_CR0_PG)
> +		kvm_mmu_reset_context(vcpu);

Hmm, I'll answer myself, is it because of the plan to add a vCPU reset 
ioctl?

Paolo
Sean Christopherson June 23, 2021, 2:50 p.m. UTC | #3
On Wed, Jun 23, 2021, Paolo Bonzini wrote:
> On 22/06/21 19:56, Sean Christopherson wrote:
> > +	/*
> > +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> > +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> > +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> > +	 * checked because it is unconditionally cleared on INIT and all other
> > +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> > +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> > +	 */
> > +	if (old_cr0 & X86_CR0_PG)
> > +		kvm_mmu_reset_context(vcpu);
> 
> Hmm, I'll answer myself, is it because of the plan to add a vCPU reset
> ioctl?

Heh, no, I'm not thinking that far ahead at the moment.

Using "if (init_event)" also resets the MMU when paging was disabled prior to
INIT, which is unnecessary.  "if (init_event && (old_cr0 & X86_CR0_PG))" would
obviously work, but I guess I was feeling clever.

As for why I don't want to unnecessarily reset the MMU, my preference for the MMU
role/context logic is to be as precise as possible to help document "why".  Doing
a MMU reset on any INIT obviously won't break anything, but it doesn't highlight
that the true motivation is CR0.PG being cleared, not simply that INIT occurred.
I.e. the MMU context is a KVM construct, there is no architectural model that
we're trying to follow.
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76dae88cf524..42608b515ce4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10735,6 +10735,8 @@  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
+	unsigned long old_cr0 = kvm_read_cr0(vcpu);
+
 	kvm_lapic_reset(vcpu, init_event);
 
 	vcpu->arch.hflags = 0;
@@ -10803,6 +10805,17 @@  void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.ia32_xss = 0;
 
 	static_call(kvm_x86_vcpu_reset)(vcpu, init_event);
+
+	/*
+	 * Reset the MMU context if paging was enabled prior to INIT (which is
+	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
+	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
+	 * checked because it is unconditionally cleared on INIT and all other
+	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
+	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
+	 */
+	if (old_cr0 & X86_CR0_PG)
+		kvm_mmu_reset_context(vcpu);
 }
 
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)