diff mbox series

[29/43] KVM: SVM: Tweak order of cr0/cr4/efer writes at RESET/INIT

Message ID 20210424004645.3950558-30-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: vCPU RESET/INIT fixes and consolidation | expand

Commit Message

Sean Christopherson April 24, 2021, 12:46 a.m. UTC
Hoist svm_set_cr0() up in the sequence of register initialization during
vCPU RESET/INIT, purely to match VMX so that a future patch can move the
sequences to common x86.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Reiji Watanabe May 19, 2021, 6:16 p.m. UTC | #1
> @@ -1204,18 +1204,13 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>         init_sys_seg(&save->ldtr, SEG_TYPE_LDT);
>         init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16);
>
> +       svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
>         svm_set_cr4(vcpu, 0);
>         svm_set_efer(vcpu, 0);
>         save->dr6 = 0xffff0ff0;
>         kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
>         vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0;
>
> -       /*
> -        * svm_set_cr0() sets PG and WP and clears NW and CD on save->cr0.
> -        * It also updates the guest-visible cr0 value.
> -        */
> -       svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);

AMD's APM Vol2 (Table 14-1 in Revision 3.37) says CR0 After INIT will be:

   CD and NW are unchanged
   Bit 4 (reserved) = 1
   All others = 0

(CR0 will be 0x60000010 after RESET)

So, it looks the CR0 value that init_vmcb() sets could be
different from what is indicated in the APM for INIT.

BTW, Intel's SDM (April 2021 version) says CR0 for Power up/Reset/INIT
will be 0x60000010 with the following note.
-------------------------------------------------
The CD and NW flags are unchanged,
bit 4 is set to 1, all other bits are cleared.
-------------------------------------------------
The note is attached as '2' to all Power up/Reset/INIT cases
looking at the SDM.  I would guess it is erroneous that
the note is attached to Power up/Reset though.

Thanks,
Reiji
Sean Christopherson May 19, 2021, 7:58 p.m. UTC | #2
On Wed, May 19, 2021, Reiji Watanabe wrote:
> > @@ -1204,18 +1204,13 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> >         init_sys_seg(&save->ldtr, SEG_TYPE_LDT);
> >         init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16);
> >
> > +       svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
> >         svm_set_cr4(vcpu, 0);
> >         svm_set_efer(vcpu, 0);
> >         save->dr6 = 0xffff0ff0;
> >         kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
> >         vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0;
> >
> > -       /*
> > -        * svm_set_cr0() sets PG and WP and clears NW and CD on save->cr0.
> > -        * It also updates the guest-visible cr0 value.
> > -        */
> > -       svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
> 
> AMD's APM Vol2 (Table 14-1 in Revision 3.37) says CR0 After INIT will be:
> 
>    CD and NW are unchanged
>    Bit 4 (reserved) = 1
>    All others = 0
> 
> (CR0 will be 0x60000010 after RESET)
> 
> So, it looks the CR0 value that init_vmcb() sets could be
> different from what is indicated in the APM for INIT.
> 
> BTW, Intel's SDM (April 2021 version) says CR0 for Power up/Reset/INIT
> will be 0x60000010 with the following note.
> -------------------------------------------------
> The CD and NW flags are unchanged,
> bit 4 is set to 1, all other bits are cleared.
> -------------------------------------------------
> The note is attached as '2' to all Power up/Reset/INIT cases
> looking at the SDM.  I would guess it is erroneous that
> the note is attached to Power up/Reset though.

Agreed.  I'll double check that CD and NW are preserved by hardware on INIT,
and will also ping Intel folks to fix the POWER-UP and RESET footnote.

Hah!  Reading through that section yet again, there's another SDM bug.  It
contradicts itself with respect to the TLBs after INIT.

  9.1 INITIALIZATION OVERVIEW: 
    The major difference is that during an INIT, the internal caches, MSRs,
    MTRRs, and x87 FPU state are left unchanged (although, the TLBs and BTB
    are invalidated as with a hardware reset)

while Table 9-1 says:

  Register                    Power up    Reset      INIT
  Data and Code Cache, TLBs:  Invalid[6]  Invalid[6] Unchanged

I'm pretty sure that Intel CPUs are supposed to flush the TLB, i.e. Tabel 9-1 is
wrong.  Back in my Intel validation days, I remember being involved in a Core2
bug that manifested as a triple fault after INIT due to global TLB entries not
being flushed.  Looks like that wasn't fixed:

https://www.intel.com/content/dam/support/us/en/documents/processors/mobile/celeron/sb/320121.pdf

  AZ28. INIT Does Not Clear Global Entries in the TLB
  Problem: INIT may not flush a TLB entry when:
    • The processor is in protected mode with paging enabled and the page global enable
      flag is set (PGE bit of CR4 register)
    • G bit for the page table entry is set
    • TLB entry is present in TLB when INIT occurs
    • Software may encounter unexpected page fault or incorrect address translation due
      to a TLB entry erroneously left in TLB after INIT.

  Workaround: Write to CR3, CR4 (setting bits PSE, PGE or PAE) or CR0 (setting
              bits PG or PE) registers before writing to memory early in BIOS
	      code to clear all the global entries from TLB.
	      
  Status: For the steppings affected, see the Summary Tables of Changes.

AMD's APM also appears to contradict itself, though that depends on one's
interpretation of "external intialization".  Like the SDM, its table states that
the TLBs are not flushed on INIT:

  Table 14-1. Initial Processor State

  Processor Resource         Value after RESET      Value after INIT
  Instruction and Data TLBs  Invalidated            Unchanged

but a blurb later on says:

  5.5.3 TLB Management

  Implicit Invalidations. The following operations cause the entire TLB to be
  invalidated, including global pages:

    • External initialization of the processor.


All in all, that means KVM also has a bug in the form of a missing guest TLB
flush on INIT, at least for VMX and probably for SVM.  I'll add a patch to flush
the guest TLBs on INIT irrespective of vendor.  Even if AMD CPUs don't flush the
TLB, I see no reason to bank on all guests being paranoid enough to flush the
TLB immediately after INIT.
Reiji Watanabe May 23, 2021, 11:04 p.m. UTC | #3
> > AMD's APM Vol2 (Table 14-1 in Revision 3.37) says CR0 After INIT will be:
> >
> >    CD and NW are unchanged
> >    Bit 4 (reserved) = 1
> >    All others = 0
> >
> > (CR0 will be 0x60000010 after RESET)
> >
> > So, it looks the CR0 value that init_vmcb() sets could be
> > different from what is indicated in the APM for INIT.
> >
> > BTW, Intel's SDM (April 2021 version) says CR0 for Power up/Reset/INIT
> > will be 0x60000010 with the following note.
> > -------------------------------------------------
> > The CD and NW flags are unchanged,
> > bit 4 is set to 1, all other bits are cleared.
> > -------------------------------------------------
> > The note is attached as '2' to all Power up/Reset/INIT cases
> > looking at the SDM.  I would guess it is erroneous that
> > the note is attached to Power up/Reset though.
>
> Agreed.  I'll double check that CD and NW are preserved by hardware on INIT,
> and will also ping Intel folks to fix the POWER-UP and RESET footnote.
>
> Hah!  Reading through that section yet again, there's another SDM bug.  It
> contradicts itself with respect to the TLBs after INIT.
>
>   9.1 INITIALIZATION OVERVIEW:
>     The major difference is that during an INIT, the internal caches, MSRs,
>     MTRRs, and x87 FPU state are left unchanged (although, the TLBs and BTB
>     are invalidated as with a hardware reset)
>
> while Table 9-1 says:
>
>   Register                    Power up    Reset      INIT
>   Data and Code Cache, TLBs:  Invalid[6]  Invalid[6] Unchanged
>
> I'm pretty sure that Intel CPUs are supposed to flush the TLB, i.e. Tabel 9-1 is
> wrong.  Back in my Intel validation days, I remember being involved in a Core2
> bug that manifested as a triple fault after INIT due to global TLB entries not
> being flushed.  Looks like that wasn't fixed:
>
> https://www.intel.com/content/dam/support/us/en/documents/processors/mobile/celeron/sb/320121.pdf
>
>   AZ28. INIT Does Not Clear Global Entries in the TLB
>   Problem: INIT may not flush a TLB entry when:
>     • The processor is in protected mode with paging enabled and the page global enable
>       flag is set (PGE bit of CR4 register)
>     • G bit for the page table entry is set
>     • TLB entry is present in TLB when INIT occurs
>     • Software may encounter unexpected page fault or incorrect address translation due
>       to a TLB entry erroneously left in TLB after INIT.
>
>   Workaround: Write to CR3, CR4 (setting bits PSE, PGE or PAE) or CR0 (setting
>               bits PG or PE) registers before writing to memory early in BIOS
>               code to clear all the global entries from TLB.
>
>   Status: For the steppings affected, see the Summary Tables of Changes.
>
> AMD's APM also appears to contradict itself, though that depends on one's
> interpretation of "external intialization".  Like the SDM, its table states that
> the TLBs are not flushed on INIT:
>
>   Table 14-1. Initial Processor State
>
>   Processor Resource         Value after RESET      Value after INIT
>   Instruction and Data TLBs  Invalidated            Unchanged
>
> but a blurb later on says:
>
>   5.5.3 TLB Management
>
>   Implicit Invalidations. The following operations cause the entire TLB to be
>   invalidated, including global pages:
>
>     • External initialization of the processor.

"Table 8-9. Simultaneous Interrupt Priorities" of AMD's APM has
the words "External Processor Initialization (INIT)", which make
me guess "the External initialization of the processor" in 5.5.3
TLB Management means INIT.


> All in all, that means KVM also has a bug in the form of a missing guest TLB
> flush on INIT, at least for VMX and probably for SVM.  I'll add a patch to flush
> the guest TLBs on INIT irrespective of vendor.  Even if AMD CPUs don't flush the
> TLB, I see no reason to bank on all guests being paranoid enough to flush the
> TLB immediately after INIT.

Yes, I agree that would be better.
Thank you so much for all the helpful information !

Regards,
Reiji
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4ea100c08cb3..88d34fa93d8b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1204,18 +1204,13 @@  static void init_vmcb(struct kvm_vcpu *vcpu)
 	init_sys_seg(&save->ldtr, SEG_TYPE_LDT);
 	init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16);
 
+	svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
 	svm_set_cr4(vcpu, 0);
 	svm_set_efer(vcpu, 0);
 	save->dr6 = 0xffff0ff0;
 	kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
 	vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0;
 
-	/*
-	 * svm_set_cr0() sets PG and WP and clears NW and CD on save->cr0.
-	 * It also updates the guest-visible cr0 value.
-	 */
-	svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
-
 	save->cr4 = X86_CR4_PAE;
 
 	if (npt_enabled) {