Message ID | 20210424004645.3950558-36-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: vCPU RESET/INIT fixes and consolidation | expand |
> --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1204,12 +1204,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu) > init_sys_seg(&save->ldtr, SEG_TYPE_LDT); > init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16); > > - svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET); > - svm_set_cr4(vcpu, 0); > - svm_set_efer(vcpu, 0); > - kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); > - vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0; Reviewed-by: Reiji Watanabe <reijiw@google.com> Those your vCPU RESET/INIT changes look great. I think the change in init_vmcb() basically assumes that the function is called from kvm_vcpu_reset(via svm_vcpu_reset()). Although shutdown_interception() directly calls init_mcb(), I would think the change doesn't matter for the shutdown interception case. IMHO it would be a bit misleading that a function named 'init_vmcb', which is called from other than kvm_vcpu_reset (svm_vcpu_reset()), only partially resets the vmcb (probably just to me though). So, I personally think it would be better if its name or comment can give some more specific information about the assumption. BTW, it looks like two lines of "vcpu->arch.hflags = 0;" can be also removed from the init_vmcb() as well. Thanks, Reiji
On Mon, May 17, 2021, Reiji Watanabe wrote: > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -1204,12 +1204,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu) > > init_sys_seg(&save->ldtr, SEG_TYPE_LDT); > > init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16); > > > > - svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET); > > - svm_set_cr4(vcpu, 0); > > - svm_set_efer(vcpu, 0); > > - kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); > > - vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0; > > Reviewed-by: Reiji Watanabe <reijiw@google.com> > > Those your vCPU RESET/INIT changes look great. > > I think the change in init_vmcb() basically assumes that the > function is called from kvm_vcpu_reset(via svm_vcpu_reset()). > Although shutdown_interception() directly calls init_mcb(), > I would think the change doesn't matter for the shutdown > interception case. > > IMHO it would be a bit misleading that a function named 'init_vmcb', > which is called from other than kvm_vcpu_reset (svm_vcpu_reset()), > only partially resets the vmcb (probably just to me though). It's not just you, that code is funky. If I could go back in time, I would lobby to not automatically init the VMCB and instead put the vCPU into KVM_MP_STATE_UNINITIALIZED and force userspace to explicitly INIT or RESET the vCPU. Even better would be to add KVM_MP_STATE_SHUTDOWN, since technically NMI can break shutdown (and SMI on Intel CPUs). Anyways, that ship has sailed, but we might be able to get away with replacing init_vmcb() with kvm_vcpu_reset(vcpu, true), i.e. effecting a full INIT. That would ensure the VMCB is consistent with KVM's software model, which I'm guessing is not true with the direct init_vmcb() call. It would also have some connection to reality since there exist bare metal platforms that automatically INIT the CPU if it hits shutdown (maybe only for the BSP?). Side topic, the NMI thing got me looking through init_vmcb() to see how it handles the IDT and GDT, and surprise, surprise, it fails to zero IDTR.base and GDTR.base. I'll add a patch to fix that, and maybe try to consolidate the VMX and SVM segmentation logic. > So, I personally think it would be better if its name or comment > can give some more specific information about the assumption. > > BTW, it looks like two lines of "vcpu->arch.hflags = 0;" > can be also removed from the init_vmcb() as well. Ya, I'll add that to the pile. Thanks!
On Tue, May 18, 2021 at 2:45 PM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, May 17, 2021, Reiji Watanabe wrote: > > > --- a/arch/x86/kvm/svm/svm.c > > > +++ b/arch/x86/kvm/svm/svm.c > > > @@ -1204,12 +1204,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu) > > > init_sys_seg(&save->ldtr, SEG_TYPE_LDT); > > > init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16); > > > > > > - svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET); > > > - svm_set_cr4(vcpu, 0); > > > - svm_set_efer(vcpu, 0); > > > - kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); > > > - vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0; > > > > Reviewed-by: Reiji Watanabe <reijiw@google.com> > > > > Those your vCPU RESET/INIT changes look great. > > > > I think the change in init_vmcb() basically assumes that the > > function is called from kvm_vcpu_reset(via svm_vcpu_reset()). > > Although shutdown_interception() directly calls init_mcb(), > > I would think the change doesn't matter for the shutdown > > interception case. > > > > IMHO it would be a bit misleading that a function named 'init_vmcb', > > which is called from other than kvm_vcpu_reset (svm_vcpu_reset()), > > only partially resets the vmcb (probably just to me though). > > It's not just you, that code is funky. If I could go back in time, I would lobby > to not automatically init the VMCB and instead put the vCPU into > KVM_MP_STATE_UNINITIALIZED and force userspace to explicitly INIT or RESET the > vCPU. Even better would be to add KVM_MP_STATE_SHUTDOWN, since technically NMI > can break shutdown (and SMI on Intel CPUs). I see. Adding KVM_MP_STATE_SHUTDOWN sounds right to me given the real CPU's behavior. > Anyways, that ship has sailed, but we might be able to get away with replacing > init_vmcb() with kvm_vcpu_reset(vcpu, true), i.e. effecting a full INIT. That > would ensure the VMCB is consistent with KVM's software model, which I'm guessing > is not true with the direct init_vmcb() call. It would also have some connection > to reality since there exist bare metal platforms that automatically INIT the CPU > if it hits shutdown (maybe only for the BSP?). Yes, calling kvm_vcpu_reset(vcpu, true) sounds better than the direct init_vmcb() call. > Side topic, the NMI thing got me looking through init_vmcb() to see how it > handles the IDT and GDT, and surprise, surprise, it fails to zero IDTR.base and > GDTR.base. I'll add a patch to fix that, and maybe try to consolidate the VMX > and SVM segmentation logic. That's surprising... It seems init_vmcb() was used only for RESET when the function was originally introduced and the entire vmcb was zero-cleared before init_vmcb() was called. So, I would suspect init_vmcb() was originally implemented assuming that all the vmcb fields were zero. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7 Thanks, Reiji
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 996a6b03e338..23f880268ff5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1204,12 +1204,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu) init_sys_seg(&save->ldtr, SEG_TYPE_LDT); init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16); - svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET); - svm_set_cr4(vcpu, 0); - svm_set_efer(vcpu, 0); - kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); - vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0; - if (npt_enabled) { /* Setup VMCB for Nested Paging */ control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 8c982e049cbb..d8afca144e11 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4557,9 +4557,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmcs_write64(GUEST_IA32_DEBUGCTL, 0); } - kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); - kvm_rip_write(vcpu, 0xfff0); - vmcs_writel(GUEST_GDTR_BASE, 0); vmcs_write32(GUEST_GDTR_LIMIT, 0xffff); @@ -4587,12 +4584,6 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); - vmx_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET); - vmx_set_cr4(vcpu, 0); - vmx_set_efer(vcpu, 0); - - vmx_update_exception_bitmap(vcpu); - vpid_sync_context(vmx->vpid); if (init_event) vmx_clear_hlt(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 167c650d1187..97d8e3e74bab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10499,6 +10499,14 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) static_call(kvm_x86_vcpu_reset)(vcpu, init_event); + kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); + kvm_rip_write(vcpu, 0xfff0); + + static_call(kvm_x86_set_cr0)(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET); + static_call(kvm_x86_set_cr4)(vcpu, 0); + static_call(kvm_x86_set_efer)(vcpu, 0); + static_call(kvm_x86_update_exception_bitmap)(vcpu); + if (kvm_cr0_mmu_role_changed(old_cr0, kvm_read_cr0(vcpu)) || kvm_cr4_mmu_role_changed(old_cr4, kvm_read_cr4(vcpu))) kvm_mmu_reset_context(vcpu);
Move the setting of CR0, CR4, EFER, RFLAGS, and RIP from vendor code to common x86. VMX and SVM now have near-identical sequences, the only difference between that VMX updates the exception bitmap. Updating the bitmap on SVM is unnecessary, but benign. Unfortunately it can't be left behind in VMX due to the need to update exception intercepts after the control registers are set. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/svm/svm.c | 6 ------ arch/x86/kvm/vmx/vmx.c | 9 --------- arch/x86/kvm/x86.c | 8 ++++++++ 3 files changed, 8 insertions(+), 15 deletions(-)