Message ID | 20190725104645.30642-2-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nVMX: guest reset fixes | expand |
Vitaly Kuznetsov <vkuznets@redhat.com> 于2019年7月25日周四 下午3:29写道: > > From: Paolo Bonzini <pbonzini@redhat.com> > > [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ] > > If a KVM guest is reset while running a nested guest, free_nested will > disable the shadow VMCS execution control in the vmcs01. However, > on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync > the VMCS12 to the shadow VMCS which has since been freed. > > This causes a vmptrld of a NULL pointer on my machime, but Jan reports > the host to hang altogether. Let's see how much this trivial patch fixes. > > Reported-by: Jan Kiszka <jan.kiszka@siemens.com> > Cc: Liran Alon <liran.alon@oracle.com> > Cc: stable@vger.kernel.org > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Hi all, Do we need to backport the fix also to stable 4.14? It applies cleanly and compiles fine. Regards, Jack Wang
On 29/07/19 10:58, Jack Wang wrote: > Vitaly Kuznetsov <vkuznets@redhat.com> 于2019年7月25日周四 下午3:29写道: >> >> From: Paolo Bonzini <pbonzini@redhat.com> >> >> [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ] >> >> If a KVM guest is reset while running a nested guest, free_nested will >> disable the shadow VMCS execution control in the vmcs01. However, >> on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync >> the VMCS12 to the shadow VMCS which has since been freed. >> >> This causes a vmptrld of a NULL pointer on my machime, but Jan reports >> the host to hang altogether. Let's see how much this trivial patch fixes. >> >> Reported-by: Jan Kiszka <jan.kiszka@siemens.com> >> Cc: Liran Alon <liran.alon@oracle.com> >> Cc: stable@vger.kernel.org >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > Hi all, > > Do we need to backport the fix also to stable 4.14? It applies > cleanly and compiles fine. The reproducer required newer kernels that support KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE, so it would be hard to test it. However, the patch itself should be safe. Paolo
Paolo Bonzini <pbonzini@redhat.com> 于2019年7月29日周一 上午11:10写道: > > On 29/07/19 10:58, Jack Wang wrote: > > Vitaly Kuznetsov <vkuznets@redhat.com> 于2019年7月25日周四 下午3:29写道: > >> > >> From: Paolo Bonzini <pbonzini@redhat.com> > >> > >> [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ] > >> > >> If a KVM guest is reset while running a nested guest, free_nested will > >> disable the shadow VMCS execution control in the vmcs01. However, > >> on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync > >> the VMCS12 to the shadow VMCS which has since been freed. > >> > >> This causes a vmptrld of a NULL pointer on my machime, but Jan reports > >> the host to hang altogether. Let's see how much this trivial patch fixes. > >> > >> Reported-by: Jan Kiszka <jan.kiszka@siemens.com> > >> Cc: Liran Alon <liran.alon@oracle.com> > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > > > Hi all, > > > > Do we need to backport the fix also to stable 4.14? It applies > > cleanly and compiles fine. > > The reproducer required newer kernels that support KVM_GET_NESTED_STATE > and KVM_SET_NESTED_STATE, so it would be hard to test it. However, the > patch itself should be safe. > > Paolo Thanks Paolo for confirmation. I'm asking because we had one incident in our production with 4.14.129 kernel, System is Skylake Gold cpu, first kvm errors, host hung afterwards kernel: [1186161.091160] kvm: vmptrld (null)/6bfc00000000 failed kernel: [1186161.091537] kvm: vmclear fail: (null)/6bfc00000000 kernel: [1186186.490300] watchdog: BUG: soft lockup - CPU#54 stuck for 23s! [qemu:16639] Hi Sasha, hi Greg, Would be great if you can pick this patch also to 4.14 kernel. Best regards, Jack Wang
On 29/07/19 11:29, Jack Wang wrote: > Thanks Paolo for confirmation. I'm asking because we had one incident > in our production with 4.14.129 kernel, > System is Skylake Gold cpu, first kvm errors, host hung afterwards > > kernel: [1186161.091160] kvm: vmptrld (null)/6bfc00000000 failed > kernel: [1186161.091537] kvm: vmclear fail: (null)/6bfc00000000 > kernel: [1186186.490300] watchdog: BUG: soft lockup - CPU#54 stuck for > 23s! [qemu:16639] > > Hi Sasha, hi Greg, > > Would be great if you can pick this patch also to 4.14 kernel. Acked-by: Paolo Bonzini <pbonzini@redhat.com>
On Mon, Jul 29, 2019 at 11:30:19AM +0200, Paolo Bonzini wrote: > On 29/07/19 11:29, Jack Wang wrote: > > Thanks Paolo for confirmation. I'm asking because we had one incident > > in our production with 4.14.129 kernel, > > System is Skylake Gold cpu, first kvm errors, host hung afterwards > > > > kernel: [1186161.091160] kvm: vmptrld (null)/6bfc00000000 failed > > kernel: [1186161.091537] kvm: vmclear fail: (null)/6bfc00000000 > > kernel: [1186186.490300] watchdog: BUG: soft lockup - CPU#54 stuck for > > 23s! [qemu:16639] > > > > Hi Sasha, hi Greg, > > > > Would be great if you can pick this patch also to 4.14 kernel. > > Acked-by: Paolo Bonzini <pbonzini@redhat.com> Ok, now queued up, thanks. greg k-h
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 73d6d585dd66..880bc36a0d5d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8457,6 +8457,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) { vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL, SECONDARY_EXEC_SHADOW_VMCS); vmcs_write64(VMCS_LINK_POINTER, -1ull); + vmx->nested.sync_shadow_vmcs = false; } static inline void nested_release_vmcs12(struct vcpu_vmx *vmx) @@ -8468,7 +8469,6 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx) /* copy to memory all shadowed fields in case they were modified */ copy_shadow_to_vmcs12(vmx); - vmx->nested.sync_shadow_vmcs = false; vmx_disable_shadow_vmcs(vmx); } vmx->nested.posted_intr_nv = -1; @@ -8668,6 +8668,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx) u64 field_value; struct vmcs *shadow_vmcs = vmx->vmcs01.shadow_vmcs; + if (WARN_ON(!shadow_vmcs)) + return; + preempt_disable(); vmcs_load(shadow_vmcs); @@ -8706,6 +8709,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx) u64 field_value = 0; struct vmcs *shadow_vmcs = vmx->vmcs01.shadow_vmcs; + if (WARN_ON(!shadow_vmcs)) + return; + vmcs_load(shadow_vmcs); for (q = 0; q < ARRAY_SIZE(fields); q++) {