kvm: x86: move tracepoints outside extended quiescent state
diff mbox

Message ID 1449769137-8668-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Dec. 10, 2015, 5:38 p.m. UTC
Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
lockdep splat.

Cc: stable@vger.kernel.org
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm.c | 4 ++--
 arch/x86/kvm/vmx.c | 3 ++-
 arch/x86/kvm/x86.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

Comments

Boris Petkov Dec. 10, 2015, 6:09 p.m. UTC | #1
On Thu, Dec 10, 2015 at 06:38:57PM +0100, Paolo Bonzini wrote:
> Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
> lockdep splat.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/svm.c | 4 ++--
>  arch/x86/kvm/vmx.c | 3 ++-
>  arch/x86/kvm/x86.c | 2 +-
>  3 files changed, 5 insertions(+), 4 deletions(-)

Looks like you missed some...

[  144.296364] kvm: zapping shadow pages for mmio generation wraparound
[  164.699053] kvm: zapping shadow pages for mmio generation wraparound
[  312.115767] kvm: zapping shadow pages for mmio generation wraparound
[  432.277585] kvm: zapping shadow pages for mmio generation wraparound

[  434.547820] ===============================
[  434.552020] [ INFO: suspicious RCU usage. ]
[  434.556223] 4.4.0-rc4+ #1 Not tainted
[  434.559886] -------------------------------
[  434.564072] arch/x86/kvm/trace.h:971 suspicious rcu_dereference_check() usage!
[  434.571303] 
               other info that might help us debug this:

[  434.579324] 
               RCU used illegally from idle CPU!
               rcu_scheduler_active = 1, debug_locks = 0
[  434.590209] RCU used illegally from extended quiescent state!
[  434.595971] 1 lock held by qemu-system-x86/2402:
[  434.600596]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffa020872c>] vcpu_load+0x1c/0x80 [kvm]
[  434.609146] 
               stack backtrace:
[  434.613526] CPU: 4 PID: 2402 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ #1
[  434.620583] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[  434.627987]  0000000000000001 ffff88042f79fcf0 ffffffff813c2cfc ffff880435aa0000
[  434.635443]  ffff88042f79fd20 ffffffff810c5157 ffff88042fd48000 0000000295a85563
[  434.642886]  0000000295de483f 0000000000000000 ffff88042f79fd58 ffffffffa023ec6e
[  434.650334] Call Trace:
[  434.652804]  [<ffffffff813c2cfc>] dump_stack+0x4e/0x82
[  434.657950]  [<ffffffff810c5157>] lockdep_rcu_suspicious+0xe7/0x120
[  434.664239]  [<ffffffffa023ec6e>] wait_lapic_expire+0xfe/0x1e0 [kvm]
[  434.670606]  [<ffffffffa022183e>] kvm_arch_vcpu_ioctl_run+0x76e/0x19c0 [kvm]
[  434.677674]  [<ffffffffa0221982>] ? kvm_arch_vcpu_ioctl_run+0x8b2/0x19c0 [kvm]
[  434.684905]  [<ffffffff81858acc>] ? mutex_lock_killable_nested+0x29c/0x4c0
[  434.691792]  [<ffffffffa0208b32>] kvm_vcpu_ioctl+0x342/0x700 [kvm]
[  434.697984]  [<ffffffff810c4a7d>] ? __lock_is_held+0x4d/0x70
[  434.703655]  [<ffffffff812351ae>] ? __fget+0xfe/0x200
[  434.708719]  [<ffffffff812291f1>] do_vfs_ioctl+0x301/0x550
[  434.714208]  [<ffffffff8123531a>] ? __fget_light+0x2a/0x90
[  434.719700]  [<ffffffff81229481>] SyS_ioctl+0x41/0x70
[  434.724754]  [<ffffffff8185cb36>] entry_SYSCALL_64_fastpath+0x16/0x7a
[  437.411818] kvm [2400]: vcpu0 unhandled rdmsr: 0x606
[  437.416898] kvm [2400]: vcpu0 unhandled rdmsr: 0x34
Paolo Bonzini Dec. 10, 2015, 6:15 p.m. UTC | #2
On 10/12/2015 19:09, Borislav Petkov wrote:
> On Thu, Dec 10, 2015 at 06:38:57PM +0100, Paolo Bonzini wrote:
>> Invoking tracepoints within kvm_guest_enter/kvm_guest_exit causes a
>> lockdep splat.
>>
>> Cc: stable@vger.kernel.org
>> Reported-by: Borislav Petkov <bp@alien8.de>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/svm.c | 4 ++--
>>  arch/x86/kvm/vmx.c | 3 ++-
>>  arch/x86/kvm/x86.c | 2 +-
>>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> Looks like you missed some...

Yeah, wait_lapic_expire also have to be moved before __kvm_guest_enter.

Paolo

> [  144.296364] kvm: zapping shadow pages for mmio generation wraparound
> [  164.699053] kvm: zapping shadow pages for mmio generation wraparound
> [  312.115767] kvm: zapping shadow pages for mmio generation wraparound
> [  432.277585] kvm: zapping shadow pages for mmio generation wraparound
> 
> [  434.547820] ===============================
> [  434.552020] [ INFO: suspicious RCU usage. ]
> [  434.556223] 4.4.0-rc4+ #1 Not tainted
> [  434.559886] -------------------------------
> [  434.564072] arch/x86/kvm/trace.h:971 suspicious rcu_dereference_check() usage!
> [  434.571303] 
>                other info that might help us debug this:
> 
> [  434.579324] 
>                RCU used illegally from idle CPU!
>                rcu_scheduler_active = 1, debug_locks = 0
> [  434.590209] RCU used illegally from extended quiescent state!
> [  434.595971] 1 lock held by qemu-system-x86/2402:
> [  434.600596]  #0:  (&vcpu->mutex){+.+.+.}, at: [<ffffffffa020872c>] vcpu_load+0x1c/0x80 [kvm]
> [  434.609146] 
>                stack backtrace:
> [  434.613526] CPU: 4 PID: 2402 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ #1
> [  434.620583] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
> [  434.627987]  0000000000000001 ffff88042f79fcf0 ffffffff813c2cfc ffff880435aa0000
> [  434.635443]  ffff88042f79fd20 ffffffff810c5157 ffff88042fd48000 0000000295a85563
> [  434.642886]  0000000295de483f 0000000000000000 ffff88042f79fd58 ffffffffa023ec6e
> [  434.650334] Call Trace:
> [  434.652804]  [<ffffffff813c2cfc>] dump_stack+0x4e/0x82
> [  434.657950]  [<ffffffff810c5157>] lockdep_rcu_suspicious+0xe7/0x120
> [  434.664239]  [<ffffffffa023ec6e>] wait_lapic_expire+0xfe/0x1e0 [kvm]
> [  434.670606]  [<ffffffffa022183e>] kvm_arch_vcpu_ioctl_run+0x76e/0x19c0 [kvm]
> [  434.677674]  [<ffffffffa0221982>] ? kvm_arch_vcpu_ioctl_run+0x8b2/0x19c0 [kvm]
> [  434.684905]  [<ffffffff81858acc>] ? mutex_lock_killable_nested+0x29c/0x4c0
> [  434.691792]  [<ffffffffa0208b32>] kvm_vcpu_ioctl+0x342/0x700 [kvm]
> [  434.697984]  [<ffffffff810c4a7d>] ? __lock_is_held+0x4d/0x70
> [  434.703655]  [<ffffffff812351ae>] ? __fget+0xfe/0x200
> [  434.708719]  [<ffffffff812291f1>] do_vfs_ioctl+0x301/0x550
> [  434.714208]  [<ffffffff8123531a>] ? __fget_light+0x2a/0x90
> [  434.719700]  [<ffffffff81229481>] SyS_ioctl+0x41/0x70
> [  434.724754]  [<ffffffff8185cb36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> [  437.411818] kvm [2400]: vcpu0 unhandled rdmsr: 0x606
> [  437.416898] kvm [2400]: vcpu0 unhandled rdmsr: 0x34
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boris Petkov Dec. 11, 2015, 10:22 a.m. UTC | #3
On Thu, Dec 10, 2015 at 07:15:19PM +0100, Paolo Bonzini wrote:
> Yeah, wait_lapic_expire also have to be moved before __kvm_guest_enter.

Yeah, v2 doesn't splat on the Intel box anymore but the AMD box still
has, and it is a different problem. With the v2 applied, it still
explodes, see below.

And I'm willing to bet good money on that shadow pages fun. The

[  959.466549] kernel tried to execute NX-protected page - exploit attempt? (uid: 1000)

line basically says that we're pagefaulting when trying to fetch
instructions, i.e., we're trying to execute something from a page, rIP
points to 0xffff8800b9f9bdf0 and that is most likely a page belonging to
kvm, which, however, is for some reason not executable (anymore?).

Could it have anything to do with that zapping of shadow pages, per
chance?

Can I disable the zapping and see if it still triggers? Or should I try
modprobing kvm with "npt=0" or so?

/me goes and tries it...

Nope, that doesn't help - it still splats.

Hmmm...

[  849.272337] kvm: zapping shadow pages for mmio generation wraparound
[  933.813871] kvm: zapping shadow pages for mmio generation wraparound
[  959.466549] kernel tried to execute NX-protected page - exploit attempt? (uid: 1000)
[  959.474369] BUG: unable to handle kernel paging request at ffff8800b9f9bdf0
[  959.481407] IP: [<ffff8800b9f9bdf0>] 0xffff8800b9f9bdf0
[  959.486677] PGD 2d7e067 PUD 43efff067 PMD 80000000b9e001e3 
[  959.492338] Oops: 0011 [#1] PREEMPT SMP 
[  959.496340] Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm irqbypass crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd amd64_edac_mod fam15h_power k10temp edac_mce_amd amdkfd amd_iommu_v2 radeon acpi_cpufreq
[  959.524023] CPU: 3 PID: 3798 Comm: qemu-system-x86 Not tainted 4.4.0-rc4+ #8
[  959.531127] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  959.541113] task: ffff8800b7ca5e00 ti: ffff8800b9f98000 task.ti: ffff8800b9f98000
[  959.548625] RIP: 0010:[<ffff8800b9f9bdf0>]  [<ffff8800b9f9bdf0>] 0xffff8800b9f9bdf0
[  959.556338] RSP: 0018:ffff8800b9f9bde0  EFLAGS: 00010206
[  959.561676] RAX: 000003993d0f82ee RBX: ffff8800b7d48000 RCX: 0000000000000001
[  959.568844] RDX: 0000039900000000 RSI: ffffffffa02bdc7b RDI: ffff8800b7d48000
[  959.576010] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[  959.583177] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  959.590346] R13: ffff8800b7d48000 R14: 0000000000000000 R15: 0000000000000000
[  959.597513] FS:  00007f7fae580700(0000) GS:ffff88042cc00000(0000) knlGS:0000000000000000
[  959.605643] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  959.611414] CR2: ffff8800b9f9bdf0 CR3: 000000041b5fe000 CR4: 00000000000406e0
[  959.618579] Stack:
[  959.620607]  ffffffffa02d5e17 ffff8800b7d48000 ffff8800b9f9be08 ffffffffa02bdb1f
[  959.628104]  0000000000000000 ffff8800b9f9be98 ffffffffa02bdc7b ffff8804242a4400
[  959.635601]  0000000000000070 0000000000004000 ffffffff81a3c1e0 ffff8800b7ca5e00
[  959.643114] Call Trace:
[  959.645599]  [<ffffffffa02d5e17>] ? kvm_arch_vcpu_put+0x17/0x40 [kvm]
[  959.652081]  [<ffffffffa02bdb1f>] ? vcpu_put+0x1f/0x60 [kvm]
[  959.657782]  [<ffffffffa02bdc7b>] ? kvm_vcpu_ioctl+0x11b/0x6f0 [kvm]
[  959.664169]  [<ffffffff811a0930>] ? do_vfs_ioctl+0x2e0/0x540
[  959.669855]  [<ffffffff811ac8e9>] ? __fget_light+0x29/0x90
[  959.675364]  [<ffffffff811a0bdc>] ? SyS_ioctl+0x4c/0x90
[  959.680618]  [<ffffffff816e2d5b>] ? entry_SYSCALL_64_fastpath+0x16/0x6f
[  959.687263] Code: 00 00 00 06 02 01 00 00 00 00 00 e0 bd f9 b9 00 88 ff ff 18 00 00 00 00 00 00 00 17 5e 2d a0 ff ff ff ff 00 80 d4 b7 00 88 ff ff <08> be f9 b9 00 88 ff ff 1f db 2b a0 ff ff ff ff 00 00 00 00 00 
[  959.707506] RIP  [<ffff8800b9f9bdf0>] 0xffff8800b9f9bdf0
[  959.712862]  RSP <ffff8800b9f9bde0>
[  959.716373] CR2: ffff8800b9f9bdf0
[  959.735764] ---[ end trace 6826bd13f6e235cd ]---
[  959.740465] note: qemu-system-x86[3798] exited with preempt_count 1
[  979.163010] kvm: zapping shadow pages for mmio generation wraparound

Patch
diff mbox

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 83a1c643f9a5..899c40f826dd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3422,6 +3422,8 @@  static int handle_exit(struct kvm_vcpu *vcpu)
 	struct kvm_run *kvm_run = vcpu->run;
 	u32 exit_code = svm->vmcb->control.exit_code;
 
+	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
+
 	if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
 		vcpu->arch.cr0 = svm->vmcb->save.cr0;
 	if (npt_enabled)
@@ -3892,8 +3894,6 @@  static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 	vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
 	vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
 
-	trace_kvm_exit(svm->vmcb->control.exit_code, vcpu, KVM_ISA_SVM);
-
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_handle_nmi(&svm->vcpu);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index af823a388c19..6b5605607849 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8042,6 +8042,8 @@  static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 	u32 exit_reason = vmx->exit_reason;
 	u32 vectoring_info = vmx->idt_vectoring_info;
 
+	trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX);
+
 	/*
 	 * Flush logged GPAs PML buffer, this will make dirty_bitmap more
 	 * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before
@@ -8668,7 +8670,6 @@  static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx->loaded_vmcs->launched = 1;
 
 	vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
-	trace_kvm_exit(vmx->exit_reason, vcpu, KVM_ISA_VMX);
 
 	/*
 	 * the KVM_REQ_EVENT optimization bit is only on for one entry, and if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eed32283d22c..2a8c035ec7fe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6515,6 +6515,7 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (req_immediate_exit)
 		smp_send_reschedule(vcpu->cpu);
 
+	trace_kvm_entry(vcpu->vcpu_id);
 	__kvm_guest_enter();
 
 	if (unlikely(vcpu->arch.switch_db_regs)) {
@@ -6527,7 +6528,6 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
 	}
 
-	trace_kvm_entry(vcpu->vcpu_id);
 	wait_lapic_expire(vcpu);
 	kvm_x86_ops->run(vcpu);