diff mbox

[v7,2/2] x86/apic: x2apic write eoi msr notrace

Message ID 1477877822-4437-3-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li Oct. 31, 2016, 1:37 a.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

| RCU used illegally from idle CPU!
| rcu_scheduler_active = 1, debug_locks = 0
| RCU used illegally from extended quiescent state!
| no locks held by swapper/1/0.
| 
|  [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
|  [<ffffffff9d06f860>] native_write_msr+0x20/0x30
|  [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
|  [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
|  [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0

Reschedule interrupt may be called in cpu idle state. This causes lockdep
check warning above.

As Peterz pointed out:

| So now we're making a very frequent interrupt slower because of debug 
| code.
|
| The thing is, many many smp_reschedule_interrupt() invocations don't
| actually execute anything much at all and are only send to tickle the
| return to user path (which does the actual preemption).
| 
| Having to do the whole irq_enter/irq_exit dance just for this unlikely
| debug case totally blows.

This patch converts x2apic write eoi msr to notrace to avoid the debug 
codes splash and reverts irq_enter/irq_exit dance to avoid to make a very 
frequent interrupt slower because of debug code.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/apic.h | 3 ++-
 arch/x86/kernel/apic/apic.c | 1 +
 arch/x86/kernel/kvm.c       | 4 ++--
 arch/x86/kernel/smp.c       | 2 --
 4 files changed, 5 insertions(+), 5 deletions(-)

Comments

Borislav Petkov Nov. 2, 2016, 9:10 p.m. UTC | #1
On Mon, Oct 31, 2016 at 09:37:02AM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> | RCU used illegally from idle CPU!
> | rcu_scheduler_active = 1, debug_locks = 0
> | RCU used illegally from extended quiescent state!
> | no locks held by swapper/1/0.
> | 
> |  [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
> |  [<ffffffff9d06f860>] native_write_msr+0x20/0x30
> |  [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
> |  [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
> |  [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0

Please remove the text between [] and the offsets after "+..." - those
are useless in a commit message.

> Reschedule interrupt may be called in cpu idle state. This causes lockdep

s/cpu/CPU/

> check warning above.
> 
> As Peterz pointed out:
> 
> | So now we're making a very frequent interrupt slower because of debug 
> | code.
> |
> | The thing is, many many smp_reschedule_interrupt() invocations don't
> | actually execute anything much at all and are only send to tickle the

s/send/sent/

> | return to user path (which does the actual preemption).
> | 
> | Having to do the whole irq_enter/irq_exit dance just for this unlikely
> | debug case totally blows.
> 
> This patch converts x2apic write eoi msr to notrace to avoid the debug 
> codes splash and reverts irq_enter/irq_exit dance to avoid to make a very 
> frequent interrupt slower because of debug code.
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
Wanpeng Li Nov. 7, 2016, 2:52 a.m. UTC | #2
2016-11-03 5:10 GMT+08:00 Borislav Petkov <bp@alien8.de>:
> On Mon, Oct 31, 2016 at 09:37:02AM +0800, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> | RCU used illegally from idle CPU!
>> | rcu_scheduler_active = 1, debug_locks = 0
>> | RCU used illegally from extended quiescent state!
>> | no locks held by swapper/1/0.
>> |
>> |  [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
>> |  [<ffffffff9d06f860>] native_write_msr+0x20/0x30
>> |  [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
>> |  [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
>> |  [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0
>
> Please remove the text between [] and the offsets after "+..." - those
> are useless in a commit message.
>
>> Reschedule interrupt may be called in cpu idle state. This causes lockdep
>
> s/cpu/CPU/
>
>> check warning above.
>>
>> As Peterz pointed out:
>>
>> | So now we're making a very frequent interrupt slower because of debug
>> | code.
>> |
>> | The thing is, many many smp_reschedule_interrupt() invocations don't
>> | actually execute anything much at all and are only send to tickle the
>
> s/send/sent/

I will send a new version to clean it up.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index f5aaf6c..a5a0bcf 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -196,7 +196,7 @@  static inline void native_apic_msr_write(u32 reg, u32 v)
 
 static inline void native_apic_msr_eoi_write(u32 reg, u32 v)
 {
-	wrmsr(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
+	wrmsr_notrace(APIC_BASE_MSR + (APIC_EOI >> 4), APIC_EOI_ACK, 0);
 }
 
 static inline u32 native_apic_msr_read(u32 reg)
@@ -332,6 +332,7 @@  struct apic {
 	 * on write for EOI.
 	 */
 	void (*eoi_write)(u32 reg, u32 v);
+	void (*native_eoi_write)(u32 reg, u32 v);
 	u64 (*icr_read)(void);
 	void (*icr_write)(u32 low, u32 high);
 	void (*wait_icr_idle)(void);
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 88c657b..2686894 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2263,6 +2263,7 @@  void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v))
 	for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
 		/* Should happen once for each apic */
 		WARN_ON((*drv)->eoi_write == eoi_write);
+		(*drv)->native_eoi_write = (*drv)->eoi_write;
 		(*drv)->eoi_write = eoi_write;
 	}
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..d230513 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -308,7 +308,7 @@  static void kvm_register_steal_time(void)
 
 static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
 
-static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
+static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 {
 	/**
 	 * This relies on __test_and_clear_bit to modify the memory
@@ -319,7 +319,7 @@  static void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 	 */
 	if (__test_and_clear_bit(KVM_PV_EOI_BIT, this_cpu_ptr(&kvm_apic_eoi)))
 		return;
-	apic_write(APIC_EOI, APIC_EOI_ACK);
+	apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
 static void kvm_guest_cpu_init(void)
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index c00cb64..68f8cc2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -261,10 +261,8 @@  static inline void __smp_reschedule_interrupt(void)
 
 __visible void smp_reschedule_interrupt(struct pt_regs *regs)
 {
-	irq_enter();
 	ack_APIC_irq();
 	__smp_reschedule_interrupt();
-	irq_exit();
 	/*
 	 * KVM uses this interrupt to force a cpu out of guest mode
 	 */