diff mbox

[v2] KVM: LAPIC: Fix cancel preemption timer repeatedly due to preemption

Message ID 1500886678-5417-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wanpeng Li July 24, 2017, 8:57 a.m. UTC
From: Wanpeng Li <wanpeng.li@hotmail.com>

Preemption can occur in the preemption timer expiration handler:

          CPU0                    CPU1

  preemption timer vmexit
  handle_preemption_timer(vCPU0)
    kvm_lapic_expired_hv_timer
      hv_timer_is_use == true
  sched_out
                           sched_in
                           kvm_arch_vcpu_load
                             kvm_lapic_restart_hv_timer
                               restart_apic_timer
                                 start_hv_timer
                                   already-expired timer or sw timer triggerd in the window
                                 start_sw_timer
                                   cancel_hv_timer
                           /* back in kvm_lapic_expired_hv_timer */
                           cancel_hv_timer
                             WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops

This can be reproduced if CONFIG_PREEMPT is enabled.

------------[ cut here ]------------
 WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1563 kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
 CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G           OE   4.13.0-rc2+ #16
 RIP: 0010:kvm_lapic_expired_hv_timer+0x9e/0xb0 [kvm]
Call Trace:
  handle_preemption_timer+0xe/0x20 [kvm_intel]
  vmx_handle_exit+0xb8/0xd70 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25
 ------------[ cut here ]------------
 WARNING: CPU: 4 PID: 2972 at /home/kernel/linux/arch/x86/kvm//lapic.c:1498 cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
 CPU: 4 PID: 2972 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc2+ #16
 RIP: 0010:cancel_hv_timer.isra.40+0x4f/0x60 [kvm]
Call Trace:
  kvm_lapic_expired_hv_timer+0x3e/0xb0 [kvm]
  handle_preemption_timer+0xe/0x20 [kvm_intel]
  vmx_handle_exit+0xb8/0xd70 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0xdd1/0x1be0 [kvm]
  ? kvm_arch_vcpu_load+0x47/0x230 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x81/0x220
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes it by don't cancel preemption timer repeatedly if 
the preemption timer has already been cancelled due to preemption 
since already-expired timer or sw timer triggered in the window.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/lapic.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini July 24, 2017, 2:45 p.m. UTC | #1
On 24/07/2017 10:57, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Preemption can occur in the preemption timer expiration handler:
> 
>           CPU0                    CPU1
> 
>   preemption timer vmexit
>   handle_preemption_timer(vCPU0)
>     kvm_lapic_expired_hv_timer
>       hv_timer_is_use == true
>   sched_out
>                            sched_in
>                            kvm_arch_vcpu_load
>                              kvm_lapic_restart_hv_timer
>                                restart_apic_timer
>                                  start_hv_timer
>                                    already-expired timer or sw timer triggerd in the window
>                                  start_sw_timer
>                                    cancel_hv_timer

At this point, the timer interrupt is injected, right?

If this is correct, kvm_lapic_expired_hv_timer can just do nothing if
the timer is not in use, with a comment explaining that the preemption
notifier has run start_sw_timer and thus injected the timer interrupt.

>                            /* back in kvm_lapic_expired_hv_timer */
>                            cancel_hv_timer
>                              WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
> 
> This can be reproduced if CONFIG_PREEMPT is enabled.
> 
> This patch fixes it by don't cancel preemption timer repeatedly if 
> the preemption timer has already been cancelled due to preemption 
> since already-expired timer or sw timer triggered in the window.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/lapic.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2819d4c..8341b40 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1560,9 +1560,13 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  
> -	WARN_ON(!apic->lapic_timer.hv_timer_in_use);
> -	WARN_ON(swait_active(&vcpu->wq));
> -	cancel_hv_timer(apic);
> +	preempt_disable();
> +	if (!(!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))) {

Why is the "if" necessary?

Maybe all of kvm_lapic_expired_hv_timer and start_sw_timer should be in
preemption-disabled regions, which trivially avoids any reentrancy issue
with the preempt notifier.  Then, cancel_hv_timer can assert that it's
called with preemption disabled.

Paolo

> +		WARN_ON(!apic->lapic_timer.hv_timer_in_use);
> +		WARN_ON(swait_active(&vcpu->wq));
> +		cancel_hv_timer(apic);
> +	}
> +	preempt_enable();
>  	apic_timer_expired(apic);
>  
>  	if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
>
Wanpeng Li July 24, 2017, 3:08 p.m. UTC | #2
2017-07-24 22:45 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 24/07/2017 10:57, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Preemption can occur in the preemption timer expiration handler:
>>
>>           CPU0                    CPU1
>>
>>   preemption timer vmexit
>>   handle_preemption_timer(vCPU0)
>>     kvm_lapic_expired_hv_timer
>>       hv_timer_is_use == true
>>   sched_out
>>                            sched_in
>>                            kvm_arch_vcpu_load
>>                              kvm_lapic_restart_hv_timer
>>                                restart_apic_timer
>>                                  start_hv_timer
>>                                    already-expired timer or sw timer triggerd in the window
>>                                  start_sw_timer
>>                                    cancel_hv_timer
>
> At this point, the timer interrupt is injected, right?

Do you mean the new one on CPU1? I think we just set the pending
timer, we return back to kvm_lapic_expired_hv_timer() after preempt
notifier sched_in.

>
> If this is correct, kvm_lapic_expired_hv_timer can just do nothing if
> the timer is not in use, with a comment explaining that the preemption
> notifier has run start_sw_timer and thus injected the timer interrupt.
>
>>                            /* back in kvm_lapic_expired_hv_timer */
>>                            cancel_hv_timer
>>                              WARN_ON(!apic->lapic_timer.hv_timer_in_use);  ==> Oops
>>
>> This can be reproduced if CONFIG_PREEMPT is enabled.
>>
>> This patch fixes it by don't cancel preemption timer repeatedly if
>> the preemption timer has already been cancelled due to preemption
>> since already-expired timer or sw timer triggered in the window.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/kvm/lapic.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 2819d4c..8341b40 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1560,9 +1560,13 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
>>  {
>>       struct kvm_lapic *apic = vcpu->arch.apic;
>>
>> -     WARN_ON(!apic->lapic_timer.hv_timer_in_use);
>> -     WARN_ON(swait_active(&vcpu->wq));
>> -     cancel_hv_timer(apic);
>> +     preempt_disable();
>> +     if (!(!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))) {
>
> Why is the "if" necessary?
>
> Maybe all of kvm_lapic_expired_hv_timer and start_sw_timer should be in
> preemption-disabled regions, which trivially avoids any reentrancy issue
> with the preempt notifier.  Then, cancel_hv_timer can assert that it's
> called with preemption disabled.

For example:

static int handle_preemption_timer(struct kvm_vcpu *vcpu)
{
     --------------------------------------------------> We still can
be preempted here, and do one cancel_hv_timer()
    preempt_disable();
    kvm_lapic_expired_hv_timer(vcpu);   -----> WARN_ON in
cancel_hv_timer() even if we remove the WARN_ON in
kvm_lapic_expired_hv_timer() as you mentioned above
    preempt_enable();
    return 1;
}

Regards,
Wanpeng Li
Paolo Bonzini July 24, 2017, 3:38 p.m. UTC | #3
On 24/07/2017 17:08, Wanpeng Li wrote:
> 2017-07-24 22:45 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> On 24/07/2017 10:57, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> Preemption can occur in the preemption timer expiration handler:
>>>
>>>           CPU0                    CPU1
>>>
>>>   preemption timer vmexit
>>>   handle_preemption_timer(vCPU0)
>>>     kvm_lapic_expired_hv_timer
>>>       hv_timer_is_use == true
>>>   sched_out
>>>                            sched_in
>>>                            kvm_arch_vcpu_load
>>>                              kvm_lapic_restart_hv_timer
>>>                                restart_apic_timer
>>>                                  start_hv_timer
>>>                                    already-expired timer or sw timer triggerd in the window
>>>                                  start_sw_timer
>>>                                    cancel_hv_timer
>>
>> At this point, the timer interrupt is injected, right?
> 
> Do you mean the new one on CPU1? I think we just set the pending
> timer, we return back to kvm_lapic_expired_hv_timer() after preempt
> notifier sched_in.

start_sw_timer calls apic_timer_expired after cancel_hv_timer.


>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 2819d4c..8341b40 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -1560,9 +1560,13 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
>>>  {
>>>       struct kvm_lapic *apic = vcpu->arch.apic;
>>>
>>> -     WARN_ON(!apic->lapic_timer.hv_timer_in_use);
>>> -     WARN_ON(swait_active(&vcpu->wq));
>>> -     cancel_hv_timer(apic);
>>> +     preempt_disable();
>>> +     if (!(!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))) {
>>
>> Why is the "if" necessary?
>>
>> Maybe all of kvm_lapic_expired_hv_timer and start_sw_timer should be in
>> preemption-disabled regions, which trivially avoids any reentrancy issue
>> with the preempt notifier.  Then, cancel_hv_timer can assert that it's
>> called with preemption disabled.
> 
> For example:
> 
> static int handle_preemption_timer(struct kvm_vcpu *vcpu)
> {
>      --------------------------------------------------> We still can
> be preempted here, and do one cancel_hv_timer()

Yes, so that just means you can do


	preempt_disable();
	/* The preempt notifier has called apic_timer_expired already.  */
	if (!apic->lapic_timer.hv_timer_in_use)
		goto out;

Thinking more about it, even _the caller_ of start_hv_timer and
start_sw_timer should be in a preemption-disabled region for simplicity.
That means that ultimately restart_apic_timer, kvm_lapic_expired_hv_timer
and kvm_lapic_switch_to_sw_timer should call preempt_disable/preempt_enable.

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2819d4c..8341b40 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1560,9 +1560,13 @@  void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	WARN_ON(!apic->lapic_timer.hv_timer_in_use);
-	WARN_ON(swait_active(&vcpu->wq));
-	cancel_hv_timer(apic);
+	preempt_disable();
+	if (!(!apic_lvtt_period(apic) && atomic_read(&apic->lapic_timer.pending))) {
+		WARN_ON(!apic->lapic_timer.hv_timer_in_use);
+		WARN_ON(swait_active(&vcpu->wq));
+		cancel_hv_timer(apic);
+	}
+	preempt_enable();
 	apic_timer_expired(apic);
 
 	if (apic_lvtt_period(apic) && apic->lapic_timer.period) {