diff mbox series

exception vs SIGALRM race on core2 CPUs (with fix!)

Message ID f09d851d-bda1-7a99-41cb-a14ea51e1237@yandex.ru (mailing list archive)
State New, archived
Headers show
Series exception vs SIGALRM race on core2 CPUs (with fix!) | expand

Commit Message

Stas Sergeev June 25, 2021, 11:35 p.m. UTC
OK, I've finally found that this
fixes the race:




This is where it returns to user
with the PF exception still pending.
So... any ideas?

Comments

Jim Mattson June 26, 2021, 12:15 a.m. UTC | #1
On Fri, Jun 25, 2021 at 4:35 PM stsp <stsp2@yandex.ru> wrote:
>
> OK, I've finally found that this
> fixes the race:
>
> --- x86.c.old   2021-03-20 12:51:14.000000000 +0300
> +++ x86.c       2021-06-26 02:28:37.082919492 +0300
> @@ -9176,8 +9176,10 @@
>                  if (__xfer_to_guest_mode_work_pending()) {
>                          srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>                          r = xfer_to_guest_mode_handle_work(vcpu);
> -                       if (r)
> +                       if (r) {
> +kvm_clear_exception_queue(vcpu);
>                                  return r;
> +}
>                          vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>                  }
>          }
>
>
>
> This is where it returns to user
> with the PF exception still pending.
> So... any ideas?

If the squashed exception was a trap, it's now lost.
Stas Sergeev June 26, 2021, 12:35 a.m. UTC | #2
26.06.2021 03:15, Jim Mattson пишет:
> On Fri, Jun 25, 2021 at 4:35 PM stsp <stsp2@yandex.ru> wrote:
>> OK, I've finally found that this
>> fixes the race:
>>
>> --- x86.c.old   2021-03-20 12:51:14.000000000 +0300
>> +++ x86.c       2021-06-26 02:28:37.082919492 +0300
>> @@ -9176,8 +9176,10 @@
>>                   if (__xfer_to_guest_mode_work_pending()) {
>>                           srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>                           r = xfer_to_guest_mode_handle_work(vcpu);
>> -                       if (r)
>> +                       if (r) {
>> +kvm_clear_exception_queue(vcpu);
>>                                   return r;
>> +}
>>                           vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>                   }
>>           }
>>
>>
>>
>> This is where it returns to user
>> with the PF exception still pending.
>> So... any ideas?
> If the squashed exception was a trap, it's now lost.

I am not saying this patch is
correct or should be applied.

The more interesting question is:
why KVM doesn't _inject_ the PF,
but is rather setting it pending
and then exits to user-space?

#0  kvm_multiple_exception (vcpu=vcpu@entry=0xffff888005934000, 
nr=nr@entry=14,
     has_error=has_error@entry=true, error_code=6, 
has_payload=has_payload@entry=true,
     payload=35979264, reinject=false) at ./include/linux/kvm_host.h:1280
#1  0xffffffff8103a13c in kvm_queue_exception_e_p (payload=<optimized out>,
     error_code=<optimized out>, nr=14, vcpu=0xffff888005934000) at 
arch/x86/kvm/x86.c:641
#2  kvm_inject_page_fault (vcpu=0xffff888005934000, fault=<optimized 
out>) at arch/x86/kvm/x86.c:641
#3  0xffffffff81031454 in kvm_inject_emulated_page_fault 
(vcpu=vcpu@entry=0xffff888005934000,
     fault=fault@entry=0xffffc9000031fc60) at arch/x86/kvm/x86.c:665
#4  0xffffffff8106df86 in paging32_page_fault (vcpu=0xffff888005934000, 
addr=35979264, error_code=6,
     prefault=<optimized out>) at arch/x86/kvm/mmu/paging_tmpl.h:816
#5  0xffffffff8106cdb4 in kvm_mmu_do_page_fault (prefault=false, err=6, 
cr2_or_gpa=35979264,
     vcpu=0xffff888005934000) at arch/x86/kvm/mmu.h:119
#6  kvm_mmu_page_fault (vcpu=vcpu@entry=0xffff888005934000, 
cr2_or_gpa=cr2_or_gpa@entry=35979264,
     error_code=error_code@entry=6, insn=0x0 <fixed_percpu_data>, 
insn_len=0)
     at arch/x86/kvm/mmu/mmu.c:5076
#7  0xffffffff8106d090 in kvm_handle_page_fault (insn_len=<optimized 
out>, insn=<optimized out>,
     fault_address=35979264, error_code=6, vcpu=0xffff888005934000) at 
arch/x86/kvm/mmu/mmu.c:3775
#8  kvm_handle_page_fault (vcpu=0xffff888005934000, error_code=6, 
fault_address=35979264,
     insn=<optimized out>, insn_len=<optimized out>) at 
arch/x86/kvm/mmu/mmu.c:3757
#9  0xffffffff810443e0 in vcpu_enter_guest (vcpu=0xffff888005934000) at 
arch/x86/kvm/x86.c:9090
#10 vcpu_run (vcpu=0xffff888005934000) at arch/x86/kvm/x86.c:9156
#11 kvm_arch_vcpu_ioctl_run (vcpu=vcpu@entry=0xffff888005934000) at 
arch/x86/kvm/x86.c:9385
#12 0xffffffff81020fec in kvm_vcpu_ioctl (filp=<optimized out>, 
ioctl=44672, arg=0)
     at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3292


This is the stack trace when it
is set pending.
Stas Sergeev June 26, 2021, 9:50 p.m. UTC | #3
26.06.2021 03:15, Jim Mattson пишет:
> If the squashed exception was a trap, it's now lost.

OK, below are 2 more patches, each of
them alone is fixing the problem.

---

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-27 00:38:45.547355116 +0300
@@ -9094,6 +9094,7 @@
         if (req_immediate_exit)
                 kvm_make_request(KVM_REQ_EVENT, vcpu);
         kvm_x86_ops.cancel_injection(vcpu);
+       kvm_clear_exception_queue(vcpu);
         if (unlikely(vcpu->arch.apic_attention))
                 kvm_lapic_sync_from_vapic(vcpu);
  out:
---


Or:

---

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-27 00:47:06.958618185 +0300
@@ -1783,8 +1783,7 @@
  bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu)
  {
         xfer_to_guest_mode_prepare();
-       return vcpu->mode == EXITING_GUEST_MODE || 
kvm_request_pending(vcpu) ||
-               xfer_to_guest_mode_work_pending();
+       return vcpu->mode == EXITING_GUEST_MODE || 
kvm_request_pending(vcpu);
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_exit_request);

---


Still not a clue?
Stas Sergeev June 27, 2021, 12:13 p.m. UTC | #4
26.06.2021 03:15, Jim Mattson пишет:
> If the squashed exception was a trap, it's now lost.

I am pretty sure this will do it:

---

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-27 15:02:45.126161812 +0300
@@ -9093,7 +9093,11 @@
  cancel_injection:
         if (req_immediate_exit)
                 kvm_make_request(KVM_REQ_EVENT, vcpu);
-       kvm_x86_ops.cancel_injection(vcpu);
+       if (vcpu->arch.exception.injected) {
+               kvm_x86_ops.cancel_injection(vcpu);
+               vcpu->arch.exception.injected = false;
+               vcpu->arch.exception.pending = true;
+       }
         if (unlikely(vcpu->arch.apic_attention))
                 kvm_lapic_sync_from_vapic(vcpu);
  out:
@@ -9464,6 +9468,7 @@
         kvm_rip_write(vcpu, regs->rip);
         kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);

+       WARN_ON_ONCE(vcpu->arch.exception.injected);
         vcpu->arch.exception.pending = false;

         kvm_make_request(KVM_REQ_EVENT, vcpu);
---


In cancel_injection, the injected/pending
members were getting out of sync with
vmcs.
We need to move it back to pending,
and if user-space does SET_REGS, then
it is cleared (not sure why SET_SREGS
doesn't clear it also).
But if the .injected member is stuck,
then its not cleared by SET_REGS, and
I added WARN_ON_ONCE() for that case.

Does this make sense?
diff mbox series

Patch

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-26 02:28:37.082919492 +0300
@@ -9176,8 +9176,10 @@ 
                 if (__xfer_to_guest_mode_work_pending()) {
                         srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
                         r = xfer_to_guest_mode_handle_work(vcpu);
-                       if (r)
+                       if (r) {
+kvm_clear_exception_queue(vcpu);
                                 return r;
+}
                         vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
                 }
         }