exception vs SIGALRM race on core2 CPUs (with fix!)

Message ID	f09d851d-bda1-7a99-41cb-a14ea51e1237@yandex.ru (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Subject: exception vs SIGALRM race on core2 CPUs (with fix!) From: stsp <stsp2@yandex.ru> To: Jim Mattson <jmattson@google.com> Cc: Sean Christopherson <seanjc@google.com>, kvm@vger.kernel.org References: <bd4a2d30-5fb4-3612-c855-946d97068b9a@yandex.ru> <YMeMov42fihXptQm@google.com> <73f1f90e-f952-45a4-184e-1aafb3e4a8fd@yandex.ru> <YMtfQHGJL7XP/0Rq@google.com> <23b00d8a-1732-0b0b-cd8d-e802f7aca87c@yandex.ru> <CALMp9eSpJ8=O=6YExpOtdnA=gQkWfQJ+oz0bBcV4gOPFdnciVA@mail.gmail.com> <d5bf20f4-9aef-8e7e-8a8f-47d10510724e@yandex.ru> <CALMp9eQANi7SPAvue5VQazG7A0=b_2vkUxYK+GMLbzNkxbXM5w@mail.gmail.com> <4f40a6e8-07ce-ba12-b3e6-5975ad19a2ff@yandex.ru> <cbaa0b83-fc3a-5732-4246-386a0a0ff9b8@yandex.ru> <60ae8b9f-89af-e8b3-13c4-99ddea1ced90@yandex.ru> <19022e7d-e1f5-06b5-f059-27172ca50011@yandex.ru> Message-ID: <f09d851d-bda1-7a99-41cb-a14ea51e1237@yandex.ru> Date: Sat, 26 Jun 2021 02:35:35 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <19022e7d-e1f5-06b5-f059-27172ca50011@yandex.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk
Series	exception vs SIGALRM race on core2 CPUs (with fix!) \| expand exception vs SIGALRM race on core2 CPUs (with fix!)

Message ID

f09d851d-bda1-7a99-41cb-a14ea51e1237@yandex.ru (mailing list archive)

State

New, archived

Headers

Subject: exception vs SIGALRM race on core2 CPUs (with fix!)
From: stsp <stsp2@yandex.ru>
To: Jim Mattson <jmattson@google.com>
Cc: Sean Christopherson <seanjc@google.com>, kvm@vger.kernel.org
References: <bd4a2d30-5fb4-3612-c855-946d97068b9a@yandex.ru>
 <YMeMov42fihXptQm@google.com>
 <73f1f90e-f952-45a4-184e-1aafb3e4a8fd@yandex.ru>
 <YMtfQHGJL7XP/0Rq@google.com>
 <23b00d8a-1732-0b0b-cd8d-e802f7aca87c@yandex.ru>
 <CALMp9eSpJ8=O=6YExpOtdnA=gQkWfQJ+oz0bBcV4gOPFdnciVA@mail.gmail.com>
 <d5bf20f4-9aef-8e7e-8a8f-47d10510724e@yandex.ru>
 <CALMp9eQANi7SPAvue5VQazG7A0=b_2vkUxYK+GMLbzNkxbXM5w@mail.gmail.com>
 <4f40a6e8-07ce-ba12-b3e6-5975ad19a2ff@yandex.ru>
 <cbaa0b83-fc3a-5732-4246-386a0a0ff9b8@yandex.ru>
 <60ae8b9f-89af-e8b3-13c4-99ddea1ced90@yandex.ru>
 <19022e7d-e1f5-06b5-f059-27172ca50011@yandex.ru>
Message-ID: <f09d851d-bda1-7a99-41cb-a14ea51e1237@yandex.ru>
Date: Sat, 26 Jun 2021 02:35:35 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <19022e7d-e1f5-06b5-f059-27172ca50011@yandex.ru>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Precedence: bulk

Series

exception vs SIGALRM race on core2 CPUs (with fix!) | expand

Comments

Jim Mattson June 26, 2021, 12:15 a.m. UTC | #1

On Fri, Jun 25, 2021 at 4:35 PM stsp <stsp2@yandex.ru> wrote:
>
> OK, I've finally found that this
> fixes the race:
>
> --- x86.c.old   2021-03-20 12:51:14.000000000 +0300
> +++ x86.c       2021-06-26 02:28:37.082919492 +0300
> @@ -9176,8 +9176,10 @@
>                  if (__xfer_to_guest_mode_work_pending()) {
>                          srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>                          r = xfer_to_guest_mode_handle_work(vcpu);
> -                       if (r)
> +                       if (r) {
> +kvm_clear_exception_queue(vcpu);
>                                  return r;
> +}
>                          vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>                  }
>          }
>
>
>
> This is where it returns to user
> with the PF exception still pending.
> So... any ideas?

If the squashed exception was a trap, it's now lost.

stsp June 26, 2021, 12:35 a.m. UTC | #2

26.06.2021 03:15, Jim Mattson пишет:
> On Fri, Jun 25, 2021 at 4:35 PM stsp <stsp2@yandex.ru> wrote:
>> OK, I've finally found that this
>> fixes the race:
>>
>> --- x86.c.old   2021-03-20 12:51:14.000000000 +0300
>> +++ x86.c       2021-06-26 02:28:37.082919492 +0300
>> @@ -9176,8 +9176,10 @@
>>                   if (__xfer_to_guest_mode_work_pending()) {
>>                           srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>>                           r = xfer_to_guest_mode_handle_work(vcpu);
>> -                       if (r)
>> +                       if (r) {
>> +kvm_clear_exception_queue(vcpu);
>>                                   return r;
>> +}
>>                           vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>                   }
>>           }
>>
>>
>>
>> This is where it returns to user
>> with the PF exception still pending.
>> So... any ideas?
> If the squashed exception was a trap, it's now lost.

I am not saying this patch is
correct or should be applied.

The more interesting question is:
why KVM doesn't _inject_ the PF,
but is rather setting it pending
and then exits to user-space?

#0  kvm_multiple_exception (vcpu=vcpu@entry=0xffff888005934000, 
nr=nr@entry=14,
     has_error=has_error@entry=true, error_code=6, 
has_payload=has_payload@entry=true,
     payload=35979264, reinject=false) at ./include/linux/kvm_host.h:1280
#1  0xffffffff8103a13c in kvm_queue_exception_e_p (payload=<optimized out>,
     error_code=<optimized out>, nr=14, vcpu=0xffff888005934000) at 
arch/x86/kvm/x86.c:641
#2  kvm_inject_page_fault (vcpu=0xffff888005934000, fault=<optimized 
out>) at arch/x86/kvm/x86.c:641
#3  0xffffffff81031454 in kvm_inject_emulated_page_fault 
(vcpu=vcpu@entry=0xffff888005934000,
     fault=fault@entry=0xffffc9000031fc60) at arch/x86/kvm/x86.c:665
#4  0xffffffff8106df86 in paging32_page_fault (vcpu=0xffff888005934000, 
addr=35979264, error_code=6,
     prefault=<optimized out>) at arch/x86/kvm/mmu/paging_tmpl.h:816
#5  0xffffffff8106cdb4 in kvm_mmu_do_page_fault (prefault=false, err=6, 
cr2_or_gpa=35979264,
     vcpu=0xffff888005934000) at arch/x86/kvm/mmu.h:119
#6  kvm_mmu_page_fault (vcpu=vcpu@entry=0xffff888005934000, 
cr2_or_gpa=cr2_or_gpa@entry=35979264,
     error_code=error_code@entry=6, insn=0x0 <fixed_percpu_data>, 
insn_len=0)
     at arch/x86/kvm/mmu/mmu.c:5076
#7  0xffffffff8106d090 in kvm_handle_page_fault (insn_len=<optimized 
out>, insn=<optimized out>,
     fault_address=35979264, error_code=6, vcpu=0xffff888005934000) at 
arch/x86/kvm/mmu/mmu.c:3775
#8  kvm_handle_page_fault (vcpu=0xffff888005934000, error_code=6, 
fault_address=35979264,
     insn=<optimized out>, insn_len=<optimized out>) at 
arch/x86/kvm/mmu/mmu.c:3757
#9  0xffffffff810443e0 in vcpu_enter_guest (vcpu=0xffff888005934000) at 
arch/x86/kvm/x86.c:9090
#10 vcpu_run (vcpu=0xffff888005934000) at arch/x86/kvm/x86.c:9156
#11 kvm_arch_vcpu_ioctl_run (vcpu=vcpu@entry=0xffff888005934000) at 
arch/x86/kvm/x86.c:9385
#12 0xffffffff81020fec in kvm_vcpu_ioctl (filp=<optimized out>, 
ioctl=44672, arg=0)
     at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3292


This is the stack trace when it
is set pending.

stsp June 26, 2021, 9:50 p.m. UTC | #3

26.06.2021 03:15, Jim Mattson пишет:
> If the squashed exception was a trap, it's now lost.

OK, below are 2 more patches, each of
them alone is fixing the problem.

---

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-27 00:38:45.547355116 +0300
@@ -9094,6 +9094,7 @@
         if (req_immediate_exit)
                 kvm_make_request(KVM_REQ_EVENT, vcpu);
         kvm_x86_ops.cancel_injection(vcpu);
+       kvm_clear_exception_queue(vcpu);
         if (unlikely(vcpu->arch.apic_attention))
                 kvm_lapic_sync_from_vapic(vcpu);
  out:
---


Or:

---

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-27 00:47:06.958618185 +0300
@@ -1783,8 +1783,7 @@
  bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu)
  {
         xfer_to_guest_mode_prepare();
-       return vcpu->mode == EXITING_GUEST_MODE || 
kvm_request_pending(vcpu) ||
-               xfer_to_guest_mode_work_pending();
+       return vcpu->mode == EXITING_GUEST_MODE || 
kvm_request_pending(vcpu);
  }
  EXPORT_SYMBOL_GPL(kvm_vcpu_exit_request);

---


Still not a clue?

stsp June 27, 2021, 12:13 p.m. UTC | #4

26.06.2021 03:15, Jim Mattson пишет:
> If the squashed exception was a trap, it's now lost.

I am pretty sure this will do it:

---

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-27 15:02:45.126161812 +0300
@@ -9093,7 +9093,11 @@
  cancel_injection:
         if (req_immediate_exit)
                 kvm_make_request(KVM_REQ_EVENT, vcpu);
-       kvm_x86_ops.cancel_injection(vcpu);
+       if (vcpu->arch.exception.injected) {
+               kvm_x86_ops.cancel_injection(vcpu);
+               vcpu->arch.exception.injected = false;
+               vcpu->arch.exception.pending = true;
+       }
         if (unlikely(vcpu->arch.apic_attention))
                 kvm_lapic_sync_from_vapic(vcpu);
  out:
@@ -9464,6 +9468,7 @@
         kvm_rip_write(vcpu, regs->rip);
         kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);

+       WARN_ON_ONCE(vcpu->arch.exception.injected);
         vcpu->arch.exception.pending = false;

         kvm_make_request(KVM_REQ_EVENT, vcpu);
---


In cancel_injection, the injected/pending
members were getting out of sync with
vmcs.
We need to move it back to pending,
and if user-space does SET_REGS, then
it is cleared (not sure why SET_SREGS
doesn't clear it also).
But if the .injected member is stuck,
then its not cleared by SET_REGS, and
I added WARN_ON_ONCE() for that case.

Does this make sense?

--- x86.c.old   2021-03-20 12:51:14.000000000 +0300
+++ x86.c       2021-06-26 02:28:37.082919492 +0300
@@ -9176,8 +9176,10 @@ 
                 if (__xfer_to_guest_mode_work_pending()) {
                         srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
                         r = xfer_to_guest_mode_handle_work(vcpu);
-                       if (r)
+                       if (r) {
+kvm_clear_exception_queue(vcpu);
                                 return r;
+}
                         vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
                 }
         }

exception vs SIGALRM race on core2 CPUs (with fix!)

Commit Message

Comments

Patch