diff mbox series

hax: Honor CPUState::halted

Message ID 20190610021939.13669-1-colin.xu@intel.com (mailing list archive)
State New, archived
Headers show
Series hax: Honor CPUState::halted | expand

Commit Message

Colin Xu June 10, 2019, 2:19 a.m. UTC
QEMU tracks whether a vcpu is halted using CPUState::halted. E.g.,
after initialization or reset, halted is 0 for the BSP (vcpu 0)
and 1 for the APs (vcpu 1, 2, ...). A halted vcpu should not be
handed to the hypervisor to run (e.g. hax_vcpu_run()).

Under HAXM, Android Emulator sometimes boots into a "vcpu shutdown
request" error while executing in SeaBIOS, with the HAXM driver
logging a guest triple fault in vcpu 1, 2, ... at RIP 0x3. That is
ultimately because the HAX accelerator asks HAXM to run those APs
when they are still in the halted state.

Normally, the vcpu thread for an AP will start by looping in
qemu_wait_io_event(), until the BSP kicks it via a pair of IPIs
(INIT followed by SIPI). But because the HAX accelerator does not
honor cpu->halted, it allows the AP vcpu thread to proceed to
hax_vcpu_run() as soon as it receives any kick, even if the kick
does not come from the BSP. It turns out that emulator has a
worker thread which periodically kicks every vcpu thread (possibly
to collect CPU usage data), and if one of these kicks comes before
those by the BSP, the AP will start execution from the wrong RIP,
resulting in the aforementioned SMP boot failure.

The solution is inspired by the KVM accelerator (credit to
Chuanxiao Dong <chuanxiao.dong@intel.com> for the pointer):

1. Get rid of questionable logic that unconditionally resets
   cpu->halted before hax_vcpu_run(). Instead, only reset it at the
   right moments (there are only a few "unhalt" events).
2. Add a check for cpu->halted before hax_vcpu_run().

Note that although the non-Unrestricted Guest (!ug_platform) code
path also forcibly resets cpu->halted, it is left untouched,
because only the UG code path supports SMP guests.

The patch is first merged to android emulator with Change-Id:
I9c5752cc737fd305d7eace1768ea12a07309d716

Cc: Yu Ning <yu.ning@intel.com>
Cc: Chuanxiao Dong <chuanxiao.dong@intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
---
 cpus.c                |  1 -
 target/i386/hax-all.c | 36 ++++++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 3 deletions(-)

Comments

Philippe Mathieu-Daudé June 11, 2019, 8:38 a.m. UTC | #1
Cc'ing Paolo & Richard.

On 6/10/19 4:27 AM, Colin Xu wrote:
> cc more.
> 
> On 2019-06-10 10:19, Colin Xu wrote:
>> QEMU tracks whether a vcpu is halted using CPUState::halted. E.g.,
>> after initialization or reset, halted is 0 for the BSP (vcpu 0)
>> and 1 for the APs (vcpu 1, 2, ...). A halted vcpu should not be
>> handed to the hypervisor to run (e.g. hax_vcpu_run()).
>>
>> Under HAXM, Android Emulator sometimes boots into a "vcpu shutdown
>> request" error while executing in SeaBIOS, with the HAXM driver
>> logging a guest triple fault in vcpu 1, 2, ... at RIP 0x3. That is
>> ultimately because the HAX accelerator asks HAXM to run those APs
>> when they are still in the halted state.
>>
>> Normally, the vcpu thread for an AP will start by looping in
>> qemu_wait_io_event(), until the BSP kicks it via a pair of IPIs
>> (INIT followed by SIPI). But because the HAX accelerator does not
>> honor cpu->halted, it allows the AP vcpu thread to proceed to
>> hax_vcpu_run() as soon as it receives any kick, even if the kick
>> does not come from the BSP. It turns out that emulator has a
>> worker thread which periodically kicks every vcpu thread (possibly
>> to collect CPU usage data), and if one of these kicks comes before
>> those by the BSP, the AP will start execution from the wrong RIP,
>> resulting in the aforementioned SMP boot failure.
>>
>> The solution is inspired by the KVM accelerator (credit to
>> Chuanxiao Dong <chuanxiao.dong@intel.com> for the pointer):
>>
>> 1. Get rid of questionable logic that unconditionally resets
>>     cpu->halted before hax_vcpu_run(). Instead, only reset it at the
>>     right moments (there are only a few "unhalt" events).
>> 2. Add a check for cpu->halted before hax_vcpu_run().
>>
>> Note that although the non-Unrestricted Guest (!ug_platform) code
>> path also forcibly resets cpu->halted, it is left untouched,
>> because only the UG code path supports SMP guests.
>>
>> The patch is first merged to android emulator with Change-Id:
>> I9c5752cc737fd305d7eace1768ea12a07309d716
>>
>> Cc: Yu Ning <yu.ning@intel.com>
>> Cc: Chuanxiao Dong <chuanxiao.dong@intel.com>
>> Signed-off-by: Colin Xu <colin.xu@intel.com>
>> ---
>>   cpus.c                |  1 -
>>   target/i386/hax-all.c | 36 ++++++++++++++++++++++++++++++++++--
>>   2 files changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index ffc57119ca5e..c1a56cd9ab01 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1591,7 +1591,6 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
>>         cpu->thread_id = qemu_get_thread_id();
>>       cpu->created = true;
>> -    cpu->halted = 0;
>>       current_cpu = cpu;
>>         hax_init_vcpu(cpu);
>> diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
>> index 44b89c1d74ae..58a27b475ec8 100644
>> --- a/target/i386/hax-all.c
>> +++ b/target/i386/hax-all.c
>> @@ -471,13 +471,35 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>           return 0;
>>       }
>>   -    cpu->halted = 0;
>> -
>>       if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
>>           cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
>>           apic_poll_irq(x86_cpu->apic_state);
>>       }
>>   +    /* After a vcpu is halted (either because it is an AP and has
>> just been
>> +     * reset, or because it has executed the HLT instruction), it
>> will not be
>> +     * run (hax_vcpu_run()) until it is unhalted. The next few if
>> blocks check
>> +     * for events that may change the halted state of this vcpu:
>> +     *  a) Maskable interrupt, when RFLAGS.IF is 1;
>> +     *     Note: env->eflags may not reflect the current RFLAGS
>> state, because
>> +     *           it is not updated after each hax_vcpu_run(). We
>> cannot afford
>> +     *           to fail to recognize any
>> unhalt-by-maskable-interrupt event
>> +     *           (in which case the vcpu will halt forever), and yet
>> we cannot
>> +     *           afford the overhead of hax_vcpu_sync_state(). The
>> current
>> +     *           solution is to err on the side of caution and have
>> the HLT
>> +     *           handler (see case HAX_EXIT_HLT below)
>> unconditionally set the
>> +     *           IF_MASK bit in env->eflags, which, in effect,
>> disables the
>> +     *           RFLAGS.IF check.
>> +     *  b) NMI;
>> +     *  c) INIT signal;
>> +     *  d) SIPI signal.
>> +     */
>> +    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
>> +         (env->eflags & IF_MASK)) ||
>> +        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
>> +        cpu->halted = 0;
>> +    }
>> +
>>       if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
>>           DPRINTF("\nhax_vcpu_hax_exec: handling INIT for %d\n",
>>                   cpu->cpu_index);
>> @@ -493,6 +515,16 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>           hax_vcpu_sync_state(env, 1);
>>       }
>>   +    if (cpu->halted) {
>> +        /* If this vcpu is halted, we must not ask HAXM to run it.
>> Instead, we
>> +         * break out of hax_smp_cpu_exec() as if this vcpu had
>> executed HLT.
>> +         * That way, this vcpu thread will be trapped in
>> qemu_wait_io_event(),
>> +         * until the vcpu is unhalted.
>> +         */
>> +        cpu->exception_index = EXCP_HLT;
>> +        return 0;
>> +    }
>> +
>>       do {
>>           int hax_ret;
>>   
>
Paolo Bonzini June 11, 2019, 10:34 a.m. UTC | #2
On 11/06/19 10:38, Philippe Mathieu-Daudé wrote:
> Cc'ing Paolo & Richard.
> 
> On 6/10/19 4:27 AM, Colin Xu wrote:
>> cc more.
>>
>> On 2019-06-10 10:19, Colin Xu wrote:
>>> QEMU tracks whether a vcpu is halted using CPUState::halted. E.g.,
>>> after initialization or reset, halted is 0 for the BSP (vcpu 0)
>>> and 1 for the APs (vcpu 1, 2, ...). A halted vcpu should not be
>>> handed to the hypervisor to run (e.g. hax_vcpu_run()).
>>>
>>> Under HAXM, Android Emulator sometimes boots into a "vcpu shutdown
>>> request" error while executing in SeaBIOS, with the HAXM driver
>>> logging a guest triple fault in vcpu 1, 2, ... at RIP 0x3. That is
>>> ultimately because the HAX accelerator asks HAXM to run those APs
>>> when they are still in the halted state.
>>>
>>> Normally, the vcpu thread for an AP will start by looping in
>>> qemu_wait_io_event(), until the BSP kicks it via a pair of IPIs
>>> (INIT followed by SIPI). But because the HAX accelerator does not
>>> honor cpu->halted, it allows the AP vcpu thread to proceed to
>>> hax_vcpu_run() as soon as it receives any kick, even if the kick
>>> does not come from the BSP. It turns out that emulator has a
>>> worker thread which periodically kicks every vcpu thread (possibly
>>> to collect CPU usage data), and if one of these kicks comes before
>>> those by the BSP, the AP will start execution from the wrong RIP,
>>> resulting in the aforementioned SMP boot failure.
>>>
>>> The solution is inspired by the KVM accelerator (credit to
>>> Chuanxiao Dong <chuanxiao.dong@intel.com> for the pointer):
>>>
>>> 1. Get rid of questionable logic that unconditionally resets
>>>     cpu->halted before hax_vcpu_run(). Instead, only reset it at the
>>>     right moments (there are only a few "unhalt" events).
>>> 2. Add a check for cpu->halted before hax_vcpu_run().
>>>
>>> Note that although the non-Unrestricted Guest (!ug_platform) code
>>> path also forcibly resets cpu->halted, it is left untouched,
>>> because only the UG code path supports SMP guests.
>>>
>>> The patch is first merged to android emulator with Change-Id:
>>> I9c5752cc737fd305d7eace1768ea12a07309d716
>>>
>>> Cc: Yu Ning <yu.ning@intel.com>
>>> Cc: Chuanxiao Dong <chuanxiao.dong@intel.com>
>>> Signed-off-by: Colin Xu <colin.xu@intel.com>
>>> ---
>>>   cpus.c                |  1 -
>>>   target/i386/hax-all.c | 36 ++++++++++++++++++++++++++++++++++--
>>>   2 files changed, 34 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index ffc57119ca5e..c1a56cd9ab01 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -1591,7 +1591,6 @@ static void *qemu_hax_cpu_thread_fn(void *arg)
>>>         cpu->thread_id = qemu_get_thread_id();
>>>       cpu->created = true;
>>> -    cpu->halted = 0;
>>>       current_cpu = cpu;
>>>         hax_init_vcpu(cpu);
>>> diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
>>> index 44b89c1d74ae..58a27b475ec8 100644
>>> --- a/target/i386/hax-all.c
>>> +++ b/target/i386/hax-all.c
>>> @@ -471,13 +471,35 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>>           return 0;
>>>       }
>>>   -    cpu->halted = 0;
>>> -
>>>       if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
>>>           cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
>>>           apic_poll_irq(x86_cpu->apic_state);
>>>       }
>>>   +    /* After a vcpu is halted (either because it is an AP and has
>>> just been
>>> +     * reset, or because it has executed the HLT instruction), it
>>> will not be
>>> +     * run (hax_vcpu_run()) until it is unhalted. The next few if
>>> blocks check
>>> +     * for events that may change the halted state of this vcpu:
>>> +     *  a) Maskable interrupt, when RFLAGS.IF is 1;
>>> +     *     Note: env->eflags may not reflect the current RFLAGS
>>> state, because
>>> +     *           it is not updated after each hax_vcpu_run(). We
>>> cannot afford
>>> +     *           to fail to recognize any
>>> unhalt-by-maskable-interrupt event
>>> +     *           (in which case the vcpu will halt forever), and yet
>>> we cannot
>>> +     *           afford the overhead of hax_vcpu_sync_state(). The
>>> current
>>> +     *           solution is to err on the side of caution and have
>>> the HLT
>>> +     *           handler (see case HAX_EXIT_HLT below)
>>> unconditionally set the
>>> +     *           IF_MASK bit in env->eflags, which, in effect,
>>> disables the
>>> +     *           RFLAGS.IF check.
>>> +     *  b) NMI;
>>> +     *  c) INIT signal;
>>> +     *  d) SIPI signal.
>>> +     */
>>> +    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
>>> +         (env->eflags & IF_MASK)) ||
>>> +        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
>>> +        cpu->halted = 0;
>>> +    }
>>> +
>>>       if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
>>>           DPRINTF("\nhax_vcpu_hax_exec: handling INIT for %d\n",
>>>                   cpu->cpu_index);
>>> @@ -493,6 +515,16 @@ static int hax_vcpu_hax_exec(CPUArchState *env)
>>>           hax_vcpu_sync_state(env, 1);
>>>       }
>>>   +    if (cpu->halted) {
>>> +        /* If this vcpu is halted, we must not ask HAXM to run it.
>>> Instead, we
>>> +         * break out of hax_smp_cpu_exec() as if this vcpu had
>>> executed HLT.
>>> +         * That way, this vcpu thread will be trapped in
>>> qemu_wait_io_event(),
>>> +         * until the vcpu is unhalted.
>>> +         */
>>> +        cpu->exception_index = EXCP_HLT;
>>> +        return 0;
>>> +    }
>>> +
>>>       do {
>>>           int hax_ret;
>>>   
>>

Queued, thanks.

Paolo
diff mbox series

Patch

diff --git a/cpus.c b/cpus.c
index ffc57119ca5e..c1a56cd9ab01 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1591,7 +1591,6 @@  static void *qemu_hax_cpu_thread_fn(void *arg)
 
     cpu->thread_id = qemu_get_thread_id();
     cpu->created = true;
-    cpu->halted = 0;
     current_cpu = cpu;
 
     hax_init_vcpu(cpu);
diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
index 44b89c1d74ae..58a27b475ec8 100644
--- a/target/i386/hax-all.c
+++ b/target/i386/hax-all.c
@@ -471,13 +471,35 @@  static int hax_vcpu_hax_exec(CPUArchState *env)
         return 0;
     }
 
-    cpu->halted = 0;
-
     if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
         cpu->interrupt_request &= ~CPU_INTERRUPT_POLL;
         apic_poll_irq(x86_cpu->apic_state);
     }
 
+    /* After a vcpu is halted (either because it is an AP and has just been
+     * reset, or because it has executed the HLT instruction), it will not be
+     * run (hax_vcpu_run()) until it is unhalted. The next few if blocks check
+     * for events that may change the halted state of this vcpu:
+     *  a) Maskable interrupt, when RFLAGS.IF is 1;
+     *     Note: env->eflags may not reflect the current RFLAGS state, because
+     *           it is not updated after each hax_vcpu_run(). We cannot afford
+     *           to fail to recognize any unhalt-by-maskable-interrupt event
+     *           (in which case the vcpu will halt forever), and yet we cannot
+     *           afford the overhead of hax_vcpu_sync_state(). The current
+     *           solution is to err on the side of caution and have the HLT
+     *           handler (see case HAX_EXIT_HLT below) unconditionally set the
+     *           IF_MASK bit in env->eflags, which, in effect, disables the
+     *           RFLAGS.IF check.
+     *  b) NMI;
+     *  c) INIT signal;
+     *  d) SIPI signal.
+     */
+    if (((cpu->interrupt_request & CPU_INTERRUPT_HARD) &&
+         (env->eflags & IF_MASK)) ||
+        (cpu->interrupt_request & CPU_INTERRUPT_NMI)) {
+        cpu->halted = 0;
+    }
+
     if (cpu->interrupt_request & CPU_INTERRUPT_INIT) {
         DPRINTF("\nhax_vcpu_hax_exec: handling INIT for %d\n",
                 cpu->cpu_index);
@@ -493,6 +515,16 @@  static int hax_vcpu_hax_exec(CPUArchState *env)
         hax_vcpu_sync_state(env, 1);
     }
 
+    if (cpu->halted) {
+        /* If this vcpu is halted, we must not ask HAXM to run it. Instead, we
+         * break out of hax_smp_cpu_exec() as if this vcpu had executed HLT.
+         * That way, this vcpu thread will be trapped in qemu_wait_io_event(),
+         * until the vcpu is unhalted.
+         */
+        cpu->exception_index = EXCP_HLT;
+        return 0;
+    }
+
     do {
         int hax_ret;