diff mbox

[v2,2/6] reuse env stop and stopped states

Message ID 1248214392-12533-3-git-send-email-glommer@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Glauber Costa July 21, 2009, 10:13 p.m. UTC
qemu CPUState already provides "stop" and "stopped" states. And they
mean exactly that. There is no need for us to provide our own.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 cpu-defs.h |    2 --
 qemu-kvm.c |   30 ++++++++++++------------------
 vl.c       |    2 +-
 3 files changed, 13 insertions(+), 21 deletions(-)

Comments

Avi Kivity July 27, 2009, 3:43 p.m. UTC | #1
On 07/22/2009 01:13 AM, Glauber Costa wrote:
> qemu CPUState already provides "stop" and "stopped" states. And they
> mean exactly that. There is no need for us to provide our own.
>
>    

This patch (known as dd0e1c1a589 in qemu-kvm.git) breaks reboot.  My 
test case is FC6 i386 -smp 2, running the reboot command in rc.local.  
In about 15 minutes qemu hangs hard.  Please check what's gone wrong.
Glauber Costa July 28, 2009, 12:48 a.m. UTC | #2
On Mon, Jul 27, 2009 at 06:43:47PM +0300, Avi Kivity wrote:
> On 07/22/2009 01:13 AM, Glauber Costa wrote:
>> qemu CPUState already provides "stop" and "stopped" states. And they
>> mean exactly that. There is no need for us to provide our own.
>>
>>    
>
> This patch (known as dd0e1c1a589 in qemu-kvm.git) breaks reboot.  My  
> test case is FC6 i386 -smp 2, running the reboot command in rc.local.   
> In about 15 minutes qemu hangs hard.  Please check what's gone wrong.
I found out that doing kill -38 <your_pid> makes it run again, so we're likely
hanging somewhere while holding qemu_mutex. The state of the process is "D",
so we're holding qemu_mutex, and then calling something that can block.

It's hard for me to believe that this patch introduced it. At best, it might have
made it more likely. Also, I also verified that it sometimes takes a while until
it happen for the first time. Are you sure this is the first patch that makes it happen?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 28, 2009, 6:17 a.m. UTC | #3
On 07/28/2009 03:48 AM, Glauber Costa wrote:
> On Mon, Jul 27, 2009 at 06:43:47PM +0300, Avi Kivity wrote:
>    
>> On 07/22/2009 01:13 AM, Glauber Costa wrote:
>>      
>>> qemu CPUState already provides "stop" and "stopped" states. And they
>>> mean exactly that. There is no need for us to provide our own.
>>>
>>>
>>>        
>> This patch (known as dd0e1c1a589 in qemu-kvm.git) breaks reboot.  My
>> test case is FC6 i386 -smp 2, running the reboot command in rc.local.
>> In about 15 minutes qemu hangs hard.  Please check what's gone wrong.
>>      
> I found out that doing kill -38<your_pid>  makes it run again, so we're likely
> hanging somewhere while holding qemu_mutex. The state of the process is "D",
> so we're holding qemu_mutex, and then calling something that can block.
>    

Sounds like we call a vcpu ioctl from the iothread (or from a different 
vcpu thread).

> It's hard for me to believe that this patch introduced it. At best, it might have
> made it more likely. Also, I also verified that it sometimes takes a while until
> it happen for the first time. Are you sure this is the first patch that makes it happen?
>    

I haven't been able to reproduce it before this patch.  Maybe this patch 
doesn't introduce it, only exposes it.
Gleb Natapov July 28, 2009, 6:24 a.m. UTC | #4
On Tue, Jul 28, 2009 at 09:17:05AM +0300, Avi Kivity wrote:
> On 07/28/2009 03:48 AM, Glauber Costa wrote:
>> On Mon, Jul 27, 2009 at 06:43:47PM +0300, Avi Kivity wrote:
>>    
>>> On 07/22/2009 01:13 AM, Glauber Costa wrote:
>>>      
>>>> qemu CPUState already provides "stop" and "stopped" states. And they
>>>> mean exactly that. There is no need for us to provide our own.
>>>>
>>>>
>>>>        
>>> This patch (known as dd0e1c1a589 in qemu-kvm.git) breaks reboot.  My
>>> test case is FC6 i386 -smp 2, running the reboot command in rc.local.
>>> In about 15 minutes qemu hangs hard.  Please check what's gone wrong.
>>>      
>> I found out that doing kill -38<your_pid>  makes it run again, so we're likely
>> hanging somewhere while holding qemu_mutex. The state of the process is "D",
>> so we're holding qemu_mutex, and then calling something that can block.
>>    
>
> Sounds like we call a vcpu ioctl from the iothread (or from a different  
> vcpu thread).
>
>> It's hard for me to believe that this patch introduced it. At best, it might have
>> made it more likely. Also, I also verified that it sometimes takes a while until
>> it happen for the first time. Are you sure this is the first patch that makes it happen?
>>    
>
> I haven't been able to reproduce it before this patch.  Maybe this patch  
> doesn't introduce it, only exposes it.
>
What are backtraces of all threads when it happens?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 28, 2009, 6:28 a.m. UTC | #5
On 07/28/2009 09:24 AM, Gleb Natapov wrote:
>
> What are backtraces of all threads when it happens?
>    

I wasn't able to attach with gdb.  But I thought you reproduced it?
Gleb Natapov July 28, 2009, 6:29 a.m. UTC | #6
On Tue, Jul 28, 2009 at 09:28:26AM +0300, Avi Kivity wrote:
> On 07/28/2009 09:24 AM, Gleb Natapov wrote:
>>
>> What are backtraces of all threads when it happens?
>>    
>
> I wasn't able to attach with gdb.  But I thought you reproduced it?
>
Glauber may be yes. Me not even tried :)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 28, 2009, 6:31 a.m. UTC | #7
On 07/28/2009 09:29 AM, Gleb Natapov wrote:
> On Tue, Jul 28, 2009 at 09:28:26AM +0300, Avi Kivity wrote:
>    
>> On 07/28/2009 09:24 AM, Gleb Natapov wrote:
>>      
>>> What are backtraces of all threads when it happens?
>>>
>>>        
>> I wasn't able to attach with gdb.  But I thought you reproduced it?
>>
>>      
> Glauber may be yes. Me not even tried :)
>
>    

Ah sorry.  I only read the first two letters of any name.
Avi Kivity July 28, 2009, 1:45 p.m. UTC | #8
On 07/28/2009 09:17 AM, Avi Kivity wrote:
>> I found out that doing kill -38<your_pid>  makes it run again, so 
>> we're likely
>> hanging somewhere while holding qemu_mutex. The state of the process 
>> is "D",
>> so we're holding qemu_mutex, and then calling something that can block.
>
> Sounds like we call a vcpu ioctl from the iothread (or from a 
> different vcpu thread).

That's indeed the case.  We reload the local apic state from the 
iothread instead of the vcpu thread.  Please write a patch to fix this.

>> It's hard for me to believe that this patch introduced it. At best, 
>> it might have
>> made it more likely. Also, I also verified that it sometimes takes a 
>> while until
>> it happen for the first time. Are you sure this is the first patch 
>> that makes it happen?
>
> I haven't been able to reproduce it before this patch.  Maybe this 
> patch doesn't introduce it, only exposes it.
>

It does.  The root problem is that env->stopped is cleared during reset, 
so pause_all_threads() doesn't work:

     uint32_t stop;   /* Stop request */                                 \
     uint32_t stopped; /* Artificially stopped */                        \
...
     /* from this point: preserved by CPU reset */                       \

This kind of bug is incredibly hard to find - you now owe Gleb a solar 
mass worth of beer.  IMO we shouldn't be coding like this, please patch 
upstream to explicitly clear what needs clearing.

I'm now testing the simple fix (moving the variables after the memset 
point).
diff mbox

Patch

diff --git a/cpu-defs.h b/cpu-defs.h
index 7570096..fce366f 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -142,8 +142,6 @@  struct qemu_work_item;
 struct KVMCPUState {
     pthread_t thread;
     int signalled;
-    int stop;
-    int stopped;
     int created;
     void *vcpu_ctx;
     struct qemu_work_item *queued_work_first, *queued_work_last;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 0f5f14f..8eeace4 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -91,7 +91,7 @@  static int kvm_debug(void *opaque, void *data,
 
     if (handle) {
 	kvm_debug_cpu_requested = env;
-	env->kvm_cpu_state.stopped = 1;
+	env->stopped = 1;
     }
     return handle;
 }
@@ -977,7 +977,7 @@  int handle_halt(kvm_vcpu_context_t vcpu)
 int handle_shutdown(kvm_context_t kvm, CPUState *env)
 {
     /* stop the current vcpu from going back to guest mode */
-    env->kvm_cpu_state.stopped = 1;
+    env->stopped = 1;
 
     qemu_system_reset_request();
     return 1;
@@ -1815,7 +1815,7 @@  int kvm_cpu_exec(CPUState *env)
 
 static int is_cpu_stopped(CPUState *env)
 {
-    return !vm_running || env->kvm_cpu_state.stopped;
+    return !vm_running || env->stopped;
 }
 
 static void flush_queued_work(CPUState *env)
@@ -1861,9 +1861,9 @@  static void kvm_main_loop_wait(CPUState *env, int timeout)
     cpu_single_env = env;
     flush_queued_work(env);
 
-    if (env->kvm_cpu_state.stop) {
-	env->kvm_cpu_state.stop = 0;
-	env->kvm_cpu_state.stopped = 1;
+    if (env->stop) {
+	env->stop = 0;
+	env->stopped = 1;
 	pthread_cond_signal(&qemu_pause_cond);
     }
 
@@ -1875,7 +1875,7 @@  static int all_threads_paused(void)
     CPUState *penv = first_cpu;
 
     while (penv) {
-        if (penv->kvm_cpu_state.stop)
+        if (penv->stop)
             return 0;
         penv = (CPUState *)penv->next_cpu;
     }
@@ -1889,11 +1889,11 @@  static void pause_all_threads(void)
 
     while (penv) {
         if (penv != cpu_single_env) {
-            penv->kvm_cpu_state.stop = 1;
+            penv->stop = 1;
             pthread_kill(penv->kvm_cpu_state.thread, SIG_IPI);
         } else {
-            penv->kvm_cpu_state.stop = 0;
-            penv->kvm_cpu_state.stopped = 1;
+            penv->stop = 0;
+            penv->stopped = 1;
             cpu_exit(penv);
         }
         penv = (CPUState *)penv->next_cpu;
@@ -1910,8 +1910,8 @@  static void resume_all_threads(void)
     assert(!cpu_single_env);
 
     while (penv) {
-        penv->kvm_cpu_state.stop = 0;
-        penv->kvm_cpu_state.stopped = 0;
+        penv->stop = 0;
+        penv->stopped = 0;
         pthread_kill(penv->kvm_cpu_state.thread, SIG_IPI);
         penv = (CPUState *)penv->next_cpu;
     }
@@ -2676,12 +2676,6 @@  int kvm_log_stop(target_phys_addr_t phys_addr, target_phys_addr_t len)
     return 0;
 }
 
-void qemu_kvm_cpu_stop(CPUState *env)
-{
-    if (kvm_enabled())
-        env->kvm_cpu_state.stopped = 1;
-}
-
 int kvm_set_boot_cpu_id(uint32_t id)
 {
 	return kvm_set_boot_vcpu_id(kvm_context, id);
diff --git a/vl.c b/vl.c
index b3df596..6ef7690 100644
--- a/vl.c
+++ b/vl.c
@@ -3553,7 +3553,7 @@  void qemu_system_reset_request(void)
         reset_requested = 1;
     }
     if (cpu_single_env) {
-        qemu_kvm_cpu_stop(cpu_single_env);
+        cpu_single_env->stopped = 1;
     }
     qemu_notify_event();
 }