diff mbox

[PULL,2/5] migration: move bdrv_invalidate_cache_all of of coroutine context

Message ID 56DDCF55.6000203@openvz.org (mailing list archive)
State New, archived
Headers show

Commit Message

Denis V. Lunev March 7, 2016, 6:58 p.m. UTC
On 03/07/2016 03:49 PM, Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.shah@redhat.com) wrote:
>> From: "Denis V. Lunev" <den@openvz.org>
>>
>> There is a possibility to hit an assert in qcow2_get_specific_info that
>> s->qcow_version is undefined. This happens when VM in starting from
>> suspended state, i.e. it processes incoming migration, and in the same
>> time 'info block' is called.
>>
>> The problem is that qcow2_invalidate_cache() closes the image and
>> memset()s BDRVQcowState in the middle.
>>
>> The patch moves processing of bdrv_invalidate_cache_all out of
>> coroutine context for postcopy migration to avoid that. This function
>> is called with the following stack:
>>    process_incoming_migration_co
>>    qemu_loadvm_state
>>    qemu_loadvm_state_main
>>    loadvm_process_command
>>    loadvm_postcopy_handle_run
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> hmm; actually - this segs in a variety of different ways;
> there are two problems:
>
>     a) +    bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
>       That's the easy one; that NULL should be 'mis', because
>       the bh is expecting to use it as a MigrationIncomingState
>       so it segs fairly reliably in the qemu_bh_delete(mis->bh)
>
>     b) The harder problem is that there's a race where qemu_bh_delete
>        segs, and I'm not 100% sure why yet - it only does it sometime
>        (i.e. run virt-test and leave it and it occasionally does it).
>        From the core it looks like qemu->bh is corrupt (0x10101010...)
>        so maybe mis has been freed at that point?
>        I'm suspecting this is the postcopy_ram_listen_thread freeing
>        mis at the end of it, but I don't know yet.
>
> Dave

Yes. this is exactly use-after-free. I have looked into the code
and this seems correct.

Could you try this simple patch?

Den

Comments

Dr. David Alan Gilbert March 8, 2016, 10:45 a.m. UTC | #1
* Denis V. Lunev (den@openvz.org) wrote:
> On 03/07/2016 03:49 PM, Dr. David Alan Gilbert wrote:
> >* Amit Shah (amit.shah@redhat.com) wrote:
> >>From: "Denis V. Lunev" <den@openvz.org>
> >>
> >>There is a possibility to hit an assert in qcow2_get_specific_info that
> >>s->qcow_version is undefined. This happens when VM in starting from
> >>suspended state, i.e. it processes incoming migration, and in the same
> >>time 'info block' is called.
> >>
> >>The problem is that qcow2_invalidate_cache() closes the image and
> >>memset()s BDRVQcowState in the middle.
> >>
> >>The patch moves processing of bdrv_invalidate_cache_all out of
> >>coroutine context for postcopy migration to avoid that. This function
> >>is called with the following stack:
> >>   process_incoming_migration_co
> >>   qemu_loadvm_state
> >>   qemu_loadvm_state_main
> >>   loadvm_process_command
> >>   loadvm_postcopy_handle_run
> >>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >hmm; actually - this segs in a variety of different ways;
> >there are two problems:
> >
> >    a) +    bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
> >      That's the easy one; that NULL should be 'mis', because
> >      the bh is expecting to use it as a MigrationIncomingState
> >      so it segs fairly reliably in the qemu_bh_delete(mis->bh)
> >
> >    b) The harder problem is that there's a race where qemu_bh_delete
> >       segs, and I'm not 100% sure why yet - it only does it sometime
> >       (i.e. run virt-test and leave it and it occasionally does it).
> >       From the core it looks like qemu->bh is corrupt (0x10101010...)
> >       so maybe mis has been freed at that point?
> >       I'm suspecting this is the postcopy_ram_listen_thread freeing
> >       mis at the end of it, but I don't know yet.
> >
> >Dave
> 
> Yes. this is exactly use-after-free. I have looked into the code
> and this seems correct.
> 
> Could you try this simple patch?

Hmm no, that's not right.
The order for postcopy is that we are running the listen thread and then
receive the 'run', and the listening thread is still running - so you
can't destroy the incoming state during the run.
It can't get destroyed until both the main thread has finished loading
the migration AND the listen thread has finished.

Hmm - that does give me an idea about the other seg I saw; I need to check it;
but I think the problem is probably the case of a very short postcopy
where the listen thread exits before the handle_run_bh is triggered;
(and since I've only seen it in my virt-test setup, and I know it can do
very short postcopies)
I think the fix here is to pass loadvm_postcopy_handle_run_bh a pointer to it's
own bh structure rather than store it in mis->bh; that way it doesn't use mis
at all.

Dave

> Den
> 
> 

> diff --git a/migration/savevm.c b/migration/savevm.c
> index 96e7db5..9a020ef 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1446,15 +1446,6 @@ static void *postcopy_ram_listen_thread(void *opaque)
>  
>      migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>                                     MIGRATION_STATUS_COMPLETED);
> -    /*
> -     * If everything has worked fine, then the main thread has waited
> -     * for us to start, and we're the last use of the mis.
> -     * (If something broke then qemu will have to exit anyway since it's
> -     * got a bad migration state).
> -     */
> -    migration_incoming_state_destroy();
> -
> -
>      return NULL;
>  }
>  
> @@ -1533,6 +1524,14 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>      }
>  
>      qemu_bh_delete(mis->bh);
> +
> +    /*
> +     * If everything has worked fine, then the main thread has waited
> +     * for us to start, and we're the last use of the mis.
> +     * (If something broke then qemu will have to exit anyway since it's
> +     * got a bad migration state).
> +     */
> +    migration_incoming_state_destroy();
>  }
>  
>  /* After all discards we can start running and asking for pages */

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Denis V. Lunev March 8, 2016, 10:54 a.m. UTC | #2
On 03/08/2016 01:45 PM, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@openvz.org) wrote:
>> On 03/07/2016 03:49 PM, Dr. David Alan Gilbert wrote:
>>> * Amit Shah (amit.shah@redhat.com) wrote:
>>>> From: "Denis V. Lunev" <den@openvz.org>
>>>>
>>>> There is a possibility to hit an assert in qcow2_get_specific_info that
>>>> s->qcow_version is undefined. This happens when VM in starting from
>>>> suspended state, i.e. it processes incoming migration, and in the same
>>>> time 'info block' is called.
>>>>
>>>> The problem is that qcow2_invalidate_cache() closes the image and
>>>> memset()s BDRVQcowState in the middle.
>>>>
>>>> The patch moves processing of bdrv_invalidate_cache_all out of
>>>> coroutine context for postcopy migration to avoid that. This function
>>>> is called with the following stack:
>>>>    process_incoming_migration_co
>>>>    qemu_loadvm_state
>>>>    qemu_loadvm_state_main
>>>>    loadvm_process_command
>>>>    loadvm_postcopy_handle_run
>>>>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> hmm; actually - this segs in a variety of different ways;
>>> there are two problems:
>>>
>>>     a) +    bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
>>>       That's the easy one; that NULL should be 'mis', because
>>>       the bh is expecting to use it as a MigrationIncomingState
>>>       so it segs fairly reliably in the qemu_bh_delete(mis->bh)
>>>
>>>     b) The harder problem is that there's a race where qemu_bh_delete
>>>        segs, and I'm not 100% sure why yet - it only does it sometime
>>>        (i.e. run virt-test and leave it and it occasionally does it).
>>>        From the core it looks like qemu->bh is corrupt (0x10101010...)
>>>        so maybe mis has been freed at that point?
>>>        I'm suspecting this is the postcopy_ram_listen_thread freeing
>>>        mis at the end of it, but I don't know yet.
>>>
>>> Dave
>> Yes. this is exactly use-after-free. I have looked into the code
>> and this seems correct.
>>
>> Could you try this simple patch?
> Hmm no, that's not right.
> The order for postcopy is that we are running the listen thread and then
> receive the 'run', and the listening thread is still running - so you
> can't destroy the incoming state during the run.
> It can't get destroyed until both the main thread has finished loading
> the migration AND the listen thread has finished.
>
> Hmm - that does give me an idea about the other seg I saw; I need to check it;
> but I think the problem is probably the case of a very short postcopy
> where the listen thread exits before the handle_run_bh is triggered;
> (and since I've only seen it in my virt-test setup, and I know it can do
> very short postcopies)
> I think the fix here is to pass loadvm_postcopy_handle_run_bh a pointer to it's
> own bh structure rather than store it in mis->bh; that way it doesn't use mis
> at all.
>
> Dave
>
>> Den
>>
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 96e7db5..9a020ef 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1446,15 +1446,6 @@ static void *postcopy_ram_listen_thread(void *opaque)
>>   
>>       migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
>>                                      MIGRATION_STATUS_COMPLETED);
>> -    /*
>> -     * If everything has worked fine, then the main thread has waited
>> -     * for us to start, and we're the last use of the mis.
>> -     * (If something broke then qemu will have to exit anyway since it's
>> -     * got a bad migration state).
>> -     */
>> -    migration_incoming_state_destroy();
>> -
>> -
>>       return NULL;
>>   }
>>   
>> @@ -1533,6 +1524,14 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
>>       }
>>   
>>       qemu_bh_delete(mis->bh);
>> +
>> +    /*
>> +     * If everything has worked fine, then the main thread has waited
>> +     * for us to start, and we're the last use of the mis.
>> +     * (If something broke then qemu will have to exit anyway since it's
>> +     * got a bad migration state).
>> +     */
>> +    migration_incoming_state_destroy();
>>   }
>>   
>>   /* After all discards we can start running and asking for pages */
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
This will help for sure. The idea to reuse migration state seems wrong.

Den
diff mbox

Patch

diff --git a/migration/savevm.c b/migration/savevm.c
index 96e7db5..9a020ef 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1446,15 +1446,6 @@  static void *postcopy_ram_listen_thread(void *opaque)
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
                                    MIGRATION_STATUS_COMPLETED);
-    /*
-     * If everything has worked fine, then the main thread has waited
-     * for us to start, and we're the last use of the mis.
-     * (If something broke then qemu will have to exit anyway since it's
-     * got a bad migration state).
-     */
-    migration_incoming_state_destroy();
-
-
     return NULL;
 }
 
@@ -1533,6 +1524,14 @@  static void loadvm_postcopy_handle_run_bh(void *opaque)
     }
 
     qemu_bh_delete(mis->bh);
+
+    /*
+     * If everything has worked fine, then the main thread has waited
+     * for us to start, and we're the last use of the mis.
+     * (If something broke then qemu will have to exit anyway since it's
+     * got a bad migration state).
+     */
+    migration_incoming_state_destroy();
 }
 
 /* After all discards we can start running and asking for pages */