Message ID | 1686860800-34667-3-git-send-email-steven.sistare@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix migration of suspended runstate | expand |
On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > Migration of a guest in the suspended state is broken. The incoming > migration code automatically tries to wake the guest, which IMO is > wrong -- the guest should end migration in the same state it started. > Further, the wakeup is done by calling qemu_system_wakeup_request(), which > bypasses vm_start(). The guest appears to be in the running state, but > it is not. > > To fix, leave the guest in the suspended state, but call > qemu_system_start_on_wakeup_request() so the guest is properly resumed > later, when the client sends a system_wakeup command. > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > migration/migration.c | 11 ++++------- > softmmu/runstate.c | 1 + > 2 files changed, 5 insertions(+), 7 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 17b4b47..851fe6d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) > vm_start(); > } else { > runstate_set(global_state_get_runstate()); > + if (runstate_check(RUN_STATE_SUSPENDED)) { > + /* Force vm_start to be called later. */ > + qemu_system_start_on_wakeup_request(); > + } Is this really needed, along with patch 1? I have a very limited knowledge on suspension, so I'm prone to making mistakes.. But from what I read this, qemu_system_wakeup_request() (existing one, not after patch 1 applied) will setup wakeup_reason and kick the main thread using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in the main thread later on after qemu_wakeup_requested() returns true. > } > /* > * This must happen after any state changes since as soon as an external > @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms) > qemu_mutex_lock_iothread(); > trace_postcopy_start_set_run(); > > - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > global_state_store(); > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > if (ret < 0) { > @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s) > if (s->state == MIGRATION_STATUS_ACTIVE) { > qemu_mutex_lock_iothread(); > s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > > s->vm_old_state = runstate_get(); > global_state_store(); > @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque) > > qemu_mutex_lock_iothread(); > > - /* > - * If VM is currently in suspended state, then, to make a valid runstate > - * transition in vm_stop_force_state() we need to wakeup it up. > - */ > - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); Removal of these three places seems reasonable to me, or we won't persist the SUSPEND state. Above comment was the major reason I used to have thought it was needed (again, based on zero knowledge around this..), but perhaps it was just wrong? I would assume vm_stop_force_state() will still just work with suepended, am I right? > s->vm_old_state = runstate_get(); > > global_state_store(); > diff --git a/softmmu/runstate.c b/softmmu/runstate.c > index e127b21..771896c 100644 > --- a/softmmu/runstate.c > +++ b/softmmu/runstate.c > @@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = { > { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, > { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, > { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, > + { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED }, > { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH }, > { RUN_STATE_SUSPENDED, RUN_STATE_COLO}, > > -- > 1.8.3.1 >
On 6/20/2023 5:46 PM, Peter Xu wrote: > On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >> Migration of a guest in the suspended state is broken. The incoming >> migration code automatically tries to wake the guest, which IMO is >> wrong -- the guest should end migration in the same state it started. >> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >> bypasses vm_start(). The guest appears to be in the running state, but >> it is not. >> >> To fix, leave the guest in the suspended state, but call >> qemu_system_start_on_wakeup_request() so the guest is properly resumed >> later, when the client sends a system_wakeup command. >> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> migration/migration.c | 11 ++++------- >> softmmu/runstate.c | 1 + >> 2 files changed, 5 insertions(+), 7 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 17b4b47..851fe6d 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) >> vm_start(); >> } else { >> runstate_set(global_state_get_runstate()); >> + if (runstate_check(RUN_STATE_SUSPENDED)) { >> + /* Force vm_start to be called later. */ >> + qemu_system_start_on_wakeup_request(); >> + } > > Is this really needed, along with patch 1? > > I have a very limited knowledge on suspension, so I'm prone to making > mistakes.. > > But from what I read this, qemu_system_wakeup_request() (existing one, not > after patch 1 applied) will setup wakeup_reason and kick the main thread > using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in > the main thread later on after qemu_wakeup_requested() returns true. Correct, here: if (qemu_wakeup_requested()) { pause_all_vcpus(); qemu_system_wakeup(); notifier_list_notify(&wakeup_notifiers, &wakeup_reason); wakeup_reason = QEMU_WAKEUP_REASON_NONE; resume_all_vcpus(); qapi_event_send_wakeup(); } However, that is not sufficient, because vm_start() was never called on the incoming side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. Without my fixes, it "works" because the outgoing migration automatically wakes a suspended guest, which sets the state to running, which is saved in global state: void migration_completion(MigrationState *s) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); global_state_store() Then the incoming migration calls vm_start here: migration/migration.c if (!global_state_received() || global_state_get_runstate() == RUN_STATE_RUNNING) { if (autostart) { vm_start(); vm_start must be called for correctness. - Steve >> } >> /* >> * This must happen after any state changes since as soon as an external >> @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms) >> qemu_mutex_lock_iothread(); >> trace_postcopy_start_set_run(); >> >> - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >> global_state_store(); >> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); >> if (ret < 0) { >> @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s) >> if (s->state == MIGRATION_STATUS_ACTIVE) { >> qemu_mutex_lock_iothread(); >> s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); >> - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >> >> s->vm_old_state = runstate_get(); >> global_state_store(); >> @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque) >> >> qemu_mutex_lock_iothread(); >> >> - /* >> - * If VM is currently in suspended state, then, to make a valid runstate >> - * transition in vm_stop_force_state() we need to wakeup it up. >> - */ >> - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > > Removal of these three places seems reasonable to me, or we won't persist > the SUSPEND state. > > Above comment was the major reason I used to have thought it was needed > (again, based on zero knowledge around this..), but perhaps it was just > wrong? I would assume vm_stop_force_state() will still just work with > suepended, am I right? > >> s->vm_old_state = runstate_get(); >> >> global_state_store(); >> diff --git a/softmmu/runstate.c b/softmmu/runstate.c >> index e127b21..771896c 100644 >> --- a/softmmu/runstate.c >> +++ b/softmmu/runstate.c >> @@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = { >> { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, >> { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, >> { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, >> + { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED }, >> { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH }, >> { RUN_STATE_SUSPENDED, RUN_STATE_COLO}, >> >> -- >> 1.8.3.1 >> >
On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: > On 6/20/2023 5:46 PM, Peter Xu wrote: > > On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > >> Migration of a guest in the suspended state is broken. The incoming > >> migration code automatically tries to wake the guest, which IMO is > >> wrong -- the guest should end migration in the same state it started. > >> Further, the wakeup is done by calling qemu_system_wakeup_request(), which > >> bypasses vm_start(). The guest appears to be in the running state, but > >> it is not. > >> > >> To fix, leave the guest in the suspended state, but call > >> qemu_system_start_on_wakeup_request() so the guest is properly resumed > >> later, when the client sends a system_wakeup command. > >> > >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >> --- > >> migration/migration.c | 11 ++++------- > >> softmmu/runstate.c | 1 + > >> 2 files changed, 5 insertions(+), 7 deletions(-) > >> > >> diff --git a/migration/migration.c b/migration/migration.c > >> index 17b4b47..851fe6d 100644 > >> --- a/migration/migration.c > >> +++ b/migration/migration.c > >> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) > >> vm_start(); > >> } else { > >> runstate_set(global_state_get_runstate()); > >> + if (runstate_check(RUN_STATE_SUSPENDED)) { > >> + /* Force vm_start to be called later. */ > >> + qemu_system_start_on_wakeup_request(); > >> + } > > > > Is this really needed, along with patch 1? > > > > I have a very limited knowledge on suspension, so I'm prone to making > > mistakes.. > > > > But from what I read this, qemu_system_wakeup_request() (existing one, not > > after patch 1 applied) will setup wakeup_reason and kick the main thread > > using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in > > the main thread later on after qemu_wakeup_requested() returns true. > > Correct, here: > > if (qemu_wakeup_requested()) { > pause_all_vcpus(); > qemu_system_wakeup(); > notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > wakeup_reason = QEMU_WAKEUP_REASON_NONE; > resume_all_vcpus(); > qapi_event_send_wakeup(); > } > > However, that is not sufficient, because vm_start() was never called on the incoming > side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. > > > Without my fixes, it "works" because the outgoing migration automatically wakes a suspended > guest, which sets the state to running, which is saved in global state: > > void migration_completion(MigrationState *s) > qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > global_state_store() > > Then the incoming migration calls vm_start here: > > migration/migration.c > if (!global_state_received() || > global_state_get_runstate() == RUN_STATE_RUNNING) { > if (autostart) { > vm_start(); > > vm_start must be called for correctness. I see. Though I had a feeling that this is still not the right way to do, at least not as clean. One question is, would above work for postcopy when VM is suspended during the switchover? I think I see your point that vm_start() (mostly vm_prepare_start()) contains a bunch of operations that maybe we must have before starting the VM, but then.. should we just make that vm_start() unconditional when loading VM completes? I just don't see anything won't need it (besides -S), even COLO. So I'm wondering about something like this: ===8<=== --- a/migration/migration.c +++ b/migration/migration.c @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) dirty_bitmap_mig_before_vm_start(); - if (!global_state_received() || - global_state_get_runstate() == RUN_STATE_RUNNING) { - if (autostart) { - vm_start(); - } else { - runstate_set(RUN_STATE_PAUSED); - } - } else if (migration_incoming_colo_enabled()) { + if (migration_incoming_colo_enabled()) { migration_incoming_disable_colo(); + /* COLO should always have autostart=1 or we can enforce it here */ + } + + if (autostart) { + RunState run_state = global_state_get_runstate(); vm_start(); + switch (run_state) { + case RUN_STATE_RUNNING: + break; + case RUN_STATE_SUSPENDED: + qemu_system_suspend(); + break; + default: + runstate_set(run_state); + break; + } } else { - runstate_set(global_state_get_runstate()); + runstate_set(RUN_STATE_PAUSED); } ===8<=== IIUC this can drop qemu_system_start_on_wakeup_request() along with the other global var. Would something like it work for us?
On 6/21/2023 4:28 PM, Peter Xu wrote: > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >> On 6/20/2023 5:46 PM, Peter Xu wrote: >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>> Migration of a guest in the suspended state is broken. The incoming >>>> migration code automatically tries to wake the guest, which IMO is >>>> wrong -- the guest should end migration in the same state it started. >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >>>> bypasses vm_start(). The guest appears to be in the running state, but >>>> it is not. >>>> >>>> To fix, leave the guest in the suspended state, but call >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>> later, when the client sends a system_wakeup command. >>>> >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>> --- >>>> migration/migration.c | 11 ++++------- >>>> softmmu/runstate.c | 1 + >>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/migration/migration.c b/migration/migration.c >>>> index 17b4b47..851fe6d 100644 >>>> --- a/migration/migration.c >>>> +++ b/migration/migration.c >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) >>>> vm_start(); >>>> } else { >>>> runstate_set(global_state_get_runstate()); >>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>> + /* Force vm_start to be called later. */ >>>> + qemu_system_start_on_wakeup_request(); >>>> + } >>> >>> Is this really needed, along with patch 1? >>> >>> I have a very limited knowledge on suspension, so I'm prone to making >>> mistakes.. >>> >>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in >>> the main thread later on after qemu_wakeup_requested() returns true. >> >> Correct, here: >> >> if (qemu_wakeup_requested()) { >> pause_all_vcpus(); >> qemu_system_wakeup(); >> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >> resume_all_vcpus(); >> qapi_event_send_wakeup(); >> } >> >> However, that is not sufficient, because vm_start() was never called on the incoming >> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. >> >> >> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended >> guest, which sets the state to running, which is saved in global state: >> >> void migration_completion(MigrationState *s) >> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >> global_state_store() >> >> Then the incoming migration calls vm_start here: >> >> migration/migration.c >> if (!global_state_received() || >> global_state_get_runstate() == RUN_STATE_RUNNING) { >> if (autostart) { >> vm_start(); >> >> vm_start must be called for correctness. > > I see. Though I had a feeling that this is still not the right way to do, > at least not as clean. > > One question is, would above work for postcopy when VM is suspended during > the switchover? Good catch, that is broken. I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh and now it works. if (global_state_get_runstate() == RUN_STATE_RUNNING) { if (autostart) { vm_start(); } else { runstate_set(RUN_STATE_PAUSED); } } else { runstate_set(global_state_get_runstate()); if (runstate_check(RUN_STATE_SUSPENDED)) { qemu_system_start_on_wakeup_request(); } } > I think I see your point that vm_start() (mostly vm_prepare_start()) > contains a bunch of operations that maybe we must have before starting the > VM, but then.. should we just make that vm_start() unconditional when > loading VM completes? I just don't see anything won't need it (besides > -S), even COLO. > > So I'm wondering about something like this: > > ===8<=== > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) > > dirty_bitmap_mig_before_vm_start(); > > - if (!global_state_received() || > - global_state_get_runstate() == RUN_STATE_RUNNING) { > - if (autostart) { > - vm_start(); > - } else { > - runstate_set(RUN_STATE_PAUSED); > - } > - } else if (migration_incoming_colo_enabled()) { > + if (migration_incoming_colo_enabled()) { > migration_incoming_disable_colo(); > + /* COLO should always have autostart=1 or we can enforce it here */ > + } > + > + if (autostart) { > + RunState run_state = global_state_get_runstate(); > vm_start(); This will resume the guest for a brief time, against the user's wishes. Not OK IMO. > + switch (run_state) { > + case RUN_STATE_RUNNING: > + break; > + case RUN_STATE_SUSPENDED: > + qemu_system_suspend(); qemu_system_suspend will not cause the guest to suspend. It is qemu's response after the guest initiates a suspend. > + break; > + default: > + runstate_set(run_state); > + break; > + } > } else { > - runstate_set(global_state_get_runstate()); > + runstate_set(RUN_STATE_PAUSED); > } > ===8<=== > > IIUC this can drop qemu_system_start_on_wakeup_request() along with the > other global var. Would something like it work for us? Afraid not. start_on_wake is the only correct solution I can think of. - Steve
On 6/23/2023 2:25 PM, Steven Sistare wrote: > On 6/21/2023 4:28 PM, Peter Xu wrote: >> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >>> On 6/20/2023 5:46 PM, Peter Xu wrote: >>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>>> Migration of a guest in the suspended state is broken. The incoming >>>>> migration code automatically tries to wake the guest, which IMO is >>>>> wrong -- the guest should end migration in the same state it started. >>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >>>>> bypasses vm_start(). The guest appears to be in the running state, but >>>>> it is not. >>>>> >>>>> To fix, leave the guest in the suspended state, but call >>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>>> later, when the client sends a system_wakeup command. >>>>> >>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>>> --- >>>>> migration/migration.c | 11 ++++------- >>>>> softmmu/runstate.c | 1 + >>>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>> index 17b4b47..851fe6d 100644 >>>>> --- a/migration/migration.c >>>>> +++ b/migration/migration.c >>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) >>>>> vm_start(); >>>>> } else { >>>>> runstate_set(global_state_get_runstate()); >>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>>> + /* Force vm_start to be called later. */ >>>>> + qemu_system_start_on_wakeup_request(); >>>>> + } >>>> >>>> Is this really needed, along with patch 1? >>>> >>>> I have a very limited knowledge on suspension, so I'm prone to making >>>> mistakes.. >>>> >>>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in >>>> the main thread later on after qemu_wakeup_requested() returns true. >>> >>> Correct, here: >>> >>> if (qemu_wakeup_requested()) { >>> pause_all_vcpus(); >>> qemu_system_wakeup(); >>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >>> resume_all_vcpus(); >>> qapi_event_send_wakeup(); >>> } >>> >>> However, that is not sufficient, because vm_start() was never called on the incoming >>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. >>> >>> >>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended >>> guest, which sets the state to running, which is saved in global state: >>> >>> void migration_completion(MigrationState *s) >>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >>> global_state_store() >>> >>> Then the incoming migration calls vm_start here: >>> >>> migration/migration.c >>> if (!global_state_received() || >>> global_state_get_runstate() == RUN_STATE_RUNNING) { >>> if (autostart) { >>> vm_start(); >>> >>> vm_start must be called for correctness. >> >> I see. Though I had a feeling that this is still not the right way to do, >> at least not as clean. >> >> One question is, would above work for postcopy when VM is suspended during >> the switchover? > > Good catch, that is broken. > I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh > and now it works. > > if (global_state_get_runstate() == RUN_STATE_RUNNING) { > if (autostart) { > vm_start(); > } else { > runstate_set(RUN_STATE_PAUSED); > } > } else { > runstate_set(global_state_get_runstate()); > if (runstate_check(RUN_STATE_SUSPENDED)) { > qemu_system_start_on_wakeup_request(); > } > } > >> I think I see your point that vm_start() (mostly vm_prepare_start()) >> contains a bunch of operations that maybe we must have before starting the >> VM, but then.. should we just make that vm_start() unconditional when >> loading VM completes? I just don't see anything won't need it (besides >> -S), even COLO. >> >> So I'm wondering about something like this: >> >> ===8<=== >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) >> >> dirty_bitmap_mig_before_vm_start(); >> >> - if (!global_state_received() || >> - global_state_get_runstate() == RUN_STATE_RUNNING) { >> - if (autostart) { >> - vm_start(); >> - } else { >> - runstate_set(RUN_STATE_PAUSED); >> - } >> - } else if (migration_incoming_colo_enabled()) { >> + if (migration_incoming_colo_enabled()) { >> migration_incoming_disable_colo(); >> + /* COLO should always have autostart=1 or we can enforce it here */ >> + } >> + >> + if (autostart) { >> + RunState run_state = global_state_get_runstate(); >> vm_start(); > > This will resume the guest for a brief time, against the user's wishes. Not OK IMO. > >> + switch (run_state) { >> + case RUN_STATE_RUNNING: >> + break; >> + case RUN_STATE_SUSPENDED: >> + qemu_system_suspend(); > > qemu_system_suspend will not cause the guest to suspend. It is qemu's response after > the guest initiates a suspend. > >> + break; >> + default: >> + runstate_set(run_state); >> + break; >> + } >> } else { >> - runstate_set(global_state_get_runstate()); >> + runstate_set(RUN_STATE_PAUSED); >> } >> ===8<=== >> >> IIUC this can drop qemu_system_start_on_wakeup_request() along with the >> other global var. Would something like it work for us? > > Afraid not. start_on_wake is the only correct solution I can think of. > > - Steve Actually, the start_on_wake concept is indeed required, but I can hide the details in softmmu/cpus.s: static bool vm_never_started = true; void vm_start(void) { if (!vm_prepare_start(false)) { vm_never_started = false; resume_all_vcpus(); } } void vm_wakeup(void) { if (vm_never_started) { vm_start(); } else { runstate_set(RUN_STATE_RUNNING); } } ... and qemu_system_wakeup_request() calls vm_wakeup(). Now the migration code simply sets the run state (eg SUSPENDED), no more calling qemu_system_start_on_wakeup_request. - Steve
On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: > On 6/21/2023 4:28 PM, Peter Xu wrote: > > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: > >> On 6/20/2023 5:46 PM, Peter Xu wrote: > >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > >>>> Migration of a guest in the suspended state is broken. The incoming > >>>> migration code automatically tries to wake the guest, which IMO is > >>>> wrong -- the guest should end migration in the same state it started. > >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which > >>>> bypasses vm_start(). The guest appears to be in the running state, but > >>>> it is not. > >>>> > >>>> To fix, leave the guest in the suspended state, but call > >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed > >>>> later, when the client sends a system_wakeup command. > >>>> > >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >>>> --- > >>>> migration/migration.c | 11 ++++------- > >>>> softmmu/runstate.c | 1 + > >>>> 2 files changed, 5 insertions(+), 7 deletions(-) > >>>> > >>>> diff --git a/migration/migration.c b/migration/migration.c > >>>> index 17b4b47..851fe6d 100644 > >>>> --- a/migration/migration.c > >>>> +++ b/migration/migration.c > >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) > >>>> vm_start(); > >>>> } else { > >>>> runstate_set(global_state_get_runstate()); > >>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { > >>>> + /* Force vm_start to be called later. */ > >>>> + qemu_system_start_on_wakeup_request(); > >>>> + } > >>> > >>> Is this really needed, along with patch 1? > >>> > >>> I have a very limited knowledge on suspension, so I'm prone to making > >>> mistakes.. > >>> > >>> But from what I read this, qemu_system_wakeup_request() (existing one, not > >>> after patch 1 applied) will setup wakeup_reason and kick the main thread > >>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in > >>> the main thread later on after qemu_wakeup_requested() returns true. > >> > >> Correct, here: > >> > >> if (qemu_wakeup_requested()) { > >> pause_all_vcpus(); > >> qemu_system_wakeup(); > >> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > >> wakeup_reason = QEMU_WAKEUP_REASON_NONE; > >> resume_all_vcpus(); > >> qapi_event_send_wakeup(); > >> } > >> > >> However, that is not sufficient, because vm_start() was never called on the incoming > >> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. > >> > >> > >> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended > >> guest, which sets the state to running, which is saved in global state: > >> > >> void migration_completion(MigrationState *s) > >> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > >> global_state_store() > >> > >> Then the incoming migration calls vm_start here: > >> > >> migration/migration.c > >> if (!global_state_received() || > >> global_state_get_runstate() == RUN_STATE_RUNNING) { > >> if (autostart) { > >> vm_start(); > >> > >> vm_start must be called for correctness. > > > > I see. Though I had a feeling that this is still not the right way to do, > > at least not as clean. > > > > One question is, would above work for postcopy when VM is suspended during > > the switchover? > > Good catch, that is broken. > I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh > and now it works. > > if (global_state_get_runstate() == RUN_STATE_RUNNING) { > if (autostart) { > vm_start(); > } else { > runstate_set(RUN_STATE_PAUSED); > } > } else { > runstate_set(global_state_get_runstate()); > if (runstate_check(RUN_STATE_SUSPENDED)) { > qemu_system_start_on_wakeup_request(); > } > } > > > I think I see your point that vm_start() (mostly vm_prepare_start()) > > contains a bunch of operations that maybe we must have before starting the > > VM, but then.. should we just make that vm_start() unconditional when > > loading VM completes? I just don't see anything won't need it (besides > > -S), even COLO. > > > > So I'm wondering about something like this: > > > > ===8<=== > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) > > > > dirty_bitmap_mig_before_vm_start(); > > > > - if (!global_state_received() || > > - global_state_get_runstate() == RUN_STATE_RUNNING) { > > - if (autostart) { > > - vm_start(); > > - } else { > > - runstate_set(RUN_STATE_PAUSED); > > - } > > - } else if (migration_incoming_colo_enabled()) { > > + if (migration_incoming_colo_enabled()) { > > migration_incoming_disable_colo(); > > + /* COLO should always have autostart=1 or we can enforce it here */ > > + } > > + > > + if (autostart) { > > + RunState run_state = global_state_get_runstate(); > > vm_start(); > > This will resume the guest for a brief time, against the user's wishes. Not OK IMO. Ah okay.. Can we do whatever we need in vm_prepare_start(), then? I assume these chunks are needed: /* * WHPX accelerator needs to know whether we are going to step * any CPUs, before starting the first one. */ if (cpus_accel->synchronize_pre_resume) { cpus_accel->synchronize_pre_resume(step_pending); } /* We are sending this now, but the CPUs will be resumed shortly later */ qapi_event_send_resume(); cpu_enable_ticks(); While these may not be needed, but instead only needed if RUN_STATE_RUNNING below (runstate_set() will be needed regardless): runstate_set(RUN_STATE_RUNNING); vm_state_notify(1, RUN_STATE_RUNNING); So here's another of my attempt (this time also taking !global_state_received() into account)... RunState run_state; if (migration_incoming_colo_enabled()) { migration_incoming_disable_colo(); } if (!global_state_received()) { run_state = RUN_STATE_RUNNING; } else { run_state = global_state_get_runstate(); } if (autostart) { /* Part of vm_prepare_start(), may isolate into a helper? */ if (cpus_accel->synchronize_pre_resume) { cpus_accel->synchronize_pre_resume(step_pending); } qapi_event_send_resume(); cpu_enable_ticks(); /* Setup the runstate on src */ runstate_set(run_state); if (run_state == RUN_STATE_RUNNING) { vm_state_notify(1, RUN_STATE_RUNNING); } } else { runstate_set(RUN_STATE_PAUSED); } The whole idea is still do whatever needed here besides resuming the vcpus, rather than leaving the whole vm_start() into system wakeup. It then can avoid touching the system wakeup code which seems cleaner. > > IIUC this can drop qemu_system_start_on_wakeup_request() along with the > > other global var. Would something like it work for us? > > Afraid not. start_on_wake is the only correct solution I can think of. Please check again above, I just hope we can avoid yet another global to QEMU if possible. Thanks,
On Mon, Jun 26, 2023 at 02:27:32PM -0400, Peter Xu wrote: > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: > > On 6/21/2023 4:28 PM, Peter Xu wrote: > > > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: > > >> On 6/20/2023 5:46 PM, Peter Xu wrote: > > >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > > >>>> Migration of a guest in the suspended state is broken. The incoming > > >>>> migration code automatically tries to wake the guest, which IMO is > > >>>> wrong -- the guest should end migration in the same state it started. > > >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which > > >>>> bypasses vm_start(). The guest appears to be in the running state, but > > >>>> it is not. > > >>>> > > >>>> To fix, leave the guest in the suspended state, but call > > >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed > > >>>> later, when the client sends a system_wakeup command. > > >>>> > > >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > > >>>> --- > > >>>> migration/migration.c | 11 ++++------- > > >>>> softmmu/runstate.c | 1 + > > >>>> 2 files changed, 5 insertions(+), 7 deletions(-) > > >>>> > > >>>> diff --git a/migration/migration.c b/migration/migration.c > > >>>> index 17b4b47..851fe6d 100644 > > >>>> --- a/migration/migration.c > > >>>> +++ b/migration/migration.c > > >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) > > >>>> vm_start(); > > >>>> } else { > > >>>> runstate_set(global_state_get_runstate()); > > >>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { > > >>>> + /* Force vm_start to be called later. */ > > >>>> + qemu_system_start_on_wakeup_request(); > > >>>> + } > > >>> > > >>> Is this really needed, along with patch 1? > > >>> > > >>> I have a very limited knowledge on suspension, so I'm prone to making > > >>> mistakes.. > > >>> > > >>> But from what I read this, qemu_system_wakeup_request() (existing one, not > > >>> after patch 1 applied) will setup wakeup_reason and kick the main thread > > >>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in > > >>> the main thread later on after qemu_wakeup_requested() returns true. > > >> > > >> Correct, here: > > >> > > >> if (qemu_wakeup_requested()) { > > >> pause_all_vcpus(); > > >> qemu_system_wakeup(); > > >> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > > >> wakeup_reason = QEMU_WAKEUP_REASON_NONE; > > >> resume_all_vcpus(); > > >> qapi_event_send_wakeup(); > > >> } > > >> > > >> However, that is not sufficient, because vm_start() was never called on the incoming > > >> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. > > >> > > >> > > >> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended > > >> guest, which sets the state to running, which is saved in global state: > > >> > > >> void migration_completion(MigrationState *s) > > >> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > > >> global_state_store() > > >> > > >> Then the incoming migration calls vm_start here: > > >> > > >> migration/migration.c > > >> if (!global_state_received() || > > >> global_state_get_runstate() == RUN_STATE_RUNNING) { > > >> if (autostart) { > > >> vm_start(); > > >> > > >> vm_start must be called for correctness. > > > > > > I see. Though I had a feeling that this is still not the right way to do, > > > at least not as clean. > > > > > > One question is, would above work for postcopy when VM is suspended during > > > the switchover? > > > > Good catch, that is broken. > > I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh > > and now it works. > > > > if (global_state_get_runstate() == RUN_STATE_RUNNING) { > > if (autostart) { > > vm_start(); > > } else { > > runstate_set(RUN_STATE_PAUSED); > > } > > } else { > > runstate_set(global_state_get_runstate()); > > if (runstate_check(RUN_STATE_SUSPENDED)) { > > qemu_system_start_on_wakeup_request(); > > } > > } > > > > > I think I see your point that vm_start() (mostly vm_prepare_start()) > > > contains a bunch of operations that maybe we must have before starting the > > > VM, but then.. should we just make that vm_start() unconditional when > > > loading VM completes? I just don't see anything won't need it (besides > > > -S), even COLO. > > > > > > So I'm wondering about something like this: > > > > > > ===8<=== > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) > > > > > > dirty_bitmap_mig_before_vm_start(); > > > > > > - if (!global_state_received() || > > > - global_state_get_runstate() == RUN_STATE_RUNNING) { > > > - if (autostart) { > > > - vm_start(); > > > - } else { > > > - runstate_set(RUN_STATE_PAUSED); > > > - } > > > - } else if (migration_incoming_colo_enabled()) { > > > + if (migration_incoming_colo_enabled()) { > > > migration_incoming_disable_colo(); > > > + /* COLO should always have autostart=1 or we can enforce it here */ > > > + } > > > + > > > + if (autostart) { > > > + RunState run_state = global_state_get_runstate(); > > > vm_start(); > > > > This will resume the guest for a brief time, against the user's wishes. Not OK IMO. > > Ah okay.. > > Can we do whatever we need in vm_prepare_start(), then? I assume these > chunks are needed: > > /* > * WHPX accelerator needs to know whether we are going to step > * any CPUs, before starting the first one. > */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > > /* We are sending this now, but the CPUs will be resumed shortly later */ > qapi_event_send_resume(); > > cpu_enable_ticks(); > > While these may not be needed, but instead only needed if RUN_STATE_RUNNING > below (runstate_set() will be needed regardless): > > runstate_set(RUN_STATE_RUNNING); > vm_state_notify(1, RUN_STATE_RUNNING); > > So here's another of my attempt (this time also taking > !global_state_received() into account)... > > RunState run_state; > > if (migration_incoming_colo_enabled()) { > migration_incoming_disable_colo(); > } > > if (!global_state_received()) { > run_state = RUN_STATE_RUNNING; > } else { > run_state = global_state_get_runstate(); > } > > if (autostart) { > /* Part of vm_prepare_start(), may isolate into a helper? */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > qapi_event_send_resume(); > cpu_enable_ticks(); > /* Setup the runstate on src */ > runstate_set(run_state); > if (run_state == RUN_STATE_RUNNING) { > vm_state_notify(1, RUN_STATE_RUNNING); And here I at least forgot: resume_all_vcpus(); The pesudo code just to illustrate the whole idea. Definitely not tested in any form, so sorry if it won't even compile.. > } > } else { > runstate_set(RUN_STATE_PAUSED); > } > > The whole idea is still do whatever needed here besides resuming the vcpus, > rather than leaving the whole vm_start() into system wakeup. It then can > avoid touching the system wakeup code which seems cleaner. > > > > IIUC this can drop qemu_system_start_on_wakeup_request() along with the > > > other global var. Would something like it work for us? > > > > Afraid not. start_on_wake is the only correct solution I can think of. > > Please check again above, I just hope we can avoid yet another global to > QEMU if possible. > > Thanks, > > -- > Peter Xu
On 6/26/2023 2:27 PM, Peter Xu wrote: > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: >> On 6/21/2023 4:28 PM, Peter Xu wrote: >>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >>>> On 6/20/2023 5:46 PM, Peter Xu wrote: >>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>>>> Migration of a guest in the suspended state is broken. The incoming >>>>>> migration code automatically tries to wake the guest, which IMO is >>>>>> wrong -- the guest should end migration in the same state it started. >>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >>>>>> bypasses vm_start(). The guest appears to be in the running state, but >>>>>> it is not. >>>>>> >>>>>> To fix, leave the guest in the suspended state, but call >>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>>>> later, when the client sends a system_wakeup command. >>>>>> >>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>>>> --- >>>>>> migration/migration.c | 11 ++++------- >>>>>> softmmu/runstate.c | 1 + >>>>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>> index 17b4b47..851fe6d 100644 >>>>>> --- a/migration/migration.c >>>>>> +++ b/migration/migration.c >>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) >>>>>> vm_start(); >>>>>> } else { >>>>>> runstate_set(global_state_get_runstate()); >>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>>>> + /* Force vm_start to be called later. */ >>>>>> + qemu_system_start_on_wakeup_request(); >>>>>> + } >>>>> >>>>> Is this really needed, along with patch 1? >>>>> >>>>> I have a very limited knowledge on suspension, so I'm prone to making >>>>> mistakes.. >>>>> >>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in >>>>> the main thread later on after qemu_wakeup_requested() returns true. >>>> >>>> Correct, here: >>>> >>>> if (qemu_wakeup_requested()) { >>>> pause_all_vcpus(); >>>> qemu_system_wakeup(); >>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >>>> resume_all_vcpus(); >>>> qapi_event_send_wakeup(); >>>> } >>>> >>>> However, that is not sufficient, because vm_start() was never called on the incoming >>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. >>>> >>>> >>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended >>>> guest, which sets the state to running, which is saved in global state: >>>> >>>> void migration_completion(MigrationState *s) >>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >>>> global_state_store() >>>> >>>> Then the incoming migration calls vm_start here: >>>> >>>> migration/migration.c >>>> if (!global_state_received() || >>>> global_state_get_runstate() == RUN_STATE_RUNNING) { >>>> if (autostart) { >>>> vm_start(); >>>> >>>> vm_start must be called for correctness. >>> >>> I see. Though I had a feeling that this is still not the right way to do, >>> at least not as clean. >>> >>> One question is, would above work for postcopy when VM is suspended during >>> the switchover? >> >> Good catch, that is broken. >> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh >> and now it works. >> >> if (global_state_get_runstate() == RUN_STATE_RUNNING) { >> if (autostart) { >> vm_start(); >> } else { >> runstate_set(RUN_STATE_PAUSED); >> } >> } else { >> runstate_set(global_state_get_runstate()); >> if (runstate_check(RUN_STATE_SUSPENDED)) { >> qemu_system_start_on_wakeup_request(); >> } >> } >> >>> I think I see your point that vm_start() (mostly vm_prepare_start()) >>> contains a bunch of operations that maybe we must have before starting the >>> VM, but then.. should we just make that vm_start() unconditional when >>> loading VM completes? I just don't see anything won't need it (besides >>> -S), even COLO. >>> >>> So I'm wondering about something like this: >>> >>> ===8<=== >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) >>> >>> dirty_bitmap_mig_before_vm_start(); >>> >>> - if (!global_state_received() || >>> - global_state_get_runstate() == RUN_STATE_RUNNING) { >>> - if (autostart) { >>> - vm_start(); >>> - } else { >>> - runstate_set(RUN_STATE_PAUSED); >>> - } >>> - } else if (migration_incoming_colo_enabled()) { >>> + if (migration_incoming_colo_enabled()) { >>> migration_incoming_disable_colo(); >>> + /* COLO should always have autostart=1 or we can enforce it here */ >>> + } >>> + >>> + if (autostart) { >>> + RunState run_state = global_state_get_runstate(); >>> vm_start(); >> >> This will resume the guest for a brief time, against the user's wishes. Not OK IMO. > > Ah okay.. > > Can we do whatever we need in vm_prepare_start(), then? I assume these > chunks are needed: > > /* > * WHPX accelerator needs to know whether we are going to step > * any CPUs, before starting the first one. > */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > > /* We are sending this now, but the CPUs will be resumed shortly later */ > qapi_event_send_resume(); > > cpu_enable_ticks(); > > While these may not be needed, but instead only needed if RUN_STATE_RUNNING > below (runstate_set() will be needed regardless): > > runstate_set(RUN_STATE_RUNNING); > vm_state_notify(1, RUN_STATE_RUNNING); > > So here's another of my attempt (this time also taking > !global_state_received() into account)... > > RunState run_state; > > if (migration_incoming_colo_enabled()) { > migration_incoming_disable_colo(); > } > > if (!global_state_received()) { > run_state = RUN_STATE_RUNNING; > } else { > run_state = global_state_get_runstate(); > } > > if (autostart) { > /* Part of vm_prepare_start(), may isolate into a helper? */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > qapi_event_send_resume(); > cpu_enable_ticks(); > /* Setup the runstate on src */ > runstate_set(run_state); > if (run_state == RUN_STATE_RUNNING) { > vm_state_notify(1, RUN_STATE_RUNNING); > } > } else { > runstate_set(RUN_STATE_PAUSED); > } > > The whole idea is still do whatever needed here besides resuming the vcpus, > rather than leaving the whole vm_start() into system wakeup. It then can > avoid touching the system wakeup code which seems cleaner. The problem is that some actions cannot be performed at migration finish time, such as vm_state_notify RUN_STATE_RUNNING. The wakeup code called later still needs to know that vm_state_notify has not been called, and call it. I just posted a new series with a cleaner wakeup, but it still uses a global. - Steve >>> IIUC this can drop qemu_system_start_on_wakeup_request() along with the >>> other global var. Would something like it work for us? >> >> Afraid not. start_on_wake is the only correct solution I can think of. > > Please check again above, I just hope we can avoid yet another global to > QEMU if possible. > > Thanks, >
On Fri, Jun 30, 2023 at 09:50:41AM -0400, Steven Sistare wrote: > On 6/26/2023 2:27 PM, Peter Xu wrote: > > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: > >> On 6/21/2023 4:28 PM, Peter Xu wrote: > >>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: > >>>> On 6/20/2023 5:46 PM, Peter Xu wrote: > >>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > >>>>>> Migration of a guest in the suspended state is broken. The incoming > >>>>>> migration code automatically tries to wake the guest, which IMO is > >>>>>> wrong -- the guest should end migration in the same state it started. > >>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which > >>>>>> bypasses vm_start(). The guest appears to be in the running state, but > >>>>>> it is not. > >>>>>> > >>>>>> To fix, leave the guest in the suspended state, but call > >>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed > >>>>>> later, when the client sends a system_wakeup command. > >>>>>> > >>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >>>>>> --- > >>>>>> migration/migration.c | 11 ++++------- > >>>>>> softmmu/runstate.c | 1 + > >>>>>> 2 files changed, 5 insertions(+), 7 deletions(-) > >>>>>> > >>>>>> diff --git a/migration/migration.c b/migration/migration.c > >>>>>> index 17b4b47..851fe6d 100644 > >>>>>> --- a/migration/migration.c > >>>>>> +++ b/migration/migration.c > >>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) > >>>>>> vm_start(); > >>>>>> } else { > >>>>>> runstate_set(global_state_get_runstate()); > >>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { > >>>>>> + /* Force vm_start to be called later. */ > >>>>>> + qemu_system_start_on_wakeup_request(); > >>>>>> + } > >>>>> > >>>>> Is this really needed, along with patch 1? > >>>>> > >>>>> I have a very limited knowledge on suspension, so I'm prone to making > >>>>> mistakes.. > >>>>> > >>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not > >>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread > >>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in > >>>>> the main thread later on after qemu_wakeup_requested() returns true. > >>>> > >>>> Correct, here: > >>>> > >>>> if (qemu_wakeup_requested()) { > >>>> pause_all_vcpus(); > >>>> qemu_system_wakeup(); > >>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > >>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; > >>>> resume_all_vcpus(); > >>>> qapi_event_send_wakeup(); > >>>> } > >>>> > >>>> However, that is not sufficient, because vm_start() was never called on the incoming > >>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. > >>>> > >>>> > >>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended > >>>> guest, which sets the state to running, which is saved in global state: > >>>> > >>>> void migration_completion(MigrationState *s) > >>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > >>>> global_state_store() > >>>> > >>>> Then the incoming migration calls vm_start here: > >>>> > >>>> migration/migration.c > >>>> if (!global_state_received() || > >>>> global_state_get_runstate() == RUN_STATE_RUNNING) { > >>>> if (autostart) { > >>>> vm_start(); > >>>> > >>>> vm_start must be called for correctness. > >>> > >>> I see. Though I had a feeling that this is still not the right way to do, > >>> at least not as clean. > >>> > >>> One question is, would above work for postcopy when VM is suspended during > >>> the switchover? > >> > >> Good catch, that is broken. > >> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh > >> and now it works. > >> > >> if (global_state_get_runstate() == RUN_STATE_RUNNING) { > >> if (autostart) { > >> vm_start(); > >> } else { > >> runstate_set(RUN_STATE_PAUSED); > >> } > >> } else { > >> runstate_set(global_state_get_runstate()); > >> if (runstate_check(RUN_STATE_SUSPENDED)) { > >> qemu_system_start_on_wakeup_request(); > >> } > >> } > >> > >>> I think I see your point that vm_start() (mostly vm_prepare_start()) > >>> contains a bunch of operations that maybe we must have before starting the > >>> VM, but then.. should we just make that vm_start() unconditional when > >>> loading VM completes? I just don't see anything won't need it (besides > >>> -S), even COLO. > >>> > >>> So I'm wondering about something like this: > >>> > >>> ===8<=== > >>> --- a/migration/migration.c > >>> +++ b/migration/migration.c > >>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) > >>> > >>> dirty_bitmap_mig_before_vm_start(); > >>> > >>> - if (!global_state_received() || > >>> - global_state_get_runstate() == RUN_STATE_RUNNING) { > >>> - if (autostart) { > >>> - vm_start(); > >>> - } else { > >>> - runstate_set(RUN_STATE_PAUSED); > >>> - } > >>> - } else if (migration_incoming_colo_enabled()) { > >>> + if (migration_incoming_colo_enabled()) { > >>> migration_incoming_disable_colo(); > >>> + /* COLO should always have autostart=1 or we can enforce it here */ > >>> + } > >>> + > >>> + if (autostart) { > >>> + RunState run_state = global_state_get_runstate(); > >>> vm_start(); > >> > >> This will resume the guest for a brief time, against the user's wishes. Not OK IMO. > > > > Ah okay.. > > > > Can we do whatever we need in vm_prepare_start(), then? I assume these > > chunks are needed: > > > > /* > > * WHPX accelerator needs to know whether we are going to step > > * any CPUs, before starting the first one. > > */ > > if (cpus_accel->synchronize_pre_resume) { > > cpus_accel->synchronize_pre_resume(step_pending); > > } > > > > /* We are sending this now, but the CPUs will be resumed shortly later */ > > qapi_event_send_resume(); > > > > cpu_enable_ticks(); > > > > While these may not be needed, but instead only needed if RUN_STATE_RUNNING > > below (runstate_set() will be needed regardless): > > > > runstate_set(RUN_STATE_RUNNING); > > vm_state_notify(1, RUN_STATE_RUNNING); > > > > So here's another of my attempt (this time also taking > > !global_state_received() into account)... > > > > RunState run_state; > > > > if (migration_incoming_colo_enabled()) { > > migration_incoming_disable_colo(); > > } > > > > if (!global_state_received()) { > > run_state = RUN_STATE_RUNNING; > > } else { > > run_state = global_state_get_runstate(); > > } > > > > if (autostart) { > > /* Part of vm_prepare_start(), may isolate into a helper? */ > > if (cpus_accel->synchronize_pre_resume) { > > cpus_accel->synchronize_pre_resume(step_pending); > > } > > qapi_event_send_resume(); > > cpu_enable_ticks(); > > /* Setup the runstate on src */ > > runstate_set(run_state); > > if (run_state == RUN_STATE_RUNNING) { > > vm_state_notify(1, RUN_STATE_RUNNING); > > } > > } else { > > runstate_set(RUN_STATE_PAUSED); > > } > > > > The whole idea is still do whatever needed here besides resuming the vcpus, > > rather than leaving the whole vm_start() into system wakeup. It then can > > avoid touching the system wakeup code which seems cleaner. > > The problem is that some actions cannot be performed at migration finish time, > such as vm_state_notify RUN_STATE_RUNNING. The wakeup code called later still > needs to know that vm_state_notify has not been called, and call it. Sorry, very late reply.. Can we just call vm_state_notify() earlier? I know I'm not familiar with this code at all.. but I'm still not fully convinced a new global is needed for this. IMHO QEMU becomes harder to maintain with such globals, while already tons exist. > > I just posted a new series with a cleaner wakeup, but it still uses a global. I think the new version does not apply anymore to master. Do you have a tree somewhere? Thanks,
On 7/26/2023 4:18 PM, Peter Xu wrote: > On Fri, Jun 30, 2023 at 09:50:41AM -0400, Steven Sistare wrote: >> On 6/26/2023 2:27 PM, Peter Xu wrote: >>> On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: >>>> On 6/21/2023 4:28 PM, Peter Xu wrote: >>>>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >>>>>> On 6/20/2023 5:46 PM, Peter Xu wrote: >>>>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>>>>>> Migration of a guest in the suspended state is broken. The incoming >>>>>>>> migration code automatically tries to wake the guest, which IMO is >>>>>>>> wrong -- the guest should end migration in the same state it started. >>>>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >>>>>>>> bypasses vm_start(). The guest appears to be in the running state, but >>>>>>>> it is not. >>>>>>>> >>>>>>>> To fix, leave the guest in the suspended state, but call >>>>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>>>>>> later, when the client sends a system_wakeup command. >>>>>>>> >>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>>>>>> --- >>>>>>>> migration/migration.c | 11 ++++------- >>>>>>>> softmmu/runstate.c | 1 + >>>>>>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>>>>>> >>>>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>>>> index 17b4b47..851fe6d 100644 >>>>>>>> --- a/migration/migration.c >>>>>>>> +++ b/migration/migration.c >>>>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) >>>>>>>> vm_start(); >>>>>>>> } else { >>>>>>>> runstate_set(global_state_get_runstate()); >>>>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>>>>>> + /* Force vm_start to be called later. */ >>>>>>>> + qemu_system_start_on_wakeup_request(); >>>>>>>> + } >>>>>>> >>>>>>> Is this really needed, along with patch 1? >>>>>>> >>>>>>> I have a very limited knowledge on suspension, so I'm prone to making >>>>>>> mistakes.. >>>>>>> >>>>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>>>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>>>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in >>>>>>> the main thread later on after qemu_wakeup_requested() returns true. >>>>>> >>>>>> Correct, here: >>>>>> >>>>>> if (qemu_wakeup_requested()) { >>>>>> pause_all_vcpus(); >>>>>> qemu_system_wakeup(); >>>>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >>>>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >>>>>> resume_all_vcpus(); >>>>>> qapi_event_send_wakeup(); >>>>>> } >>>>>> >>>>>> However, that is not sufficient, because vm_start() was never called on the incoming >>>>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among other things. >>>>>> >>>>>> >>>>>> Without my fixes, it "works" because the outgoing migration automatically wakes a suspended >>>>>> guest, which sets the state to running, which is saved in global state: >>>>>> >>>>>> void migration_completion(MigrationState *s) >>>>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >>>>>> global_state_store() >>>>>> >>>>>> Then the incoming migration calls vm_start here: >>>>>> >>>>>> migration/migration.c >>>>>> if (!global_state_received() || >>>>>> global_state_get_runstate() == RUN_STATE_RUNNING) { >>>>>> if (autostart) { >>>>>> vm_start(); >>>>>> >>>>>> vm_start must be called for correctness. >>>>> >>>>> I see. Though I had a feeling that this is still not the right way to do, >>>>> at least not as clean. >>>>> >>>>> One question is, would above work for postcopy when VM is suspended during >>>>> the switchover? >>>> >>>> Good catch, that is broken. >>>> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh >>>> and now it works. >>>> >>>> if (global_state_get_runstate() == RUN_STATE_RUNNING) { >>>> if (autostart) { >>>> vm_start(); >>>> } else { >>>> runstate_set(RUN_STATE_PAUSED); >>>> } >>>> } else { >>>> runstate_set(global_state_get_runstate()); >>>> if (runstate_check(RUN_STATE_SUSPENDED)) { >>>> qemu_system_start_on_wakeup_request(); >>>> } >>>> } >>>> >>>>> I think I see your point that vm_start() (mostly vm_prepare_start()) >>>>> contains a bunch of operations that maybe we must have before starting the >>>>> VM, but then.. should we just make that vm_start() unconditional when >>>>> loading VM completes? I just don't see anything won't need it (besides >>>>> -S), even COLO. >>>>> >>>>> So I'm wondering about something like this: >>>>> >>>>> ===8<=== >>>>> --- a/migration/migration.c >>>>> +++ b/migration/migration.c >>>>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) >>>>> >>>>> dirty_bitmap_mig_before_vm_start(); >>>>> >>>>> - if (!global_state_received() || >>>>> - global_state_get_runstate() == RUN_STATE_RUNNING) { >>>>> - if (autostart) { >>>>> - vm_start(); >>>>> - } else { >>>>> - runstate_set(RUN_STATE_PAUSED); >>>>> - } >>>>> - } else if (migration_incoming_colo_enabled()) { >>>>> + if (migration_incoming_colo_enabled()) { >>>>> migration_incoming_disable_colo(); >>>>> + /* COLO should always have autostart=1 or we can enforce it here */ >>>>> + } >>>>> + >>>>> + if (autostart) { >>>>> + RunState run_state = global_state_get_runstate(); >>>>> vm_start(); >>>> >>>> This will resume the guest for a brief time, against the user's wishes. Not OK IMO. >>> >>> Ah okay.. >>> >>> Can we do whatever we need in vm_prepare_start(), then? I assume these >>> chunks are needed: >>> >>> /* >>> * WHPX accelerator needs to know whether we are going to step >>> * any CPUs, before starting the first one. >>> */ >>> if (cpus_accel->synchronize_pre_resume) { >>> cpus_accel->synchronize_pre_resume(step_pending); >>> } >>> >>> /* We are sending this now, but the CPUs will be resumed shortly later */ >>> qapi_event_send_resume(); >>> >>> cpu_enable_ticks(); >>> >>> While these may not be needed, but instead only needed if RUN_STATE_RUNNING >>> below (runstate_set() will be needed regardless): >>> >>> runstate_set(RUN_STATE_RUNNING); >>> vm_state_notify(1, RUN_STATE_RUNNING); >>> >>> So here's another of my attempt (this time also taking >>> !global_state_received() into account)... >>> >>> RunState run_state; >>> >>> if (migration_incoming_colo_enabled()) { >>> migration_incoming_disable_colo(); >>> } >>> >>> if (!global_state_received()) { >>> run_state = RUN_STATE_RUNNING; >>> } else { >>> run_state = global_state_get_runstate(); >>> } >>> >>> if (autostart) { >>> /* Part of vm_prepare_start(), may isolate into a helper? */ >>> if (cpus_accel->synchronize_pre_resume) { >>> cpus_accel->synchronize_pre_resume(step_pending); >>> } >>> qapi_event_send_resume(); >>> cpu_enable_ticks(); >>> /* Setup the runstate on src */ >>> runstate_set(run_state); >>> if (run_state == RUN_STATE_RUNNING) { >>> vm_state_notify(1, RUN_STATE_RUNNING); >>> } >>> } else { >>> runstate_set(RUN_STATE_PAUSED); >>> } >>> >>> The whole idea is still do whatever needed here besides resuming the vcpus, >>> rather than leaving the whole vm_start() into system wakeup. It then can >>> avoid touching the system wakeup code which seems cleaner. >> >> The problem is that some actions cannot be performed at migration finish time, >> such as vm_state_notify RUN_STATE_RUNNING. The wakeup code called later still >> needs to know that vm_state_notify has not been called, and call it. > > Sorry, very late reply.. And I just returned from vaction :) > Can we just call vm_state_notify() earlier? We cannot. The guest is not running yet, and will not be until later. We cannot call notifiers that perform actions that complete, or react to, the guest entering a running state. > I know I'm not familiar with this code at all.. but I'm still not fully > convinced a new global is needed for this. IMHO QEMU becomes harder to > maintain with such globals, while already tons exist. > >> I just posted a new series with a cleaner wakeup, but it still uses a global. See "[PATCH V2 01/10] vl: start on wakeup request" in the new series. The transitions of the global are trivial and sensible: vm_start() vm_started = true; do_vm_stop() vm_started = false; current_run_state is already a global, confined to runstate.c. vm_started is a new qualifier on the state, and hence is also global. If runstate were a field in MachineState, then I would add vm_started to MachineState, but MachineState is not used in runstate.c. > I think the new version does not apply anymore to master. Do you have a > tree somewhere? I do not, but I will rebase and post V3 in a moment. - Steve
On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote: > > Can we just call vm_state_notify() earlier? > > We cannot. The guest is not running yet, and will not be until later. > We cannot call notifiers that perform actions that complete, or react to, > the guest entering a running state. I tried to look at a few examples of the notifees and most of them I read do not react to "vcpu running" but "vm running" (in which case I think "suspended" mode falls into "vm running" case); most of them won't care on the RunState parameter passed in, but only the bool "running". In reality, when running=true, it must be RUNNING so far. In that case does it mean we should notify right after the switchover, since after migration the vm is indeed running only if the vcpus are not during suspend? One example (of possible issue) is vfio_vmstate_change(), where iiuc if we try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that device; this kind of prove to me that SUSPEND is actually one of running=true states. If we postpone all notifiers here even after we switched over to dest qemu to the next upcoming suspend wakeup, I think it means these devices will not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps VFIO_DEVICE_STATE_STOP. Ideally I think we should here call vm_state_notify() with running=true and state=SUSPEND, but since I do see some hooks are not well prepared for SUSPEND over running=true, I'd think we should on the safe side call vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch over phase. With that IIUC it'll naturally work (e.g. when wakeup again later we just need to call no notifiers).
On 8/14/2023 3:37 PM, Peter Xu wrote: > On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote: >>> Can we just call vm_state_notify() earlier? >> >> We cannot. The guest is not running yet, and will not be until later. >> We cannot call notifiers that perform actions that complete, or react to, >> the guest entering a running state. > > I tried to look at a few examples of the notifees and most of them I read > do not react to "vcpu running" but "vm running" (in which case I think > "suspended" mode falls into "vm running" case); most of them won't care on > the RunState parameter passed in, but only the bool "running". > > In reality, when running=true, it must be RUNNING so far. > > In that case does it mean we should notify right after the switchover, > since after migration the vm is indeed running only if the vcpus are not > during suspend? I cannot parse your question, but maybe this answers it. If the outgoing VM is running and not suspended, then the incoming side tests for autostart==true and calls vm_start, which calls the notifiers, right after the switchover. > One example (of possible issue) is vfio_vmstate_change(), where iiuc if we > try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that > device; this kind of prove to me that SUSPEND is actually one of > running=true states. > > If we postpone all notifiers here even after we switched over to dest qemu > to the next upcoming suspend wakeup, I think it means these devices will > not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps > VFIO_DEVICE_STATE_STOP. or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup. AFAIK it is OK to remain in that state until wakeup is called later. > Ideally I think we should here call vm_state_notify() with running=true and > state=SUSPEND, but since I do see some hooks are not well prepared for > SUSPEND over running=true, I'd think we should on the safe side call > vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch > over phase. With that IIUC it'll naturally work (e.g. when wakeup again > later we just need to call no notifiers). Notifiers are just one piece, all the code in vm_prepare_start must be called. Is it correct to call all of that long before we actually resume the CPUs in wakeup? I don't know, but what is the point? The wakeup code still needs modification to conditionally resume the vcpus. The scheme would be roughly: loadvm_postcopy_handle_run_bh() runstat = global_state_get_runstate(); if (runstate == RUN_STATE_RUNNING) { vm_start() } else if (runstate == RUN_STATE_SUSPENDED) vm_prepare_start(); // the start of vm_start() } qemu_system_wakeup_request() if (some condition) resume_all_vcpus(); // the remainder of vm_start() else runstate_set(RUN_STATE_RUNNING) How is that better than my patches [PATCH V3 01/10] vl: start on wakeup request [PATCH V3 02/10] migration: preserve suspended runstate loadvm_postcopy_handle_run_bh() runstate = global_state_get_runstate(); if (runstate == RUN_STATE_RUNNING) vm_start() else runstate_set(runstate); // eg RUN_STATE_SUSPENDED qemu_system_wakeup_request() if (!vm_started) vm_start(); else runstate_set(RUN_STATE_RUNNING); Recall this thread started with your comment "It then can avoid touching the system wakeup code which seems cleaner". We still need to touch the wakeup code. - Steve
On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote: > On 8/14/2023 3:37 PM, Peter Xu wrote: > > On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote: > >>> Can we just call vm_state_notify() earlier? > >> > >> We cannot. The guest is not running yet, and will not be until later. > >> We cannot call notifiers that perform actions that complete, or react to, > >> the guest entering a running state. > > > > I tried to look at a few examples of the notifees and most of them I read > > do not react to "vcpu running" but "vm running" (in which case I think > > "suspended" mode falls into "vm running" case); most of them won't care on > > the RunState parameter passed in, but only the bool "running". > > > > In reality, when running=true, it must be RUNNING so far. > > > > In that case does it mean we should notify right after the switchover, > > since after migration the vm is indeed running only if the vcpus are not > > during suspend? > > I cannot parse your question, but maybe this answers it. > If the outgoing VM is running and not suspended, then the incoming side > tests for autostart==true and calls vm_start, which calls the notifiers, > right after the switchover. I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like RUNNING. Then, we should invoke vm_prepare_start(), just need some touch ups. > > > One example (of possible issue) is vfio_vmstate_change(), where iiuc if we > > try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that > > device; this kind of prove to me that SUSPEND is actually one of > > running=true states. > > > > If we postpone all notifiers here even after we switched over to dest qemu > > to the next upcoming suspend wakeup, I think it means these devices will > > not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps > > VFIO_DEVICE_STATE_STOP. > > or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup. > AFAIK it is OK to remain in that state until wakeup is called later. So let me provide another clue of why I think we should call vm_prepare_start().. Firstly, I think RESUME event should always be there right after we switched over, no matter suspeneded or not. I just noticed that your test case would work because you put "wakeup" to be before RESUME. I'm not sure whether that's correct. I'd bet people could rely on that RESUME to identify the switchover. More importantly, I'm wondering whether RTC should still be running during the suspended mode? Sorry again if my knowledge over there is just limited, so correct me otherwise - but my understanding is during suspend mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should still be running along with the system clock. It means we _should_ at least call cpu_enable_ticks() to enable rtc: /* * enable cpu_get_ticks() * Caller must hold BQL which serves as mutex for vm_clock_seqlock. */ void cpu_enable_ticks(void) I think that'll enable cpu_get_tsc() and make it start to work right. > > > Ideally I think we should here call vm_state_notify() with running=true and > > state=SUSPEND, but since I do see some hooks are not well prepared for > > SUSPEND over running=true, I'd think we should on the safe side call > > vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch > > over phase. With that IIUC it'll naturally work (e.g. when wakeup again > > later we just need to call no notifiers). > > Notifiers are just one piece, all the code in vm_prepare_start must be called. > Is it correct to call all of that long before we actually resume the CPUs in > wakeup? I don't know, but what is the point? The point is not only for cleaness (again, I really, really don't like that new global.. sorry), but also now I think we should make the vm running. > The wakeup code still needs > modification to conditionally resume the vcpus. The scheme would be roughly: > > loadvm_postcopy_handle_run_bh() > runstat = global_state_get_runstate(); > if (runstate == RUN_STATE_RUNNING) { > vm_start() > } else if (runstate == RUN_STATE_SUSPENDED) > vm_prepare_start(); // the start of vm_start() > } > > qemu_system_wakeup_request() > if (some condition) > resume_all_vcpus(); // the remainder of vm_start() > else > runstate_set(RUN_STATE_RUNNING) No it doesn't. wakeup_reason is set there, main loop does the resuming. See: if (qemu_wakeup_requested()) { pause_all_vcpus(); qemu_system_wakeup(); notifier_list_notify(&wakeup_notifiers, &wakeup_reason); wakeup_reason = QEMU_WAKEUP_REASON_NONE; resume_all_vcpus(); qapi_event_send_wakeup(); } > > How is that better than my patches > [PATCH V3 01/10] vl: start on wakeup request > [PATCH V3 02/10] migration: preserve suspended runstate > > loadvm_postcopy_handle_run_bh() > runstate = global_state_get_runstate(); > if (runstate == RUN_STATE_RUNNING) > vm_start() > else > runstate_set(runstate); // eg RUN_STATE_SUSPENDED > > qemu_system_wakeup_request() > if (!vm_started) > vm_start(); > else > runstate_set(RUN_STATE_RUNNING); > > Recall this thread started with your comment "It then can avoid touching the > system wakeup code which seems cleaner". We still need to touch the wakeup > code. Let me provide some code and reply to your new patchset inlines. Thanks.
On 8/17/2023 2:19 PM, Peter Xu wrote: > On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote: >> On 8/14/2023 3:37 PM, Peter Xu wrote: >>> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote: >>>>> Can we just call vm_state_notify() earlier? >>>> >>>> We cannot. The guest is not running yet, and will not be until later. >>>> We cannot call notifiers that perform actions that complete, or react to, >>>> the guest entering a running state. >>> >>> I tried to look at a few examples of the notifees and most of them I read >>> do not react to "vcpu running" but "vm running" (in which case I think >>> "suspended" mode falls into "vm running" case); most of them won't care on >>> the RunState parameter passed in, but only the bool "running". >>> >>> In reality, when running=true, it must be RUNNING so far. >>> >>> In that case does it mean we should notify right after the switchover, >>> since after migration the vm is indeed running only if the vcpus are not >>> during suspend? >> >> I cannot parse your question, but maybe this answers it. >> If the outgoing VM is running and not suspended, then the incoming side >> tests for autostart==true and calls vm_start, which calls the notifiers, >> right after the switchover. > > I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like > RUNNING. Then, we should invoke vm_prepare_start(), just need some touch > ups. > >> >>> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we >>> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that >>> device; this kind of prove to me that SUSPEND is actually one of >>> running=true states. >>> >>> If we postpone all notifiers here even after we switched over to dest qemu >>> to the next upcoming suspend wakeup, I think it means these devices will >>> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps >>> VFIO_DEVICE_STATE_STOP. >> >> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup. >> AFAIK it is OK to remain in that state until wakeup is called later. > > So let me provide another clue of why I think we should call > vm_prepare_start().. > > Firstly, I think RESUME event should always be there right after we > switched over, no matter suspeneded or not. I just noticed that your test > case would work because you put "wakeup" to be before RESUME. I'm not sure > whether that's correct. I'd bet people could rely on that RESUME to > identify the switchover. Yes, possibly. > More importantly, I'm wondering whether RTC should still be running during > the suspended mode? Sorry again if my knowledge over there is just > limited, so correct me otherwise - but my understanding is during suspend > mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should > still be running along with the system clock. It means we _should_ at > least call cpu_enable_ticks() to enable rtc: Agreed. The comment above cpu_get_ticks says: * return the time elapsed in VM between vm_start and vm_stop Suspending a guest does not call vm_stop, so ticks keeps running. I also verified that experimentally. > /* > * enable cpu_get_ticks() > * Caller must hold BQL which serves as mutex for vm_clock_seqlock. > */ > void cpu_enable_ticks(void) > > I think that'll enable cpu_get_tsc() and make it start to work right. > >> >>> Ideally I think we should here call vm_state_notify() with running=true and >>> state=SUSPEND, but since I do see some hooks are not well prepared for >>> SUSPEND over running=true, I'd think we should on the safe side call >>> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch >>> over phase. With that IIUC it'll naturally work (e.g. when wakeup again >>> later we just need to call no notifiers). >> >> Notifiers are just one piece, all the code in vm_prepare_start must be called. >> Is it correct to call all of that long before we actually resume the CPUs in >> wakeup? I don't know, but what is the point? > > The point is not only for cleaness (again, I really, really don't like that > new global.. sorry), but also now I think we should make the vm running. I do believe it is safe to call vm_prepare_start immediately, since the vcpus are never running when it is called. >> The wakeup code still needs >> modification to conditionally resume the vcpus. The scheme would be roughly: >> >> loadvm_postcopy_handle_run_bh() >> runstat = global_state_get_runstate(); >> if (runstate == RUN_STATE_RUNNING) { >> vm_start() >> } else if (runstate == RUN_STATE_SUSPENDED) >> vm_prepare_start(); // the start of vm_start() >> } >> >> qemu_system_wakeup_request() >> if (some condition) >> resume_all_vcpus(); // the remainder of vm_start() >> else >> runstate_set(RUN_STATE_RUNNING) > > No it doesn't. wakeup_reason is set there, main loop does the resuming. > See: > > if (qemu_wakeup_requested()) { > pause_all_vcpus(); > qemu_system_wakeup(); > notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > wakeup_reason = QEMU_WAKEUP_REASON_NONE; > resume_all_vcpus(); > qapi_event_send_wakeup(); > } Agreed, we can rely on that main loop code to call resume_all_vcpus, and not modify qemu_system_wakeup_request. That is cleaner. >> How is that better than my patches >> [PATCH V3 01/10] vl: start on wakeup request >> [PATCH V3 02/10] migration: preserve suspended runstate >> >> loadvm_postcopy_handle_run_bh() >> runstate = global_state_get_runstate(); >> if (runstate == RUN_STATE_RUNNING) >> vm_start() >> else >> runstate_set(runstate); // eg RUN_STATE_SUSPENDED >> >> qemu_system_wakeup_request() >> if (!vm_started) >> vm_start(); >> else >> runstate_set(RUN_STATE_RUNNING); >> >> Recall this thread started with your comment "It then can avoid touching the >> system wakeup code which seems cleaner". We still need to touch the wakeup >> code. > > Let me provide some code and reply to your new patchset inlines. Thank you, I have more comments there. - Steve
diff --git a/migration/migration.c b/migration/migration.c index 17b4b47..851fe6d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void *opaque) vm_start(); } else { runstate_set(global_state_get_runstate()); + if (runstate_check(RUN_STATE_SUSPENDED)) { + /* Force vm_start to be called later. */ + qemu_system_start_on_wakeup_request(); + } } /* * This must happen after any state changes since as soon as an external @@ -2101,7 +2105,6 @@ static int postcopy_start(MigrationState *ms) qemu_mutex_lock_iothread(); trace_postcopy_start_set_run(); - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); global_state_store(); ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret < 0) { @@ -2307,7 +2310,6 @@ static void migration_completion(MigrationState *s) if (s->state == MIGRATION_STATUS_ACTIVE) { qemu_mutex_lock_iothread(); s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); @@ -3102,11 +3104,6 @@ static void *bg_migration_thread(void *opaque) qemu_mutex_lock_iothread(); - /* - * If VM is currently in suspended state, then, to make a valid runstate - * transition in vm_stop_force_state() we need to wakeup it up. - */ - qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); s->vm_old_state = runstate_get(); global_state_store(); diff --git a/softmmu/runstate.c b/softmmu/runstate.c index e127b21..771896c 100644 --- a/softmmu/runstate.c +++ b/softmmu/runstate.c @@ -159,6 +159,7 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, + { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED }, { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH }, { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
Migration of a guest in the suspended state is broken. The incoming migration code automatically tries to wake the guest, which IMO is wrong -- the guest should end migration in the same state it started. Further, the wakeup is done by calling qemu_system_wakeup_request(), which bypasses vm_start(). The guest appears to be in the running state, but it is not. To fix, leave the guest in the suspended state, but call qemu_system_start_on_wakeup_request() so the guest is properly resumed later, when the client sends a system_wakeup command. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- migration/migration.c | 11 ++++------- softmmu/runstate.c | 1 + 2 files changed, 5 insertions(+), 7 deletions(-)