diff mbox series

[V4,06/11] migration: preserve cpu ticks if suspended

Message ID 1693333086-392798-7-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New, archived
Headers show
Series fix migration of suspended runstate | expand

Commit Message

Steven Sistare Aug. 29, 2023, 6:18 p.m. UTC
During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the
timers_state saved to the migration stream is stale, causing time errors
in the guest when it wakes from suspend.

To fix, maintain a shadow copy of timers_state, and update the shadow in a
pre_save handler that uses the same logic found in cpu_disable_ticks.  Copy
the shadow to timers_state in post_load.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 softmmu/cpu-timers.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

Comments

Peter Xu Aug. 30, 2023, 4:47 p.m. UTC | #1
On Tue, Aug 29, 2023 at 11:18:01AM -0700, Steve Sistare wrote:
> During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the
> timers_state saved to the migration stream is stale, causing time errors
> in the guest when it wakes from suspend.

Instead of having this, I'm wondering whether we should just let:

        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);

stop the vm for suspended too - I think we reached a consensus that
SUSPENDED should be treated the same as running here (except the vcpu
beingg running or not).

So the more risky change is we should make runstate_is_running() cover
SUSPENDED, but of course that again can affect many other call sites.. and
I'm not sure whether it's 100% working everywhere.

I think I mentioned the other "easier" way, which is to modify
vm_stop_force_state() to take suspended:

int vm_stop_force_state(RunState state)
 {
-    if (runstate_is_running()) {
+    if (runstate_is_running() || runstate_is_suspended()) {
         return vm_stop(state);

That resides in cpus.c but it really only affects migration, so much less
risky.  Do you think this should be the better (and correct) way to go?
Steven Sistare Sept. 7, 2023, 3:50 p.m. UTC | #2
On 8/30/2023 12:47 PM, Peter Xu wrote:
> On Tue, Aug 29, 2023 at 11:18:01AM -0700, Steve Sistare wrote:
>> During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the
>> timers_state saved to the migration stream is stale, causing time errors
>> in the guest when it wakes from suspend.
> 
> Instead of having this, I'm wondering whether we should just let:
> 
>         ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> 
> stop the vm for suspended too - I think we reached a consensus that
> SUSPENDED should be treated the same as running here (except the vcpu
> beingg running or not).
> 
> So the more risky change is we should make runstate_is_running() cover
> SUSPENDED, but of course that again can affect many other call sites.. and
> I'm not sure whether it's 100% working everywhere.
> 
> I think I mentioned the other "easier" way, which is to modify
> vm_stop_force_state() to take suspended:
> 
> int vm_stop_force_state(RunState state)
>  {
> -    if (runstate_is_running()) {
> +    if (runstate_is_running() || runstate_is_suspended()) {
>          return vm_stop(state);
> 
> That resides in cpus.c but it really only affects migration, so much less
> risky.  Do you think this should be the better (and correct) way to go?

Hi Peter, thanks for your careful review.  I have some other work to do, but
I will get back to this soon.

- Steve
Steven Sistare Nov. 13, 2023, 6:32 p.m. UTC | #3
On 8/30/2023 12:47 PM, Peter Xu wrote:
> On Tue, Aug 29, 2023 at 11:18:01AM -0700, Steve Sistare wrote:
>> During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the
>> timers_state saved to the migration stream is stale, causing time errors
>> in the guest when it wakes from suspend.
> 
> Instead of having this, I'm wondering whether we should just let:
> 
>         ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> 
> stop the vm for suspended too - I think we reached a consensus that
> SUSPENDED should be treated the same as running here (except the vcpu
> beingg running or not).
> 
> So the more risky change is we should make runstate_is_running() cover
> SUSPENDED, but of course that again can affect many other call sites.. and
> I'm not sure whether it's 100% working everywhere.
> 
> I think I mentioned the other "easier" way, which is to modify
> vm_stop_force_state() to take suspended:
> 
> int vm_stop_force_state(RunState state)
>  {
> -    if (runstate_is_running()) {
> +    if (runstate_is_running() || runstate_is_suspended()) {
>          return vm_stop(state);
> 
> That resides in cpus.c but it really only affects migration, so much less
> risky.  Do you think this should be the better (and correct) way to go?

Agreed, good idea, done in V5 - steve
diff mbox series

Patch

diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c
index 117408c..d5af317 100644
--- a/softmmu/cpu-timers.c
+++ b/softmmu/cpu-timers.c
@@ -157,6 +157,36 @@  static bool icount_shift_state_needed(void *opaque)
     return icount_enabled() == 2;
 }
 
+static int cpu_pre_save_ticks(void *opaque)
+{
+    TimersState *t = &timers_state;
+    TimersState *snap = opaque;
+
+    seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+
+    if (t->cpu_ticks_enabled) {
+        snap->cpu_ticks_offset = t->cpu_ticks_offset + cpu_get_host_ticks();
+        snap->cpu_clock_offset = cpu_get_clock_locked();
+    } else {
+        snap->cpu_ticks_offset = t->cpu_ticks_offset;
+        snap->cpu_clock_offset = t->cpu_clock_offset;
+    }
+    seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+    return 0;
+}
+
+static int cpu_post_load_ticks(void *opaque, int version_id)
+{
+    TimersState *t = &timers_state;
+    TimersState *snap = opaque;
+
+    seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+    t->cpu_ticks_offset = snap->cpu_ticks_offset;
+    t->cpu_clock_offset = snap->cpu_clock_offset;
+    seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock);
+    return 0;
+}
+
 /*
  * Subsection for warp timer migration is optional, because may not be created
  */
@@ -221,6 +251,8 @@  static const VMStateDescription vmstate_timers = {
     .name = "timer",
     .version_id = 2,
     .minimum_version_id = 1,
+    .pre_save = cpu_pre_save_ticks,
+    .post_load = cpu_post_load_ticks,
     .fields = (VMStateField[]) {
         VMSTATE_INT64(cpu_ticks_offset, TimersState),
         VMSTATE_UNUSED(8),
@@ -269,9 +301,11 @@  TimersState timers_state;
 /* initialize timers state and the cpu throttle for convenience */
 void cpu_timers_init(void)
 {
+    static TimersState timers_snapshot;
+
     seqlock_init(&timers_state.vm_clock_seqlock);
     qemu_spin_init(&timers_state.vm_clock_lock);
-    vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
+    vmstate_register(NULL, 0, &vmstate_timers, &timers_snapshot);
 
     cpu_throttle_init();
 }