Message ID | 1693333086-392798-7-git-send-email-steven.sistare@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix migration of suspended runstate | expand |
On Tue, Aug 29, 2023 at 11:18:01AM -0700, Steve Sistare wrote: > During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the > timers_state saved to the migration stream is stale, causing time errors > in the guest when it wakes from suspend. Instead of having this, I'm wondering whether we should just let: ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); stop the vm for suspended too - I think we reached a consensus that SUSPENDED should be treated the same as running here (except the vcpu beingg running or not). So the more risky change is we should make runstate_is_running() cover SUSPENDED, but of course that again can affect many other call sites.. and I'm not sure whether it's 100% working everywhere. I think I mentioned the other "easier" way, which is to modify vm_stop_force_state() to take suspended: int vm_stop_force_state(RunState state) { - if (runstate_is_running()) { + if (runstate_is_running() || runstate_is_suspended()) { return vm_stop(state); That resides in cpus.c but it really only affects migration, so much less risky. Do you think this should be the better (and correct) way to go?
On 8/30/2023 12:47 PM, Peter Xu wrote: > On Tue, Aug 29, 2023 at 11:18:01AM -0700, Steve Sistare wrote: >> During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the >> timers_state saved to the migration stream is stale, causing time errors >> in the guest when it wakes from suspend. > > Instead of having this, I'm wondering whether we should just let: > > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > > stop the vm for suspended too - I think we reached a consensus that > SUSPENDED should be treated the same as running here (except the vcpu > beingg running or not). > > So the more risky change is we should make runstate_is_running() cover > SUSPENDED, but of course that again can affect many other call sites.. and > I'm not sure whether it's 100% working everywhere. > > I think I mentioned the other "easier" way, which is to modify > vm_stop_force_state() to take suspended: > > int vm_stop_force_state(RunState state) > { > - if (runstate_is_running()) { > + if (runstate_is_running() || runstate_is_suspended()) { > return vm_stop(state); > > That resides in cpus.c but it really only affects migration, so much less > risky. Do you think this should be the better (and correct) way to go? Hi Peter, thanks for your careful review. I have some other work to do, but I will get back to this soon. - Steve
On 8/30/2023 12:47 PM, Peter Xu wrote: > On Tue, Aug 29, 2023 at 11:18:01AM -0700, Steve Sistare wrote: >> During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the >> timers_state saved to the migration stream is stale, causing time errors >> in the guest when it wakes from suspend. > > Instead of having this, I'm wondering whether we should just let: > > ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > > stop the vm for suspended too - I think we reached a consensus that > SUSPENDED should be treated the same as running here (except the vcpu > beingg running or not). > > So the more risky change is we should make runstate_is_running() cover > SUSPENDED, but of course that again can affect many other call sites.. and > I'm not sure whether it's 100% working everywhere. > > I think I mentioned the other "easier" way, which is to modify > vm_stop_force_state() to take suspended: > > int vm_stop_force_state(RunState state) > { > - if (runstate_is_running()) { > + if (runstate_is_running() || runstate_is_suspended()) { > return vm_stop(state); > > That resides in cpus.c but it really only affects migration, so much less > risky. Do you think this should be the better (and correct) way to go? Agreed, good idea, done in V5 - steve
diff --git a/softmmu/cpu-timers.c b/softmmu/cpu-timers.c index 117408c..d5af317 100644 --- a/softmmu/cpu-timers.c +++ b/softmmu/cpu-timers.c @@ -157,6 +157,36 @@ static bool icount_shift_state_needed(void *opaque) return icount_enabled() == 2; } +static int cpu_pre_save_ticks(void *opaque) +{ + TimersState *t = &timers_state; + TimersState *snap = opaque; + + seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock); + + if (t->cpu_ticks_enabled) { + snap->cpu_ticks_offset = t->cpu_ticks_offset + cpu_get_host_ticks(); + snap->cpu_clock_offset = cpu_get_clock_locked(); + } else { + snap->cpu_ticks_offset = t->cpu_ticks_offset; + snap->cpu_clock_offset = t->cpu_clock_offset; + } + seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock); + return 0; +} + +static int cpu_post_load_ticks(void *opaque, int version_id) +{ + TimersState *t = &timers_state; + TimersState *snap = opaque; + + seqlock_write_lock(&t->vm_clock_seqlock, &t->vm_clock_lock); + t->cpu_ticks_offset = snap->cpu_ticks_offset; + t->cpu_clock_offset = snap->cpu_clock_offset; + seqlock_write_unlock(&t->vm_clock_seqlock, &t->vm_clock_lock); + return 0; +} + /* * Subsection for warp timer migration is optional, because may not be created */ @@ -221,6 +251,8 @@ static const VMStateDescription vmstate_timers = { .name = "timer", .version_id = 2, .minimum_version_id = 1, + .pre_save = cpu_pre_save_ticks, + .post_load = cpu_post_load_ticks, .fields = (VMStateField[]) { VMSTATE_INT64(cpu_ticks_offset, TimersState), VMSTATE_UNUSED(8), @@ -269,9 +301,11 @@ TimersState timers_state; /* initialize timers state and the cpu throttle for convenience */ void cpu_timers_init(void) { + static TimersState timers_snapshot; + seqlock_init(&timers_state.vm_clock_seqlock); qemu_spin_init(&timers_state.vm_clock_lock); - vmstate_register(NULL, 0, &vmstate_timers, &timers_state); + vmstate_register(NULL, 0, &vmstate_timers, &timers_snapshot); cpu_throttle_init(); }
During RUN_STATE_SUSPENDED, the cpu clock remains enabled, so the timers_state saved to the migration stream is stale, causing time errors in the guest when it wakes from suspend. To fix, maintain a shadow copy of timers_state, and update the shadow in a pre_save handler that uses the same logic found in cpu_disable_ticks. Copy the shadow to timers_state in post_load. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- softmmu/cpu-timers.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-)