Message ID | 20220617144857.34189-6-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | CPU: Detect put cpu register errors for migrations | expand |
On Fri, 17 Jun 2022 at 15:53, Peter Xu <peterx@redhat.com> wrote: > > Leverage the new mechanism to pass over errors to upper stack for > kvm_arch_put_registers() when called for the post_init() accel hook. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > accel/kvm/kvm-all.c | 13 ++++++++++--- > accel/kvm/kvm-cpus.h | 2 +- > softmmu/cpus.c | 5 ++++- > 3 files changed, 15 insertions(+), 5 deletions(-) Checking for errors definitely does seem like the right thing to do. That said: (1) Why do we want to check for errors only on the call for post_init synchronize, and not any of the other places where we call kvm_arch_put_registers()? (2) I suspect this will uncover some situations where we've been happening-to-work because we ignore an error, and now will start to actively fail. But I guess there's not much we can do about that except say "we'll fix them as we encounter bug reports about them". (I know of at least one: on the Mac M1 running Linux, if the host doesn't have this kernel fix: https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@google.com/T/ then the first put_registers will fail (mostly harmlessly). I think that's the post_init sync but it might be the post_reset one.) thanks -- PMM
On Thu, Jun 23, 2022 at 02:09:43PM +0100, Peter Maydell wrote: > On Fri, 17 Jun 2022 at 15:53, Peter Xu <peterx@redhat.com> wrote: > > > > Leverage the new mechanism to pass over errors to upper stack for > > kvm_arch_put_registers() when called for the post_init() accel hook. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > accel/kvm/kvm-all.c | 13 ++++++++++--- > > accel/kvm/kvm-cpus.h | 2 +- > > softmmu/cpus.c | 5 ++++- > > 3 files changed, 15 insertions(+), 5 deletions(-) > > Checking for errors definitely does seem like the right thing to do. > That said: > > (1) Why do we want to check for errors only on the call > for post_init synchronize, and not any of the other places > where we call kvm_arch_put_registers()? Because I only know that's what we need to keep live migration honest on being successful, and I didn't want to spread the fire elsewhere, neither from knowledge nor time.. So I wanted to keep the series simple but useful. If we have reasons to cover some of the rest, I can still try. > > (2) I suspect this will uncover some situations where we've > been happening-to-work because we ignore an error, and now > will start to actively fail. But I guess there's not much > we can do about that except say "we'll fix them as we encounter > bug reports about them". (I know of at least one: on the > Mac M1 running Linux, if the host doesn't have this kernel fix: > https://lore.kernel.org/all/YnHz6Cw5ONR2e+KA@google.com/T/ > then the first put_registers will fail (mostly harmlessly). > I think that's the post_init sync but it might be the post_reset > one.) .. from what I read from the commit message in the link, hopefully that was only about the reset process since that sounds like a mismatched regs before/after gic created. When migration completes, I guess we're always fetching from the after-gic-created case? But it'll be great if we double check. Luckily it seems only for m1. What my series wanted to achieve is not affect anything else but migration (so if they fail elsewhere they keep the benign failing). What I worried is exactly when we have benign failures on put regs on live migration use cases, but hopefully not. Thanks,
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index df4f7c98f3..03e29ab1ed 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -2771,15 +2771,22 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu) run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, RUN_ON_CPU_NULL); } -static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg) +static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg, + Error **errp) { - kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE); + int ret = kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE); + + if (ret) { + error_setg(errp, "kvm_arch_put_registers() failed with retval=%d", ret); + return; + } + cpu->vcpu_dirty = false; } void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp) { - run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL); + run_on_cpu2(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL, errp); } static void do_kvm_cpu_synchronize_pre_loadvm(CPUState *cpu, run_on_cpu_data arg) diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h index bf0bd1bee4..c9b8262704 100644 --- a/accel/kvm/kvm-cpus.h +++ b/accel/kvm/kvm-cpus.h @@ -16,7 +16,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp); int kvm_cpu_exec(CPUState *cpu); void kvm_destroy_vcpu(CPUState *cpu); void kvm_cpu_synchronize_post_reset(CPUState *cpu); -void kvm_cpu_synchronize_post_init(CPUState *cpu); +void kvm_cpu_synchronize_post_init(CPUState *cpu, Error **errp); void kvm_cpu_synchronize_pre_loadvm(CPUState *cpu); #endif /* KVM_CPUS_H */ diff --git a/softmmu/cpus.c b/softmmu/cpus.c index 59c70fd496..6c0b5b87f0 100644 --- a/softmmu/cpus.c +++ b/softmmu/cpus.c @@ -151,7 +151,10 @@ void cpu_synchronize_all_post_init(Error **errp) CPUState *cpu; CPU_FOREACH(cpu) { - cpu_synchronize_post_init(cpu); + cpu_synchronize_post_init_full(cpu, errp); + if (errp && *errp) { + break; + } } }
Leverage the new mechanism to pass over errors to upper stack for kvm_arch_put_registers() when called for the post_init() accel hook. Signed-off-by: Peter Xu <peterx@redhat.com> --- accel/kvm/kvm-all.c | 13 ++++++++++--- accel/kvm/kvm-cpus.h | 2 +- softmmu/cpus.c | 5 ++++- 3 files changed, 15 insertions(+), 5 deletions(-)