Message ID | 20170606181948.16238-17-rkagan@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/06/2017 20:19, Roman Kagan wrote: > There is a design flaw in the Hyper-V SynIC implementation in KVM: when > message page or event flags page is enabled by setting the corresponding > msr, KVM zeroes it out. This violates the spec in general (per spec, > the pages have to be overlay ones and only zeroed at cpu reset), but > it's non-fatal in normal operation because the user exit happens after > the page is zeroed, so it's the underlying guest page which is zeroed > out, and sane guests don't depend on its contents to be preserved while > it's overlaid. > > However, in the case of vmstate load the overlay pages are set up before > msrs are set so the contents of those pages get lost. > > To work it around, avoid setting up overlay pages in .post_load. > Instead, postpone it until after the msrs are pushed to KVM. As a > result, KVM just zeroes out the underlying guest pages similar to how it > happens during guest-initiated msr writes, which is tolerable. Why not disable the zeroing for host-initiated MSR writes? This is pretty clearly a KVM bug, we can push it to stable kernels too. Paolo
On Wed, Jun 14, 2017 at 01:12:12PM +0200, Paolo Bonzini wrote: > > > On 06/06/2017 20:19, Roman Kagan wrote: > > There is a design flaw in the Hyper-V SynIC implementation in KVM: when > > message page or event flags page is enabled by setting the corresponding > > msr, KVM zeroes it out. This violates the spec in general (per spec, > > the pages have to be overlay ones and only zeroed at cpu reset), but > > it's non-fatal in normal operation because the user exit happens after > > the page is zeroed, so it's the underlying guest page which is zeroed > > out, and sane guests don't depend on its contents to be preserved while > > it's overlaid. > > > > However, in the case of vmstate load the overlay pages are set up before > > msrs are set so the contents of those pages get lost. > > > > To work it around, avoid setting up overlay pages in .post_load. > > Instead, postpone it until after the msrs are pushed to KVM. As a > > result, KVM just zeroes out the underlying guest pages similar to how it > > happens during guest-initiated msr writes, which is tolerable. > > Why not disable the zeroing for host-initiated MSR writes? This is > pretty clearly a KVM bug, we can push it to stable kernels too. The only problem with this is that QEMU will have no reliable way to know if the KVM it runs with has this bug fixed or not. Machines without vmbus work and even migrate fine with the current KVM despite this bug (the only user of those pages currently is synic timers which re-arm themselves and post messages regardless of zeroing). Now updating QEMU to a vmbus-enabled version without updating the kernel will make the migrations cause guest hangs. If that is tolerable I can happily drop this patch as it complicates code a little. Distros probably won't be affected as they can make sure their kernels have this bug fixed before they roll out a vmbus-capable QEMU. What do you think? Thanks, Roman.
On 14/06/2017 13:54, Roman Kagan wrote: >> Why not disable the zeroing for host-initiated MSR writes? This is >> pretty clearly a KVM bug, we can push it to stable kernels too. > > The only problem with this is that QEMU will have no reliable way to > know if the KVM it runs with has this bug fixed or not. Machines > without vmbus work and even migrate fine with the current KVM despite > this bug (the only user of those pages currently is synic timers which > re-arm themselves and post messages regardless of zeroing). Now > updating QEMU to a vmbus-enabled version without updating the kernel > will make the migrations cause guest hangs. Return 2 from KVM_CHECK_EXTENSION(KVM_CAP_HYPERV_SYNIC)? Then you can make new QEMU refuse to enable synic if a new kernel is not available. Paolo
On Wed, Jun 14, 2017 at 02:11:56PM +0200, Paolo Bonzini wrote: > On 14/06/2017 13:54, Roman Kagan wrote: > >> Why not disable the zeroing for host-initiated MSR writes? This is > >> pretty clearly a KVM bug, we can push it to stable kernels too. > > > > The only problem with this is that QEMU will have no reliable way to > > know if the KVM it runs with has this bug fixed or not. Machines > > without vmbus work and even migrate fine with the current KVM despite > > this bug (the only user of those pages currently is synic timers which > > re-arm themselves and post messages regardless of zeroing). Now > > updating QEMU to a vmbus-enabled version without updating the kernel > > will make the migrations cause guest hangs. > > Return 2 from KVM_CHECK_EXTENSION(KVM_CAP_HYPERV_SYNIC)? Then you can > make new QEMU refuse to enable synic if a new kernel is not available. Indeed, that's a possibility. I'll probably make it in both directions then: on KVM_ENABLE_CAP(KVM_CAP_HYPERV_SYNIC, 2) disable zeroing completely, including on guest writes, to better match Hyper-V. Or does it deserve a separate cap number? Thanks, Roman.
On 14/06/2017 14:41, Roman Kagan wrote: > On Wed, Jun 14, 2017 at 02:11:56PM +0200, Paolo Bonzini wrote: >> On 14/06/2017 13:54, Roman Kagan wrote: >>>> Why not disable the zeroing for host-initiated MSR writes? This is >>>> pretty clearly a KVM bug, we can push it to stable kernels too. >>> >>> The only problem with this is that QEMU will have no reliable way to >>> know if the KVM it runs with has this bug fixed or not. Machines >>> without vmbus work and even migrate fine with the current KVM despite >>> this bug (the only user of those pages currently is synic timers which >>> re-arm themselves and post messages regardless of zeroing). Now >>> updating QEMU to a vmbus-enabled version without updating the kernel >>> will make the migrations cause guest hangs. >> >> Return 2 from KVM_CHECK_EXTENSION(KVM_CAP_HYPERV_SYNIC)? Then you can >> make new QEMU refuse to enable synic if a new kernel is not available. > > Indeed, that's a possibility. > > I'll probably make it in both directions then: on > KVM_ENABLE_CAP(KVM_CAP_HYPERV_SYNIC, 2) disable zeroing completely, > including on guest writes, to better match Hyper-V. Or does it deserve > a separate cap number? Unfortunately kvm_vcpu_ioctl_enable_cap is not checking arguments and refusing nonzero values, so I'm a bit wary of doing that unconditionally---and I'm not sure it deserves a separate capability number. Maybe a flag KVM_ENABLE_CAP_STRICT_ARG_CHECK in cap->flags, but I don't want you to do too much unrelated work. Keeping the zeroing on guest writes doesn't seem too bad. Or the nuclear option, introduce KVM_CAP_HYPERV_SYNIC2 and drop the broken KVM_CAP_HYPERV_SYNIC altogether. Has its charm... Paolo
diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 433c912..b0b7595 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -2761,6 +2761,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level) return ret; } } + /* + * to work around buggy KVM which zeroes out the message and event pages in + * KVM_SET_MSRS handler, only map the overlay pages after kvm_put_msrs, + * making vmstate load work similar to guest-initiated set_msr + */ + if (level >= KVM_PUT_RESET_STATE) { + hyperv_synic_update(x86_cpu); + } ret = kvm_put_tscdeadline_msr(x86_cpu); if (ret < 0) { diff --git a/target/i386/machine.c b/target/i386/machine.c index 8022c24..eb00b19 100644 --- a/target/i386/machine.c +++ b/target/i386/machine.c @@ -7,7 +7,6 @@ #include "hw/i386/pc.h" #include "hw/isa/isa.h" #include "migration/cpu.h" -#include "hyperv.h" #include "sysemu/kvm.h" @@ -634,19 +633,11 @@ static bool hyperv_synic_enable_needed(void *opaque) return false; } -static int hyperv_synic_post_load(void *opaque, int version_id) -{ - X86CPU *cpu = opaque; - hyperv_synic_update(cpu); - return 0; -} - static const VMStateDescription vmstate_msr_hyperv_synic = { .name = "cpu/msr_hyperv_synic", .version_id = 1, .minimum_version_id = 1, .needed = hyperv_synic_enable_needed, - .post_load = hyperv_synic_post_load, .fields = (VMStateField[]) { VMSTATE_UINT64(env.msr_hv_synic_control, X86CPU), VMSTATE_UINT64(env.msr_hv_synic_evt_page, X86CPU),
There is a design flaw in the Hyper-V SynIC implementation in KVM: when message page or event flags page is enabled by setting the corresponding msr, KVM zeroes it out. This violates the spec in general (per spec, the pages have to be overlay ones and only zeroed at cpu reset), but it's non-fatal in normal operation because the user exit happens after the page is zeroed, so it's the underlying guest page which is zeroed out, and sane guests don't depend on its contents to be preserved while it's overlaid. However, in the case of vmstate load the overlay pages are set up before msrs are set so the contents of those pages get lost. To work it around, avoid setting up overlay pages in .post_load. Instead, postpone it until after the msrs are pushed to KVM. As a result, KVM just zeroes out the underlying guest pages similar to how it happens during guest-initiated msr writes, which is tolerable. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> --- target/i386/kvm.c | 8 ++++++++ target/i386/machine.c | 9 --------- 2 files changed, 8 insertions(+), 9 deletions(-)