diff mbox

[16/23] hyperv: map overlay pages after updating msrs

Message ID 20170606181948.16238-17-rkagan@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Roman Kagan June 6, 2017, 6:19 p.m. UTC
There is a design flaw in the Hyper-V SynIC implementation in KVM: when
message page or event flags page is enabled by setting the corresponding
msr, KVM zeroes it out.  This violates the spec in general (per spec,
the pages have to be overlay ones and only zeroed at cpu reset), but
it's non-fatal in normal operation because the user exit happens after
the page is zeroed, so it's the underlying guest page which is zeroed
out, and sane guests don't depend on its contents to be preserved while
it's overlaid.

However, in the case of vmstate load the overlay pages are set up before
msrs are set so the contents of those pages get lost.

To work it around, avoid setting up overlay pages in .post_load.
Instead, postpone it until after the msrs are pushed to KVM.  As a
result, KVM just zeroes out the underlying guest pages similar to how it
happens during guest-initiated msr writes, which is tolerable.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
---
 target/i386/kvm.c     | 8 ++++++++
 target/i386/machine.c | 9 ---------
 2 files changed, 8 insertions(+), 9 deletions(-)

Comments

Paolo Bonzini June 14, 2017, 11:12 a.m. UTC | #1
On 06/06/2017 20:19, Roman Kagan wrote:
> There is a design flaw in the Hyper-V SynIC implementation in KVM: when
> message page or event flags page is enabled by setting the corresponding
> msr, KVM zeroes it out.  This violates the spec in general (per spec,
> the pages have to be overlay ones and only zeroed at cpu reset), but
> it's non-fatal in normal operation because the user exit happens after
> the page is zeroed, so it's the underlying guest page which is zeroed
> out, and sane guests don't depend on its contents to be preserved while
> it's overlaid.
> 
> However, in the case of vmstate load the overlay pages are set up before
> msrs are set so the contents of those pages get lost.
> 
> To work it around, avoid setting up overlay pages in .post_load.
> Instead, postpone it until after the msrs are pushed to KVM.  As a
> result, KVM just zeroes out the underlying guest pages similar to how it
> happens during guest-initiated msr writes, which is tolerable.

Why not disable the zeroing for host-initiated MSR writes?  This is
pretty clearly a KVM bug, we can push it to stable kernels too.

Paolo
Roman Kagan June 14, 2017, 11:54 a.m. UTC | #2
On Wed, Jun 14, 2017 at 01:12:12PM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/06/2017 20:19, Roman Kagan wrote:
> > There is a design flaw in the Hyper-V SynIC implementation in KVM: when
> > message page or event flags page is enabled by setting the corresponding
> > msr, KVM zeroes it out.  This violates the spec in general (per spec,
> > the pages have to be overlay ones and only zeroed at cpu reset), but
> > it's non-fatal in normal operation because the user exit happens after
> > the page is zeroed, so it's the underlying guest page which is zeroed
> > out, and sane guests don't depend on its contents to be preserved while
> > it's overlaid.
> > 
> > However, in the case of vmstate load the overlay pages are set up before
> > msrs are set so the contents of those pages get lost.
> > 
> > To work it around, avoid setting up overlay pages in .post_load.
> > Instead, postpone it until after the msrs are pushed to KVM.  As a
> > result, KVM just zeroes out the underlying guest pages similar to how it
> > happens during guest-initiated msr writes, which is tolerable.
> 
> Why not disable the zeroing for host-initiated MSR writes?  This is
> pretty clearly a KVM bug, we can push it to stable kernels too.

The only problem with this is that QEMU will have no reliable way to
know if the KVM it runs with has this bug fixed or not.  Machines
without vmbus work and even migrate fine with the current KVM despite
this bug (the only user of those pages currently is synic timers which
re-arm themselves and post messages regardless of zeroing).  Now
updating QEMU to a vmbus-enabled version without updating the kernel
will make the migrations cause guest hangs.

If that is tolerable I can happily drop this patch as it complicates
code a little.  Distros probably won't be affected as they can make sure
their kernels have this bug fixed before they roll out a vmbus-capable
QEMU.

What do you think?

Thanks,
Roman.
Paolo Bonzini June 14, 2017, 12:11 p.m. UTC | #3
On 14/06/2017 13:54, Roman Kagan wrote:
>> Why not disable the zeroing for host-initiated MSR writes?  This is
>> pretty clearly a KVM bug, we can push it to stable kernels too.
>
> The only problem with this is that QEMU will have no reliable way to
> know if the KVM it runs with has this bug fixed or not.  Machines
> without vmbus work and even migrate fine with the current KVM despite
> this bug (the only user of those pages currently is synic timers which
> re-arm themselves and post messages regardless of zeroing).  Now
> updating QEMU to a vmbus-enabled version without updating the kernel
> will make the migrations cause guest hangs.

Return 2 from KVM_CHECK_EXTENSION(KVM_CAP_HYPERV_SYNIC)?  Then you can
make new QEMU refuse to enable synic if a new kernel is not available.

Paolo
Roman Kagan June 14, 2017, 12:41 p.m. UTC | #4
On Wed, Jun 14, 2017 at 02:11:56PM +0200, Paolo Bonzini wrote:
> On 14/06/2017 13:54, Roman Kagan wrote:
> >> Why not disable the zeroing for host-initiated MSR writes?  This is
> >> pretty clearly a KVM bug, we can push it to stable kernels too.
> >
> > The only problem with this is that QEMU will have no reliable way to
> > know if the KVM it runs with has this bug fixed or not.  Machines
> > without vmbus work and even migrate fine with the current KVM despite
> > this bug (the only user of those pages currently is synic timers which
> > re-arm themselves and post messages regardless of zeroing).  Now
> > updating QEMU to a vmbus-enabled version without updating the kernel
> > will make the migrations cause guest hangs.
> 
> Return 2 from KVM_CHECK_EXTENSION(KVM_CAP_HYPERV_SYNIC)?  Then you can
> make new QEMU refuse to enable synic if a new kernel is not available.

Indeed, that's a possibility.

I'll probably make it in both directions then: on
KVM_ENABLE_CAP(KVM_CAP_HYPERV_SYNIC, 2) disable zeroing completely,
including on guest writes, to better match Hyper-V.  Or does it deserve
a separate cap number?

Thanks,
Roman.
Paolo Bonzini June 14, 2017, 12:46 p.m. UTC | #5
On 14/06/2017 14:41, Roman Kagan wrote:
> On Wed, Jun 14, 2017 at 02:11:56PM +0200, Paolo Bonzini wrote:
>> On 14/06/2017 13:54, Roman Kagan wrote:
>>>> Why not disable the zeroing for host-initiated MSR writes?  This is
>>>> pretty clearly a KVM bug, we can push it to stable kernels too.
>>>
>>> The only problem with this is that QEMU will have no reliable way to
>>> know if the KVM it runs with has this bug fixed or not.  Machines
>>> without vmbus work and even migrate fine with the current KVM despite
>>> this bug (the only user of those pages currently is synic timers which
>>> re-arm themselves and post messages regardless of zeroing).  Now
>>> updating QEMU to a vmbus-enabled version without updating the kernel
>>> will make the migrations cause guest hangs.
>>
>> Return 2 from KVM_CHECK_EXTENSION(KVM_CAP_HYPERV_SYNIC)?  Then you can
>> make new QEMU refuse to enable synic if a new kernel is not available.
> 
> Indeed, that's a possibility.
> 
> I'll probably make it in both directions then: on
> KVM_ENABLE_CAP(KVM_CAP_HYPERV_SYNIC, 2) disable zeroing completely,
> including on guest writes, to better match Hyper-V.  Or does it deserve
> a separate cap number?

Unfortunately kvm_vcpu_ioctl_enable_cap is not checking arguments and
refusing nonzero values, so I'm a bit wary of doing that
unconditionally---and I'm not sure it deserves a separate capability number.

Maybe a flag KVM_ENABLE_CAP_STRICT_ARG_CHECK in cap->flags, but I don't
want you to do too much unrelated work.  Keeping the zeroing on guest
writes doesn't seem too bad.

Or the nuclear option, introduce KVM_CAP_HYPERV_SYNIC2 and drop the
broken KVM_CAP_HYPERV_SYNIC altogether.  Has its charm...

Paolo
diff mbox

Patch

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 433c912..b0b7595 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -2761,6 +2761,14 @@  int kvm_arch_put_registers(CPUState *cpu, int level)
             return ret;
         }
     }
+    /*
+     * to work around buggy KVM which zeroes out the message and event pages in
+     * KVM_SET_MSRS handler, only map the overlay pages after kvm_put_msrs,
+     * making vmstate load work similar to guest-initiated set_msr
+     */
+    if (level >= KVM_PUT_RESET_STATE) {
+        hyperv_synic_update(x86_cpu);
+    }
 
     ret = kvm_put_tscdeadline_msr(x86_cpu);
     if (ret < 0) {
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 8022c24..eb00b19 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -7,7 +7,6 @@ 
 #include "hw/i386/pc.h"
 #include "hw/isa/isa.h"
 #include "migration/cpu.h"
-#include "hyperv.h"
 
 #include "sysemu/kvm.h"
 
@@ -634,19 +633,11 @@  static bool hyperv_synic_enable_needed(void *opaque)
     return false;
 }
 
-static int hyperv_synic_post_load(void *opaque, int version_id)
-{
-    X86CPU *cpu = opaque;
-    hyperv_synic_update(cpu);
-    return 0;
-}
-
 static const VMStateDescription vmstate_msr_hyperv_synic = {
     .name = "cpu/msr_hyperv_synic",
     .version_id = 1,
     .minimum_version_id = 1,
     .needed = hyperv_synic_enable_needed,
-    .post_load = hyperv_synic_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(env.msr_hv_synic_control, X86CPU),
         VMSTATE_UINT64(env.msr_hv_synic_evt_page, X86CPU),