Message ID | 1429789615-36001-1-git-send-email-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2015-04-23 13:46+0200, Paolo Bonzini: > From: Radim Kr?má? <rkrcmar@redhat.com> > > The kvmclock spec says that the host will increment a version field to > an odd number, then update stuff, then increment it to an even number. > The host is buggy and doesn't do this, and the result is observable > when one vcpu reads another vcpu's kvmclock data. > > There's no good way for a guest kernel to keep its vdso from reading > a different vcpu's kvmclock data, but we don't need to care about > changing VCPUs as long as we read a consistent data from kvmclock. > (VCPU can change outside of this loop too, so it doesn't matter if we > return a value not fit for this VCPU.) > > Based on a patch by Radim Kr?má?. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- Nice, Reviewed-by: Radim Kr?má? <rkrcmar@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 23, 2015 at 01:46:55PM +0200, Paolo Bonzini wrote: > From: Radim Kr?má? <rkrcmar@redhat.com> > > The kvmclock spec says that the host will increment a version field to > an odd number, then update stuff, then increment it to an even number. > The host is buggy and doesn't do this, and the result is observable > when one vcpu reads another vcpu's kvmclock data. > > There's no good way for a guest kernel to keep its vdso from reading > a different vcpu's kvmclock data, but we don't need to care about > changing VCPUs as long as we read a consistent data from kvmclock. > (VCPU can change outside of this loop too, so it doesn't matter if we > return a value not fit for this VCPU.) > > Based on a patch by Radim Kr?má?. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ed31c31b2485..c73efcd03e29 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > &guest_hv_clock, sizeof(guest_hv_clock)))) > return 0; > > - /* > - * The interface expects us to write an even number signaling that the > - * update is finished. Since the guest won't see the intermediate > - * state, we just increase by 2 at the end. > + /* This VCPU is paused, but it's legal for a guest to read another > + * VCPU's kvmclock, so we really have to follow the specification where > + * it says that version is odd if data is being modified, and even after > + * it is consistent. > + * > + * Version field updates must be kept separate. This is because > + * kvm_write_guest_cached might use a "rep movs" instruction, and > + * writes within a string instruction are weakly ordered. So there > + * are three writes overall. > + * > + * As a small optimization, only write the version field in the first > + * and third write. The vcpu->pv_time cache is still valid, because the > + * version field is the first in the struct. > */ > - vcpu->hv_clock.version = guest_hv_clock.version + 2; > + BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); > + > + vcpu->hv_clock.version = guest_hv_clock.version + 1; > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &vcpu->hv_clock, > + sizeof(vcpu->hv_clock.version)); > + > + smp_wmb(); > > /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ > pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); > @@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > &vcpu->hv_clock, > sizeof(vcpu->hv_clock)); > + > + smp_wmb(); > + > + vcpu->hv_clock.version++; > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &vcpu->hv_clock, > + sizeof(vcpu->hv_clock.version)); > return 0; > } > > -- > 1.8.3.1 Acked-by: Marcelo Tosatti <mtosatti@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ed31c31b2485..c73efcd03e29 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) &guest_hv_clock, sizeof(guest_hv_clock)))) return 0; - /* - * The interface expects us to write an even number signaling that the - * update is finished. Since the guest won't see the intermediate - * state, we just increase by 2 at the end. + /* This VCPU is paused, but it's legal for a guest to read another + * VCPU's kvmclock, so we really have to follow the specification where + * it says that version is odd if data is being modified, and even after + * it is consistent. + * + * Version field updates must be kept separate. This is because + * kvm_write_guest_cached might use a "rep movs" instruction, and + * writes within a string instruction are weakly ordered. So there + * are three writes overall. + * + * As a small optimization, only write the version field in the first + * and third write. The vcpu->pv_time cache is still valid, because the + * version field is the first in the struct. */ - vcpu->hv_clock.version = guest_hv_clock.version + 2; + BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); + + vcpu->hv_clock.version = guest_hv_clock.version + 1; + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &vcpu->hv_clock, + sizeof(vcpu->hv_clock.version)); + + smp_wmb(); /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); @@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) kvm_write_guest_cached(v->kvm, &vcpu->pv_time, &vcpu->hv_clock, sizeof(vcpu->hv_clock)); + + smp_wmb(); + + vcpu->hv_clock.version++; + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &vcpu->hv_clock, + sizeof(vcpu->hv_clock.version)); return 0; }