From patchwork Thu Apr 23 11:46:55 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 6261851 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 25B67BF4A6 for ; Thu, 23 Apr 2015 11:47:43 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 3DD3120380 for ; Thu, 23 Apr 2015 11:47:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3B8C520379 for ; Thu, 23 Apr 2015 11:47:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964832AbbDWLrK (ORCPT ); Thu, 23 Apr 2015 07:47:10 -0400 Received: from mail-wi0-f177.google.com ([209.85.212.177]:35904 "EHLO mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757469AbbDWLrI (ORCPT ); Thu, 23 Apr 2015 07:47:08 -0400 Received: by wizk4 with SMTP id k4so212658608wiz.1; Thu, 23 Apr 2015 04:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; bh=Gip60Kn+e0uhzUDlGiMRxWNtKnSJIhIpzqrkUgGO0Wc=; b=05Np9lRT5WbTAD1gTfarU4G2WnUGsr5+AfvZ6loJs6WlBbDouPgaeGolsmDbDSk8oB +agX+WxX9F/JRFwS7VAB4GOkLjPccXcf/43My1WEa6nCU3FvJdXtnBGUkSDxp4cAiICi xlablt+uIQP3kh1oh9hNxll8FxMsqhlpVa02LAzxlLqslLs0JJdMb4kxpxrv6pPw2PEs LuKtU62xjvDGqJvE4V+FMP3YQDSDqXs1qjc+wkWq6zH7jqDA8TkjVNh7+kv4Bp8EUa3a cN/0ogFt/8M7ilHaO1PBbNLTEpLDcjrvyhudk1wks3plxuaLNrCfCieg1Lj2KaObWosw AreQ== X-Received: by 10.180.10.102 with SMTP id h6mr1344983wib.37.1429789627253; Thu, 23 Apr 2015 04:47:07 -0700 (PDT) Received: from 640k.localdomain (net-93-66-123-41.cust.vodafonedsl.it. [93.66.123.41]) by mx.google.com with ESMTPSA id l20sm11770481wjw.42.2015.04.23.04.47.04 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Apr 2015 04:47:06 -0700 (PDT) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: rkrcmar@redhat.com, luto@kernel.org, mtosatti@redhat.com Subject: [PATCH] kvm: x86: fix kvmclock update protocol Date: Thu, 23 Apr 2015 13:46:55 +0200 Message-Id: <1429789615-36001-1-git-send-email-pbonzini@redhat.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Radim Kr?má? The kvmclock spec says that the host will increment a version field to an odd number, then update stuff, then increment it to an even number. The host is buggy and doesn't do this, and the result is observable when one vcpu reads another vcpu's kvmclock data. There's no good way for a guest kernel to keep its vdso from reading a different vcpu's kvmclock data, but we don't need to care about changing VCPUs as long as we read a consistent data from kvmclock. (VCPU can change outside of this loop too, so it doesn't matter if we return a value not fit for this VCPU.) Based on a patch by Radim Kr?má?. Signed-off-by: Paolo Bonzini Reviewed-by: Radim Kr?má? Acked-by: Marcelo Tosatti --- arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ed31c31b2485..c73efcd03e29 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) &guest_hv_clock, sizeof(guest_hv_clock)))) return 0; - /* - * The interface expects us to write an even number signaling that the - * update is finished. Since the guest won't see the intermediate - * state, we just increase by 2 at the end. + /* This VCPU is paused, but it's legal for a guest to read another + * VCPU's kvmclock, so we really have to follow the specification where + * it says that version is odd if data is being modified, and even after + * it is consistent. + * + * Version field updates must be kept separate. This is because + * kvm_write_guest_cached might use a "rep movs" instruction, and + * writes within a string instruction are weakly ordered. So there + * are three writes overall. + * + * As a small optimization, only write the version field in the first + * and third write. The vcpu->pv_time cache is still valid, because the + * version field is the first in the struct. */ - vcpu->hv_clock.version = guest_hv_clock.version + 2; + BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); + + vcpu->hv_clock.version = guest_hv_clock.version + 1; + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &vcpu->hv_clock, + sizeof(vcpu->hv_clock.version)); + + smp_wmb(); /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); @@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) kvm_write_guest_cached(v->kvm, &vcpu->pv_time, &vcpu->hv_clock, sizeof(vcpu->hv_clock)); + + smp_wmb(); + + vcpu->hv_clock.version++; + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &vcpu->hv_clock, + sizeof(vcpu->hv_clock.version)); return 0; }