From patchwork Thu Jul 7 09:42:15 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 9218699 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9431460B16 for ; Thu, 7 Jul 2016 09:42:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 86EDE287F8 for ; Thu, 7 Jul 2016 09:42:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B2FA287FD; Thu, 7 Jul 2016 09:42:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 24282287F9 for ; Thu, 7 Jul 2016 09:42:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756912AbcGGJmk (ORCPT ); Thu, 7 Jul 2016 05:42:40 -0400 Received: from merlin.infradead.org ([205.233.59.134]:35720 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756809AbcGGJmg (ORCPT ); Thu, 7 Jul 2016 05:42:36 -0400 Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=twins.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.85_2 #1 (Red Hat Linux)) id 1bL5p1-0006gV-EH; Thu, 07 Jul 2016 09:42:15 +0000 Received: by twins.programming.kicks-ass.net (Postfix, from userid 1000) id E512B1256AA87; Thu, 7 Jul 2016 11:42:15 +0200 (CEST) Date: Thu, 7 Jul 2016 11:42:15 +0200 From: Peter Zijlstra To: Wanpeng Li Cc: Paolo Bonzini , Pan Xinhui , linux-s390 , Davidlohr Bueso , mpe@ellerman.id.au, boqun.feng@gmail.com, will.deacon@arm.com, "linux-kernel@vger.kernel.org" , Waiman Long , virtualization@lists.linux-foundation.org, Ingo Molnar , Paul Mackerras , benh@kernel.crashing.org, schwidefsky@de.ibm.com, Paul McKenney , linuxppc-dev@lists.ozlabs.org, kvm Subject: Re: [PATCH v2 0/4] implement vcpu preempted check Message-ID: <20160707094215.GT30921@twins.programming.kicks-ass.net> References: <1467124991-13164-1-git-send-email-xinhui.pan@linux.vnet.ibm.com> <20160706065255.GH30909@twins.programming.kicks-ass.net> <14a24854-9787-e4a1-c9a8-76eba4e97301@redhat.com> <8e8edf1b-b64b-3c44-b580-b9271663844c@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote: > 2016-07-06 20:28 GMT+08:00 Paolo Bonzini : > > Hmm, you're right. We can use bit 0 of struct kvm_steal_time's flags to > > indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the > > VCPU has been scheduled out since the last time the guest reset the bit. > > The guest can use an xchg to test-and-clear it. The bit can be > > accessed at any time, independent of the version field. > > If one vCPU is preempted, and guest check it several times before this > vCPU is scheded in, then the first time we can get "vCPU is > preempted", however, since the field is cleared, the second time we > will get "vCPU is running". > > Do you mean we should call record_steal_time() in both kvm_sched_in() > and kvm_sched_out() to record this field? Btw, if we should keep both > vcpu->preempted and kvm_steal_time's "vCPU preempted" field present > simultaneous? I suspect you want something like so; except this has holes in. We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it after enabling it, this means that if we get preempted in between, the vcpu is reported as running even though it very much is not. Fixing that requires much larger surgery. --- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b2766723c951..117270df43b6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1997,8 +1997,29 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu) vcpu->arch.pv_time_enabled = false; } +static void update_steal_time_preempt(struct kvm_vcpu *vcpu) +{ + struct kvm_steal_time *st; + + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) + return; + + if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime, + &vcpu->arch.st.steal, sizeof(struct kvm_steal_time)))) + return; + + st = &vcpu->arch.st.steal; + + st->pad[KVM_ST_PAD_PREEMPT] = 1; /* we've stopped running */ + + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime, + st, sizeof(struct kvm_steal_time)); +} + static void record_steal_time(struct kvm_vcpu *vcpu) { + struct kvm_steal_time *st; + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; @@ -2006,29 +2027,34 @@ static void record_steal_time(struct kvm_vcpu *vcpu) &vcpu->arch.st.steal, sizeof(struct kvm_steal_time)))) return; - if (vcpu->arch.st.steal.version & 1) - vcpu->arch.st.steal.version += 1; /* first time write, random junk */ + st = &vcpu->arch.st.steal; + + if (st->version & 1) { + st->flags = KVM_ST_FLAG_PREEMPT; + st->version += 1; /* first time write, random junk */ + } - vcpu->arch.st.steal.version += 1; + st->version += 1; kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime, - &vcpu->arch.st.steal, sizeof(struct kvm_steal_time)); + st, sizeof(struct kvm_steal_time)); smp_wmb(); - vcpu->arch.st.steal.steal += current->sched_info.run_delay - + st->steal += current->sched_info.run_delay - vcpu->arch.st.last_steal; vcpu->arch.st.last_steal = current->sched_info.run_delay; + st->pad[KVM_ST_PAD_PREEMPT] = 0; /* we're about to start running */ kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime, - &vcpu->arch.st.steal, sizeof(struct kvm_steal_time)); + st, sizeof(struct kvm_steal_time)); smp_wmb(); - vcpu->arch.st.steal.version += 1; + st->version += 1; kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime, - &vcpu->arch.st.steal, sizeof(struct kvm_steal_time)); + st, sizeof(struct kvm_steal_time)); } int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) @@ -6693,6 +6719,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) preempt_enable(); + update_steal_time_preempt(vcpu); + vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); /*