From patchwork Tue May 24 07:57:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 9132939 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 712EE6075E for ; Tue, 24 May 2016 07:58:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65DEC281F0 for ; Tue, 24 May 2016 07:58:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5A56A28287; Tue, 24 May 2016 07:58:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ACF7F281F0 for ; Tue, 24 May 2016 07:58:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753557AbcEXH6E (ORCPT ); Tue, 24 May 2016 03:58:04 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:35999 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751853AbcEXH6C (ORCPT ); Tue, 24 May 2016 03:58:02 -0400 Received: by mail-pf0-f193.google.com with SMTP id g132so1310969pfb.3; Tue, 24 May 2016 00:58:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qFFe4lcyCPAYAzsBMbF8PimwRR/kGVl0EdRqrkHrBwg=; b=lzbCSEw//IH5dG2+9oPXxAWjsxdboNLzpm4cFnCFdMclgAx8n3C0tSd9d+mwTsx5Gi KRGLrBeNxErec/qY+xlEIB9dheoNAj81bkoKhAW2H4dB++lLs6gGaEtW36XebU4oUzxs gsMLZY/0uBWpHQ6Shrk8fj2XFNJrLv2QjUV7/0KZTbppkJUAPGNXFYiKHSbeFEZ/25t4 HxhJoIJOwbVFUnXgoinImYPC7dFNo/G6Atg0xKJ2iQZX1QfP7rHdAjB5jP0zk+qYlkPR AiQ5xngdPd4bq2X0xVhpRh+WHVrqQck0SKP6ANmd8QglKprZ7FImwJg34gTX6Ryzu0Gq eb0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=qFFe4lcyCPAYAzsBMbF8PimwRR/kGVl0EdRqrkHrBwg=; b=h1BfpadeWdwsdvIAnLoZYMr4c8CZRAvqkyqxAh3hDFwbBbltJF8UFST+3BnfhzQwxH eFewMJdhvVXlJBf8S9pBpAsMiLETqEcvX0/jCgbJpGzKCBp4Z5H2sQaHcGQ4XPBWXp1D vA/lr3Z2Eg38PUJ42qvA9wfDCPLU0RdpTowxsNFcf9hebaS8rKcboX187WjUxIG/VGLP i9SPEsI5BLwlkxT70xGUmsxV9FFibFok5yp54d01+WxGz677EEBDFVK3eBzaXbBjsUGT wXjW9wKVoQJWUzOPgL2X9BgRQwx7EM/O2+syoWRFSlJ1pWKK6GGtX1nPhotM+8uFUT1L lMAw== X-Gm-Message-State: ALyK8tJosruuz22DioFXuUCLBgeXF1Z/0aPcd8hFJUlqitF3a2Zx6v4r/l5TJeyHIqegqA== X-Received: by 10.98.6.4 with SMTP id 4mr4708160pfg.8.1464076680937; Tue, 24 May 2016 00:58:00 -0700 (PDT) Received: from kernel.kingsoft.cn ([114.255.44.132]) by smtp.gmail.com with ESMTPSA id m64sm29137447pfc.19.2016.05.24.00.57.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 24 May 2016 00:58:00 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Wanpeng Li , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , David Matlack , Christian Borntraeger , Yang Zhang Subject: [PATCH v4] KVM: halt-polling: poll for the upcoming fire timers Date: Tue, 24 May 2016 15:57:54 +0800 Message-Id: <1464076674-4024-1-git-send-email-wanpeng.li@hotmail.com> X-Mailer: git-send-email 1.9.1 MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li If an emulated lapic timer will fire soon(in the scope of 10us the base of dynamic halt-polling, lower-end of message passing workload latency TCP_RR's poll time < 10us) we can treat it as a short halt, and poll to wait it fire, the fire callback apic_timer_fn() will set KVM_REQ_PENDING_TIMER, and this flag will be check during busy poll. This can avoid context switch overhead and the latency which we wake up vCPU. This feature is slightly different from current advance expiration way. Advance expiration rely on the vCPU is running(do polling before vmentry). But in some cases, the timer interrupt may be blocked by other thread(i.e., IF bit is clear) and vCPU cannot be scheduled to run immediately. So even advance the timer early, vCPU may still see the latency. But polling is different, it ensures the vCPU to aware the timer expiration before schedule out. echo HRTICK > /sys/kernel/debug/sched_features in dynticks guests. Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- kernel Linux 4.6.0+ 7.9800 11.0 10.8 14.6 9.4300 13.0 10.2 vanilla kernel Linux 4.6.0+ 15.3 13.6 10.7 12.5 9.0000 12.8 7.38000 poll Cc: Paolo Bonzini Cc: Radim Krčmář Cc: David Matlack Cc: Christian Borntraeger Cc: Yang Zhang Signed-off-by: Wanpeng Li --- v3 -> v4: * add module parameter halt_poll_ns_timer * rename patch subject since lapic maybe just for x86. v2 -> v3: * add Yang's statement to patch description v1 -> v2: * add return statement to non-x86 archs * capture never expire case for x86 (hrtimer is not started) arch/arm/include/asm/kvm_host.h | 4 ++++ arch/arm64/include/asm/kvm_host.h | 4 ++++ arch/mips/include/asm/kvm_host.h | 4 ++++ arch/powerpc/include/asm/kvm_host.h | 4 ++++ arch/s390/include/asm/kvm_host.h | 4 ++++ arch/x86/kvm/lapic.c | 11 +++++++++++ arch/x86/kvm/lapic.h | 1 + arch/x86/kvm/x86.c | 5 +++++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 15 +++++++++++---- 10 files changed, 49 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 0df6b1f..fdfbed9 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -292,6 +292,10 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu) +{ + return -1ULL; +} static inline void kvm_arm_init_debug(void) {} static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {} diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e63d23b..f510d71 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -371,6 +371,10 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu) +{ + return -1ULL; +} void kvm_arm_init_debug(void); void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 6733ac5..baf9472 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -814,6 +814,10 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu) +{ + return -1ULL; +} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} #endif /* __MIPS_KVM_HOST_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ec35af3..5986c79 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -729,5 +729,9 @@ static inline void kvm_arch_exit(void) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu) +{ + return -1ULL; +} #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 37b9017..bdb01a1 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -696,6 +696,10 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu) +{ + return -1ULL; +} void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index bbb5b28..cfeeac3 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -256,6 +256,17 @@ static inline int apic_lvtt_tscdeadline(struct kvm_lapic *apic) return apic->lapic_timer.timer_mode == APIC_LVT_TIMER_TSCDEADLINE; } +u64 apic_get_timer_expire(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu->arch.apic; + struct hrtimer *timer = &apic->lapic_timer.timer; + + if (!hrtimer_active(timer)) + return -1ULL; + else + return ktime_to_ns(hrtimer_get_remaining(timer)); +} + static inline int apic_lvt_nmi_mode(u32 lvt_val) { return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 891c6da..ee4da6c 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -212,4 +212,5 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, struct kvm_vcpu **dest_vcpu); int kvm_vector_to_index(u32 vector, u32 dest_vcpus, const unsigned long *bitmap, u32 bitmap_size); +u64 apic_get_timer_expire(struct kvm_vcpu *vcpu); #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c805cf4..1b89a68 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7623,6 +7623,11 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu) struct static_key kvm_no_apic_vcpu __read_mostly; EXPORT_SYMBOL_GPL(kvm_no_apic_vcpu); +u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu) +{ + return apic_get_timer_expire(vcpu); +} + int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { struct page *page; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b1fa8f1..14d6c23 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -663,6 +663,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); +u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu); void kvm_flush_remote_tlbs(struct kvm *kvm); void kvm_reload_remote_mmus(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index dd4ac9d..afd15ba 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -78,6 +78,10 @@ module_param(halt_poll_ns_grow, uint, S_IRUGO | S_IWUSR); static unsigned int halt_poll_ns_shrink; module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR); +/* lower-end of message passing workload latency TCP_RR's poll time < 10us */ +static unsigned int halt_poll_ns_timer = 10000; +module_param(halt_poll_ns_timer, uint, S_IRUGO | S_IWUSR); + /* * Ordering of locks: * @@ -1966,7 +1970,7 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) grow = READ_ONCE(halt_poll_ns_grow); /* 10us base */ if (val == 0 && grow) - val = 10000; + val = halt_poll_ns_timer; else val *= grow; @@ -2014,12 +2018,15 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) ktime_t start, cur; DECLARE_SWAITQUEUE(wait); bool waited = false; - u64 block_ns; + u64 block_ns, delta, remaining; + remaining = kvm_arch_timer_remaining(vcpu); start = cur = ktime_get(); - if (vcpu->halt_poll_ns) { - ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns); + if (vcpu->halt_poll_ns || remaining < halt_poll_ns_timer) { + ktime_t stop; + delta = vcpu->halt_poll_ns ? vcpu->halt_poll_ns : remaining; + stop = ktime_add_ns(ktime_get(), delta); ++vcpu->stat.halt_attempted_poll; do { /*