From patchwork Wed Jul 3 23:51:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 11030647 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 77EA61398 for ; Wed, 3 Jul 2019 23:59:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 668DB28A05 for ; Wed, 3 Jul 2019 23:59:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5A51128A11; Wed, 3 Jul 2019 23:59:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE6E528A05 for ; Wed, 3 Jul 2019 23:59:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727395AbfGCX7v (ORCPT ); Wed, 3 Jul 2019 19:59:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40276 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727398AbfGCX7u (ORCPT ); Wed, 3 Jul 2019 19:59:50 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 21E83C04FFEF; Wed, 3 Jul 2019 23:59:50 +0000 (UTC) Received: from amt.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8AFE58351E; Wed, 3 Jul 2019 23:59:49 +0000 (UTC) Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 2EBDA10516E; Wed, 3 Jul 2019 20:59:00 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id x63NwxJm023712; Wed, 3 Jul 2019 20:58:59 -0300 Message-Id: <20190703235828.340866829@amt.cnet> User-Agent: quilt/0.60-1 Date: Wed, 03 Jul 2019 20:51:25 -0300 From: Marcelo Tosatti To: kvm-devel Cc: Paolo Bonzini , Radim Krcmar , Andrea Arcangeli , "Rafael J. Wysocki" , Peter Zijlstra , Wanpeng Li , Konrad Rzeszutek Wilk , Raslan KarimAllah , Boris Ostrovsky , Ankur Arora , Christian Borntraeger , linux-pm@vger.kernel.org, Marcelo Tosatti Subject: [patch 1/5] add cpuidle-haltpoll driver References: <20190703235124.783034907@amt.cnet> Content-Disposition: inline; filename=00-cpuidle-haltpoll X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 03 Jul 2019 23:59:50 +0000 (UTC) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add a cpuidle driver that calls the architecture default_idle routine. To be used in conjunction with the haltpoll governor. Signed-off-by: Marcelo Tosatti --- arch/x86/kernel/process.c | 2 - drivers/cpuidle/Kconfig | 9 +++++ drivers/cpuidle/Makefile | 1 drivers/cpuidle/cpuidle-haltpoll.c | 65 +++++++++++++++++++++++++++++++++++++ 4 files changed, 76 insertions(+), 1 deletion(-) Index: linux-2.6-newcpuidle.git/arch/x86/kernel/process.c =================================================================== --- linux-2.6-newcpuidle.git.orig/arch/x86/kernel/process.c +++ linux-2.6-newcpuidle.git/arch/x86/kernel/process.c @@ -580,7 +580,7 @@ void __cpuidle default_idle(void) safe_halt(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); } -#ifdef CONFIG_APM_MODULE +#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE) EXPORT_SYMBOL(default_idle); #endif Index: linux-2.6-newcpuidle.git/drivers/cpuidle/Kconfig =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/Kconfig +++ linux-2.6-newcpuidle.git/drivers/cpuidle/Kconfig @@ -51,6 +51,15 @@ depends on PPC source "drivers/cpuidle/Kconfig.powerpc" endmenu +config HALTPOLL_CPUIDLE + tristate "Halt poll cpuidle driver" + depends on X86 && KVM_GUEST + default y + help + This option enables halt poll cpuidle driver, which allows to poll + before halting in the guest (more efficient than polling in the + host via halt_poll_ns for some scenarios). + endif config ARCH_NEEDS_CPU_IDLE_COUPLED Index: linux-2.6-newcpuidle.git/drivers/cpuidle/Makefile =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/Makefile +++ linux-2.6-newcpuidle.git/drivers/cpuidle/Makefile @@ -7,6 +7,7 @@ obj-y += cpuidle.o driver.o governor.o s obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o +obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o ################################################################################## # ARM SoC drivers Index: linux-2.6-newcpuidle.git/drivers/cpuidle/cpuidle-haltpoll.c =================================================================== --- /dev/null +++ linux-2.6-newcpuidle.git/drivers/cpuidle/cpuidle-haltpoll.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * cpuidle driver for haltpoll governor. + * + * Copyright 2019 Red Hat, Inc. and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Authors: Marcelo Tosatti + */ + +#include +#include +#include +#include +#include + +static int default_enter_idle(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + if (current_clr_polling_and_test()) { + local_irq_enable(); + return index; + } + default_idle(); + return index; +} + +static struct cpuidle_driver haltpoll_driver = { + .name = "haltpoll", + .owner = THIS_MODULE, + .states = { + { /* entry 0 is for polling */ }, + { + .enter = default_enter_idle, + .exit_latency = 1, + .target_residency = 1, + .power_usage = -1, + .name = "haltpoll idle", + .desc = "default architecture idle", + }, + }, + .safe_state_index = 0, + .state_count = 2, +}; + +static int __init haltpoll_init(void) +{ + struct cpuidle_driver *drv = &haltpoll_driver; + + cpuidle_poll_state_init(drv); + + if (!kvm_para_available()) + return 0; + + return cpuidle_register(&haltpoll_driver, NULL); +} + +static void __exit haltpoll_exit(void) +{ + cpuidle_unregister(&haltpoll_driver); +} + +module_init(haltpoll_init); +module_exit(haltpoll_exit); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Marcelo Tosatti "); + From patchwork Wed Jul 3 23:51:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 11030643 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73C7C1902 for ; Wed, 3 Jul 2019 23:59:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6576028A05 for ; Wed, 3 Jul 2019 23:59:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59CFC28A31; Wed, 3 Jul 2019 23:59:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7B6628A05 for ; Wed, 3 Jul 2019 23:59:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727398AbfGCX7v (ORCPT ); Wed, 3 Jul 2019 19:59:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34634 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727100AbfGCX7v (ORCPT ); Wed, 3 Jul 2019 19:59:51 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id F1F9A859FB; Wed, 3 Jul 2019 23:59:49 +0000 (UTC) Received: from amt.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7E02F1001B1B; Wed, 3 Jul 2019 23:59:49 +0000 (UTC) Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 3F4A9105171; Wed, 3 Jul 2019 20:59:01 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id x63Nx03M023713; Wed, 3 Jul 2019 20:59:00 -0300 Message-Id: <20190703235828.405864213@amt.cnet> User-Agent: quilt/0.60-1 Date: Wed, 03 Jul 2019 20:51:26 -0300 From: Marcelo Tosatti To: kvm-devel Cc: Paolo Bonzini , Radim Krcmar , Andrea Arcangeli , "Rafael J. Wysocki" , Peter Zijlstra , Wanpeng Li , Konrad Rzeszutek Wilk , Raslan KarimAllah , Boris Ostrovsky , Ankur Arora , Christian Borntraeger , linux-pm@vger.kernel.org, Marcelo Tosatti Subject: [patch 2/5] cpuidle: add poll_limit_ns to cpuidle_device structure References: <20190703235124.783034907@amt.cnet> Content-Disposition: inline; filename=01-poll-time X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 03 Jul 2019 23:59:50 +0000 (UTC) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add a poll_limit_ns variable to cpuidle_device structure. Calculate and configure it in the new cpuidle_poll_time function, in case its zero. Individual governors are allowed to override this value. Signed-off-by: Marcelo Tosatti --- drivers/cpuidle/cpuidle.c | 40 ++++++++++++++++++++++++++++++++++++++++ drivers/cpuidle/poll_state.c | 11 ++--------- include/linux/cpuidle.h | 8 ++++++++ 3 files changed, 50 insertions(+), 9 deletions(-) Index: linux-2.6-newcpuidle.git/drivers/cpuidle/cpuidle.c =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/cpuidle.c +++ linux-2.6-newcpuidle.git/drivers/cpuidle/cpuidle.c @@ -362,6 +362,36 @@ void cpuidle_reflect(struct cpuidle_devi } /** + * cpuidle_poll_time - return amount of time to poll for, + * governors can override dev->poll_limit_ns if necessary + * + * @drv: the cpuidle driver tied with the cpu + * @dev: the cpuidle device + * + */ +u64 cpuidle_poll_time(struct cpuidle_driver *drv, + struct cpuidle_device *dev) +{ + int i; + u64 limit_ns; + + if (dev->poll_limit_ns) + return dev->poll_limit_ns; + + limit_ns = TICK_NSEC; + for (i = 1; i < drv->state_count; i++) { + if (drv->states[i].disabled || dev->states_usage[i].disable) + continue; + + limit_ns = (u64)drv->states[i].target_residency * NSEC_PER_USEC; + } + + dev->poll_limit_ns = limit_ns; + + return dev->poll_limit_ns; +} + +/** * cpuidle_install_idle_handler - installs the cpuidle idle loop handler */ void cpuidle_install_idle_handler(void) Index: linux-2.6-newcpuidle.git/drivers/cpuidle/poll_state.c =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/poll_state.c +++ linux-2.6-newcpuidle.git/drivers/cpuidle/poll_state.c @@ -20,16 +20,9 @@ static int __cpuidle poll_idle(struct cp local_irq_enable(); if (!current_set_polling_and_test()) { unsigned int loop_count = 0; - u64 limit = TICK_NSEC; - int i; + u64 limit; - for (i = 1; i < drv->state_count; i++) { - if (drv->states[i].disabled || dev->states_usage[i].disable) - continue; - - limit = (u64)drv->states[i].target_residency * NSEC_PER_USEC; - break; - } + limit = cpuidle_poll_time(drv, dev); while (!need_resched()) { cpu_relax(); Index: linux-2.6-newcpuidle.git/include/linux/cpuidle.h =================================================================== --- linux-2.6-newcpuidle.git.orig/include/linux/cpuidle.h +++ linux-2.6-newcpuidle.git/include/linux/cpuidle.h @@ -86,6 +86,7 @@ struct cpuidle_device { ktime_t next_hrtimer; int last_residency; + u64 poll_limit_ns; struct cpuidle_state_usage states_usage[CPUIDLE_STATE_MAX]; struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX]; struct cpuidle_driver_kobj *kobj_driver; @@ -132,6 +133,8 @@ extern int cpuidle_select(struct cpuidle extern int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev, int index); extern void cpuidle_reflect(struct cpuidle_device *dev, int index); +extern u64 cpuidle_poll_time(struct cpuidle_driver *drv, + struct cpuidle_device *dev); extern int cpuidle_register_driver(struct cpuidle_driver *drv); extern struct cpuidle_driver *cpuidle_get_driver(void); @@ -166,6 +169,9 @@ static inline int cpuidle_enter(struct c struct cpuidle_device *dev, int index) {return -ENODEV; } static inline void cpuidle_reflect(struct cpuidle_device *dev, int index) { } +extern u64 cpuidle_poll_time(struct cpuidle_driver *drv, + struct cpuidle_device *dev) +{return 0; } static inline int cpuidle_register_driver(struct cpuidle_driver *drv) {return -ENODEV; } static inline struct cpuidle_driver *cpuidle_get_driver(void) {return NULL; } Index: linux-2.6-newcpuidle.git/drivers/cpuidle/sysfs.c =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/sysfs.c +++ linux-2.6-newcpuidle.git/drivers/cpuidle/sysfs.c @@ -334,6 +334,7 @@ struct cpuidle_state_kobj { struct cpuidle_state_usage *state_usage; struct completion kobj_unregister; struct kobject kobj; + struct cpuidle_device *device; }; #ifdef CONFIG_SUSPEND @@ -391,6 +392,7 @@ static inline void cpuidle_remove_s2idle #define kobj_to_state_obj(k) container_of(k, struct cpuidle_state_kobj, kobj) #define kobj_to_state(k) (kobj_to_state_obj(k)->state) #define kobj_to_state_usage(k) (kobj_to_state_obj(k)->state_usage) +#define kobj_to_device(k) (kobj_to_state_obj(k)->device) #define attr_to_stateattr(a) container_of(a, struct cpuidle_state_attr, attr) static ssize_t cpuidle_state_show(struct kobject *kobj, struct attribute *attr, @@ -414,10 +416,14 @@ static ssize_t cpuidle_state_store(struc struct cpuidle_state *state = kobj_to_state(kobj); struct cpuidle_state_usage *state_usage = kobj_to_state_usage(kobj); struct cpuidle_state_attr *cattr = attr_to_stateattr(attr); + struct cpuidle_device *dev = kobj_to_device(kobj); if (cattr->store) ret = cattr->store(state, state_usage, buf, size); + /* reset poll time cache */ + dev->poll_limit_ns = 0; + return ret; } @@ -468,6 +474,7 @@ static int cpuidle_add_state_sysfs(struc } kobj->state = &drv->states[i]; kobj->state_usage = &device->states_usage[i]; + kobj->device = device; init_completion(&kobj->kobj_unregister); ret = kobject_init_and_add(&kobj->kobj, &ktype_state_cpuidle, From patchwork Wed Jul 3 23:51:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 11030655 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE96B1398 for ; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC5ED28A05 for ; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A021F28A31; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1730128A05 for ; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727516AbfGCX7y (ORCPT ); Wed, 3 Jul 2019 19:59:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55722 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727508AbfGCX7y (ORCPT ); Wed, 3 Jul 2019 19:59:54 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5EB42368E3; Wed, 3 Jul 2019 23:59:53 +0000 (UTC) Received: from amt.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id D85DE19698; Wed, 3 Jul 2019 23:59:52 +0000 (UTC) Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id C3A29105172; Wed, 3 Jul 2019 20:59:01 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id x63Nx1kk023714; Wed, 3 Jul 2019 20:59:01 -0300 Message-Id: <20190703235828.471136356@amt.cnet> User-Agent: quilt/0.60-1 Date: Wed, 03 Jul 2019 20:51:27 -0300 From: Marcelo Tosatti To: kvm-devel Cc: Paolo Bonzini , Radim Krcmar , Andrea Arcangeli , "Rafael J. Wysocki" , Peter Zijlstra , Wanpeng Li , Konrad Rzeszutek Wilk , Raslan KarimAllah , Boris Ostrovsky , Ankur Arora , Christian Borntraeger , linux-pm@vger.kernel.org, Marcelo Tosatti Subject: [patch 3/5] governors: unify last_state_idx References: <20190703235124.783034907@amt.cnet> Content-Disposition: inline; filename=02-unify-last-used-idx X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 03 Jul 2019 23:59:53 +0000 (UTC) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since this field is shared by all governors, move it to cpuidle device structure. Signed-off-by: Marcelo Tosatti Index: linux-2.6-newcpuidle.git/drivers/cpuidle/governors/ladder.c =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/governors/ladder.c +++ linux-2.6-newcpuidle.git/drivers/cpuidle/governors/ladder.c @@ -38,7 +38,6 @@ struct ladder_device_state { struct ladder_device { struct ladder_device_state states[CPUIDLE_STATE_MAX]; - int last_state_idx; }; static DEFINE_PER_CPU(struct ladder_device, ladder_devices); @@ -49,12 +48,13 @@ static DEFINE_PER_CPU(struct ladder_devi * @old_idx: the current state index * @new_idx: the new target state index */ -static inline void ladder_do_selection(struct ladder_device *ldev, +static inline void ladder_do_selection(struct cpuidle_device *dev, + struct ladder_device *ldev, int old_idx, int new_idx) { ldev->states[old_idx].stats.promotion_count = 0; ldev->states[old_idx].stats.demotion_count = 0; - ldev->last_state_idx = new_idx; + dev->last_state_idx = new_idx; } /** @@ -68,13 +68,13 @@ static int ladder_select_state(struct cp { struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); struct ladder_device_state *last_state; - int last_residency, last_idx = ldev->last_state_idx; + int last_residency, last_idx = dev->last_state_idx; int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0; int latency_req = cpuidle_governor_latency_req(dev->cpu); /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) { - ladder_do_selection(ldev, last_idx, 0); + ladder_do_selection(dev, ldev, last_idx, 0); return 0; } @@ -91,7 +91,7 @@ static int ladder_select_state(struct cp last_state->stats.promotion_count++; last_state->stats.demotion_count = 0; if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { - ladder_do_selection(ldev, last_idx, last_idx + 1); + ladder_do_selection(dev, ldev, last_idx, last_idx + 1); return last_idx + 1; } } @@ -107,7 +107,7 @@ static int ladder_select_state(struct cp if (drv->states[i].exit_latency <= latency_req) break; } - ladder_do_selection(ldev, last_idx, i); + ladder_do_selection(dev, ldev, last_idx, i); return i; } @@ -116,7 +116,7 @@ static int ladder_select_state(struct cp last_state->stats.demotion_count++; last_state->stats.promotion_count = 0; if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) { - ladder_do_selection(ldev, last_idx, last_idx - 1); + ladder_do_selection(dev, ldev, last_idx, last_idx - 1); return last_idx - 1; } } @@ -139,7 +139,7 @@ static int ladder_enable_device(struct c struct ladder_device_state *lstate; struct cpuidle_state *state; - ldev->last_state_idx = first_idx; + dev->last_state_idx = first_idx; for (i = first_idx; i < drv->state_count; i++) { state = &drv->states[i]; @@ -167,9 +167,8 @@ static int ladder_enable_device(struct c */ static void ladder_reflect(struct cpuidle_device *dev, int index) { - struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); if (index > 0) - ldev->last_state_idx = index; + dev->last_state_idx = index; } static struct cpuidle_governor ladder_governor = { Index: linux-2.6-newcpuidle.git/drivers/cpuidle/governors/menu.c =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/governors/menu.c +++ linux-2.6-newcpuidle.git/drivers/cpuidle/governors/menu.c @@ -117,7 +117,6 @@ */ struct menu_device { - int last_state_idx; int needs_update; int tick_wakeup; @@ -455,7 +454,7 @@ static void menu_reflect(struct cpuidle_ { struct menu_device *data = this_cpu_ptr(&menu_devices); - data->last_state_idx = index; + dev->last_state_idx = index; data->needs_update = 1; data->tick_wakeup = tick_nohz_idle_got_tick(); } @@ -468,7 +467,7 @@ static void menu_reflect(struct cpuidle_ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct menu_device *data = this_cpu_ptr(&menu_devices); - int last_idx = data->last_state_idx; + int last_idx = dev->last_state_idx; struct cpuidle_state *target = &drv->states[last_idx]; unsigned int measured_us; unsigned int new_factor; Index: linux-2.6-newcpuidle.git/drivers/cpuidle/governors/teo.c =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/governors/teo.c +++ linux-2.6-newcpuidle.git/drivers/cpuidle/governors/teo.c @@ -96,7 +96,6 @@ struct teo_idle_state { * @time_span_ns: Time between idle state selection and post-wakeup update. * @sleep_length_ns: Time till the closest timer event (at the selection time). * @states: Idle states data corresponding to this CPU. - * @last_state: Idle state entered by the CPU last time. * @interval_idx: Index of the most recent saved idle interval. * @intervals: Saved idle duration values. */ @@ -104,7 +103,6 @@ struct teo_cpu { u64 time_span_ns; u64 sleep_length_ns; struct teo_idle_state states[CPUIDLE_STATE_MAX]; - int last_state; int interval_idx; unsigned int intervals[INTERVALS]; }; @@ -130,7 +128,9 @@ static void teo_update(struct cpuidle_dr */ measured_us = sleep_length_us; } else { - unsigned int lat = drv->states[cpu_data->last_state].exit_latency; + unsigned int lat; + + lat = drv->states[dev->last_state_idx].exit_latency; measured_us = ktime_to_us(cpu_data->time_span_ns); /* @@ -245,9 +245,9 @@ static int teo_select(struct cpuidle_dri int max_early_idx, idx, i; ktime_t delta_tick; - if (cpu_data->last_state >= 0) { + if (dev->last_state_idx >= 0) { teo_update(drv, dev); - cpu_data->last_state = -1; + dev->last_state_idx = -1; } cpu_data->time_span_ns = local_clock(); @@ -394,7 +394,7 @@ static void teo_reflect(struct cpuidle_d { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - cpu_data->last_state = state; + dev->last_state_idx = state; /* * If the wakeup was not "natural", but triggered by one of the safety * nets, assume that the CPU might have been idle for the entire sleep Index: linux-2.6-newcpuidle.git/include/linux/cpuidle.h =================================================================== --- linux-2.6-newcpuidle.git.orig/include/linux/cpuidle.h +++ linux-2.6-newcpuidle.git/include/linux/cpuidle.h @@ -85,6 +85,7 @@ struct cpuidle_device { unsigned int cpu; ktime_t next_hrtimer; + int last_state_idx; int last_residency; u64 poll_limit_ns; struct cpuidle_state_usage states_usage[CPUIDLE_STATE_MAX]; From patchwork Wed Jul 3 23:51:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 11030649 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F0341398 for ; Wed, 3 Jul 2019 23:59:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E1B928A05 for ; Wed, 3 Jul 2019 23:59:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4172928A0B; Wed, 3 Jul 2019 23:59:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6E6C928A31 for ; Wed, 3 Jul 2019 23:59:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727526AbfGCX7y (ORCPT ); Wed, 3 Jul 2019 19:59:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55708 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727400AbfGCX7v (ORCPT ); Wed, 3 Jul 2019 19:59:51 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2C627A3B48; Wed, 3 Jul 2019 23:59:50 +0000 (UTC) Received: from amt.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E9CF1001B20; Wed, 3 Jul 2019 23:59:49 +0000 (UTC) Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 90700105174; Wed, 3 Jul 2019 20:59:02 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id x63Nx2in023715; Wed, 3 Jul 2019 20:59:02 -0300 Message-Id: <20190703235828.538807297@amt.cnet> User-Agent: quilt/0.60-1 Date: Wed, 03 Jul 2019 20:51:28 -0300 From: Marcelo Tosatti To: kvm-devel Cc: Paolo Bonzini , Radim Krcmar , Andrea Arcangeli , "Rafael J. Wysocki" , Peter Zijlstra , Wanpeng Li , Konrad Rzeszutek Wilk , Raslan KarimAllah , Boris Ostrovsky , Ankur Arora , Christian Borntraeger , linux-pm@vger.kernel.org, Marcelo Tosatti Subject: [patch 4/5] cpuidle: add haltpoll governor References: <20190703235124.783034907@amt.cnet> Content-Disposition: inline; filename=03-cpuidle-haltpoll-governor X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 03 Jul 2019 23:59:50 +0000 (UTC) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The cpuidle_haltpoll governor, in conjunction with the haltpoll cpuidle driver, allows guest vcpus to poll for a specified amount of time before halting. This provides the following benefits to host side polling: 1) The POLL flag is set while polling is performed, which allows a remote vCPU to avoid sending an IPI (and the associated cost of handling the IPI) when performing a wakeup. 2) The VM-exit cost can be avoided. The downside of guest side polling is that polling is performed even with other runnable tasks in the host. Results comparing halt_poll_ns and server/client application where a small packet is ping-ponged: host --> 31.33 halt_poll_ns=300000 / no guest busy spin --> 33.40 (93.8%) halt_poll_ns=0 / guest_halt_poll_ns=300000 --> 32.73 (95.7%) For the SAP HANA benchmarks (where idle_spin is a parameter of the previous version of the patch, results should be the same): hpns == halt_poll_ns idle_spin=0/ idle_spin=800/ idle_spin=0/ hpns=200000 hpns=0 hpns=800000 DeleteC06T03 (100 thread) 1.76 1.71 (-3%) 1.78 (+1%) InsertC16T02 (100 thread) 2.14 2.07 (-3%) 2.18 (+1.8%) DeleteC00T01 (1 thread) 1.34 1.28 (-4.5%) 1.29 (-3.7%) UpdateC00T03 (1 thread) 4.72 4.18 (-12%) 4.53 (-5%) Signed-off-by: Marcelo Tosatti --- Documentation/virtual/guest-halt-polling.txt | 79 ++++++++++++ drivers/cpuidle/Kconfig | 11 + drivers/cpuidle/governors/Makefile | 1 drivers/cpuidle/governors/haltpoll.c | 175 +++++++++++++++++++++++++++ 4 files changed, 266 insertions(+) Index: linux-2.6-newcpuidle.git/drivers/cpuidle/Kconfig =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/Kconfig +++ linux-2.6-newcpuidle.git/drivers/cpuidle/Kconfig @@ -33,6 +33,17 @@ config CPU_IDLE_GOV_TEO Some workloads benefit from using it and it generally should be safe to use. Say Y here if you are not happy with the alternatives. +config CPU_IDLE_GOV_HALTPOLL + bool "Haltpoll governor (for virtualized systems)" + depends on KVM_GUEST + help + This governor implements haltpoll idle state selection, to be + used in conjunction with the haltpoll cpuidle driver, allowing + for polling for a certain amount of time before entering idle + state. + + Some virtualized workloads benefit from using it. + config DT_IDLE_STATES bool Index: linux-2.6-newcpuidle.git/drivers/cpuidle/governors/Makefile =================================================================== --- linux-2.6-newcpuidle.git.orig/drivers/cpuidle/governors/Makefile +++ linux-2.6-newcpuidle.git/drivers/cpuidle/governors/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_CPU_IDLE_GOV_LADDER) += ladder.o obj-$(CONFIG_CPU_IDLE_GOV_MENU) += menu.o obj-$(CONFIG_CPU_IDLE_GOV_TEO) += teo.o +obj-$(CONFIG_CPU_IDLE_GOV_HALTPOLL) += haltpoll.o Index: linux-2.6-newcpuidle.git/drivers/cpuidle/governors/haltpoll.c =================================================================== --- /dev/null +++ linux-2.6-newcpuidle.git/drivers/cpuidle/governors/haltpoll.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * haltpoll.c - haltpoll idle governor + * + * Copyright 2019 Red Hat, Inc. and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Authors: Marcelo Tosatti + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static unsigned int guest_halt_poll_ns __read_mostly = 200000; +module_param(guest_halt_poll_ns, uint, 0644); + +/* division factor to shrink halt_poll_ns */ +static unsigned int guest_halt_poll_shrink __read_mostly = 2; +module_param(guest_halt_poll_shrink, uint, 0644); + +/* multiplication factor to grow per-cpu poll_limit_ns */ +static unsigned int guest_halt_poll_grow __read_mostly = 2; +module_param(guest_halt_poll_grow, uint, 0644); + +/* value in us to start growing per-cpu halt_poll_ns */ +static unsigned int guest_halt_poll_grow_start __read_mostly = 50000; +module_param(guest_halt_poll_grow_start, uint, 0644); + +/* allow shrinking guest halt poll */ +static bool guest_halt_poll_allow_shrink __read_mostly = true; +module_param(guest_halt_poll_allow_shrink, bool, 0644); + +/** + * haltpoll_select - selects the next idle state to enter + * @drv: cpuidle driver containing state data + * @dev: the CPU + * @stop_tick: indication on whether or not to stop the tick + */ +static int haltpoll_select(struct cpuidle_driver *drv, + struct cpuidle_device *dev, + bool *stop_tick) +{ + int latency_req = cpuidle_governor_latency_req(dev->cpu); + + if (!drv->state_count || latency_req == 0) { + *stop_tick = false; + return 0; + } + + if (dev->poll_limit_ns == 0) + return 1; + + /* Last state was poll? */ + if (dev->last_state_idx == 0) { + /* Halt if no event occurred on poll window */ + if (dev->poll_time_limit == true) + return 1; + + *stop_tick = false; + /* Otherwise, poll again */ + return 0; + } + + *stop_tick = false; + /* Last state was halt: poll */ + return 0; +} + +static void adjust_poll_limit(struct cpuidle_device *dev, unsigned int block_us) +{ + unsigned int val; + u64 block_ns = block_us*NSEC_PER_USEC; + + /* Grow cpu_halt_poll_us if + * cpu_halt_poll_us < block_ns < guest_halt_poll_us + */ + if (block_ns > dev->poll_limit_ns && block_ns <= guest_halt_poll_ns) { + val = dev->poll_limit_ns * guest_halt_poll_grow; + + if (val < guest_halt_poll_grow_start) + val = guest_halt_poll_grow_start; + if (val > guest_halt_poll_ns) + val = guest_halt_poll_ns; + + dev->poll_limit_ns = val; + } else if (block_ns > guest_halt_poll_ns && + guest_halt_poll_allow_shrink) { + unsigned int shrink = guest_halt_poll_shrink; + + val = dev->poll_limit_ns; + if (shrink == 0) + val = 0; + else + val /= shrink; + dev->poll_limit_ns = val; + } +} + +/** + * haltpoll_reflect - update variables and update poll time + * @dev: the CPU + * @index: the index of actual entered state + */ +static void haltpoll_reflect(struct cpuidle_device *dev, int index) +{ + dev->last_state_idx = index; + + if (index != 0) + adjust_poll_limit(dev, dev->last_residency); +} + +/** + * haltpoll_enable_device - scans a CPU's states and does setup + * @drv: cpuidle driver + * @dev: the CPU + */ +static int haltpoll_enable_device(struct cpuidle_driver *drv, + struct cpuidle_device *dev) +{ + dev->poll_limit_ns = 0; + + return 0; +} + +static struct cpuidle_governor haltpoll_governor = { + .name = "haltpoll", + .rating = 21, + .enable = haltpoll_enable_device, + .select = haltpoll_select, + .reflect = haltpoll_reflect, +}; + +static int __init init_haltpoll(void) +{ + if (kvm_para_available()) + return cpuidle_register_governor(&haltpoll_governor); + + return 0; +} + +postcore_initcall(init_haltpoll); Index: linux-2.6-newcpuidle.git/Documentation/virtual/guest-halt-polling.txt =================================================================== --- /dev/null +++ linux-2.6-newcpuidle.git/Documentation/virtual/guest-halt-polling.txt @@ -0,0 +1,79 @@ +Guest halt polling +================== + +The cpuidle_haltpoll driver, with the haltpoll governor, allows +the guest vcpus to poll for a specified amount of time before +halting. +This provides the following benefits to host side polling: + + 1) The POLL flag is set while polling is performed, which allows + a remote vCPU to avoid sending an IPI (and the associated + cost of handling the IPI) when performing a wakeup. + + 2) The VM-exit cost can be avoided. + +The downside of guest side polling is that polling is performed +even with other runnable tasks in the host. + +The basic logic as follows: A global value, guest_halt_poll_ns, +is configured by the user, indicating the maximum amount of +time polling is allowed. This value is fixed. + +Each vcpu has an adjustable guest_halt_poll_ns +("per-cpu guest_halt_poll_ns"), which is adjusted by the algorithm +in response to events (explained below). + +Module Parameters +================= + +The haltpoll governor has 5 tunable module parameters: + +1) guest_halt_poll_ns: +Maximum amount of time, in nanoseconds, that polling is +performed before halting. + +Default: 200000 + +2) guest_halt_poll_shrink: +Division factor used to shrink per-cpu guest_halt_poll_ns when +wakeup event occurs after the global guest_halt_poll_ns. + +Default: 2 + +3) guest_halt_poll_grow: +Multiplication factor used to grow per-cpu guest_halt_poll_ns +when event occurs after per-cpu guest_halt_poll_ns +but before global guest_halt_poll_ns. + +Default: 2 + +4) guest_halt_poll_grow_start: +The per-cpu guest_halt_poll_ns eventually reaches zero +in case of an idle system. This value sets the initial +per-cpu guest_halt_poll_ns when growing. This can +be increased from 10000, to avoid misses during the initial +growth stage: + +10k, 20k, 40k, ... (example assumes guest_halt_poll_grow=2). + +Default: 50000 + +5) guest_halt_poll_allow_shrink: + +Bool parameter which allows shrinking. Set to N +to avoid it (per-cpu guest_halt_poll_ns will remain +high once achieves global guest_halt_poll_ns value). + +Default: Y + +The module parameters can be set from the debugfs files in: + + /sys/module/haltpoll/parameters/ + +Further Notes +============= + +- Care should be taken when setting the guest_halt_poll_ns parameter as a +large value has the potential to drive the cpu usage to 100% on a machine which +would be almost entirely idle otherwise. + From patchwork Wed Jul 3 23:51:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 11030657 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C331F18E8 for ; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B4BE328A33 for ; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A8EB428A11; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4312B28A32 for ; Wed, 3 Jul 2019 23:59:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726988AbfGCX7u (ORCPT ); Wed, 3 Jul 2019 19:59:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50924 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727395AbfGCX7u (ORCPT ); Wed, 3 Jul 2019 19:59:50 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24546308624A; Wed, 3 Jul 2019 23:59:50 +0000 (UTC) Received: from amt.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9168891F5E; Wed, 3 Jul 2019 23:59:49 +0000 (UTC) Received: from amt.cnet (localhost [127.0.0.1]) by amt.cnet (Postfix) with ESMTP id 4F9AA105179; Wed, 3 Jul 2019 20:59:03 -0300 (BRT) Received: (from marcelo@localhost) by amt.cnet (8.14.7/8.14.7/Submit) id x63Nx2kb023716; Wed, 3 Jul 2019 20:59:02 -0300 Message-Id: <20190703235828.625277350@amt.cnet> User-Agent: quilt/0.60-1 Date: Wed, 03 Jul 2019 20:51:29 -0300 From: Marcelo Tosatti To: kvm-devel Cc: Paolo Bonzini , Radim Krcmar , Andrea Arcangeli , "Rafael J. Wysocki" , Peter Zijlstra , Wanpeng Li , Konrad Rzeszutek Wilk , Raslan KarimAllah , Boris Ostrovsky , Ankur Arora , Christian Borntraeger , linux-pm@vger.kernel.org, Marcelo Tosatti Subject: [patch 5/5] cpuidle-haltpoll: disable host side polling when kvm virtualized References: <20190703235124.783034907@amt.cnet> Content-Disposition: inline; filename=04-pollcontrol-guest.patch X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Wed, 03 Jul 2019 23:59:50 +0000 (UTC) Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When performing guest side polling, it is not necessary to also perform host side polling. So disable host side polling, via the new MSR interface, when loading cpuidle-haltpoll driver. Signed-off-by: Marcelo Tosatti --- arch/x86/Kconfig | 7 +++++ arch/x86/include/asm/cpuidle_haltpoll.h | 8 ++++++ arch/x86/kernel/kvm.c | 42 ++++++++++++++++++++++++++++++++ drivers/cpuidle/cpuidle-haltpoll.c | 10 ++++++- include/linux/cpuidle_haltpoll.h | 16 ++++++++++++ 5 files changed, 82 insertions(+), 1 deletion(-) Index: kvm/arch/x86/include/asm/cpuidle_haltpoll.h =================================================================== --- /dev/null +++ kvm/arch/x86/include/asm/cpuidle_haltpoll.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ARCH_HALTPOLL_H +#define _ARCH_HALTPOLL_H + +void arch_haltpoll_enable(void); +void arch_haltpoll_disable(void); + +#endif Index: kvm/drivers/cpuidle/cpuidle-haltpoll.c =================================================================== --- kvm.orig/drivers/cpuidle/cpuidle-haltpoll.c +++ kvm/drivers/cpuidle/cpuidle-haltpoll.c @@ -15,6 +15,7 @@ #include #include #include +#include static int default_enter_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) @@ -47,6 +48,7 @@ static struct cpuidle_driver haltpoll_dr static int __init haltpoll_init(void) { + int ret; struct cpuidle_driver *drv = &haltpoll_driver; cpuidle_poll_state_init(drv); @@ -54,11 +56,16 @@ static int __init haltpoll_init(void) if (!kvm_para_available()) return 0; - return cpuidle_register(&haltpoll_driver, NULL); + ret = cpuidle_register(&haltpoll_driver, NULL); + if (ret == 0) + arch_haltpoll_enable(); + + return ret; } static void __exit haltpoll_exit(void) { + arch_haltpoll_disable(); cpuidle_unregister(&haltpoll_driver); } Index: kvm/include/linux/cpuidle_haltpoll.h =================================================================== --- /dev/null +++ kvm/include/linux/cpuidle_haltpoll.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _CPUIDLE_HALTPOLL_H +#define _CPUIDLE_HALTPOLL_H + +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL +#include +#else +static inline void arch_haltpoll_enable(void) +{ +} + +static inline void arch_haltpoll_disable(void) +{ +} +#endif +#endif Index: kvm/arch/x86/Kconfig =================================================================== --- kvm.orig/arch/x86/Kconfig +++ kvm/arch/x86/Kconfig @@ -787,6 +787,7 @@ config KVM_GUEST bool "KVM Guest support (including kvmclock)" depends on PARAVIRT select PARAVIRT_CLOCK + select ARCH_CPUIDLE_HALTPOLL default y ---help--- This option enables various optimizations for running under the KVM @@ -795,6 +796,12 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config ARCH_CPUIDLE_HALTPOLL + def_bool n + prompt "Disable host haltpoll when loading haltpoll driver" + help + If virtualized under KVM, disable host haltpoll. + config PVH bool "Support for running PVH guests" ---help--- Index: kvm/arch/x86/kernel/kvm.c =================================================================== --- kvm.orig/arch/x86/kernel/kvm.c +++ kvm/arch/x86/kernel/kvm.c @@ -874,3 +874,45 @@ void __init kvm_spinlock_init(void) } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL + +static void kvm_disable_host_haltpoll(void *i) +{ + wrmsrl(MSR_KVM_POLL_CONTROL, 0); +} + +static void kvm_enable_host_haltpoll(void *i) +{ + wrmsrl(MSR_KVM_POLL_CONTROL, 1); +} + +void arch_haltpoll_enable(void) +{ + if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) { + printk(KERN_ERR "kvm: host does not support poll control\n"); + printk(KERN_ERR "kvm: host upgrade recommended\n"); + return; + } + + preempt_disable(); + /* Enable guest halt poll disables host halt poll */ + kvm_disable_host_haltpoll(NULL); + smp_call_function(kvm_disable_host_haltpoll, NULL, 1); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(arch_haltpoll_enable); + +void arch_haltpoll_disable(void) +{ + if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) + return; + + preempt_disable(); + /* Enable guest halt poll disables host halt poll */ + kvm_enable_host_haltpoll(NULL); + smp_call_function(kvm_enable_host_haltpoll, NULL, 1); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(arch_haltpoll_disable); +#endif