From patchwork Thu Aug 1 16:32:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 13750737 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2D491B4C48; Thu, 1 Aug 2024 16:32:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722529970; cv=none; b=s5iMvNij/Xilh6jEiASpz5SYBQ+lss+fsBh+1yrZMEAJJWbg3FUnbZNB9282FQzgSePK4i4xmJ/35QmwaYSEl8VchMHCyNcb7I/zwnbeNMxfXhaRoMT2VjYqm6LPH1x4eucXiB9pEt62/cRzEsZfHcuWqyeJdAZmEysp4N3dbYI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722529970; c=relaxed/simple; bh=ZavNK65scCND+gSI7O4Bi4SDz7LIV99Qku5RqYvP+0Y=; h=Message-ID:Subject:From:To:Cc:Date:Content-Type:MIME-Version; b=GDrz0D54vPQYXQK/OsLXG2wjHORMH4TUQqNNIImm/XkjZoWhcHK16zeuf9Xafw8H6Xib0ApZPbGAq5OX8X4lZBOyckdqtigReRDtkMgigLzPQhRjcGAXfKSvrETRghvHwU2g80+gpR4P457u91SyzwDhUIjy+PcrARX53tdM34U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=s0VBKFXm; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="s0VBKFXm" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=MIME-Version:Content-Type:Date:Cc:To: From:Subject:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=fWt1dkutceKExJIVkktLRrf0RXSuNUtxAecZS3Fh5Z0=; b=s0VBKFXmKih4kq+LZSM4E993Uw Pm5BTAxgaYKM83nXMHRCvLr4fhhEMf2GmE+vTxX2iowkoIU0eCH93/uxTNFQcO3UN21rLntJ+rim6 +v+yQXlyqhU84PBsMeKJdK3BfT2AFSjVN3xg9aQ/W4IGvUM08UjiXqOvEKaI1B3QL8EDHUaaAcPHG fyNTOZu8JSeYMTK9NBCagjkWtIrEEaGfbhOGYbFkUZVAZjYpcKyecxx7Lu23XX1Ha2BkuxK8S2ugj UxMnabpcXa3d1+z+AiJdEXiRDv186gSuaN4eC5y1Z++IzplSIYd0W6neH7047Y2zHpXiTmZJzli1t 4NubKP6A==; Received: from [2001:8b0:10b:5:9c27:6796:c1af:9131] (helo=u3832b3a9db3152.ant.amazon.com) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sZYjJ-000000004KK-14Y1; Thu, 01 Aug 2024 16:32:41 +0000 Message-ID: <6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org> Subject: [PATCH] i8253: Disable PIT timer 0 when not in use From: David Woodhouse To: bp@alien8.de, dave.hansen@linux.intel.com, decui@microsoft.com, haiyangz@microsoft.com, kys@microsoft.com, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, lirongqing@baidu.com, mingo@redhat.com, seanjc@google.com, tglx@linutronix.de, wei.liu@kernel.org, x86 Cc: mikelley@microsoft.com, kvm , Daniel Lezcano Date: Thu, 01 Aug 2024 17:32:40 +0100 User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse Leaving the PIT interrupt running can cause noticeable steal time for virtual guests. The VMM generally has a timer which toggles the IRQ input to the PIC and I/O APIC, which takes CPU time away from the guest. Make sure it's turned off if it isn't going to be used. Except on real hardware, because the less we change on real hardware the better. There be dragons. Signed-off-by: David Woodhouse --- arch/x86/kernel/i8253.c | 13 +++++++++++-- drivers/clocksource/i8253.c | 30 ++++++++++++++++++++++++++---- include/linux/i8253.h | 1 + 3 files changed, 38 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/i8253.c b/arch/x86/kernel/i8253.c index 2b7999a1a50a..54bfbd2aa773 100644 --- a/arch/x86/kernel/i8253.c +++ b/arch/x86/kernel/i8253.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -39,9 +40,17 @@ static bool __init use_pit(void) bool __init pit_timer_init(void) { - if (!use_pit()) - return false; + if (!use_pit()) { + /* + * Don't just ignore the PIT. Ensure it's stopped, because + * VMMs otherwise steal CPU time just to pointlessly waggle + * the (masked) IRQ. + */ + if (!hypervisor_is_type(X86_HYPER_NATIVE)) + clockevent_i8253_disable(); + return false; + } clockevent_i8253_init(true); global_clock_event = &i8253_clockevent; return true; diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c index d4350bb10b83..51aab0a74481 100644 --- a/drivers/clocksource/i8253.c +++ b/drivers/clocksource/i8253.c @@ -108,21 +108,43 @@ int __init clocksource_i8253_init(void) #endif #ifdef CONFIG_CLKEVT_I8253 -static int pit_shutdown(struct clock_event_device *evt) +void clockevent_i8253_disable(void) { - if (!clockevent_state_oneshot(evt) && !clockevent_state_periodic(evt)) - return 0; - raw_spin_lock(&i8253_lock); outb_p(0x30, PIT_MODE); + /* + * The spec is a little bit ambiguous, although it does say that + * "The actual order of the programming is quite flexible. Writing + * out of the MODE control word can be in any sequence of counter + * selection". + * + * Implementations differ, however, in whether a mode change takes + * effect immediately or whether it only occurs when the counter is + * subsequently written. The KVM in-kernel and AWS Nitro hypervisor + * implementations need the counter to be written; QEMU does not. + * + * Theoretically, in one-shot mode, writing the counter will cause + * the IRQ to trigger one last time before falling quiet. Allegedly, + * under Hyper-V it keeps firing repeatedly, thus the existence of + * the i8253_clear_counter_on_shutdown quick to refrain from doing + * so. + */ if (i8253_clear_counter_on_shutdown) { outb_p(0, PIT_CH0); outb_p(0, PIT_CH0); } raw_spin_unlock(&i8253_lock); +} + +static int pit_shutdown(struct clock_event_device *evt) +{ + if (!clockevent_state_oneshot(evt) && !clockevent_state_periodic(evt)) + return 0; + + clockevent_i8253_disable(); return 0; } diff --git a/include/linux/i8253.h b/include/linux/i8253.h index 8336b2f6f834..bf169cfef7f1 100644 --- a/include/linux/i8253.h +++ b/include/linux/i8253.h @@ -24,6 +24,7 @@ extern raw_spinlock_t i8253_lock; extern bool i8253_clear_counter_on_shutdown; extern struct clock_event_device i8253_clockevent; extern void clockevent_i8253_init(bool oneshot); +extern void clockevent_i8253_disable(void); extern void setup_pit_timer(void); From patchwork Fri Aug 2 10:21:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Woodhouse X-Patchwork-Id: 13751406 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D7701DC490; Fri, 2 Aug 2024 10:21:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722594097; cv=none; b=srwZua1bC8p+LRXgHGrDnxciHeIstJQ3EU16uu/RciC0FY5leZ6+ykhAcCnJvYwcrm+rg1q8iKdULz+LEHIbd9H59iKyCi+qP+v8LldbKYFwadJYZM9MbIpEh5NzxfGaEWaaMXnUKiJf18QdxHsHioBlXQfB83ldY9HNUvMwKVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722594097; c=relaxed/simple; bh=HqKIksQl6zH43WdGmb3ctuyphRcYoHRwcYZiqUMybMI=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=YQfMMgKa9jD8rCIHzUUVY8aDQtwhYT3lC2uhiLdnjgU0zqmS1QK1jrDO3iToRbY+RlsS40OsnB/uIMFBoUUKM2hOkcS1yODeW7B4691Li52QWgmlyxhnKNwuIpxVxm+dmstYlDZ1c6cktuhq1K5toaOA3sQ7VOHetvdb2p3J/VY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=fz8UvOn5; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="fz8UvOn5" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=MIME-Version:Content-Type:References: In-Reply-To:Date:Cc:To:From:Subject:Message-ID:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=xTEj3hPhK+TXxiVpZqB0oQUNAYw7EnBUdR7FIiNwGow=; b=fz8UvOn51agiBqQ7sEeqP5H9Vf xBrzDM+LEr+O8GvQQDwJbFLdbSfiG10xMbGHCVVyt0OQ8QMHAvZJRmR1mn+z0kphf44rAU/s/3tzP eB14WDJWgcItR4fIMvgR4TdIrekFE/YBXxeMs7wUjINGzgQMyyscvZYHB4Xw8cjTWab7EP/+n9ylu FxR5ZchJ9G0/NE4i+r9kiEiQ+AGhmKdXxZnl3xPLmPE/nu3moYEOyGm8tlf9pTnzOaHrAii7tE8EJ xzQEbmPJbgMAatsbiyV3IOypztcDU8/4Fiv4trMbmfSTv+0yG0mzmUvSjKR9T0Zr6XL/h9p7wiTZQ Hn4EPh1w==; Received: from [2001:8b0:10b:5:9c27:6796:c1af:9131] (helo=u3832b3a9db3152.ant.amazon.com) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sZpPa-00000000uBX-2GEe; Fri, 02 Aug 2024 10:21:26 +0000 Message-ID: <3bc237678ade809cc685fedb8c1a3d435e590639.camel@infradead.org> Subject: [PATCH 2/1] i8253: Fix stop sequence for timer 0 From: David Woodhouse To: bp@alien8.de, dave.hansen@linux.intel.com, decui@microsoft.com, haiyangz@microsoft.com, kys@microsoft.com, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, lirongqing@baidu.com, mingo@redhat.com, seanjc@google.com, tglx@linutronix.de, wei.liu@kernel.org, x86 Cc: kvm , Daniel Lezcano , Michael Kelley Date: Fri, 02 Aug 2024 11:21:23 +0100 In-Reply-To: <6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org> References: <6cd62b5058e11a6262cb2e798cc85cc5daead3b1.camel@infradead.org> User-Agent: Evolution 3.44.4-0ubuntu2 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse According to the data sheet, writing the MODE register should stop the counter (and thus the interrupts). This appears to work on real hardware, at least modern Intel and AMD systems. It should also work on Hyper-V. However, on some buggy virtual machines the mode change doesn't have any effect until the counter is subsequently loaded (or perhaps when the IRQ next fires). So, set MODE 0 and then load the counter, to ensure that those buggy VMs do the right thing and the interrupts stop. And then write MODE 0 *again* to stop the counter on compliant implementations too. Apparently, Hyper-V keeps firing the IRQ *repeatedly* even in mode zero when it should only happen once, but the second MODE write stops that too. Userspace test program (mostly written by tglx): ===== #include #include #include #include #include typedef unsigned char uint8_t; typedef unsigned short uint16_t; static __always_inline void __out##bwl(type value, uint16_t port) \ { \ asm volatile("out" #bwl " %" #bw "0, %w1" \ : : "a"(value), "Nd"(port)); \ } \ \ static __always_inline type __in##bwl(uint16_t port) \ { \ type value; \ asm volatile("in" #bwl " %w1, %" #bw "0" \ : "=a"(value) : "Nd"(port)); \ return value; \ } BUILDIO(b, b, uint8_t) #define inb __inb #define outb __outb #define PIT_MODE 0x43 #define PIT_CH0 0x40 #define PIT_CH2 0x42 static int is8254; static void dump_pit(void) { if (is8254) { // Latch and output counter and status outb(0xC2, PIT_MODE); printf("%02x %02x %02x\n", inb(PIT_CH0), inb(PIT_CH0), inb(PIT_CH0)); } else { // Latch and output counter outb(0x0, PIT_MODE); printf("%02x %02x\n", inb(PIT_CH0), inb(PIT_CH0)); } } int main(int argc, char* argv[]) { int nr_counts = 2; if (argc > 1) nr_counts = atoi(argv[1]); if (argc > 2) is8254 = 1; if (ioperm(0x40, 4, 1) != 0) return 1; dump_pit(); printf("Set oneshot\n"); outb(0x38, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); printf("Set periodic\n"); outb(0x34, PIT_MODE); outb(0x00, PIT_CH0); outb(0x0F, PIT_CH0); dump_pit(); usleep(1000); dump_pit(); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set stop (%d counter writes)\n", nr_counts); outb(0x30, PIT_MODE); while (nr_counts--) outb(0xFF, PIT_CH0); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); printf("Set MODE 0\n"); outb(0x30, PIT_MODE); dump_pit(); usleep(100000); dump_pit(); usleep(100000); dump_pit(); return 0; } ===== Signed-off-by: David Woodhouse --- arch/x86/kernel/cpu/mshyperv.c | 10 ---------- drivers/clocksource/i8253.c | 36 +++++++++++++++++++++++----------- include/linux/i8253.h | 1 - 3 files changed, 25 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index e0fd57a8ba84..64fdbada83db 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -522,16 +522,6 @@ static void __init ms_hyperv_init_platform(void) if (efi_enabled(EFI_BOOT)) x86_platform.get_nmi_reason = hv_get_nmi_reason; - /* - * Hyper-V VMs have a PIT emulation quirk such that zeroing the - * counter register during PIT shutdown restarts the PIT. So it - * continues to interrupt @18.2 HZ. Setting i8253_clear_counter - * to false tells pit_shutdown() not to zero the counter so that - * the PIT really is shutdown. Generation 2 VMs don't have a PIT, - * and setting this value has no effect. - */ - i8253_clear_counter_on_shutdown = false; - #if IS_ENABLED(CONFIG_HYPERV) if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) || ms_hyperv.paravisor_present) diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c index cb215e6f2e83..39f7c2d736d1 100644 --- a/drivers/clocksource/i8253.c +++ b/drivers/clocksource/i8253.c @@ -20,13 +20,6 @@ DEFINE_RAW_SPINLOCK(i8253_lock); EXPORT_SYMBOL(i8253_lock); -/* - * Handle PIT quirk in pit_shutdown() where zeroing the counter register - * restarts the PIT, negating the shutdown. On platforms with the quirk, - * platform specific code can set this to false. - */ -bool i8253_clear_counter_on_shutdown __ro_after_init = true; - #ifdef CONFIG_CLKSRC_I8253 /* * Since the PIT overflows every tick, its not very useful @@ -112,12 +105,33 @@ void clockevent_i8253_disable(void) { raw_spin_lock(&i8253_lock); + /* + * Writing the MODE register should stop the counter, according to + * the datasheet. This appears to work on real hardware (well, on + * modern Intel and AMD boxes; I didn't dig the Pegasos out of the + * shed). + * + * However, some virtual implementations differ, and the MODE change + * doesn't have any effect until either the counter is written (KVM + * in-kernel PIT) or the next interrupt (QEMU). And in those cases, + * it may not stop the *count*, only the interrupts. Although in + * the virt case, that probably doesn't matter, as the value of the + * counter will only be calculated on demand if the guest reads it; + * it's the interrupts which cause steal time. + * + * Hyper-V apparently has a bug where even in mode 0, the IRQ keeps + * firing repeatedly if the counter is running. But it *does* do the + * right thing when the MODE register is written. + * + * So: write the MODE and then load the counter, which ensures that + * the IRQ is stopped on those buggy virt implementations. And then + * write the MODE again, which is the right way to stop it. + */ outb_p(0x30, PIT_MODE); + outb_p(0, PIT_CH0); + outb_p(0, PIT_CH0); - if (i8253_clear_counter_on_shutdown) { - outb_p(0, PIT_CH0); - outb_p(0, PIT_CH0); - } + outb_p(0x30, PIT_MODE); raw_spin_unlock(&i8253_lock); } diff --git a/include/linux/i8253.h b/include/linux/i8253.h index bf169cfef7f1..56c280eb2d4f 100644 --- a/include/linux/i8253.h +++ b/include/linux/i8253.h @@ -21,7 +21,6 @@ #define PIT_LATCH ((PIT_TICK_RATE + HZ/2) / HZ) extern raw_spinlock_t i8253_lock; -extern bool i8253_clear_counter_on_shutdown; extern struct clock_event_device i8253_clockevent; extern void clockevent_i8253_init(bool oneshot); extern void clockevent_i8253_disable(void);