From patchwork Mon Jul 24 09:37:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julien Thierry X-Patchwork-Id: 9859035 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 56B3060349 for ; Mon, 24 Jul 2017 09:38:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4809727FB6 for ; Mon, 24 Jul 2017 09:38:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3C629281C3; Mon, 24 Jul 2017 09:38:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A822427FB6 for ; Mon, 24 Jul 2017 09:38:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=EG2O0cPCxxnQadsZp85aS3j0i90GauIuQVO13Nl1/Ak=; b=IB4 Ut5NkzutXBGKtrCMRdsE2XdAi5ElG47NKbESpNtlsSNXk5aO7vTrKpY00t/sVmXELeoWGqOe7EMOH Croc9dfe43ZpAZ4O8fIXAfEXNaVWj8qSTHy6QoQTI4nopwYVGlkUSY8fYOj6GeSyNjc2DGHpAnVoL veNCGMipuR2bGbgd0AzcO7F/2TykR0j+fSTq/mAQM1VBXq+1jLSktxigSzrfpcOaJjuR57q5Vb7IP KkEX0kUEsOLp3a1C871vjzl+SsFFmKGx/V5A+LccfVkC0FJTOpRQY8K/TFOp0qbg0dHsiJumc2BTc /3e4PXYZZv87ETutvdf/N+6mkvf3kwQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1dZZp4-0002G9-A9; Mon, 24 Jul 2017 09:38:42 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1dZZow-00028w-Ch for linux-arm-kernel@lists.infradead.org; Mon, 24 Jul 2017 09:38:40 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1BA9C80D; Mon, 24 Jul 2017 02:38:14 -0700 (PDT) Received: from e112298-lin.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0753A3F540; Mon, 24 Jul 2017 02:38:12 -0700 (PDT) From: Julien Thierry To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] arm64: use WFE for long delays Date: Mon, 24 Jul 2017 10:37:42 +0100 Message-Id: <1500889062-30476-1-git-send-email-julien.thierry@arm.com> X-Mailer: git-send-email 1.9.1 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170724_023834_533410_1E9AB278 X-CRM114-Status: GOOD ( 13.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Catalin Marinas , Will Deacon , Arnd Bergmann , Julien Thierry MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP The current delay implementation uses the yield instruction, which is a hint that it is beneficial to schedule another thread. As this is a hint, it may be implemented as a NOP, causing all delays to be busy loops. This is the case for many existing CPUs. Taking advantage of the generic timer sending periodic events to all cores, we can use WFE during delays to reduce power consumption. This is beneficial only for delays longer than the period of the timer event stream. If timer event stream is not enabled, delays will behave as yield/busy loops. Signed-off-by: Julien Thierry Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Arnd Bergmann --- arch/arm64/lib/delay.c | 25 ++++++++++++++++++++----- include/asm-generic/delay.h | 9 +++++++-- 2 files changed, 27 insertions(+), 7 deletions(-) -- 1.9.1 diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c index dad4ec9..fdfe6ef 100644 --- a/arch/arm64/lib/delay.c +++ b/arch/arm64/lib/delay.c @@ -24,10 +24,28 @@ #include #include +#include + +#define USECS_TO_CYCLES(TIME_USECS) \ + (xloops_to_cycles(usecs_to_xloops(TIME_USECS))) + +static inline unsigned long xloops_to_cycles(unsigned long xloops) +{ + return (xloops * loops_per_jiffy * HZ) >> 32; +} + void __delay(unsigned long cycles) { cycles_t start = get_cycles(); + if (elf_hwcap & HWCAP_EVTSTRM) { + const cycles_t timer_evt_period = + USECS_TO_CYCLES(1000000 / ARCH_TIMER_EVT_STREAM_FREQ); + + while (get_cycles() - start + timer_evt_period < cycles) + wfe(); + } + while ((get_cycles() - start) < cycles) cpu_relax(); } @@ -35,16 +53,13 @@ void __delay(unsigned long cycles) inline void __const_udelay(unsigned long xloops) { - unsigned long loops; - - loops = xloops * loops_per_jiffy * HZ; - __delay(loops >> 32); + __delay(xloops_to_cycles(xloops)); } EXPORT_SYMBOL(__const_udelay); void __udelay(unsigned long usecs) { - __const_udelay(usecs * 0x10C7UL); /* 2**32 / 1000000 (rounded up) */ + __const_udelay(usecs_to_xloops(usecs)); } EXPORT_SYMBOL(__udelay); diff --git a/include/asm-generic/delay.h b/include/asm-generic/delay.h index 0f79054..1538e58 100644 --- a/include/asm-generic/delay.h +++ b/include/asm-generic/delay.h @@ -10,19 +10,24 @@ extern void __const_udelay(unsigned long xloops); extern void __delay(unsigned long loops); +/* 0x10c7 is 2**32 / 1000000 (rounded up) */ +static inline unsigned long usecs_to_xloops(unsigned long usecs) +{ + return usecs * 0x10C7UL; +} + /* * The weird n/20000 thing suppresses a "comparison is always false due to * limited range of data type" warning with non-const 8-bit arguments. */ -/* 0x10c7 is 2**32 / 1000000 (rounded up) */ #define udelay(n) \ ({ \ if (__builtin_constant_p(n)) { \ if ((n) / 20000 >= 1) \ __bad_udelay(); \ else \ - __const_udelay((n) * 0x10c7ul); \ + __const_udelay(usecs_to_xloops(n)); \ } else { \ __udelay(n); \ } \