From patchwork Mon Jun 26 08:28:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dietmar Eggemann X-Patchwork-Id: 9808799 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 55416603D7 for ; Mon, 26 Jun 2017 08:28:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64C7022A68 for ; Mon, 26 Jun 2017 08:28:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56F1624B44; Mon, 26 Jun 2017 08:28:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B18B522A68 for ; Mon, 26 Jun 2017 08:28:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751397AbdFZI2f (ORCPT ); Mon, 26 Jun 2017 04:28:35 -0400 Received: from foss.arm.com ([217.140.101.70]:40218 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751364AbdFZI2e (ORCPT ); Mon, 26 Jun 2017 04:28:34 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 94F7B80D; Mon, 26 Jun 2017 01:28:33 -0700 (PDT) Received: from [10.1.210.41] (e107985-lin.cambridge.arm.com [10.1.210.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B7BA13F41F; Mon, 26 Jun 2017 01:28:31 -0700 (PDT) Subject: Re: [PATCH 2/6] drivers base/arch_topology: frequency-invariant load-tracking support From: Dietmar Eggemann To: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org, linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org, Greg Kroah-Hartman , Russell King , Catalin Marinas , Will Deacon , Juri Lelli , Vincent Guittot , Peter Zijlstra , Morten Rasmussen References: <20170608075513.12475-1-dietmar.eggemann@arm.com> <20170608075513.12475-3-dietmar.eggemann@arm.com> Message-ID: <7c6decdf-42e2-b5f3-6497-8a2d99a95435@arm.com> Date: Mon, 26 Jun 2017 09:28:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170608075513.12475-3-dietmar.eggemann@arm.com> Content-Language: en-GB Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 08/06/17 08:55, Dietmar Eggemann wrote: > Implements an arch-specific frequency-scaling function > topology_get_freq_scale() which provides the following frequency > scaling factor: > > current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_supported_freq(cpu) [...] Frequency and cpu-invariant load tracking are part of the task schedulers hot path: e.g.: __update_load_avg_se()-> ___update_load_avg() -> accumulate_sum() That's why function calls should be avoided here. I would like to fold the following changes into patch 2/6 in v2: commit 1397770fe47ce5d34511e7062bd3a8bc96a74590 Author: Dietmar Eggemann Date: Sat Jun 24 16:46:45 2017 +0100 drivers base/arch_topology: eliminate function call for cpu and frequency-invariant accounting topology_get_cpu_scale() and topology_get_freq_scale() are the arm/arm64 architecture specific implementations to provide cpu-invariant and frequency-invariant accounting support up to the task scheduler. Define them as static inline functions to allow cpu-invariant and frequency-invariant accounting to happen without an extra function call involved. Test results on JUNO (arm64): root@juno:~# grep "__update_load_avg_\|update_group_capacity\|topology_get" available_filter_functions > set_ftrace_filter root@juno:~# echo function_graph > current_tracer root@juno:~# cat trace | tail -50 w/ this patch: ... 3) 0.700 us | __update_load_avg_se.isra.5(); ... 3) 0.750 us | __update_load_avg_cfs_rq(); ... 3) 0.780 us | update_group_capacity(); ... w/o this patch: 4) | __update_load_avg_cfs_rq() { 4) 0.380 us | topology_get_freq_scale(); 4) 0.340 us | topology_get_cpu_scale(); 4) 6.420 us | } ... 4) | __update_load_avg_se.isra.4() { 4) 0.300 us | topology_get_freq_scale(); 4) 0.260 us | topology_get_cpu_scale(); 4) 5.800 us | } ... 4) | update_group_capacity() { 4) 0.260 us | topology_get_cpu_scale(); 4) 3.540 us | } ... So these extra function calls cost ~2.5us each (on Cortex A53, cpu0,3,4,5). Since this happens in the task scheduler hot-path, they have to be avoided. Signed-off-by: Dietmar Eggemann [...] diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c index d7e130c268fb..8dfa4c3dbfc2 100644 --- a/drivers/base/arch_topology.c +++ b/drivers/base/arch_topology.c @@ -23,18 +23,8 @@ #include static DEFINE_MUTEX(cpu_scale_mutex); -static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; -static DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE; - -unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu) -{ - return per_cpu(cpu_scale, cpu); -} - -unsigned long topology_get_freq_scale(struct sched_domain *sd, int cpu) -{ - return per_cpu(freq_scale, cpu); -} +DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE; +DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE; void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity) { diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index 3fb4d8ccb179..cf22631e6765 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -9,10 +9,21 @@ void topology_normalize_cpu_scale(void); struct device_node; int topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu); +DECLARE_PER_CPU(unsigned long, cpu_scale); +DECLARE_PER_CPU(unsigned long, freq_scale); + struct sched_domain; -unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu); +static inline +unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu) +{ + return per_cpu(cpu_scale, cpu); +} -unsigned long topology_get_freq_scale(struct sched_domain *sd, int cpu); +static inline +unsigned long topology_get_freq_scale(struct sched_domain *sd, int cpu) +{ + return per_cpu(freq_scale, cpu); +} void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);