cpuidle: Select polling interval based on a c-state with a longer target residency

It was noted that a few workloads that idle rapidly regressed when commit
36fcb4292473 ("cpuidle: use first valid target residency as poll time")
was merged. The workloads in question were heavy communicators that idle
rapidly and were impacted by the c-state exit latency as the active CPUs
were not polling at the time of wakeup. As they were not particularly
realistic workloads, it was not considered to be a major problem.

Unfortunately, a bug was reported for a real workload in a production
environment that relied on large numbers of threads operating in a worker
pool pattern. These threads would idle for periods of time longer than the
C1 target residency and so incurred the c-state exit latency penalty. The
application is very sensitive to wakeup latency and indirectly relying
on behaviour prior to commit on a37b969a61c1 ("cpuidle: poll_state: Add
time limit to poll_idle()") to poll for long enough to avoid the exit
latency cost.

The target residency of C1 is typically very short. On some x86 machines,
it can be as low as 2 microseconds. In poll_idle(), the clock is checked
every POLL_IDLE_RELAX_COUNT interations of cpu_relax() and even one
iteration of that loop can be over 1 microsecond so the polling interval is
very close to the granularity of what poll_idle() can detect. Furthermore,
a basic ping pong workload like perf bench pipe has a longer round-trip
time than the 2 microseconds meaning that the CPU will almost certainly
not be polling when the ping-pong completes.

This patch selects a polling interval based on an enabled c-state that
has an target residency longer than 10usec. If there is no enabled-cstate
then polling will be up to a TICK_NSEC/16 similar to what it was up until
kernel 4.20.

As an example, consider a CPU with the following c-state information from
an Intel CPU;

	residency	exit_latency
C1	2		2
C1E	20		10
C3	100		33
C6	400		133

The polling interval selected is 20usec. If booted with
intel_idle.max_cstate=1 then the polling interval is 250usec as the deeper
c-states were not available.

On an AMD EPYC machine, the c-state information is more limited and looks
like

	residency	exit_latency
C1	2		1
C2	800		400

The polling interval selected is 250usec. While C2 was considered, the
polling interval was clamped by CPUIDLE_POLL_MAX.

Note that it is not expected that polling will be a universal win. As
well as potentially trading power for performance, the performance is not
guaranteed if the extra polling prevented a turbo state being reached. The
patch allows the polling interval to be tuned in case a corner-case is
found and if a bug is filed, the tuning may advise how CPUIDLE_POLL_MIN
should be adjusted (e.g. optional overrides per architecture) or if a
different balance point than residency-exit_latency should be used.

tbench4
			     vanilla		    polling
Hmean     1        497.89 (   0.00%)      543.15 *   9.09%*
Hmean     2        975.88 (   0.00%)     1059.73 *   8.59%*
Hmean     4       1953.97 (   0.00%)     2081.37 *   6.52%*
Hmean     8       3645.76 (   0.00%)     4052.95 *  11.17%*
Hmean     16      6882.21 (   0.00%)     6995.93 *   1.65%*
Hmean     32     10752.20 (   0.00%)    10731.53 *  -0.19%*
Hmean     64     12875.08 (   0.00%)    12478.13 *  -3.08%*
Hmean     128    21500.54 (   0.00%)    21098.60 *  -1.87%*
Hmean     256    21253.70 (   0.00%)    21027.18 *  -1.07%*
Hmean     320    20813.50 (   0.00%)    20580.64 *  -1.12%*

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 Documentation/admin-guide/kernel-parameters.txt | 18 +++++++++
 drivers/cpuidle/cpuidle.c                       | 49 ++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 2 deletions(-)

Message ID	20201130092241.GR3371@techsingularity.net (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-pm-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3092C64E90 for <linux-pm@archiver.kernel.org>; Mon, 30 Nov 2020 09:31:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F94E20809 for <linux-pm@archiver.kernel.org>; Mon, 30 Nov 2020 09:31:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726249AbgK3JbQ (ORCPT <rfc822;linux-pm@archiver.kernel.org>); Mon, 30 Nov 2020 04:31:16 -0500 Received: from outbound-smtp24.blacknight.com ([81.17.249.192]:47385 "EHLO outbound-smtp24.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727460AbgK3JbQ (ORCPT <rfc822;linux-pm@vger.kernel.org>); Mon, 30 Nov 2020 04:31:16 -0500 X-Greylist: delayed 461 seconds by postgrey-1.27 at vger.kernel.org; Mon, 30 Nov 2020 04:31:14 EST Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp24.blacknight.com (Postfix) with ESMTPS id ECBA4C0E93 for <linux-pm@vger.kernel.org>; Mon, 30 Nov 2020 09:22:42 +0000 (GMT) Received: (qmail 28195 invoked from network); 30 Nov 2020 09:22:42 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 30 Nov 2020 09:22:42 -0000 Date: Mon, 30 Nov 2020 09:22:41 +0000 From: Mel Gorman <mgorman@techsingularity.net> To: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Marcelo Tosatti <mtosatti@redhat.com>, Mel Gorman <mgorman@techsingularity.net>, Daniel Lezcano <daniel.lezcano@linaro.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux PM <linux-pm@vger.kernel.org> Subject: [PATCH] cpuidle: Select polling interval based on a c-state with a longer target residency Message-ID: <20201130092241.GR3371@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: <linux-pm.vger.kernel.org> X-Mailing-List: linux-pm@vger.kernel.org
Series	cpuidle: Select polling interval based on a c-state with a longer target residency \| expand cpuidle: Select polling interval based on a c-state with a longer target residency

cpuidle: Select polling interval based on a c-state with a longer target residency

Commit Message

Comments

Patch