drm/i915/gt: Limit frequency drop to RPe on parking

Message ID	20201124183521.28623-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=tBxe=E6=lists.freedesktop.org=intel-gfx-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 984FC206D8 From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 18:35:21 +0000 Message-Id: <20201124183521.28623-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915/gt: Limit frequency drop to RPe on parking Precedence: list Cc: stable@vger.kernel.org, Chris Wilson <chris@chris-wilson.co.uk> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915/gt: Limit frequency drop to RPe on parking \| expand drm/i915/gt: Limit frequency drop to RPe on parking

Message ID

20201124183521.28623-1-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 984FC206D8
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Tue, 24 Nov 2020 18:35:21 +0000
Message-Id: <20201124183521.28623-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH] drm/i915/gt: Limit frequency drop to RPe on
 parking
Precedence: list
Cc: stable@vger.kernel.org, Chris Wilson <chris@chris-wilson.co.uk>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

drm/i915/gt: Limit frequency drop to RPe on parking | expand

Commit Message

Chris Wilson Nov. 24, 2020, 6:35 p.m. UTC

We treat idling the GT (intel_rps_park) as a downclock event, and reduce
the frequency we intend to restart the GT with. Since the two workloads
are likely related (e.g. a compositor rendering every 16ms), we want to
carry the frequency and load information from across the idling.
However, we do also need to update the frequencies so that workloads
that run for less than 1ms are autotuned by RPS (otherwise we leave
compositors running at max clocks, draining excess power). Conversely,
if we try to run too slowly, the next workload has to run longer. Since
there is a hysteresis in the power graph, below a certain frequency
running a short workload for longer consumes more energy than running it
slightly higher for less time. The exact balance point is unknown
beforehand, but measurements with 30fps media playback indicate that RPe
is a better choice.

Reported-by: Edward Baker <edward.baker@intel.com>
Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v5.8+
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Rodrigo Vivi Nov. 24, 2020, 7:46 p.m. UTC | #1

On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
> We treat idling the GT (intel_rps_park) as a downclock event, and reduce
> the frequency we intend to restart the GT with. Since the two workloads
> are likely related (e.g. a compositor rendering every 16ms), we want to
> carry the frequency and load information from across the idling.
> However, we do also need to update the frequencies so that workloads
> that run for less than 1ms are autotuned by RPS (otherwise we leave
> compositors running at max clocks, draining excess power). Conversely,
> if we try to run too slowly, the next workload has to run longer. Since
> there is a hysteresis in the power graph, below a certain frequency
> running a short workload for longer consumes more energy than running it
> slightly higher for less time. The exact balance point is unknown
> beforehand, but measurements with 30fps media playback indicate that RPe
> is a better choice.
> 
> Reported-by: Edward Baker <edward.baker@intel.com>
> Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Edward Baker <edward.baker@intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: <stable@vger.kernel.org> # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index b13e7845d483..f74d5e09e176 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps)
>  		adj = -2;
>  	rps->last_adj = adj;
>  	rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
> +	if (rps->cur_freq < rps->efficient_freq) {
> +		rps->cur_freq = rps->efficient_freq;
> +		rps->last_adj = 0;

this is indeed the smallest fix we can propagate:


Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

but I wonder now if we couldn't simply kill the last_adj now and always go
with the rpe on park/unpark

> +	}
>  
>  	GT_TRACE(rps_to_gt(rps), "park:%x\n", rps->cur_freq);
>  }
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Chris Wilson Nov. 24, 2020, 8:16 p.m. UTC | #2

Quoting Rodrigo Vivi (2020-11-24 19:46:29)
> On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
> > We treat idling the GT (intel_rps_park) as a downclock event, and reduce
> > the frequency we intend to restart the GT with. Since the two workloads
> > are likely related (e.g. a compositor rendering every 16ms), we want to
> > carry the frequency and load information from across the idling.
> > However, we do also need to update the frequencies so that workloads
> > that run for less than 1ms are autotuned by RPS (otherwise we leave
> > compositors running at max clocks, draining excess power). Conversely,
> > if we try to run too slowly, the next workload has to run longer. Since
> > there is a hysteresis in the power graph, below a certain frequency
> > running a short workload for longer consumes more energy than running it
> > slightly higher for less time. The exact balance point is unknown
> > beforehand, but measurements with 30fps media playback indicate that RPe
> > is a better choice.
> > 
> > Reported-by: Edward Baker <edward.baker@intel.com>
> > Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Edward Baker <edward.baker@intel.com>
> > Cc: Andi Shyti <andi.shyti@intel.com>
> > Cc: Lyude Paul <lyude@redhat.com>
> > Cc: <stable@vger.kernel.org> # v5.8+
> > ---
> >  drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> > index b13e7845d483..f74d5e09e176 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> > @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps)
> >               adj = -2;
> >       rps->last_adj = adj;
> >       rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
> > +     if (rps->cur_freq < rps->efficient_freq) {
> > +             rps->cur_freq = rps->efficient_freq;
> > +             rps->last_adj = 0;
> 
> this is indeed the smallest fix we can propagate:
> 
> 
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> but I wonder now if we couldn't simply kill the last_adj now and always go
> with the rpe on park/unpark

Since we often have very bursty workloads that are less than 1ms, we do
want to keep the frequency across idling, or else we incur more latency
than is desired by the user (although unpark latency is no joke,
although that is mostly the context switches). The compromise for always
running shorter than an RPS interval is to "gradually" reduce the
frequency (so that compositors do not get stuck at max clocks, yet those
very same compositors also do require very quick autotuning so that
animations are smooth from idle.) Compute is another one where they have
both sustained and bursty workloads, and the shorter-than-RPS bursty
workloads are naturally expected to be to low latency.

So I still think keeping cur_freq is most often the best approach.
-Chris

Andi Shyti Nov. 24, 2020, 9:58 p.m. UTC | #3

Hi Chris,

On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
> We treat idling the GT (intel_rps_park) as a downclock event, and reduce
> the frequency we intend to restart the GT with. Since the two workloads
> are likely related (e.g. a compositor rendering every 16ms), we want to
> carry the frequency and load information from across the idling.
> However, we do also need to update the frequencies so that workloads
> that run for less than 1ms are autotuned by RPS (otherwise we leave
> compositors running at max clocks, draining excess power). Conversely,
> if we try to run too slowly, the next workload has to run longer. Since
> there is a hysteresis in the power graph, below a certain frequency
> running a short workload for longer consumes more energy than running it
> slightly higher for less time. The exact balance point is unknown
> beforehand, but measurements with 30fps media playback indicate that RPe
> is a better choice.
> 
> Reported-by: Edward Baker <edward.baker@intel.com>
> Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Edward Baker <edward.baker@intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: <stable@vger.kernel.org> # v5.8+
> ---
>  drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index b13e7845d483..f74d5e09e176 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps)
>  		adj = -2;
>  	rps->last_adj = adj;
>  	rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
> +	if (rps->cur_freq < rps->efficient_freq) {
> +		rps->cur_freq = rps->efficient_freq;
> +		rps->last_adj = 0;
> +	}

looks OK to me, makes sense:

Reviewed-by: Andi Shyti <andi.shyti@intel.com>

Thanks,
Andi

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index b13e7845d483..f74d5e09e176 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -907,6 +907,10 @@  void intel_rps_park(struct intel_rps *rps)
 		adj = -2;
 	rps->last_adj = adj;
 	rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
+	if (rps->cur_freq < rps->efficient_freq) {
+		rps->cur_freq = rps->efficient_freq;
+		rps->last_adj = 0;
+	}
 
 	GT_TRACE(rps_to_gt(rps), "park:%x\n", rps->cur_freq);
 }

drm/i915/gt: Limit frequency drop to RPe on parking

Commit Message

Comments

Patch