diff mbox

[04/62] drm/i915: Restore waitboost credit to the synchronous waiter

Message ID 1464971847-15809-5-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 3, 2016, 4:36 p.m. UTC
Ideally, we want to automagically have the GPU respond to the
instantaneous load by reclocking itself. However, reclocking occurs
relatively slowly, and to the client waiting for a result from the GPU,
too late. To compensate and reduce the client latency, we allow the
first wait from a client to boost the GPU clocks to maximum. This
overcomes the lag in autoreclocking, at the expense of forcing the GPU
clocks too high. So to offset the excessive power usage, we currently
allow a client to only boost the clocks once before we detect the GPU
is idle again. This works reasonably for say the first frame in a
benchmark, but for many more synchronous workloads (like OpenCL) we find
the GPU clocks remain too low. By noting a wait which would idle the GPU
(i.e. we just waited upon the last known request), we can give that
client the idle boost credit (for their next wait) without the 100ms
delay required for us to detect the GPU idle state. The intention is to
boost clients that are stalling in the process of feeding the GPU more
work (and who in doing so let the GPU idle), without granting boost
credits to clients that are throttling themselves (such as compositors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Daniel Vetter June 8, 2016, 9:04 a.m. UTC | #1
On Fri, Jun 03, 2016 at 05:36:29PM +0100, Chris Wilson wrote:
> Ideally, we want to automagically have the GPU respond to the
> instantaneous load by reclocking itself. However, reclocking occurs
> relatively slowly, and to the client waiting for a result from the GPU,
> too late. To compensate and reduce the client latency, we allow the
> first wait from a client to boost the GPU clocks to maximum. This
> overcomes the lag in autoreclocking, at the expense of forcing the GPU
> clocks too high. So to offset the excessive power usage, we currently
> allow a client to only boost the clocks once before we detect the GPU
> is idle again. This works reasonably for say the first frame in a
> benchmark, but for many more synchronous workloads (like OpenCL) we find
> the GPU clocks remain too low. By noting a wait which would idle the GPU
> (i.e. we just waited upon the last known request), we can give that
> client the idle boost credit (for their next wait) without the 100ms
> delay required for us to detect the GPU idle state. The intention is to
> boost clients that are stalling in the process of feeding the GPU more
> work (and who in doing so let the GPU idle), without granting boost
> credits to clients that are throttling themselves (such as compositors).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

I wonder a bit what will happen here for workloads that flip-flop between
engines, since you check for last request on a given engine. But maybe in
the future we'll get clock domains per engine ;-)

Anyway commit message needs to be touched up to say idle engine instead of
idle GPU. With that

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

since it makes sense indeed.

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index da44715c894f..bec02baef190 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1310,6 +1310,22 @@ complete:
>  			*timeout = 0;
>  	}
>  
> +	if (rps && req->seqno == req->engine->last_submitted_seqno) {
> +		/* The GPU is now idle and this client has stalled.
> +		 * Since no other client has submitted a request in the
> +		 * meantime, assume that this client is the only one
> +		 * supplying work to the GPU but is unable to keep that
> +		 * work supplied because it is waiting. Since the GPU is
> +		 * then never kept fully busy, RPS autoclocking will
> +		 * keep the clocks relatively low, causing further delays.
> +		 * Compensate by giving the synchronous client credit for
> +		 * a waitboost next time.
> +		 */
> +		spin_lock(&req->i915->rps.client_lock);
> +		list_del_init(&rps->link);
> +		spin_unlock(&req->i915->rps.client_lock);
> +	}
> +
>  	return ret;
>  }
>  
> -- 
> 2.8.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson June 8, 2016, 10:38 a.m. UTC | #2
On Wed, Jun 08, 2016 at 11:04:57AM +0200, Daniel Vetter wrote:
> On Fri, Jun 03, 2016 at 05:36:29PM +0100, Chris Wilson wrote:
> > Ideally, we want to automagically have the GPU respond to the
> > instantaneous load by reclocking itself. However, reclocking occurs
> > relatively slowly, and to the client waiting for a result from the GPU,
> > too late. To compensate and reduce the client latency, we allow the
> > first wait from a client to boost the GPU clocks to maximum. This
> > overcomes the lag in autoreclocking, at the expense of forcing the GPU
> > clocks too high. So to offset the excessive power usage, we currently
> > allow a client to only boost the clocks once before we detect the GPU
> > is idle again. This works reasonably for say the first frame in a
> > benchmark, but for many more synchronous workloads (like OpenCL) we find
> > the GPU clocks remain too low. By noting a wait which would idle the GPU
> > (i.e. we just waited upon the last known request), we can give that
> > client the idle boost credit (for their next wait) without the 100ms
> > delay required for us to detect the GPU idle state. The intention is to
> > boost clients that are stalling in the process of feeding the GPU more
> > work (and who in doing so let the GPU idle), without granting boost
> > credits to clients that are throttling themselves (such as compositors).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
> > Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
> > Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> I wonder a bit what will happen here for workloads that flip-flop between
> engines, since you check for last request on a given engine. But maybe in
> the future we'll get clock domains per engine ;-)

We disable RPS boosting for inter ring synchronisation, so only if the
client does submit(RCS); submit(BCS); wait(RCS); wait(BCS) would it get
two bites at the cherry. That still falls under the notion of allowed
client behaviour as the second wait is presumably stalling the client
from submitting more work.

s/GPU idle/engine idle/ + tweaks
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index da44715c894f..bec02baef190 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1310,6 +1310,22 @@  complete:
 			*timeout = 0;
 	}
 
+	if (rps && req->seqno == req->engine->last_submitted_seqno) {
+		/* The GPU is now idle and this client has stalled.
+		 * Since no other client has submitted a request in the
+		 * meantime, assume that this client is the only one
+		 * supplying work to the GPU but is unable to keep that
+		 * work supplied because it is waiting. Since the GPU is
+		 * then never kept fully busy, RPS autoclocking will
+		 * keep the clocks relatively low, causing further delays.
+		 * Compensate by giving the synchronous client credit for
+		 * a waitboost next time.
+		 */
+		spin_lock(&req->i915->rps.client_lock);
+		list_del_init(&rps->link);
+		spin_unlock(&req->i915->rps.client_lock);
+	}
+
 	return ret;
 }