diff mbox series

[2/3] drm/i915: Do not use iowait while waiting for the GPU

Message ID 20180730152522.31682-3-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/i915: Limit C-states when waiting for the active request | expand

Commit Message

Chris Wilson July 30, 2018, 3:25 p.m. UTC
A recent trend for cpufreq is to boost the CPU frequencies for
iowaiters, in particularly to benefit high frequency I/O. We do the same
and boost the GPU clocks to try and minimise time spent waiting for the
GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
frequency will result in the GPU being throttled and its frequency being
reduced. Thus declaring iowait negatively impacts on GPU throughput.

v2: Both sleeps!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
---
 drivers/gpu/drm/i915/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Mika Kuoppala July 31, 2018, 1:03 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> A recent trend for cpufreq is to boost the CPU frequencies for
> iowaiters, in particularly to benefit high frequency I/O. We do the same
> and boost the GPU clocks to try and minimise time spent waiting for the
> GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
> frequency will result in the GPU being throttled and its frequency being
> reduced. Thus declaring iowait negatively impacts on GPU throughput.
>
> v2: Both sleeps!
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
> References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")

The commit above has it's own heuristics on when to actual ramp up,
inspecting the interval of io waits.

Regardless of that, with shared tdp, the waiter should not stand in a
way. And that it fixes a regression:

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

On other way around, the atomic commit code on updating
planes, could potentially benefit of changing to the
io_schedule_timeout. (and/or adopting c state limits)

-Mika

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Francisco Jerez <currojerez@riseup.net>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index f3ff8dbe363d..3e48ea87b324 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq,
>  			goto complete;
>  		}
>  
> -		timeout = io_schedule_timeout(timeout);
> +		timeout = schedule_timeout(timeout);
>  	} while (1);
>  
>  	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq,
>  				      wait.seqno - 1))
>  			qos = wait_dma_qos_add();
>  
> -		timeout = io_schedule_timeout(timeout);
> +		timeout = schedule_timeout(timeout);
>  
>  		if (intel_wait_complete(&wait) &&
>  		    intel_wait_check_request(&wait, rq))
> -- 
> 2.18.0
Francisco Jerez July 31, 2018, 7:25 p.m. UTC | #2
Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> A recent trend for cpufreq is to boost the CPU frequencies for
>> iowaiters, in particularly to benefit high frequency I/O. We do the same
>> and boost the GPU clocks to try and minimise time spent waiting for the
>> GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
>> frequency will result in the GPU being throttled and its frequency being
>> reduced. Thus declaring iowait negatively impacts on GPU throughput.
>>
>> v2: Both sleeps!
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
>> References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
>
> The commit above has it's own heuristics on when to actual ramp up,
> inspecting the interval of io waits.
>
> Regardless of that, with shared tdp, the waiter should not stand in a
> way.

I've been running some tests with this series (and your previous ones).
I still see statistically significant regressions in latency-sensitive
benchmarks with this series applied:

 qgears2/render-backend=XRender Extension/test-mode=Text: XXX ±0.26% x12 -> XXX ±0.36% x15 d=-0.97% ±0.32% p=0.00%
 lightsmark:                                              XXX ±0.51% x22 -> XXX ±0.49% x20 d=-1.58% ±0.50% p=0.00%
 gputest/triangle:                                        XXX ±0.67% x10 -> XXX ±1.76% x20 d=-1.73% ±1.47% p=0.52%
 synmark/OglMultithread:ĝ                                 XXX ±0.47% x10 -> XXX ±1.06% x20 d=-3.59% ±0.88% p=0.00%

Numbers above are from a partial benchmark run on BXT J3455 -- I'm still
waiting to get the results of a full run though.

Worse, in combination with my intel_pstate branch the effect of this
patch is strictly negative.  There are no improvements because the
cpufreq governor is able to figure out by itself that boosting the
frequency of the CPU under GPU-bound conditions cannot possibly help
(The HWP boost logic could be fixed to do the same thing easily which
would allow us to obtain the best of both worlds on big core).  The
reason for the regressions is that IOWAIT is a useful signal for the
cpufreq governor to provide reduced latency in applications that are
unable to parallelize enough work between CPU and the IO device -- The
upstream governor is just using it rather ineffectively.

> And that it fixes a regression:
>

This patch isn't necessary anymore to fix the regression, there is
another change going in that mitigates the problem [1].  Can we please
keep the IO schedule calls here? (and elsewhere in the atomic commit
code)

[1] https://lkml.org/lkml/2018/7/30/880

> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> On other way around, the atomic commit code on updating
> planes, could potentially benefit of changing to the
> io_schedule_timeout. (and/or adopting c state limits)
>
> -Mika
>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> Cc: Francisco Jerez <currojerez@riseup.net>
>> ---
>>  drivers/gpu/drm/i915/i915_request.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> index f3ff8dbe363d..3e48ea87b324 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq,
>>  			goto complete;
>>  		}
>>  
>> -		timeout = io_schedule_timeout(timeout);
>> +		timeout = schedule_timeout(timeout);
>>  	} while (1);
>>  
>>  	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
>> @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq,
>>  				      wait.seqno - 1))
>>  			qos = wait_dma_qos_add();
>>  
>> -		timeout = io_schedule_timeout(timeout);
>> +		timeout = schedule_timeout(timeout);
>>  
>>  		if (intel_wait_complete(&wait) &&
>>  		    intel_wait_check_request(&wait, rq))
>> -- 
>> 2.18.0
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f3ff8dbe363d..3e48ea87b324 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1376,7 +1376,7 @@  long i915_request_wait(struct i915_request *rq,
 			goto complete;
 		}
 
-		timeout = io_schedule_timeout(timeout);
+		timeout = schedule_timeout(timeout);
 	} while (1);
 
 	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
@@ -1414,7 +1414,7 @@  long i915_request_wait(struct i915_request *rq,
 				      wait.seqno - 1))
 			qos = wait_dma_qos_add();
 
-		timeout = io_schedule_timeout(timeout);
+		timeout = schedule_timeout(timeout);
 
 		if (intel_wait_complete(&wait) &&
 		    intel_wait_check_request(&wait, rq))