diff mbox

[0/9] GPU-bound energy efficiency improvements for the intel_pstate driver.

Message ID 152397378196.27610.5253114574084892203@mail.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 17, 2018, 2:03 p.m. UTC
I have to ask, if this is all just to work around iowait triggering high
frequencies for GPU bound applications, does it all just boil down to
i915 incorrectly using iowait. Does this patch set perform better than


Quite clearly the general framework could prove useful in a broader
range of situations, but does the above suffice? (And can be backported
to stable.)
-Chris

Comments

Srinivas Pandruvada April 17, 2018, 3:34 p.m. UTC | #1
On Tue, 2018-04-17 at 15:03 +0100, Chris Wilson wrote:
> I have to ask, if this is all just to work around iowait triggering
> high
> frequencies for GPU bound applications, does it all just boil down to
> i915 incorrectly using iowait. Does this patch set perform better
> than
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c
> b/drivers/gpu/drm/i915/i915_request.c
> index 9ca9c24b4421..7e7c95411bcd 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1267,7 +1267,7 @@ long i915_request_wait(struct i915_request *rq,
>                         goto complete;
>                 }
>  
> -               timeout = io_schedule_timeout(timeout);
> +               timeout = schedule_timeout(timeout);
>         } while (1);
>  
>         GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> 
> Quite clearly the general framework could prove useful in a broader
> range of situations, but does the above suffice? (And can be
> backported
> to stable.)

Definitely a very good test to do.

Thanks,
Srinivas

> -Chris
Francisco Jerez April 17, 2018, 7:27 p.m. UTC | #2
Hey Chris,

Chris Wilson <chris@chris-wilson.co.uk> writes:

> I have to ask, if this is all just to work around iowait triggering high
> frequencies for GPU bound applications, does it all just boil down to
> i915 incorrectly using iowait. Does this patch set perform better than
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 9ca9c24b4421..7e7c95411bcd 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1267,7 +1267,7 @@ long i915_request_wait(struct i915_request *rq,
>                         goto complete;
>                 }
>  
> -               timeout = io_schedule_timeout(timeout);
> +               timeout = schedule_timeout(timeout);
>         } while (1);
>  
>         GEM_BUG_ON(!intel_wait_has_seqno(&wait));
>
> Quite clearly the general framework could prove useful in a broader
> range of situations, but does the above suffice? (And can be backported
> to stable.)
> -Chris

Nope...  This hunk is one of the first things I tried when I started
looking into this.  It didn't cut it, and it seemed to lead to some
regressions in latency-bound test-cases that were relying on the upward
bias provided by IOWAIT boosting in combination with the upstream
P-state governor.

The reason why it's not sufficient is that the bulk of the energy
efficiency improvement from this series is obtained by dampening
high-frequency oscillations of the CPU P-state, which occur currently
for any periodically fluctuating workload (not only i915 rendering)
regardless of whether IOWAIT boosting kicks in.  i915 using IO waits
does exacerbate the problem with the upstream governor by amplifying the
oscillation, but it's not really the ultimate cause.

In combination with v2 (you can take a peek at the half-baked patch here
[1], planning to make a few more changes this week so it isn't quite
ready for review yet) this hunk will actually cause more serious
regressions because v2 is able to use the frequent IOWAIT stalls of the
i915 driver in combination with an IO-underutilized system as a strong
indication that the workload is latency-bound, which causes it to
transition to latency-minimizing mode, which significantly improves
performance of latency-bound rendering (most visible in a handful X11
test-cases and some GPU/CPU sync test-cases from SynMark2).

[1] https://people.freedesktop.org/~currojerez/intel_pstate-lp/0001-cpufreq-intel_pstate-Implement-variably-low-pass-fil-v1.8.patch
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9ca9c24b4421..7e7c95411bcd 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1267,7 +1267,7 @@  long i915_request_wait(struct i915_request *rq,
                        goto complete;
                }
 
-               timeout = io_schedule_timeout(timeout);
+               timeout = schedule_timeout(timeout);
        } while (1);
 
        GEM_BUG_ON(!intel_wait_has_seqno(&wait));