diff mbox

[9/9] drm/i915/execlists: Report GPU rendering as IO activity to cpufreq.

Message ID 20180328063845.4884-10-currojerez@riseup.net (mailing list archive)
State New, archived
Headers show

Commit Message

Francisco Jerez March 28, 2018, 6:38 a.m. UTC
This allows cpufreq governors to realize when the system becomes
non-CPU-bound due to GPU rendering activity, which will cause the
intel_pstate LP controller to behave more conservatively: CPU energy
usage will be reduced when there isn't a good chance for system
performance to scale with CPU frequency.  This leaves additional TDP
budget available for the GPU to reach higher frequencies, which is
translated into an improvement in graphics performance to the extent
that the workload remains TDP-limited (Most non-trivial graphics
benchmarks out there improve significantly in TDP-constrained
platforms, see the cover letter for some numbers).  If the workload
isn't (anymore) TDP-limited performance should stay roughly constant,
but energy usage will be divided by a similar factor.

The intel_pstate LP controller is only enabled on BXT+ small-core
platforms at this point, so this shouldn't have any effect on other
systems.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
---
 drivers/gpu/drm/i915/intel_lrc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Chris Wilson March 28, 2018, 8:02 a.m. UTC | #1
Quoting Francisco Jerez (2018-03-28 07:38:45)
> This allows cpufreq governors to realize when the system becomes
> non-CPU-bound due to GPU rendering activity, which will cause the
> intel_pstate LP controller to behave more conservatively: CPU energy
> usage will be reduced when there isn't a good chance for system
> performance to scale with CPU frequency.  This leaves additional TDP
> budget available for the GPU to reach higher frequencies, which is
> translated into an improvement in graphics performance to the extent
> that the workload remains TDP-limited (Most non-trivial graphics
> benchmarks out there improve significantly in TDP-constrained
> platforms, see the cover letter for some numbers).  If the workload
> isn't (anymore) TDP-limited performance should stay roughly constant,
> but energy usage will be divided by a similar factor.

And that's what I thought IPS was already meant to be achieving;
intelligent distribution between different units...
 
> The intel_pstate LP controller is only enabled on BXT+ small-core
> platforms at this point, so this shouldn't have any effect on other
> systems.

Although that's probably only a feature for big core :)
 
> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 3a69b367e565..721f915115bd 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -132,6 +132,7 @@
>   *
>   */
>  #include <linux/interrupt.h>
> +#include <linux/cpufreq.h>
>  
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
> @@ -379,11 +380,13 @@ execlists_context_schedule_in(struct i915_request *rq)
>  {
>         execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>         intel_engine_context_in(rq->engine);
> +       cpufreq_io_active_begin();

Since you only care about a binary value for GPU activity, we don't need
to do this on each context, just between submitting the first request
and the final completion, i.e. couple this to EXECLISTS_ACTIVE_USER.

Haven't yet gone back to check how heavy io_active_begin/end are, but I
trust you appreciate that this code is particularly latency sensitive.
-Chris
Francisco Jerez March 28, 2018, 6:55 p.m. UTC | #2
Hi Chris,

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2018-03-28 07:38:45)
>> This allows cpufreq governors to realize when the system becomes
>> non-CPU-bound due to GPU rendering activity, which will cause the
>> intel_pstate LP controller to behave more conservatively: CPU energy
>> usage will be reduced when there isn't a good chance for system
>> performance to scale with CPU frequency.  This leaves additional TDP
>> budget available for the GPU to reach higher frequencies, which is
>> translated into an improvement in graphics performance to the extent
>> that the workload remains TDP-limited (Most non-trivial graphics
>> benchmarks out there improve significantly in TDP-constrained
>> platforms, see the cover letter for some numbers).  If the workload
>> isn't (anymore) TDP-limited performance should stay roughly constant,
>> but energy usage will be divided by a similar factor.
>
> And that's what I thought IPS was already meant to be achieving;
> intelligent distribution between different units...
>  

I'm not aware of IPS ever being available on any small core systems.

>> The intel_pstate LP controller is only enabled on BXT+ small-core
>> platforms at this point, so this shouldn't have any effect on other
>> systems.
>
> Although that's probably only a feature for big core :)
>

Actually I wouldn't rule out it being useful on big core up front.  I've
been playing around with the idea of hooking up this same interface to
the EPP preference used by HWP, which would allow the HWP controller to
reduce energy usage in GPU-bound conditions, which could potentially
leave additional TDP headroom available for the GPU -- It's not uncommon
for graphics workloads to be TDP-limited even on big core, and while
non-TDP-limited (so neither IPS nor IBC will ever help you) that would
still allow them to optimize CPU energy usage for workloads that are
consistently GPU-bound (which would mean longer battery life while
gaming on a big-core laptop).

>> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> ---
>>  drivers/gpu/drm/i915/intel_lrc.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 3a69b367e565..721f915115bd 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -132,6 +132,7 @@
>>   *
>>   */
>>  #include <linux/interrupt.h>
>> +#include <linux/cpufreq.h>
>>  
>>  #include <drm/drmP.h>
>>  #include <drm/i915_drm.h>
>> @@ -379,11 +380,13 @@ execlists_context_schedule_in(struct i915_request *rq)
>>  {
>>         execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
>>         intel_engine_context_in(rq->engine);
>> +       cpufreq_io_active_begin();
>
> Since you only care about a binary value for GPU activity, we don't need
> to do this on each context, just between submitting the first request
> and the final completion, i.e. couple this to EXECLISTS_ACTIVE_USER.
>
> Haven't yet gone back to check how heavy io_active_begin/end are, but I
> trust you appreciate that this code is particularly latency sensitive.

The cpufreq_io_active_begin/end() interfaces are probably as lightweight
as they can be.  There's no locking involved.  In cases where there
already is an overlapping request, cpufreq_io_active_begin() and
cpufreq_io_active_end() should return without doing much more than an
atomic increment and an atomic cmpxchg respectively.

That said, it wouldn't hurt to call each of them once per sequence of
overlapping requests.  Do you want me to call them from
execlists_set/clear_active() themselves when bit == EXECLISTS_ACTIVE_USER,
or at each callsite of execlists_set/clear_active()? [possibly protected
with a check that execlists_is_active(EXECLISTS_ACTIVE_USER) wasn't
already the expected value]

> -Chris
Chris Wilson March 28, 2018, 7:20 p.m. UTC | #3
Quoting Francisco Jerez (2018-03-28 19:55:12)
> Hi Chris,
> 
[snip]
> That said, it wouldn't hurt to call each of them once per sequence of
> overlapping requests.  Do you want me to call them from
> execlists_set/clear_active() themselves when bit == EXECLISTS_ACTIVE_USER,
> or at each callsite of execlists_set/clear_active()? [possibly protected
> with a check that execlists_is_active(EXECLISTS_ACTIVE_USER) wasn't
> already the expected value]

No, I was thinking of adding an execlists_start()/execlists_stop()
[naming suggestions welcome, begin/end?] where we could hook additional
bookkeeping into.
-Chris
Chris Wilson March 28, 2018, 11:19 p.m. UTC | #4
Quoting Chris Wilson (2018-03-28 20:20:19)
> Quoting Francisco Jerez (2018-03-28 19:55:12)
> > Hi Chris,
> > 
> [snip]
> > That said, it wouldn't hurt to call each of them once per sequence of
> > overlapping requests.  Do you want me to call them from
> > execlists_set/clear_active() themselves when bit == EXECLISTS_ACTIVE_USER,
> > or at each callsite of execlists_set/clear_active()? [possibly protected
> > with a check that execlists_is_active(EXECLISTS_ACTIVE_USER) wasn't
> > already the expected value]
> 
> No, I was thinking of adding an execlists_start()/execlists_stop()
> [naming suggestions welcome, begin/end?] where we could hook additional
> bookkeeping into.

Trying to call execlist_begin() once didn't pan out. It's easier to
reuse for similar bookkeeping used in future patches if execlist_begin()
(or whatever name suits best) at the start of each context.

Something along the lines of:

@@ -374,6 +374,19 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
                                   status, rq);
 }
 
+static inline void
+execlists_begin(struct intel_engine_execlists *execlists,
+               struct execlist_port *port)
+{
+       execlists_set_active_once(execlists, EXECLISTS_ACTIVE_USER);
+}
+
+static inline void
+execlists_end(struct intel_engine_execlists *execlists)
+{
+       execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
+}
+
 static inline void
 execlists_context_schedule_in(struct i915_request *rq)
 {
@@ -710,7 +723,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
        spin_unlock_irq(&engine->timeline->lock);
 
        if (submit) {
-               execlists_set_active(execlists, EXECLISTS_ACTIVE_USER);
+               execlists_begin(execlists, execlists->port);
                execlists_submit_ports(engine);
        }
 
@@ -741,7 +754,7 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
                port++;
        }
 
-       execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
+       execlists_end(execlists);
 }
 
 static void clear_gtiir(struct intel_engine_cs *engine)
@@ -872,7 +885,7 @@ static void execlists_submission_tasklet(unsigned long data)
 {
        struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
        struct intel_engine_execlists * const execlists = &engine->execlists;
-       struct execlist_port * const port = execlists->port;
+       struct execlist_port *port = execlists->port;
        struct drm_i915_private *dev_priv = engine->i915;
        bool fw = false;
 
@@ -1010,9 +1023,19 @@ static void execlists_submission_tasklet(unsigned long data)
 
                        GEM_BUG_ON(count == 0);
                        if (--count == 0) {
+                               /*
+                                * On the final event corresponding to the
+                                * submission of this context, we expect either
+                                * an element-switch event or an completion
+                                * event (and on completion, the active-idle
+                                * marker). No more preemptions, lite-restore
+                                * or otherwise
+                                */
                                GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
                                GEM_BUG_ON(port_isset(&port[1]) &&
                                           !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
+                               GEM_BUG_ON(!port_isset(&port[1]) &&
+                                          !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
                                GEM_BUG_ON(!i915_request_completed(rq));
                                execlists_context_schedule_out(rq);
                                trace_i915_request_out(rq);
@@ -1021,17 +1044,14 @@ static void execlists_submission_tasklet(unsigned long data)
                                GEM_TRACE("%s completed ctx=%d\n",
                                          engine->name, port->context_id);
 
-                               execlists_port_complete(execlists, port);
+                               port = execlists_port_complete(execlists, port);
+                               if (port_isset(port))
+                                       execlists_begin(execlists, port);
+                               else
+                                       execlists_end(execlists);
                        } else {
                                port_set(port, port_pack(rq, count));
                        }

-Chris
Francisco Jerez March 29, 2018, 12:32 a.m. UTC | #5
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Chris Wilson (2018-03-28 20:20:19)
>> Quoting Francisco Jerez (2018-03-28 19:55:12)
>> > Hi Chris,
>> > 
>> [snip]
>> > That said, it wouldn't hurt to call each of them once per sequence of
>> > overlapping requests.  Do you want me to call them from
>> > execlists_set/clear_active() themselves when bit == EXECLISTS_ACTIVE_USER,
>> > or at each callsite of execlists_set/clear_active()? [possibly protected
>> > with a check that execlists_is_active(EXECLISTS_ACTIVE_USER) wasn't
>> > already the expected value]
>> 
>> No, I was thinking of adding an execlists_start()/execlists_stop()
>> [naming suggestions welcome, begin/end?] where we could hook additional
>> bookkeeping into.
>
> Trying to call execlist_begin() once didn't pan out. It's easier to
> reuse for similar bookkeeping used in future patches if execlist_begin()
> (or whatever name suits best) at the start of each context.
>
> Something along the lines of:
>
> @@ -374,6 +374,19 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
>                                    status, rq);
>  }
>  
> +static inline void
> +execlists_begin(struct intel_engine_execlists *execlists,

I had started writing something along the same lines in my working tree
called execlists_active_user_begin/end -- Which name do you prefer?

> +               struct execlist_port *port)
> +{

What do you expect the port argument to be useful for?  Is it ever going
to be anything other than execlists->port?

> +       execlists_set_active_once(execlists, EXECLISTS_ACTIVE_USER);
> +}
> +
> +static inline void
> +execlists_end(struct intel_engine_execlists *execlists)
> +{
> +       execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
> +}
> +
>  static inline void
>  execlists_context_schedule_in(struct i915_request *rq)
>  {
> @@ -710,7 +723,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>         spin_unlock_irq(&engine->timeline->lock);
>  
>         if (submit) {
> -               execlists_set_active(execlists, EXECLISTS_ACTIVE_USER);
> +               execlists_begin(execlists, execlists->port);
>                 execlists_submit_ports(engine);
>         }
>  
> @@ -741,7 +754,7 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
>                 port++;
>         }
>  
> -       execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);

This line doesn't seem to exist in my working tree.  I guess it was just
added?

> +       execlists_end(execlists);
>  }
>  
>  static void clear_gtiir(struct intel_engine_cs *engine)
> @@ -872,7 +885,7 @@ static void execlists_submission_tasklet(unsigned long data)
>  {
>         struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
>         struct intel_engine_execlists * const execlists = &engine->execlists;
> -       struct execlist_port * const port = execlists->port;
> +       struct execlist_port *port = execlists->port;
>         struct drm_i915_private *dev_priv = engine->i915;
>         bool fw = false;
>  
> @@ -1010,9 +1023,19 @@ static void execlists_submission_tasklet(unsigned long data)
>  
>                         GEM_BUG_ON(count == 0);
>                         if (--count == 0) {
> +                               /*
> +                                * On the final event corresponding to the
> +                                * submission of this context, we expect either
> +                                * an element-switch event or an completion
> +                                * event (and on completion, the active-idle
> +                                * marker). No more preemptions, lite-restore
> +                                * or otherwise
> +                                */
>                                 GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
>                                 GEM_BUG_ON(port_isset(&port[1]) &&
>                                            !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
> +                               GEM_BUG_ON(!port_isset(&port[1]) &&
> +                                          !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>                                 GEM_BUG_ON(!i915_request_completed(rq));
>                                 execlists_context_schedule_out(rq);
>                                 trace_i915_request_out(rq);
> @@ -1021,17 +1044,14 @@ static void execlists_submission_tasklet(unsigned long data)
>                                 GEM_TRACE("%s completed ctx=%d\n",
>                                           engine->name, port->context_id);
>  
> -                               execlists_port_complete(execlists, port);
> +                               port = execlists_port_complete(execlists, port);
> +                               if (port_isset(port))
> +                                       execlists_begin(execlists, port);

Isn't this going to call execlists_begin() roughly once per request?
What's the purpose if we expect it to be a no-op except for the first
request submitted after execlists_end()?  Isn't the intention to provide
a hook for bookkeeping that depends on idle to active and active to idle
transitions of the hardware?

> +                               else
> +                                       execlists_end(execlists);
>                         } else {
>                                 port_set(port, port_pack(rq, count));
>                         }
>

Isn't there an "execlists_clear_active(execlists,
EXECLISTS_ACTIVE_USER);" call in your tree a few lines below that is now
redundant?

> -Chris
Chris Wilson March 29, 2018, 1:01 a.m. UTC | #6
Quoting Francisco Jerez (2018-03-29 01:32:07)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Chris Wilson (2018-03-28 20:20:19)
> >> Quoting Francisco Jerez (2018-03-28 19:55:12)
> >> > Hi Chris,
> >> > 
> >> [snip]
> >> > That said, it wouldn't hurt to call each of them once per sequence of
> >> > overlapping requests.  Do you want me to call them from
> >> > execlists_set/clear_active() themselves when bit == EXECLISTS_ACTIVE_USER,
> >> > or at each callsite of execlists_set/clear_active()? [possibly protected
> >> > with a check that execlists_is_active(EXECLISTS_ACTIVE_USER) wasn't
> >> > already the expected value]
> >> 
> >> No, I was thinking of adding an execlists_start()/execlists_stop()
> >> [naming suggestions welcome, begin/end?] where we could hook additional
> >> bookkeeping into.
> >
> > Trying to call execlist_begin() once didn't pan out. It's easier to
> > reuse for similar bookkeeping used in future patches if execlist_begin()
> > (or whatever name suits best) at the start of each context.
> >
> > Something along the lines of:
> >
> > @@ -374,6 +374,19 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
> >                                    status, rq);
> >  }
> >  
> > +static inline void
> > +execlists_begin(struct intel_engine_execlists *execlists,
> 
> I had started writing something along the same lines in my working tree
> called execlists_active_user_begin/end -- Which name do you prefer?

Hmm, so the reason why the user distinction exists is that we may
momentarily remain active after the last user context is switch out, if
there is a preemption event still pending. Keeping the user tag does
help maintain that distinction.

> 
> > +               struct execlist_port *port)
> > +{
> 
> What do you expect the port argument to be useful for?  Is it ever going
> to be anything other than execlists->port?

There are patches afoot to change that, so since I needed to inspect
port here it seemed to tie in nicely with the proposed changes to
execlists_port_complete.

> > +       execlists_set_active_once(execlists, EXECLISTS_ACTIVE_USER);
> > +}
> > +
> > +static inline void
> > +execlists_end(struct intel_engine_execlists *execlists)
> > +{
> > +       execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
> > +}
> > +
> >  static inline void
> >  execlists_context_schedule_in(struct i915_request *rq)
> >  {
> > @@ -710,7 +723,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >         spin_unlock_irq(&engine->timeline->lock);
> >  
> >         if (submit) {
> > -               execlists_set_active(execlists, EXECLISTS_ACTIVE_USER);
> > +               execlists_begin(execlists, execlists->port);
> >                 execlists_submit_ports(engine);
> >         }
> >  
> > @@ -741,7 +754,7 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
> >                 port++;
> >         }
> >  
> > -       execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
> 
> This line doesn't seem to exist in my working tree.  I guess it was just
> added?

A few days ago, yes.

> > +       execlists_end(execlists);
> >  }
> >  
> >  static void clear_gtiir(struct intel_engine_cs *engine)
> > @@ -872,7 +885,7 @@ static void execlists_submission_tasklet(unsigned long data)
> >  {
> >         struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
> >         struct intel_engine_execlists * const execlists = &engine->execlists;
> > -       struct execlist_port * const port = execlists->port;
> > +       struct execlist_port *port = execlists->port;
> >         struct drm_i915_private *dev_priv = engine->i915;
> >         bool fw = false;
> >  
> > @@ -1010,9 +1023,19 @@ static void execlists_submission_tasklet(unsigned long data)
> >  
> >                         GEM_BUG_ON(count == 0);
> >                         if (--count == 0) {
> > +                               /*
> > +                                * On the final event corresponding to the
> > +                                * submission of this context, we expect either
> > +                                * an element-switch event or an completion
> > +                                * event (and on completion, the active-idle
> > +                                * marker). No more preemptions, lite-restore
> > +                                * or otherwise
> > +                                */
> >                                 GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
> >                                 GEM_BUG_ON(port_isset(&port[1]) &&
> >                                            !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
> > +                               GEM_BUG_ON(!port_isset(&port[1]) &&
> > +                                          !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
> >                                 GEM_BUG_ON(!i915_request_completed(rq));
> >                                 execlists_context_schedule_out(rq);
> >                                 trace_i915_request_out(rq);
> > @@ -1021,17 +1044,14 @@ static void execlists_submission_tasklet(unsigned long data)
> >                                 GEM_TRACE("%s completed ctx=%d\n",
> >                                           engine->name, port->context_id);
> >  
> > -                               execlists_port_complete(execlists, port);
> > +                               port = execlists_port_complete(execlists, port);
> > +                               if (port_isset(port))
> > +                                       execlists_begin(execlists, port);
> 
> Isn't this going to call execlists_begin() roughly once per request?
> What's the purpose if we expect it to be a no-op except for the first
> request submitted after execlists_end()?  Isn't the intention to provide
> a hook for bookkeeping that depends on idle to active and active to idle
> transitions of the hardware?

Because on a context switch I need to update the GPU clocks, update
tracking for preemption, and that's just the two I have in my tree that
need to land in the next couple of weeks...

Currently I have,

static inline void
execlists_begin(struct intel_engine_execlists *execlists,
                struct execlist_port *port)
{
        struct intel_engine_cs *engine =
                container_of(execlists, typeof(*engine), execlists);
        struct i915_request *rq = port_request(port);

        execlists_set_active_once(execlists, EXECLISTS_ACTIVE_USER);

        intel_rps_update_engine(engine, rq->ctx);
        execlists->current_priority = rq_prio(rq);
}

static inline void
execlists_end(struct intel_engine_execlists *execlists)
{
        struct intel_engine_cs *engine =
                container_of(execlists, typeof(*engine), execlists);

        execlists->current_priority = INT_MIN;
        intel_rps_update_engine(engine, NULL);

        execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER);
}

> > +                               else
> > +                                       execlists_end(execlists);
> >                         } else {
> >                                 port_set(port, port_pack(rq, count));
> >                         }
> >
> 
> Isn't there an "execlists_clear_active(execlists,
> EXECLISTS_ACTIVE_USER);" call in your tree a few lines below that is now
> redundant?

This is only half the patch to show the gist. :)
-Chris
Chris Wilson March 29, 2018, 1:20 a.m. UTC | #7
Quoting Chris Wilson (2018-03-29 02:01:37)
> Quoting Francisco Jerez (2018-03-29 01:32:07)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > > +                               else
> > > +                                       execlists_end(execlists);
> > >                         } else {
> > >                                 port_set(port, port_pack(rq, count));
> > >                         }
> > >
> > 
> > Isn't there an "execlists_clear_active(execlists,
> > EXECLISTS_ACTIVE_USER);" call in your tree a few lines below that is now
> > redundant?
> 
> This is only half the patch to show the gist. :)

A more concrete sketch,
https://patchwork.freedesktop.org/patch/213594/
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3a69b367e565..721f915115bd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -132,6 +132,7 @@ 
  *
  */
 #include <linux/interrupt.h>
+#include <linux/cpufreq.h>
 
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
@@ -379,11 +380,13 @@  execlists_context_schedule_in(struct i915_request *rq)
 {
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
 	intel_engine_context_in(rq->engine);
+	cpufreq_io_active_begin();
 }
 
 static inline void
 execlists_context_schedule_out(struct i915_request *rq)
 {
+	cpufreq_io_active_end();
 	intel_engine_context_out(rq->engine);
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 }
@@ -726,6 +729,7 @@  execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 		struct i915_request *rq = port_request(port);
 
 		GEM_BUG_ON(!execlists->active);
+		cpufreq_io_active_end();
 		intel_engine_context_out(rq->engine);
 
 		execlists_context_status_change(rq,