[v2,0/4] Dynamic EU configuration of Slice/Subslice/EU.

Message ID	1541477601-10883-1-git-send-email-ankit.p.navik@intel.com (mailing list archive)
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Ankit Navik <ankit.p.navik@intel.com> To: intel-gfx@lists.freedesktop.org Date: Tue, 6 Nov 2018 09:43:17 +0530 Message-Id: <1541477601-10883-1-git-send-email-ankit.p.navik@intel.com> Subject: [Intel-gfx] [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU. Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Dynamic EU configuration of Slice/Subslice/EU. \| expand [v2,0/4] Dynamic EU configuration of Slice/Subslice/EU. [v2,1/4] drm/i915: Get active pending request for given context [v2,2/4] drm/i915: Update render power clock state configuration for given context [v2,3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type [v2,4/4] drm/i915: Predictive governor to control eu/slice/subslice

Message ID

1541477601-10883-1-git-send-email-ankit.p.navik@intel.com (mailing list archive)

Headers

From: Ankit Navik <ankit.p.navik@intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Tue,  6 Nov 2018 09:43:17 +0530
Message-Id: <1541477601-10883-1-git-send-email-ankit.p.navik@intel.com>
Subject: [Intel-gfx] [PATCH v2 0/4] Dynamic EU configuration of
 Slice/Subslice/EU.
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

Dynamic EU configuration of Slice/Subslice/EU. | expand

Message

Ankit Navik Nov. 6, 2018, 4:13 a.m. UTC

drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel

Current GPU configuration code for i915 does not allow us to change
EU/Slice/Sub-slice configuration dynamically. Its done only once while context
is created.

While particular graphics application is running, if we examine the command
requests from user space, we observe that command density is not consistent.
It means there is scope to change the graphics configuration dynamically even
while context is running actively. This patch series proposes the solution to
find the active pending load for all active context at given time and based on
that, dynamically perform graphics configuration for each context.

We use a hr (high resolution) timer with i915 driver in kernel to get a
callback every few milliseconds (this timer value can be configured through
debugfs, default is '0' indicating timer is in disabled state i.e. original
system without any intervention).In the timer callback, we examine pending
commands for a context in the queue, essentially, we intercept them before
they are executed by GPU and we update context with required number of EUs.

Two questions, how did we arrive at right timer value? and what's the right
number of EUs? For the prior one, empirical data to achieve best performance
in least power was considered. For the later one, we roughly categorized number 
of EUs logically based on platform. Now we compare number of pending commands
with a particular threshold and then set number of EUs accordingly with update
context. That threshold is also based on experiments & findings. If GPU is able
to catch up with CPU, typically there are no pending commands, the EU config
would remain unchanged there. In case there are more pending commands we
reprogram context with higher number of EUs. Please note, here we are changing
EUs even while context is running by examining pending commands every 'x'
milliseconds.

With this solution in place, on KBL-GT3 + Android we saw following pnp
benefits, power numbers mentioned here are system power.

App /KPI               | % Power |
                       | Benefit |
                       |  (mW)   |
---------------------------------|
3D Mark (Ice storm)    | 2.30%   |
TRex On screen         | 2.49%   |
TRex Off screen        | 1.32%   |
ManhattanOn screen     | 3.11%   |
Manhattan Off screen   | 0.89%   |
AnTuTu  6.1.4          | 3.42%   |

Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
constraint.
We always controlled number of EUs rather than sub-slices/slices.

Praveen Diwakar (4):
  drm/i915: Get active pending request for given context
  drm/i915: Update render power clock state configuration for given
    context
  drm/i915: set optimum eu/slice/sub-slice configuration based on load
    type
  drm/i915: Predictive governor to control eu/slice/subslice

 drivers/gpu/drm/i915/i915_debugfs.c        | 88 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            | 10 ++++
 drivers/gpu/drm/i915/i915_gem_context.c    | 26 +++++++++
 drivers/gpu/drm/i915/i915_gem_context.h    | 45 +++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
 drivers/gpu/drm/i915/intel_device_info.c   | 44 ++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c           | 20 ++++++-
 8 files changed, 235 insertions(+), 4 deletions(-)

Comments

Tvrtko Ursulin Nov. 7, 2018, 10:38 a.m. UTC | #1

On 06/11/2018 04:13, Ankit Navik wrote:
> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel
> 
> Current GPU configuration code for i915 does not allow us to change
> EU/Slice/Sub-slice configuration dynamically. Its done only once while context
> is created.
> 
> While particular graphics application is running, if we examine the command
> requests from user space, we observe that command density is not consistent.
> It means there is scope to change the graphics configuration dynamically even
> while context is running actively. This patch series proposes the solution to
> find the active pending load for all active context at given time and based on
> that, dynamically perform graphics configuration for each context.
> 
> We use a hr (high resolution) timer with i915 driver in kernel to get a
> callback every few milliseconds (this timer value can be configured through
> debugfs, default is '0' indicating timer is in disabled state i.e. original
> system without any intervention).In the timer callback, we examine pending
> commands for a context in the queue, essentially, we intercept them before
> they are executed by GPU and we update context with required number of EUs.
> 
> Two questions, how did we arrive at right timer value? and what's the right
> number of EUs? For the prior one, empirical data to achieve best performance
> in least power was considered. For the later one, we roughly categorized number
> of EUs logically based on platform. Now we compare number of pending commands
> with a particular threshold and then set number of EUs accordingly with update
> context. That threshold is also based on experiments & findings. If GPU is able
> to catch up with CPU, typically there are no pending commands, the EU config
> would remain unchanged there. In case there are more pending commands we
> reprogram context with higher number of EUs. Please note, here we are changing
> EUs even while context is running by examining pending commands every 'x'
> milliseconds.
> 
> With this solution in place, on KBL-GT3 + Android we saw following pnp
> benefits, power numbers mentioned here are system power.
> 
> App /KPI               | % Power |
>                         | Benefit |
>                         |  (mW)   |
> ---------------------------------|
> 3D Mark (Ice storm)    | 2.30%   |
> TRex On screen         | 2.49%   |
> TRex Off screen        | 1.32%   |
> ManhattanOn screen     | 3.11%   |
> Manhattan Off screen   | 0.89%   |
> AnTuTu  6.1.4          | 3.42%   |

Were you able to find some benchmarks which regress? Maybe try Synmark2 
and more from gfxbench? Not all benchmarks there are equally important, 
and regressions on some are fine, but I think a fuller set would be 
interesting to see.

Regards,

Tvrtko

> 
> Note - For KBL (GEN9) we cannot control at sub-slice level, it was always  a
> constraint.
> We always controlled number of EUs rather than sub-slices/slices.
> 
> Praveen Diwakar (4):
>    drm/i915: Get active pending request for given context
>    drm/i915: Update render power clock state configuration for given
>      context
>    drm/i915: set optimum eu/slice/sub-slice configuration based on load
>      type
>    drm/i915: Predictive governor to control eu/slice/subslice
> 
>   drivers/gpu/drm/i915/i915_debugfs.c        | 88 +++++++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_drv.c            |  1 +
>   drivers/gpu/drm/i915/i915_drv.h            | 10 ++++
>   drivers/gpu/drm/i915/i915_gem_context.c    | 26 +++++++++
>   drivers/gpu/drm/i915/i915_gem_context.h    | 45 +++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
>   drivers/gpu/drm/i915/intel_device_info.c   | 44 ++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.c           | 20 ++++++-
>   8 files changed, 235 insertions(+), 4 deletions(-)
>

Ankit Navik Dec. 11, 2018, 9:58 a.m. UTC | #2

Hi Tvrtko,

> On Wed, Nov 7, 2018 at 4:08 PM Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> 
> On 06/11/2018 04:13, Ankit Navik wrote:
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI               | % Power |
> >                         | Benefit |
> >                         |  (mW)   |
> > ---------------------------------|
> > 3D Mark (Ice storm)    | 2.30%   |
> > TRex On screen         | 2.49%   |
> > TRex Off screen        | 1.32%   |
> > ManhattanOn screen     | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4          | 3.42%   |
> 
> Were you able to find some benchmarks which regress? Maybe try Synmark2
> and more from gfxbench? Not all benchmarks there are equally important, and
> regressions on some are fine, but I think a fuller set would be interesting to see.

We have not seen much improvement in GFX Carchase, but there was no degradation in performance.
Regards, Ankit 
> 
> Regards,
> 
> Tvrtko
> 
> >
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> >
> > Praveen Diwakar (4):
> >    drm/i915: Get active pending request for given context
> >    drm/i915: Update render power clock state configuration for given
> >      context
> >    drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >      type
> >    drm/i915: Predictive governor to control eu/slice/subslice
> >
> >   drivers/gpu/drm/i915/i915_debugfs.c        | 88
> +++++++++++++++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_drv.c            |  1 +
> >   drivers/gpu/drm/i915/i915_drv.h            | 10 ++++
> >   drivers/gpu/drm/i915/i915_gem_context.c    | 26 +++++++++
> >   drivers/gpu/drm/i915/i915_gem_context.h    | 45 +++++++++++++++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
> >   drivers/gpu/drm/i915/intel_device_info.c   | 44 ++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.c           | 20 ++++++-
> >   8 files changed, 235 insertions(+), 4 deletions(-)
> >