[4/5] drm/i915: Increase initial busyspin limit

Message ID	20180728164623.10613-4-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Sat, 28 Jul 2018 17:46:22 +0100 Message-Id: <20180728164623.10613-4-chris@chris-wilson.co.uk> In-Reply-To: <20180728164623.10613-1-chris@chris-wilson.co.uk> References: <20180728164623.10613-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 4/5] drm/i915: Increase initial busyspin limit Precedence: list Cc: Ben Widawsky <ben@bwidawsk.net>, Eero Tamminen <eero.t.tamminen@intel.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[1/5] drm/i915: Expose the busyspin durations for i915_wait_request \| expand [1/5] drm/i915: Expose the busyspin durations for i915_wait_request [2/5] drm/i915: Expose idle delays to Kconfig [3/5] drm/i915: Increase busyspin limit before a context-switch [4/5] drm/i915: Increase initial busyspin limit [5/5] drm/i915: Do not use iowait while waiting for the GPU

Message ID

20180728164623.10613-4-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Sat, 28 Jul 2018 17:46:22 +0100
Message-Id: <20180728164623.10613-4-chris@chris-wilson.co.uk>
In-Reply-To: <20180728164623.10613-1-chris@chris-wilson.co.uk>
References: <20180728164623.10613-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH 4/5] drm/i915: Increase initial busyspin limit
Precedence: list
Cc: Ben Widawsky <ben@bwidawsk.net>, Eero Tamminen <eero.t.tamminen@intel.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

[1/5] drm/i915: Expose the busyspin durations for i915_wait_request | expand

Commit Message

Chris Wilson July 28, 2018, 4:46 p.m. UTC

Here we bump the busyspin for the current request to avoid sleeping and
the cost of both idling and downclocking the CPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Kamble <sagar.a.kamble@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tvrtko Ursulin Aug. 2, 2018, 2:47 p.m. UTC | #1

On 28/07/2018 17:46, Chris Wilson wrote:
> Here we bump the busyspin for the current request to avoid sleeping and
> the cost of both idling and downclocking the CPU.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Sagar Kamble <sagar.a.kamble@intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Francisco Jerez <currojerez@riseup.net>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> ---
>   drivers/gpu/drm/i915/Kconfig.profile | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index de394dea4a14..54e0ba4051e7 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -1,6 +1,6 @@
>   config DRM_I915_SPIN_REQUEST_IRQ
>   	int
> -	default 5 # microseconds
> +	default 250 # microseconds
>   	help
>   	  Before sleeping waiting for a request (GPU operation) to complete,
>   	  we may spend some time polling for its completion. As the IRQ may
> 

But the histograms from previous patch do not suggest 250us busy spin 
gets us into interesting territory. And we have to know the distribution 
between IRQ and CS spins.

Regards,

Tvrtko

Chris Wilson Aug. 2, 2018, 2:54 p.m. UTC | #2

Quoting Tvrtko Ursulin (2018-08-02 15:47:37)
> 
> On 28/07/2018 17:46, Chris Wilson wrote:
> > Here we bump the busyspin for the current request to avoid sleeping and
> > the cost of both idling and downclocking the CPU.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Sagar Kamble <sagar.a.kamble@intel.com>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: Francisco Jerez <currojerez@riseup.net>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > ---
> >   drivers/gpu/drm/i915/Kconfig.profile | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > index de394dea4a14..54e0ba4051e7 100644
> > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -1,6 +1,6 @@
> >   config DRM_I915_SPIN_REQUEST_IRQ
> >       int
> > -     default 5 # microseconds
> > +     default 250 # microseconds
> >       help
> >         Before sleeping waiting for a request (GPU operation) to complete,
> >         we may spend some time polling for its completion. As the IRQ may
> > 
> 
> But the histograms from previous patch do not suggest 250us busy spin 
> gets us into interesting territory. And we have to know the distribution 
> between IRQ and CS spins.

This is for a different problem than presented in those histograms. This
is to hide the reclocking that occurred if we sleep.

Previous justification for the initial spin was to try and hide the irq
setup costs, this is to try and hide the sleep costs.

The purpose of this patch was just to dip the toe into the water of what
is possible and solicit feedback since we are still waiting for the
panacea patches to improve cpufreq. (Why it had such a lackluster
justification.)
-Chris

Rogozhkin, Dmitry V Aug. 2, 2018, 7:37 p.m. UTC | #3

On Thu, 2018-08-02 at 15:54 +0100, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-02 15:47:37)
> > 
> > On 28/07/2018 17:46, Chris Wilson wrote:
> > > Here we bump the busyspin for the current request to avoid
> > > sleeping and
> > > the cost of both idling and downclocking the CPU.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Sagar Kamble <sagar.a.kamble@intel.com>
> > > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > > Cc: Francisco Jerez <currojerez@riseup.net>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/Kconfig.profile | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile
> > > b/drivers/gpu/drm/i915/Kconfig.profile
> > > index de394dea4a14..54e0ba4051e7 100644
> > > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > > @@ -1,6 +1,6 @@
> > >   config DRM_I915_SPIN_REQUEST_IRQ
> > >       int
> > > -     default 5 # microseconds
> > > +     default 250 # microseconds
> > >       help
> > >         Before sleeping waiting for a request (GPU operation) to
> > > complete,
> > >         we may spend some time polling for its completion. As the
> > > IRQ may
> > > 
> > 
> > But the histograms from previous patch do not suggest 250us busy
> > spin 
> > gets us into interesting territory. And we have to know the
> > distribution 
> > between IRQ and CS spins.
> 
> This is for a different problem than presented in those histograms.
> This
> is to hide the reclocking that occurred if we sleep.
> 
> Previous justification for the initial spin was to try and hide the
> irq
> setup costs, this is to try and hide the sleep costs.
> 
> The purpose of this patch was just to dip the toe into the water of
> what
> is possible and solicit feedback since we are still waiting for the
> panacea patches to improve cpufreq. (Why it had such a lackluster
> justification.)

This patch somewhat resembles with the issue we just met in userspace
for the opencl and media. Long story short there was CPU% regression
between 2 opencl releases root caused to the following. OpenCL rewrote
the code how they wait for the task completion. Essentially they have
an optimistic spin implemented on the userspace side and logic their is
quite complex: they commit to different duration of the spin depending
on the scenario. And they had some pretty long spin, like 10ms. This is
a killer thing if opencl is coupled with media workloads, especially
when media is dominant. That's because usual execution time of media
batches is pretty long, few or dozen milliseconds is quite typical. As
a result opencl workload is really being submitted and executed with
significant delay, thus optimistic spin gives up and it comes with the
CPU% cost. This story was discussed here: https://github.com/intel/comp
ute-runtime/commit/d53e1c3979f6d822f55645bfcd753be1658f53ca#comments.

I was actually surprised why they don't rely on i915 wait function.
They responded in was: "I would gladly always use bo_wait ioctl, but it
has big overhead for short workloads." Thus the question: do we have an
opportunity to improve? Can we have some dynamic way to configure
spinning for the particular requests/contexts? /if I understand
correctly current patch it introduces kernel config-level setting
rather than some dynamic setting/

> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Chris Wilson Aug. 2, 2018, 8:21 p.m. UTC | #4

Quoting Rogozhkin, Dmitry V (2018-08-02 20:37:36)
> I was actually surprised why they don't rely on i915 wait function.
> They responded in was: "I would gladly always use bo_wait ioctl, but it
> has big overhead for short workloads." Thus the question: do we have an
> opportunity to improve? Can we have some dynamic way to configure
> spinning for the particular requests/contexts? /if I understand
> correctly current patch it introduces kernel config-level setting
> rather than some dynamic setting/

Better yet, tell us about the problem with the overhead!

The latency from batch to client (measuring RCS timestamp in batch to
RCS timestamp in userspace) is 10us (bdw and the like), under controlled
conditions ;) Hmm, that does drop a bit for a RT client which does also
suggest that cpufreq is a factor there.

A roundtrip through the wait ioctl doing nothing is 100ns; minimum wait
due to irq setup is around 7us (spin then mmio before detecting
completion immediately afterwards). If your measurements are orders of
magnitude worse that too is enlightening.

To justify a 10ms spin is that sleeping (hitting the scheduler) is
unacceptable, which seems to be a widerscale problem. But that I
recommend we look at https://patchwork.freedesktop.org/patch/241571/ to
see if something like that helps us in the short term.
-Chris

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index de394dea4a14..54e0ba4051e7 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -1,6 +1,6 @@ 
 config DRM_I915_SPIN_REQUEST_IRQ
 	int
-	default 5 # microseconds
+	default 250 # microseconds
 	help
 	  Before sleeping waiting for a request (GPU operation) to complete,
 	  we may spend some time polling for its completion. As the IRQ may

[4/5] drm/i915: Increase initial busyspin limit

Commit Message

Comments

Patch