Message ID | 20180728164623.10613-4-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/5] drm/i915: Expose the busyspin durations for i915_wait_request | expand |
On 28/07/2018 17:46, Chris Wilson wrote: > Here we bump the busyspin for the current request to avoid sleeping and > the cost of both idling and downclocking the CPU. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Sagar Kamble <sagar.a.kamble@intel.com> > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > Cc: Francisco Jerez <currojerez@riseup.net> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Ben Widawsky <ben@bwidawsk.net> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Michał Winiarski <michal.winiarski@intel.com> > --- > drivers/gpu/drm/i915/Kconfig.profile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile > index de394dea4a14..54e0ba4051e7 100644 > --- a/drivers/gpu/drm/i915/Kconfig.profile > +++ b/drivers/gpu/drm/i915/Kconfig.profile > @@ -1,6 +1,6 @@ > config DRM_I915_SPIN_REQUEST_IRQ > int > - default 5 # microseconds > + default 250 # microseconds > help > Before sleeping waiting for a request (GPU operation) to complete, > we may spend some time polling for its completion. As the IRQ may > But the histograms from previous patch do not suggest 250us busy spin gets us into interesting territory. And we have to know the distribution between IRQ and CS spins. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-08-02 15:47:37) > > On 28/07/2018 17:46, Chris Wilson wrote: > > Here we bump the busyspin for the current request to avoid sleeping and > > the cost of both idling and downclocking the CPU. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Sagar Kamble <sagar.a.kamble@intel.com> > > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > > Cc: Francisco Jerez <currojerez@riseup.net> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Cc: Ben Widawsky <ben@bwidawsk.net> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Cc: Michał Winiarski <michal.winiarski@intel.com> > > --- > > drivers/gpu/drm/i915/Kconfig.profile | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile > > index de394dea4a14..54e0ba4051e7 100644 > > --- a/drivers/gpu/drm/i915/Kconfig.profile > > +++ b/drivers/gpu/drm/i915/Kconfig.profile > > @@ -1,6 +1,6 @@ > > config DRM_I915_SPIN_REQUEST_IRQ > > int > > - default 5 # microseconds > > + default 250 # microseconds > > help > > Before sleeping waiting for a request (GPU operation) to complete, > > we may spend some time polling for its completion. As the IRQ may > > > > But the histograms from previous patch do not suggest 250us busy spin > gets us into interesting territory. And we have to know the distribution > between IRQ and CS spins. This is for a different problem than presented in those histograms. This is to hide the reclocking that occurred if we sleep. Previous justification for the initial spin was to try and hide the irq setup costs, this is to try and hide the sleep costs. The purpose of this patch was just to dip the toe into the water of what is possible and solicit feedback since we are still waiting for the panacea patches to improve cpufreq. (Why it had such a lackluster justification.) -Chris
On Thu, 2018-08-02 at 15:54 +0100, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-08-02 15:47:37) > > > > On 28/07/2018 17:46, Chris Wilson wrote: > > > Here we bump the busyspin for the current request to avoid > > > sleeping and > > > the cost of both idling and downclocking the CPU. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Sagar Kamble <sagar.a.kamble@intel.com> > > > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > > > Cc: Francisco Jerez <currojerez@riseup.net> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > Cc: Ben Widawsky <ben@bwidawsk.net> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > > Cc: Michał Winiarski <michal.winiarski@intel.com> > > > --- > > > drivers/gpu/drm/i915/Kconfig.profile | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile > > > b/drivers/gpu/drm/i915/Kconfig.profile > > > index de394dea4a14..54e0ba4051e7 100644 > > > --- a/drivers/gpu/drm/i915/Kconfig.profile > > > +++ b/drivers/gpu/drm/i915/Kconfig.profile > > > @@ -1,6 +1,6 @@ > > > config DRM_I915_SPIN_REQUEST_IRQ > > > int > > > - default 5 # microseconds > > > + default 250 # microseconds > > > help > > > Before sleeping waiting for a request (GPU operation) to > > > complete, > > > we may spend some time polling for its completion. As the > > > IRQ may > > > > > > > But the histograms from previous patch do not suggest 250us busy > > spin > > gets us into interesting territory. And we have to know the > > distribution > > between IRQ and CS spins. > > This is for a different problem than presented in those histograms. > This > is to hide the reclocking that occurred if we sleep. > > Previous justification for the initial spin was to try and hide the > irq > setup costs, this is to try and hide the sleep costs. > > The purpose of this patch was just to dip the toe into the water of > what > is possible and solicit feedback since we are still waiting for the > panacea patches to improve cpufreq. (Why it had such a lackluster > justification.) This patch somewhat resembles with the issue we just met in userspace for the opencl and media. Long story short there was CPU% regression between 2 opencl releases root caused to the following. OpenCL rewrote the code how they wait for the task completion. Essentially they have an optimistic spin implemented on the userspace side and logic their is quite complex: they commit to different duration of the spin depending on the scenario. And they had some pretty long spin, like 10ms. This is a killer thing if opencl is coupled with media workloads, especially when media is dominant. That's because usual execution time of media batches is pretty long, few or dozen milliseconds is quite typical. As a result opencl workload is really being submitted and executed with significant delay, thus optimistic spin gives up and it comes with the CPU% cost. This story was discussed here: https://github.com/intel/comp ute-runtime/commit/d53e1c3979f6d822f55645bfcd753be1658f53ca#comments. I was actually surprised why they don't rely on i915 wait function. They responded in was: "I would gladly always use bo_wait ioctl, but it has big overhead for short workloads." Thus the question: do we have an opportunity to improve? Can we have some dynamic way to configure spinning for the particular requests/contexts? /if I understand correctly current patch it introduces kernel config-level setting rather than some dynamic setting/ > -Chris > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Rogozhkin, Dmitry V (2018-08-02 20:37:36) > I was actually surprised why they don't rely on i915 wait function. > They responded in was: "I would gladly always use bo_wait ioctl, but it > has big overhead for short workloads." Thus the question: do we have an > opportunity to improve? Can we have some dynamic way to configure > spinning for the particular requests/contexts? /if I understand > correctly current patch it introduces kernel config-level setting > rather than some dynamic setting/ Better yet, tell us about the problem with the overhead! The latency from batch to client (measuring RCS timestamp in batch to RCS timestamp in userspace) is 10us (bdw and the like), under controlled conditions ;) Hmm, that does drop a bit for a RT client which does also suggest that cpufreq is a factor there. A roundtrip through the wait ioctl doing nothing is 100ns; minimum wait due to irq setup is around 7us (spin then mmio before detecting completion immediately afterwards). If your measurements are orders of magnitude worse that too is enlightening. To justify a 10ms spin is that sleeping (hitting the scheduler) is unacceptable, which seems to be a widerscale problem. But that I recommend we look at https://patchwork.freedesktop.org/patch/241571/ to see if something like that helps us in the short term. -Chris
diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index de394dea4a14..54e0ba4051e7 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -1,6 +1,6 @@ config DRM_I915_SPIN_REQUEST_IRQ int - default 5 # microseconds + default 250 # microseconds help Before sleeping waiting for a request (GPU operation) to complete, we may spend some time polling for its completion. As the IRQ may
Here we bump the busyspin for the current request to avoid sleeping and the cost of both idling and downclocking the CPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sagar Kamble <sagar.a.kamble@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Francisco Jerez <currojerez@riseup.net> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> --- drivers/gpu/drm/i915/Kconfig.profile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)