Message ID | 20170505114321.20348-1-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 05, 2017 at 12:43:21PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > It seems that the DMC likes to transition between the DC states > a lot when there are no connected displays (no active power > domains) during simple command submission. > > This frantic activity on DC states has a terrible impact on the > performance of the overall chip with huge latencies observed in > the interrupt handlers and elsewhere. Simple tests like > igt/gem_latency -n 0 are slowed down by a factor of eight. > > Work around it by grabbing a modeset display power domain whilst > there is any GT activity. This seems to be effective in making > the DMC keep its paws off the chip. > > On the other hand this may have a negative impact on the overall > power budget of the chip and so could still affect performance. Please add this as a comment to the code, I think in mark_busy(). I don't think this w/a will remain applicable forever and so merits a continual reminder and being discussed again in future. > This version limits the workaround got SKL GT3 and GT4 parts but > this is just due the absence of testing on other platforms. It > is possible we will have to apply it wider. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572 > Testcase: igt/gem_exec_nop/headless > Cc: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 5 +++++ > drivers/gpu/drm/i915/i915_gem.c | 4 ++++ > drivers/gpu/drm/i915/i915_gem_request.c | 3 +++ > 3 files changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 320c16df1c9c..4d58e2e28c2f 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2990,6 +2990,11 @@ intel_info(const struct drm_i915_private *dev_priv) > > #define HAS_DECOUPLED_MMIO(dev_priv) (INTEL_INFO(dev_priv)->has_decoupled_mmio) > > +#define NEEDS_CSR_GT_PERF_WA(dev_priv) \ > + HAS_CSR(dev_priv) && \ > + (IS_SKL_GT3(dev_priv) || IS_SKL_GT4(dev_priv)) && \ > + (dev_priv)->csr.dmc_payload csr.dmc_payload is a bit of a surprise, but looks correct. Acked-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Fri, May 05, 2017 at 12:43:21PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > It seems that the DMC likes to transition between the DC states > a lot when there are no connected displays (no active power > domains) during simple command submission. Is it trapping on some interrupt register accesses or what? And if so which registers are affected? > > This frantic activity on DC states has a terrible impact on the > performance of the overall chip with huge latencies observed in > the interrupt handlers and elsewhere. Simple tests like > igt/gem_latency -n 0 are slowed down by a factor of eight. > > Work around it by grabbing a modeset display power domain whilst > there is any GT activity. This seems to be effective in making > the DMC keep its paws off the chip. > > On the other hand this may have a negative impact on the overall > power budget of the chip and so could still affect performance. > > This version limits the workaround got SKL GT3 and GT4 parts but > this is just due the absence of testing on other platforms. It > is possible we will have to apply it wider. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572 > Testcase: igt/gem_exec_nop/headless > Cc: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 5 +++++ > drivers/gpu/drm/i915/i915_gem.c | 4 ++++ > drivers/gpu/drm/i915/i915_gem_request.c | 3 +++ > 3 files changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 320c16df1c9c..4d58e2e28c2f 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2990,6 +2990,11 @@ intel_info(const struct drm_i915_private *dev_priv) > > #define HAS_DECOUPLED_MMIO(dev_priv) (INTEL_INFO(dev_priv)->has_decoupled_mmio) > > +#define NEEDS_CSR_GT_PERF_WA(dev_priv) \ > + HAS_CSR(dev_priv) && \ > + (IS_SKL_GT3(dev_priv) || IS_SKL_GT4(dev_priv)) && \ > + (dev_priv)->csr.dmc_payload > + > #include "i915_trace.h" > > static inline bool intel_scanout_needs_vtd_wa(struct drm_i915_private *dev_priv) > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index b2727905ef2b..c52d863f409c 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3200,7 +3200,11 @@ i915_gem_idle_work_handler(struct work_struct *work) > > if (INTEL_GEN(dev_priv) >= 6) > gen6_rps_idle(dev_priv); > + > intel_runtime_pm_put(dev_priv); > + > + if (NEEDS_CSR_GT_PERF_WA(dev_priv)) > + intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET); > out_unlock: > mutex_unlock(&dev->struct_mutex); > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index 10361c7e3b37..10a3b51f6362 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -873,6 +873,9 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine) > > GEM_BUG_ON(!dev_priv->gt.active_requests); > > + if (NEEDS_CSR_GT_PERF_WA(dev_priv)) > + intel_display_power_get(dev_priv, POWER_DOMAIN_MODESET); > + > intel_runtime_pm_get_noresume(dev_priv); > dev_priv->gt.awake = true; > > -- > 2.9.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 05/05/2017 15:49, Ville Syrjälä wrote: > On Fri, May 05, 2017 at 12:43:21PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> It seems that the DMC likes to transition between the DC states >> a lot when there are no connected displays (no active power >> domains) during simple command submission. > > Is it trapping on some interrupt register accesses or what? And > if so which registers are affected? It looks like GT IIR or something along those lines it but I couldn't say with total confidence. It is just a guess. Firmware binary definitely "mentions" those registers as can be seen by inspecting it with a hex editor. The data I collected at least seems to present a correlation between the batch frequency and DC state transition frequency: tgt DC irqs irqs/s irq batch/s DC/s DC/batch submit transitions / freq batch ======================================================================== 10000 20000 78300 7830.00 1.96 4000.00 2000.00 0.50 9901 14000 52855 7550.71 1.32 5714.29 2000.00 0.35 9524 13500 49100 7328.36 1.23 5970.15 2014.93 0.34 9091 13500 49200 7235.29 1.23 5882.35 1985.29 0.34 5000 16900 33290 3916.47 0.83 4705.88 1988.24 0.42 3333 27800 69550 4932.62 1.74 2836.88 1971.63 0.70 1667 57200 80200 2655.63 2.01 1324.50 1894.04 1.43 909 80000 80034 1482.11 2.00 740.74 1481.48 2.00 476 87000 80039 820.91 2.00 410.26 892.31 2.18 196 160000 80055 334.40 2.00 167.08 668.34 4.00 Submitted batches were ~100us long in all cases. So with low batch frequency it looks pretty believable. For example when we have 167.08 batches/s, we have 334.40 irq/s - which is double - as expected for execlists. And we get again double that in terms of DC transitions per second. Each irq is one GT IIR write from the GPU side, and another from the CPU side. This was actually Imre's suggestion btw. Before I was only looking at the irq/s to DC/s correlation which was confusing me a lot since there are more of the latter, so I thought it can't be the trigger. But once Imre mentioned the possibility that things are triggered by IIR register writes numbers started making more sense. Then with higher batch frequencies the ratio starts falling, is it because DC transitions are too slow to keep up, or something else I am not sure. Regards, Tvrtko
On Fri, May 05, 2017 at 05:13:58PM +0100, Tvrtko Ursulin wrote: > > On 05/05/2017 15:49, Ville Syrjälä wrote: > > On Fri, May 05, 2017 at 12:43:21PM +0100, Tvrtko Ursulin wrote: > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >> It seems that the DMC likes to transition between the DC states > >> a lot when there are no connected displays (no active power > >> domains) during simple command submission. > > > > Is it trapping on some interrupt register accesses or what? And > > if so which registers are affected? > > It looks like GT IIR or something along those lines it but I couldn't > say with total confidence. <read DC counters> for i in `seq 1 100` ; do IGT_NO_FORCEWAKE=1 intel_reg read <whatever> ; done <read DC counters> Should be a pretty trivial to run that against the suspect registers. > It is just a guess. Firmware binary > definitely "mentions" those registers as can be seen by inspecting it > with a hex editor. > > The data I collected at least seems to present a correlation between the > batch frequency and DC state transition frequency: > > tgt DC irqs irqs/s irq batch/s DC/s DC/batch > submit transitions / > freq batch > ======================================================================== > 10000 20000 78300 7830.00 1.96 4000.00 2000.00 0.50 > 9901 14000 52855 7550.71 1.32 5714.29 2000.00 0.35 > 9524 13500 49100 7328.36 1.23 5970.15 2014.93 0.34 > 9091 13500 49200 7235.29 1.23 5882.35 1985.29 0.34 > 5000 16900 33290 3916.47 0.83 4705.88 1988.24 0.42 > 3333 27800 69550 4932.62 1.74 2836.88 1971.63 0.70 > 1667 57200 80200 2655.63 2.01 1324.50 1894.04 1.43 > 909 80000 80034 1482.11 2.00 740.74 1481.48 2.00 > 476 87000 80039 820.91 2.00 410.26 892.31 2.18 > 196 160000 80055 334.40 2.00 167.08 668.34 4.00 > > Submitted batches were ~100us long in all cases. So with low batch > frequency it looks pretty believable. For example when we have 167.08 > batches/s, we have 334.40 irq/s - which is double - as expected for > execlists. And we get again double that in terms of DC transitions per > second. Each irq is one GT IIR write from the GPU side, and another from > the CPU side. GPU doesn't actually write the IIRs. It's just latching stuff from the ISR. Whether the ISR edge or some higher level interrupt event actually causes the DMC to kick into action isn't clear at all. My original impressions was that it just traps the register accesses.
On Fri, 05 May 2017, Tvrtko Ursulin <tursulin@ursulin.net> wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > It seems that the DMC likes to transition between the DC states > a lot when there are no connected displays (no active power > domains) during simple command submission. > > This frantic activity on DC states has a terrible impact on the > performance of the overall chip with huge latencies observed in > the interrupt handlers and elsewhere. Simple tests like > igt/gem_latency -n 0 are slowed down by a factor of eight. > > Work around it by grabbing a modeset display power domain whilst > there is any GT activity. This seems to be effective in making > the DMC keep its paws off the chip. > > On the other hand this may have a negative impact on the overall > power budget of the chip and so could still affect performance. > > This version limits the workaround got SKL GT3 and GT4 parts but > this is just due the absence of testing on other platforms. It > is possible we will have to apply it wider. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572 > Testcase: igt/gem_exec_nop/headless > Cc: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 5 +++++ > drivers/gpu/drm/i915/i915_gem.c | 4 ++++ > drivers/gpu/drm/i915/i915_gem_request.c | 3 +++ > 3 files changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 320c16df1c9c..4d58e2e28c2f 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2990,6 +2990,11 @@ intel_info(const struct drm_i915_private *dev_priv) > > #define HAS_DECOUPLED_MMIO(dev_priv) (INTEL_INFO(dev_priv)->has_decoupled_mmio) > > +#define NEEDS_CSR_GT_PERF_WA(dev_priv) \ > + HAS_CSR(dev_priv) && \ > + (IS_SKL_GT3(dev_priv) || IS_SKL_GT4(dev_priv)) && \ > + (dev_priv)->csr.dmc_payload Nitpick, the whole thing could use braces around it for consistency, although I don't see any sane use of the macro causing precedence surprises. BR, Jani. > + > #include "i915_trace.h" > > static inline bool intel_scanout_needs_vtd_wa(struct drm_i915_private *dev_priv) > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index b2727905ef2b..c52d863f409c 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3200,7 +3200,11 @@ i915_gem_idle_work_handler(struct work_struct *work) > > if (INTEL_GEN(dev_priv) >= 6) > gen6_rps_idle(dev_priv); > + > intel_runtime_pm_put(dev_priv); > + > + if (NEEDS_CSR_GT_PERF_WA(dev_priv)) > + intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET); > out_unlock: > mutex_unlock(&dev->struct_mutex); > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index 10361c7e3b37..10a3b51f6362 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -873,6 +873,9 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine) > > GEM_BUG_ON(!dev_priv->gt.active_requests); > > + if (NEEDS_CSR_GT_PERF_WA(dev_priv)) > + intel_display_power_get(dev_priv, POWER_DOMAIN_MODESET); > + > intel_runtime_pm_get_noresume(dev_priv); > dev_priv->gt.awake = true;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 320c16df1c9c..4d58e2e28c2f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2990,6 +2990,11 @@ intel_info(const struct drm_i915_private *dev_priv) #define HAS_DECOUPLED_MMIO(dev_priv) (INTEL_INFO(dev_priv)->has_decoupled_mmio) +#define NEEDS_CSR_GT_PERF_WA(dev_priv) \ + HAS_CSR(dev_priv) && \ + (IS_SKL_GT3(dev_priv) || IS_SKL_GT4(dev_priv)) && \ + (dev_priv)->csr.dmc_payload + #include "i915_trace.h" static inline bool intel_scanout_needs_vtd_wa(struct drm_i915_private *dev_priv) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b2727905ef2b..c52d863f409c 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3200,7 +3200,11 @@ i915_gem_idle_work_handler(struct work_struct *work) if (INTEL_GEN(dev_priv) >= 6) gen6_rps_idle(dev_priv); + intel_runtime_pm_put(dev_priv); + + if (NEEDS_CSR_GT_PERF_WA(dev_priv)) + intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET); out_unlock: mutex_unlock(&dev->struct_mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index 10361c7e3b37..10a3b51f6362 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -873,6 +873,9 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine) GEM_BUG_ON(!dev_priv->gt.active_requests); + if (NEEDS_CSR_GT_PERF_WA(dev_priv)) + intel_display_power_get(dev_priv, POWER_DOMAIN_MODESET); + intel_runtime_pm_get_noresume(dev_priv); dev_priv->gt.awake = true;