Message ID | 20180506171328.30034-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Chris Wilson <chris@chris-wilson.co.uk> writes: > We rely on ksoftirqd to run in a timely fashion in order to drain the > execlists queue. Quite frequently, it does not. In some cases we may see > latencies of over 200ms triggering our idle timeouts and forcing us to > declare the driver wedged! > > Thus we can speed up idle detection by bypassing ksoftirqd in these > cases and flush our tasklet to confirm if we are indeed still waiting > for the ELSP to drain. > > v2: Put the execlists.first check back; it is required for handling > reset! > > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > index 70325e0824e3..a3111511ea1d 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) > return true; > > /* Waiting to drain ELSP? */ > - if (READ_ONCE(engine->execlists.active)) > - return false; > + if (READ_ONCE(engine->execlists.active)) { > + struct intel_engine_execlists *execlists = &engine->execlists; > + > + if (tasklet_trylock(&execlists->tasklet)) { Now that we have the lock, sample active again to catch the late tasklet run and skip running if so? -Mika > + execlists->tasklet.func(execlists->tasklet.data); > + tasklet_unlock(&execlists->tasklet); > + } > + > + if (READ_ONCE(execlists->active)) > + return false; > + } > > - /* ELSP is empty, but there are ready requests? */ > + /* ELSP is empty, but there are ready requests? E.g. after reset */ > if (READ_ONCE(engine->execlists.first)) > return false; > > -- > 2.17.0
Quoting Mika Kuoppala (2018-05-07 09:34:24) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > We rely on ksoftirqd to run in a timely fashion in order to drain the > > execlists queue. Quite frequently, it does not. In some cases we may see > > latencies of over 200ms triggering our idle timeouts and forcing us to > > declare the driver wedged! > > > > Thus we can speed up idle detection by bypassing ksoftirqd in these > > cases and flush our tasklet to confirm if we are indeed still waiting > > for the ELSP to drain. > > > > v2: Put the execlists.first check back; it is required for handling > > reset! > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++--- > > 1 file changed, 12 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index 70325e0824e3..a3111511ea1d 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) > > return true; > > > > /* Waiting to drain ELSP? */ > > - if (READ_ONCE(engine->execlists.active)) > > - return false; > > + if (READ_ONCE(engine->execlists.active)) { > > + struct intel_engine_execlists *execlists = &engine->execlists; > > + > > + if (tasklet_trylock(&execlists->tasklet)) { > > Now that we have the lock, sample active again to catch > the late tasklet run and skip running if so? It becomes a nop in the submission tasklet, it's not dangerous. So it comes down to what looks less of a wart! -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-05-07 09:34:24) >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > We rely on ksoftirqd to run in a timely fashion in order to drain the >> > execlists queue. Quite frequently, it does not. In some cases we may see >> > latencies of over 200ms triggering our idle timeouts and forcing us to >> > declare the driver wedged! >> > >> > Thus we can speed up idle detection by bypassing ksoftirqd in these >> > cases and flush our tasklet to confirm if we are indeed still waiting >> > for the ELSP to drain. >> > >> > v2: Put the execlists.first check back; it is required for handling >> > reset! >> > >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >> > --- >> > drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++--- >> > 1 file changed, 12 insertions(+), 3 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c >> > index 70325e0824e3..a3111511ea1d 100644 >> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c >> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c >> > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) >> > return true; >> > >> > /* Waiting to drain ELSP? */ >> > - if (READ_ONCE(engine->execlists.active)) >> > - return false; >> > + if (READ_ONCE(engine->execlists.active)) { >> > + struct intel_engine_execlists *execlists = &engine->execlists; >> > + >> > + if (tasklet_trylock(&execlists->tasklet)) { >> >> Now that we have the lock, sample active again to catch >> the late tasklet run and skip running if so? > > It becomes a nop in the submission tasklet, it's not dangerous. So it > comes down to what looks less of a wart! The nice side effect of this kick is that now the hangcheck also won't fall a victim. Should we have a test which adds random and long tasklet delays? Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Quoting Mika Kuoppala (2018-05-07 09:54:27) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Quoting Mika Kuoppala (2018-05-07 09:34:24) > >> Chris Wilson <chris@chris-wilson.co.uk> writes: > >> > >> > We rely on ksoftirqd to run in a timely fashion in order to drain the > >> > execlists queue. Quite frequently, it does not. In some cases we may see > >> > latencies of over 200ms triggering our idle timeouts and forcing us to > >> > declare the driver wedged! > >> > > >> > Thus we can speed up idle detection by bypassing ksoftirqd in these > >> > cases and flush our tasklet to confirm if we are indeed still waiting > >> > for the ELSP to drain. > >> > > >> > v2: Put the execlists.first check back; it is required for handling > >> > reset! > >> > > >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > >> > --- > >> > drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++--- > >> > 1 file changed, 12 insertions(+), 3 deletions(-) > >> > > >> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > >> > index 70325e0824e3..a3111511ea1d 100644 > >> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > >> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > >> > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) > >> > return true; > >> > > >> > /* Waiting to drain ELSP? */ > >> > - if (READ_ONCE(engine->execlists.active)) > >> > - return false; > >> > + if (READ_ONCE(engine->execlists.active)) { > >> > + struct intel_engine_execlists *execlists = &engine->execlists; > >> > + > >> > + if (tasklet_trylock(&execlists->tasklet)) { > >> > >> Now that we have the lock, sample active again to catch > >> the late tasklet run and skip running if so? > > > > It becomes a nop in the submission tasklet, it's not dangerous. So it > > comes down to what looks less of a wart! > > The nice side effect of this kick is that now the hangcheck also won't > fall a victim. > > Should we have a test which adds random and long tasklet delays? We do, it's called userspace :) We stumbled across a good way of using a RT hog to delay ksoftirqd arbitrarily; it just has the requirement of forcing the tasklet onto the ksoftirqd for the same cpu... -Chris
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 70325e0824e3..a3111511ea1d 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) return true; /* Waiting to drain ELSP? */ - if (READ_ONCE(engine->execlists.active)) - return false; + if (READ_ONCE(engine->execlists.active)) { + struct intel_engine_execlists *execlists = &engine->execlists; + + if (tasklet_trylock(&execlists->tasklet)) { + execlists->tasklet.func(execlists->tasklet.data); + tasklet_unlock(&execlists->tasklet); + } + + if (READ_ONCE(execlists->active)) + return false; + } - /* ELSP is empty, but there are ready requests? */ + /* ELSP is empty, but there are ready requests? E.g. after reset */ if (READ_ONCE(engine->execlists.first)) return false;
We rely on ksoftirqd to run in a timely fashion in order to drain the execlists queue. Quite frequently, it does not. In some cases we may see latencies of over 200ms triggering our idle timeouts and forcing us to declare the driver wedged! Thus we can speed up idle detection by bypassing ksoftirqd in these cases and flush our tasklet to confirm if we are indeed still waiting for the ELSP to drain. v2: Put the execlists.first check back; it is required for handling reset! References: https://bugs.freedesktop.org/show_bug.cgi?id=106373 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)