diff mbox

[v2] drm/i915: Speed up idle detection by kicking the tasklets

Message ID 20180506171328.30034-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 6, 2018, 5:13 p.m. UTC
We rely on ksoftirqd to run in a timely fashion in order to drain the
execlists queue. Quite frequently, it does not. In some cases we may see
latencies of over 200ms triggering our idle timeouts and forcing us to
declare the driver wedged!

Thus we can speed up idle detection by bypassing ksoftirqd in these
cases and flush our tasklet to confirm if we are indeed still waiting
for the ELSP to drain.

v2: Put the execlists.first check back; it is required for handling
reset!

References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Mika Kuoppala May 7, 2018, 8:34 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We rely on ksoftirqd to run in a timely fashion in order to drain the
> execlists queue. Quite frequently, it does not. In some cases we may see
> latencies of over 200ms triggering our idle timeouts and forcing us to
> declare the driver wedged!
>
> Thus we can speed up idle detection by bypassing ksoftirqd in these
> cases and flush our tasklet to confirm if we are indeed still waiting
> for the ELSP to drain.
>
> v2: Put the execlists.first check back; it is required for handling
> reset!
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 70325e0824e3..a3111511ea1d 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
>  		return true;
>  
>  	/* Waiting to drain ELSP? */
> -	if (READ_ONCE(engine->execlists.active))
> -		return false;
> +	if (READ_ONCE(engine->execlists.active)) {
> +		struct intel_engine_execlists *execlists = &engine->execlists;
> +
> +		if (tasklet_trylock(&execlists->tasklet)) {

Now that we have the lock, sample active again to catch
the late tasklet run and skip running if so?

-Mika

> +			execlists->tasklet.func(execlists->tasklet.data);
> +			tasklet_unlock(&execlists->tasklet);
> +		}
> +
> +		if (READ_ONCE(execlists->active))
> +			return false;
> +	}
>  
> -	/* ELSP is empty, but there are ready requests? */
> +	/* ELSP is empty, but there are ready requests? E.g. after reset */
>  	if (READ_ONCE(engine->execlists.first))
>  		return false;
>  
> -- 
> 2.17.0
Chris Wilson May 7, 2018, 8:38 a.m. UTC | #2
Quoting Mika Kuoppala (2018-05-07 09:34:24)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We rely on ksoftirqd to run in a timely fashion in order to drain the
> > execlists queue. Quite frequently, it does not. In some cases we may see
> > latencies of over 200ms triggering our idle timeouts and forcing us to
> > declare the driver wedged!
> >
> > Thus we can speed up idle detection by bypassing ksoftirqd in these
> > cases and flush our tasklet to confirm if we are indeed still waiting
> > for the ELSP to drain.
> >
> > v2: Put the execlists.first check back; it is required for handling
> > reset!
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 70325e0824e3..a3111511ea1d 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
> >               return true;
> >  
> >       /* Waiting to drain ELSP? */
> > -     if (READ_ONCE(engine->execlists.active))
> > -             return false;
> > +     if (READ_ONCE(engine->execlists.active)) {
> > +             struct intel_engine_execlists *execlists = &engine->execlists;
> > +
> > +             if (tasklet_trylock(&execlists->tasklet)) {
> 
> Now that we have the lock, sample active again to catch
> the late tasklet run and skip running if so?

It becomes a nop in the submission tasklet, it's not dangerous. So it
comes down to what looks less of a wart!
-Chris
Mika Kuoppala May 7, 2018, 8:54 a.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-05-07 09:34:24)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > We rely on ksoftirqd to run in a timely fashion in order to drain the
>> > execlists queue. Quite frequently, it does not. In some cases we may see
>> > latencies of over 200ms triggering our idle timeouts and forcing us to
>> > declare the driver wedged!
>> >
>> > Thus we can speed up idle detection by bypassing ksoftirqd in these
>> > cases and flush our tasklet to confirm if we are indeed still waiting
>> > for the ELSP to drain.
>> >
>> > v2: Put the execlists.first check back; it is required for handling
>> > reset!
>> >
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++---
>> >  1 file changed, 12 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
>> > index 70325e0824e3..a3111511ea1d 100644
>> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>> > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
>> >               return true;
>> >  
>> >       /* Waiting to drain ELSP? */
>> > -     if (READ_ONCE(engine->execlists.active))
>> > -             return false;
>> > +     if (READ_ONCE(engine->execlists.active)) {
>> > +             struct intel_engine_execlists *execlists = &engine->execlists;
>> > +
>> > +             if (tasklet_trylock(&execlists->tasklet)) {
>> 
>> Now that we have the lock, sample active again to catch
>> the late tasklet run and skip running if so?
>
> It becomes a nop in the submission tasklet, it's not dangerous. So it
> comes down to what looks less of a wart!

The nice side effect of this kick is that now the hangcheck also won't
fall a victim.

Should we have a test which adds random and long tasklet delays?

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Chris Wilson May 7, 2018, 9:03 a.m. UTC | #4
Quoting Mika Kuoppala (2018-05-07 09:54:27)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2018-05-07 09:34:24)
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > We rely on ksoftirqd to run in a timely fashion in order to drain the
> >> > execlists queue. Quite frequently, it does not. In some cases we may see
> >> > latencies of over 200ms triggering our idle timeouts and forcing us to
> >> > declare the driver wedged!
> >> >
> >> > Thus we can speed up idle detection by bypassing ksoftirqd in these
> >> > cases and flush our tasklet to confirm if we are indeed still waiting
> >> > for the ELSP to drain.
> >> >
> >> > v2: Put the execlists.first check back; it is required for handling
> >> > reset!
> >> >
> >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/intel_engine_cs.c | 15 ++++++++++++---
> >> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> >> > index 70325e0824e3..a3111511ea1d 100644
> >> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> >> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> >> > @@ -945,10 +945,19 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
> >> >               return true;
> >> >  
> >> >       /* Waiting to drain ELSP? */
> >> > -     if (READ_ONCE(engine->execlists.active))
> >> > -             return false;
> >> > +     if (READ_ONCE(engine->execlists.active)) {
> >> > +             struct intel_engine_execlists *execlists = &engine->execlists;
> >> > +
> >> > +             if (tasklet_trylock(&execlists->tasklet)) {
> >> 
> >> Now that we have the lock, sample active again to catch
> >> the late tasklet run and skip running if so?
> >
> > It becomes a nop in the submission tasklet, it's not dangerous. So it
> > comes down to what looks less of a wart!
> 
> The nice side effect of this kick is that now the hangcheck also won't
> fall a victim.
> 
> Should we have a test which adds random and long tasklet delays?

We do, it's called userspace :)

We stumbled across a good way of using a RT hog to delay ksoftirqd
arbitrarily; it just has the requirement of forcing the tasklet onto the
ksoftirqd for the same cpu...
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70325e0824e3..a3111511ea1d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -945,10 +945,19 @@  bool intel_engine_is_idle(struct intel_engine_cs *engine)
 		return true;
 
 	/* Waiting to drain ELSP? */
-	if (READ_ONCE(engine->execlists.active))
-		return false;
+	if (READ_ONCE(engine->execlists.active)) {
+		struct intel_engine_execlists *execlists = &engine->execlists;
+
+		if (tasklet_trylock(&execlists->tasklet)) {
+			execlists->tasklet.func(execlists->tasklet.data);
+			tasklet_unlock(&execlists->tasklet);
+		}
+
+		if (READ_ONCE(execlists->active))
+			return false;
+	}
 
-	/* ELSP is empty, but there are ready requests? */
+	/* ELSP is empty, but there are ready requests? E.g. after reset */
 	if (READ_ONCE(engine->execlists.first))
 		return false;