diff mbox

[3/4] drm/i915: Use full serialisation around engine->irq_posted

Message ID 20180322073533.5313-3-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 22, 2018, 7:35 a.m. UTC
Using engine->irq_posted for execlists, we are not always serialised by
the tasklet as we supposed. On the reset paths, the tasklet is disabled
and ignored. Instead, we manipulate the engine->irq_posted directly to
account for the reset, but if an interrupt fired before the reset and so
wrote to engine->irq_posted, that write may not be flushed from the
local CPU's cacheline until much later as the tasklet is already active
and so does not generate a mb(). To correctly serialise the interrupt
with reset, we need serialisation on the set_bit() itself.

And at last Mika can be happy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Mika Kuoppala March 22, 2018, 2:35 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Using engine->irq_posted for execlists, we are not always serialised by
> the tasklet as we supposed. On the reset paths, the tasklet is disabled
> and ignored. Instead, we manipulate the engine->irq_posted directly to
> account for the reset, but if an interrupt fired before the reset and so
> wrote to engine->irq_posted, that write may not be flushed from the
> local CPU's cacheline until much later as the tasklet is already active
> and so does not generate a mb(). To correctly serialise the interrupt
> with reset, we need serialisation on the set_bit() itself.
>
> And at last Mika can be happy.

Yes.

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_irq.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index fa7310766217..27aee25429b7 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1405,10 +1405,9 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>  	bool tasklet = false;
>  
>  	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> -		if (READ_ONCE(engine->execlists.active)) {
> -			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -			tasklet = true;
> -		}
> +		if (READ_ONCE(engine->execlists.active))
> +			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> +						    &engine->irq_posted);
>  	}
>  
>  	if (iir & GT_RENDER_USER_INTERRUPT) {
> -- 
> 2.16.2
jeff.mcgee@intel.com March 22, 2018, 3:34 p.m. UTC | #2
On Thu, Mar 22, 2018 at 07:35:32AM +0000, Chris Wilson wrote:
> Using engine->irq_posted for execlists, we are not always serialised by
> the tasklet as we supposed. On the reset paths, the tasklet is disabled
> and ignored. Instead, we manipulate the engine->irq_posted directly to
> account for the reset, but if an interrupt fired before the reset and so
> wrote to engine->irq_posted, that write may not be flushed from the
> local CPU's cacheline until much later as the tasklet is already active
> and so does not generate a mb(). To correctly serialise the interrupt
> with reset, we need serialisation on the set_bit() itself.
> 
> And at last Mika can be happy.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index fa7310766217..27aee25429b7 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1405,10 +1405,9 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>  	bool tasklet = false;
>  
>  	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> -		if (READ_ONCE(engine->execlists.active)) {
> -			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -			tasklet = true;
> -		}
> +		if (READ_ONCE(engine->execlists.active))
> +			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> +						    &engine->irq_posted);
>  	}
>  
>  	if (iir & GT_RENDER_USER_INTERRUPT) {
> -- 
> 2.16.2
> 

Confirmed that this along with the interrupt flush eliminates the cases
of finding CSB tail at its reset value (0x7) in the tasklet in my force
preemption tests.

Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Chris Wilson March 22, 2018, 5:01 p.m. UTC | #3
Quoting Jeff McGee (2018-03-22 15:34:45)
> On Thu, Mar 22, 2018 at 07:35:32AM +0000, Chris Wilson wrote:
> > Using engine->irq_posted for execlists, we are not always serialised by
> > the tasklet as we supposed. On the reset paths, the tasklet is disabled
> > and ignored. Instead, we manipulate the engine->irq_posted directly to
> > account for the reset, but if an interrupt fired before the reset and so
> > wrote to engine->irq_posted, that write may not be flushed from the
> > local CPU's cacheline until much later as the tasklet is already active
> > and so does not generate a mb(). To correctly serialise the interrupt
> > with reset, we need serialisation on the set_bit() itself.
> > 
> > And at last Mika can be happy.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > CC: Michel Thierry <michel.thierry@intel.com>
> > Cc: Jeff McGee <jeff.mcgee@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index fa7310766217..27aee25429b7 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1405,10 +1405,9 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
> >       bool tasklet = false;
> >  
> >       if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> > -             if (READ_ONCE(engine->execlists.active)) {
> > -                     __set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> > -                     tasklet = true;
> > -             }
> > +             if (READ_ONCE(engine->execlists.active))
> > +                     tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> > +                                                 &engine->irq_posted);
> >       }
> >  
> >       if (iir & GT_RENDER_USER_INTERRUPT) {
> > -- 
> > 2.16.2
> > 
> 
> Confirmed that this along with the interrupt flush eliminates the cases
> of finding CSB tail at its reset value (0x7) in the tasklet in my force
> preemption tests.

At the moment, I'm concerned about the failures we have in CI before we
go building on top. So care to complete the set of r-b for us to move
on?
-Chris
Chris Wilson March 30, 2018, 11:08 p.m. UTC | #4
Quoting Chris Wilson (2018-03-22 07:35:32)
> Using engine->irq_posted for execlists, we are not always serialised by
> the tasklet as we supposed. On the reset paths, the tasklet is disabled
> and ignored. Instead, we manipulate the engine->irq_posted directly to
> account for the reset, but if an interrupt fired before the reset and so
> wrote to engine->irq_posted, that write may not be flushed from the
> local CPU's cacheline until much later as the tasklet is already active
> and so does not generate a mb(). To correctly serialise the interrupt
> with reset, we need serialisation on the set_bit() itself.
> 
> And at last Mika can be happy.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> CC: Michel Thierry <michel.thierry@intel.com>
> Cc: Jeff McGee <jeff.mcgee@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index fa7310766217..27aee25429b7 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1405,10 +1405,9 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>         bool tasklet = false;
>  
>         if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> -               if (READ_ONCE(engine->execlists.active)) {
> -                       __set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -                       tasklet = true;
> -               }
> +               if (READ_ONCE(engine->execlists.active))
> +                       tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> +                                                   &engine->irq_posted);

This is driving me mad. A very rare missed interrupt unless we
unconditionally kick tasklet:

        if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-               if (READ_ONCE(engine->execlists.active))
-                       tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-                                                   &engine->irq_posted);
+               if (READ_ONCE(engine->execlists.active)) {
+                       set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+                       tasklet = true;
+               }
        }

I can't see why.

Hmm, I wonder if we are seeing READ_ONCE(execlsts->active) false
negatives.

Getting close to admitting defeat :(
-Chris
Chris Wilson March 31, 2018, 8:59 a.m. UTC | #5
Quoting Chris Wilson (2018-03-31 00:08:47)
> Quoting Chris Wilson (2018-03-22 07:35:32)
> > Using engine->irq_posted for execlists, we are not always serialised by
> > the tasklet as we supposed. On the reset paths, the tasklet is disabled
> > and ignored. Instead, we manipulate the engine->irq_posted directly to
> > account for the reset, but if an interrupt fired before the reset and so
> > wrote to engine->irq_posted, that write may not be flushed from the
> > local CPU's cacheline until much later as the tasklet is already active
> > and so does not generate a mb(). To correctly serialise the interrupt
> > with reset, we need serialisation on the set_bit() itself.
> > 
> > And at last Mika can be happy.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > CC: Michel Thierry <michel.thierry@intel.com>
> > Cc: Jeff McGee <jeff.mcgee@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index fa7310766217..27aee25429b7 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1405,10 +1405,9 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
> >         bool tasklet = false;
> >  
> >         if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> > -               if (READ_ONCE(engine->execlists.active)) {
> > -                       __set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> > -                       tasklet = true;
> > -               }
> > +               if (READ_ONCE(engine->execlists.active))
> > +                       tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> > +                                                   &engine->irq_posted);
> 
> This is driving me mad. A very rare missed interrupt unless we
> unconditionally kick tasklet:
> 
>         if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
> -               if (READ_ONCE(engine->execlists.active))
> -                       tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
> -                                                   &engine->irq_posted);
> +               if (READ_ONCE(engine->execlists.active)) {
> +                       set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +                       tasklet = true;
> +               }
>         }
> 
> I can't see why.
> 
> Hmm, I wonder if we are seeing READ_ONCE(execlsts->active) false
> negatives.

Fortunately, doesn't appear to be that.

@@ -1405,9 +1405,10 @@  gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	bool tasklet = false;
 
 	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
+		GEM_BUG_ON(!READ_ONCE(execlists->tasklet.state) &&
+			   test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+		tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
+					    &engine->irq_posted);
 	}

Hasn't even hit a BUG, which is a little disconcerting.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index fa7310766217..27aee25429b7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1405,10 +1405,9 @@  gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	bool tasklet = false;
 
 	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active)) {
-			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-			tasklet = true;
-		}
+		if (READ_ONCE(engine->execlists.active))
+			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
+						    &engine->irq_posted);
 	}
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {