diff mbox

drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt

Message ID 20180321091027.21034-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 21, 2018, 9:10 a.m. UTC
We were relying on the uncached reads when processing the CSB to provide
ourselves with the serialisation with the interrupt handler (so we could
detect new interrupts in the middle of processing the old one). However,
in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
from the HWSP") those uncached reads were eliminated (on one path at
least) and along with them our serialisation. The result is that we
would very rarely miss notification of a new interrupt and leave a
context-switch unprocessed, hanging the GPU.

Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

Comments

Tvrtko Ursulin March 21, 2018, 10:14 a.m. UTC | #1
On 21/03/2018 09:10, Chris Wilson wrote:
> We were relying on the uncached reads when processing the CSB to provide
> ourselves with the serialisation with the interrupt handler (so we could
> detect new interrupts in the middle of processing the old one). However,
> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> from the HWSP") those uncached reads were eliminated (on one path at
> least) and along with them our serialisation. The result is that we
> would very rarely miss notification of a new interrupt and leave a
> context-switch unprocessed, hanging the GPU.
> 
> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
>   1 file changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 53f1c009ed7b..67b6a0f658d6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
>   	struct drm_i915_private *dev_priv = engine->i915;
>   	bool fw = false;
>   
> -	/* We can skip acquiring intel_runtime_pm_get() here as it was taken
> +	/*
> +	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
>   	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
>   	 * not be relinquished until the device is idle (see
>   	 * i915_gem_idle_work_handler()). As a precaution, we make sure
> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
>   	 */
>   	GEM_BUG_ON(!dev_priv->gt.awake);
>   
> -	/* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> +	/*
> +	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
>   	 * imposing the cost of a locked atomic transaction when submitting a
>   	 * new request (outside of the context-switch interrupt).
>   	 */
> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
>   			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
>   		}
>   
> -		/* The write will be ordered by the uncached read (itself
> -		 * a memory barrier), so we do not need another in the form
> -		 * of a locked instruction. The race between the interrupt
> -		 * handler and the split test/clear is harmless as we order
> -		 * our clear before the CSB read. If the interrupt arrived
> -		 * first between the test and the clear, we read the updated
> -		 * CSB and clear the bit. If the interrupt arrives as we read
> -		 * the CSB or later (i.e. after we had cleared the bit) the bit
> -		 * is set and we do a new loop.
> -		 */
> -		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +		/* Clear before reading to catch new interrupts */
> +		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +		smp_mb__after_atomic();
> +

Could theoretically avoid the locked cost in the mmio case by having two 
flavours of bit clearing in the "if" branches below but it doesn't 
sounds like a worthy complication in a wider context.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now off to apply it in desperation it might affect my weird reset issues...

Regards,

Tvrtko

>   		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
>   			if (!fw) {
>   				intel_uncore_forcewake_get(dev_priv,
>
Mika Kuoppala March 21, 2018, 10:46 a.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We were relying on the uncached reads when processing the CSB to provide
> ourselves with the serialisation with the interrupt handler (so we could
> detect new interrupts in the middle of processing the old one). However,
> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> from the HWSP") those uncached reads were eliminated (on one path at
> least) and along with them our serialisation. The result is that we
> would very rarely miss notification of a new interrupt and leave a
> context-switch unprocessed, hanging the GPU.
>
> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
>  1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 53f1c009ed7b..67b6a0f658d6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
>  	struct drm_i915_private *dev_priv = engine->i915;
>  	bool fw = false;
>  
> -	/* We can skip acquiring intel_runtime_pm_get() here as it was taken
> +	/*
> +	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
>  	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
>  	 * not be relinquished until the device is idle (see
>  	 * i915_gem_idle_work_handler()). As a precaution, we make sure
> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
>  	 */
>  	GEM_BUG_ON(!dev_priv->gt.awake);
>  
> -	/* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> +	/*
> +	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
>  	 * imposing the cost of a locked atomic transaction when submitting a
>  	 * new request (outside of the context-switch interrupt).
>  	 */
> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
>  			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
>  		}
>  
> -		/* The write will be ordered by the uncached read (itself
> -		 * a memory barrier), so we do not need another in the form
> -		 * of a locked instruction. The race between the interrupt
> -		 * handler and the split test/clear is harmless as we order
> -		 * our clear before the CSB read. If the interrupt arrived
> -		 * first between the test and the clear, we read the updated
> -		 * CSB and clear the bit. If the interrupt arrives as we read
> -		 * the CSB or later (i.e. after we had cleared the bit) the bit
> -		 * is set and we do a new loop.
> -		 */
> -		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +		/* Clear before reading to catch new interrupts */
> +		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +		smp_mb__after_atomic();

I was confused about this memory barrier as our test is in the
same context and ordered wrt this. Chris noted in irc that this is for
the documentation for ordering wrt the code that follows.

I am fine with that so,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +
>  		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
>  			if (!fw) {
>  				intel_uncore_forcewake_get(dev_priv,
> -- 
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Michel Thierry March 21, 2018, 5:01 p.m. UTC | #3
On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
>> We were relying on the uncached reads when processing the CSB to provide
>> ourselves with the serialisation with the interrupt handler (so we could
>> detect new interrupts in the middle of processing the old one). However,
>> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
>> from the HWSP") those uncached reads were eliminated (on one path at
>> least) and along with them our serialisation. The result is that we
>> would very rarely miss notification of a new interrupt and leave a
>> context-switch unprocessed, hanging the GPU.
>>
>> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Michel Thierry <michel.thierry@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
>>   1 file changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 53f1c009ed7b..67b6a0f658d6 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
>>   	struct drm_i915_private *dev_priv = engine->i915;
>>   	bool fw = false;
>>   
>> -	/* We can skip acquiring intel_runtime_pm_get() here as it was taken
>> +	/*
>> +	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
>>   	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
>>   	 * not be relinquished until the device is idle (see
>>   	 * i915_gem_idle_work_handler()). As a precaution, we make sure
>> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
>>   	 */
>>   	GEM_BUG_ON(!dev_priv->gt.awake);
>>   
>> -	/* Prefer doing test_and_clear_bit() as a two stage operation to avoid
>> +	/*
>> +	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
>>   	 * imposing the cost of a locked atomic transaction when submitting a
>>   	 * new request (outside of the context-switch interrupt).
>>   	 */
>> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
>>   			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
>>   		}
>>   
>> -		/* The write will be ordered by the uncached read (itself
>> -		 * a memory barrier), so we do not need another in the form
>> -		 * of a locked instruction. The race between the interrupt
>> -		 * handler and the split test/clear is harmless as we order
>> -		 * our clear before the CSB read. If the interrupt arrived
>> -		 * first between the test and the clear, we read the updated
>> -		 * CSB and clear the bit. If the interrupt arrives as we read
>> -		 * the CSB or later (i.e. after we had cleared the bit) the bit
>> -		 * is set and we do a new loop.
>> -		 */
>> -		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> +		/* Clear before reading to catch new interrupts */
>> +		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> +		smp_mb__after_atomic();

Checkpatch wants a comment for the memory barrier... Are we being strict 
about it? (https://patchwork.freedesktop.org/series/40359/)

> 
> I was confused about this memory barrier as our test is in the
> same context and ordered wrt this. Chris noted in irc that this is for
> the documentation for ordering wrt the code that follows.
> 
> I am fine with that so,
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 

Fine by me too,

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

>> +
>>   		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
>>   			if (!fw) {
>>   				intel_uncore_forcewake_get(dev_priv,
>> -- 
>> 2.16.2
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Chris Wilson March 21, 2018, 5:05 p.m. UTC | #4
Quoting Michel Thierry (2018-03-21 17:01:12)
> On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> >> We were relying on the uncached reads when processing the CSB to provide
> >> ourselves with the serialisation with the interrupt handler (so we could
> >> detect new interrupts in the middle of processing the old one). However,
> >> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> >> from the HWSP") those uncached reads were eliminated (on one path at
> >> least) and along with them our serialisation. The result is that we
> >> would very rarely miss notification of a new interrupt and leave a
> >> context-switch unprocessed, hanging the GPU.
> >>
> >> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Michel Thierry <michel.thierry@intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
> >>   1 file changed, 8 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >> index 53f1c009ed7b..67b6a0f658d6 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
> >>      struct drm_i915_private *dev_priv = engine->i915;
> >>      bool fw = false;
> >>   
> >> -    /* We can skip acquiring intel_runtime_pm_get() here as it was taken
> >> +    /*
> >> +     * We can skip acquiring intel_runtime_pm_get() here as it was taken
> >>       * on our behalf by the request (see i915_gem_mark_busy()) and it will
> >>       * not be relinquished until the device is idle (see
> >>       * i915_gem_idle_work_handler()). As a precaution, we make sure
> >> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
> >>       */
> >>      GEM_BUG_ON(!dev_priv->gt.awake);
> >>   
> >> -    /* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> >> +    /*
> >> +     * Prefer doing test_and_clear_bit() as a two stage operation to avoid
> >>       * imposing the cost of a locked atomic transaction when submitting a
> >>       * new request (outside of the context-switch interrupt).
> >>       */
> >> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
> >>                      execlists->csb_head = -1; /* force mmio read of CSB ptrs */
> >>              }
> >>   
> >> -            /* The write will be ordered by the uncached read (itself
> >> -             * a memory barrier), so we do not need another in the form
> >> -             * of a locked instruction. The race between the interrupt
> >> -             * handler and the split test/clear is harmless as we order
> >> -             * our clear before the CSB read. If the interrupt arrived
> >> -             * first between the test and the clear, we read the updated
> >> -             * CSB and clear the bit. If the interrupt arrives as we read
> >> -             * the CSB or later (i.e. after we had cleared the bit) the bit
> >> -             * is set and we do a new loop.
> >> -             */
> >> -            __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >> +            /* Clear before reading to catch new interrupts */
> >> +            clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >> +            smp_mb__after_atomic();
> 
> Checkpatch wants a comment for the memory barrier... Are we being strict 
> about it? (https://patchwork.freedesktop.org/series/40359/)

There's a comment for it not two lines above! Silly perl script.
-Chris
Chris Wilson March 21, 2018, 5:07 p.m. UTC | #5
Quoting Chris Wilson (2018-03-21 17:05:06)
> Quoting Michel Thierry (2018-03-21 17:01:12)
> > On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
> > > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > > 
> > >> We were relying on the uncached reads when processing the CSB to provide
> > >> ourselves with the serialisation with the interrupt handler (so we could
> > >> detect new interrupts in the middle of processing the old one). However,
> > >> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> > >> from the HWSP") those uncached reads were eliminated (on one path at
> > >> least) and along with them our serialisation. The result is that we
> > >> would very rarely miss notification of a new interrupt and leave a
> > >> context-switch unprocessed, hanging the GPU.
> > >>
> > >> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> > >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >> Cc: Michel Thierry <michel.thierry@intel.com>
> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > >> ---
> > >>   drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
> > >>   1 file changed, 8 insertions(+), 13 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > >> index 53f1c009ed7b..67b6a0f658d6 100644
> > >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> > >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > >> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
> > >>      struct drm_i915_private *dev_priv = engine->i915;
> > >>      bool fw = false;
> > >>   
> > >> -    /* We can skip acquiring intel_runtime_pm_get() here as it was taken
> > >> +    /*
> > >> +     * We can skip acquiring intel_runtime_pm_get() here as it was taken
> > >>       * on our behalf by the request (see i915_gem_mark_busy()) and it will
> > >>       * not be relinquished until the device is idle (see
> > >>       * i915_gem_idle_work_handler()). As a precaution, we make sure
> > >> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
> > >>       */
> > >>      GEM_BUG_ON(!dev_priv->gt.awake);
> > >>   
> > >> -    /* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> > >> +    /*
> > >> +     * Prefer doing test_and_clear_bit() as a two stage operation to avoid
> > >>       * imposing the cost of a locked atomic transaction when submitting a
> > >>       * new request (outside of the context-switch interrupt).
> > >>       */
> > >> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
> > >>                      execlists->csb_head = -1; /* force mmio read of CSB ptrs */
> > >>              }
> > >>   
> > >> -            /* The write will be ordered by the uncached read (itself
> > >> -             * a memory barrier), so we do not need another in the form
> > >> -             * of a locked instruction. The race between the interrupt
> > >> -             * handler and the split test/clear is harmless as we order
> > >> -             * our clear before the CSB read. If the interrupt arrived
> > >> -             * first between the test and the clear, we read the updated
> > >> -             * CSB and clear the bit. If the interrupt arrives as we read
> > >> -             * the CSB or later (i.e. after we had cleared the bit) the bit
> > >> -             * is set and we do a new loop.
> > >> -             */
> > >> -            __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> > >> +            /* Clear before reading to catch new interrupts */
> > >> +            clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> > >> +            smp_mb__after_atomic();
> > 
> > Checkpatch wants a comment for the memory barrier... Are we being strict 
> > about it? (https://patchwork.freedesktop.org/series/40359/)
> 
> There's a comment for it not two lines above! Silly perl script.

Besides it being only a simulacrum of a mb. Silly perl script :) 
-Chris
Chris Wilson March 21, 2018, 5:10 p.m. UTC | #6
Quoting Michel Thierry (2018-03-21 17:01:12)
> On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> >> We were relying on the uncached reads when processing the CSB to provide
> >> ourselves with the serialisation with the interrupt handler (so we could
> >> detect new interrupts in the middle of processing the old one). However,
> >> in commit 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD
> >> from the HWSP") those uncached reads were eliminated (on one path at
> >> least) and along with them our serialisation. The result is that we
> >> would very rarely miss notification of a new interrupt and leave a
> >> context-switch unprocessed, hanging the GPU.
> >>
> >> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Michel Thierry <michel.thierry@intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/intel_lrc.c | 21 ++++++++-------------
> >>   1 file changed, 8 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >> index 53f1c009ed7b..67b6a0f658d6 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -831,7 +831,8 @@ static void execlists_submission_tasklet(unsigned long data)
> >>      struct drm_i915_private *dev_priv = engine->i915;
> >>      bool fw = false;
> >>   
> >> -    /* We can skip acquiring intel_runtime_pm_get() here as it was taken
> >> +    /*
> >> +     * We can skip acquiring intel_runtime_pm_get() here as it was taken
> >>       * on our behalf by the request (see i915_gem_mark_busy()) and it will
> >>       * not be relinquished until the device is idle (see
> >>       * i915_gem_idle_work_handler()). As a precaution, we make sure
> >> @@ -840,7 +841,8 @@ static void execlists_submission_tasklet(unsigned long data)
> >>       */
> >>      GEM_BUG_ON(!dev_priv->gt.awake);
> >>   
> >> -    /* Prefer doing test_and_clear_bit() as a two stage operation to avoid
> >> +    /*
> >> +     * Prefer doing test_and_clear_bit() as a two stage operation to avoid
> >>       * imposing the cost of a locked atomic transaction when submitting a
> >>       * new request (outside of the context-switch interrupt).
> >>       */
> >> @@ -856,17 +858,10 @@ static void execlists_submission_tasklet(unsigned long data)
> >>                      execlists->csb_head = -1; /* force mmio read of CSB ptrs */
> >>              }
> >>   
> >> -            /* The write will be ordered by the uncached read (itself
> >> -             * a memory barrier), so we do not need another in the form
> >> -             * of a locked instruction. The race between the interrupt
> >> -             * handler and the split test/clear is harmless as we order
> >> -             * our clear before the CSB read. If the interrupt arrived
> >> -             * first between the test and the clear, we read the updated
> >> -             * CSB and clear the bit. If the interrupt arrives as we read
> >> -             * the CSB or later (i.e. after we had cleared the bit) the bit
> >> -             * is set and we do a new loop.
> >> -             */
> >> -            __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >> +            /* Clear before reading to catch new interrupts */
> >> +            clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >> +            smp_mb__after_atomic();
> 
> Checkpatch wants a comment for the memory barrier... Are we being strict 
> about it? (https://patchwork.freedesktop.org/series/40359/)
> 
> > 
> > I was confused about this memory barrier as our test is in the
> > same context and ordered wrt this. Chris noted in irc that this is for
> > the documentation for ordering wrt the code that follows.
> > 
> > I am fine with that so,
> > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > 
> 
> Fine by me too,
> 
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>

It definitely appears to be fixing an issue I've been seeing for the
last few months since using HWSP for execlists. But I only seeing in
conjunction with another set of patches, so my presumption was upon
those and not drm-tip (which kept on testing clear).

Thanks for the review, pushed. I'm sure I'll moan about the locked
instruction appearing in the profiles, just as much as I moan about the
locked instructions for tasklet_schedule() dominating some profiles.
-Chris
Jani Nikula March 22, 2018, 9:34 a.m. UTC | #7
On Wed, 21 Mar 2018, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Michel Thierry (2018-03-21 17:01:12)
>> On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
>> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> >> -            /* The write will be ordered by the uncached read (itself
>> >> -             * a memory barrier), so we do not need another in the form
>> >> -             * of a locked instruction. The race between the interrupt
>> >> -             * handler and the split test/clear is harmless as we order
>> >> -             * our clear before the CSB read. If the interrupt arrived
>> >> -             * first between the test and the clear, we read the updated
>> >> -             * CSB and clear the bit. If the interrupt arrives as we read
>> >> -             * the CSB or later (i.e. after we had cleared the bit) the bit
>> >> -             * is set and we do a new loop.
>> >> -             */
>> >> -            __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> >> +            /* Clear before reading to catch new interrupts */
>> >> +            clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> >> +            smp_mb__after_atomic();
>> 
>> Checkpatch wants a comment for the memory barrier... Are we being strict 
>> about it? (https://patchwork.freedesktop.org/series/40359/)
>
> There's a comment for it not two lines above! Silly perl script.

Sure, it's nowhere near perfect. But I do like to get the reminder about
this, "hey don't forget to document your memory barriers, locks,
etc.". It does mean we can't use checkpatch for gating, but I think it
can make the reviewer's life easier to be able to just point at the
results, and ask the author to fix the relevant stuff. I think it's less
tedious and less offensive than the reviewer doing the job manually.

BR,
Jani.
Chris Wilson March 22, 2018, 9:36 a.m. UTC | #8
Quoting Jani Nikula (2018-03-22 09:34:18)
> On Wed, 21 Mar 2018, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Quoting Michel Thierry (2018-03-21 17:01:12)
> >> On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> >> -            /* The write will be ordered by the uncached read (itself
> >> >> -             * a memory barrier), so we do not need another in the form
> >> >> -             * of a locked instruction. The race between the interrupt
> >> >> -             * handler and the split test/clear is harmless as we order
> >> >> -             * our clear before the CSB read. If the interrupt arrived
> >> >> -             * first between the test and the clear, we read the updated
> >> >> -             * CSB and clear the bit. If the interrupt arrives as we read
> >> >> -             * the CSB or later (i.e. after we had cleared the bit) the bit
> >> >> -             * is set and we do a new loop.
> >> >> -             */
> >> >> -            __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >> >> +            /* Clear before reading to catch new interrupts */
> >> >> +            clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> >> >> +            smp_mb__after_atomic();
> >> 
> >> Checkpatch wants a comment for the memory barrier... Are we being strict 
> >> about it? (https://patchwork.freedesktop.org/series/40359/)
> >
> > There's a comment for it not two lines above! Silly perl script.
> 
> Sure, it's nowhere near perfect. But I do like to get the reminder about
> this, "hey don't forget to document your memory barriers, locks,
> etc.". It does mean we can't use checkpatch for gating, but I think it
> can make the reviewer's life easier to be able to just point at the
> results, and ask the author to fix the relevant stuff. I think it's less
> tedious and less offensive than the reviewer doing the job manually.

The complaint was only in jest. The reminder to document locks and mb is
indeed invaluable, just sometimes the limitation of being a "dumb" perl
script show through.
-Chris
Jani Nikula March 22, 2018, 10:04 a.m. UTC | #9
On Thu, 22 Mar 2018, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Jani Nikula (2018-03-22 09:34:18)
>> On Wed, 21 Mar 2018, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > Quoting Michel Thierry (2018-03-21 17:01:12)
>> >> On 3/21/2018 3:46 AM, Mika Kuoppala wrote:
>> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> >> >> -            /* The write will be ordered by the uncached read (itself
>> >> >> -             * a memory barrier), so we do not need another in the form
>> >> >> -             * of a locked instruction. The race between the interrupt
>> >> >> -             * handler and the split test/clear is harmless as we order
>> >> >> -             * our clear before the CSB read. If the interrupt arrived
>> >> >> -             * first between the test and the clear, we read the updated
>> >> >> -             * CSB and clear the bit. If the interrupt arrives as we read
>> >> >> -             * the CSB or later (i.e. after we had cleared the bit) the bit
>> >> >> -             * is set and we do a new loop.
>> >> >> -             */
>> >> >> -            __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> >> >> +            /* Clear before reading to catch new interrupts */
>> >> >> +            clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> >> >> +            smp_mb__after_atomic();
>> >> 
>> >> Checkpatch wants a comment for the memory barrier... Are we being strict 
>> >> about it? (https://patchwork.freedesktop.org/series/40359/)
>> >
>> > There's a comment for it not two lines above! Silly perl script.
>> 
>> Sure, it's nowhere near perfect. But I do like to get the reminder about
>> this, "hey don't forget to document your memory barriers, locks,
>> etc.". It does mean we can't use checkpatch for gating, but I think it
>> can make the reviewer's life easier to be able to just point at the
>> results, and ask the author to fix the relevant stuff. I think it's less
>> tedious and less offensive than the reviewer doing the job manually.
>
> The complaint was only in jest. The reminder to document locks and mb is
> indeed invaluable, just sometimes the limitation of being a "dumb" perl
> script show through.

Oh, I didn't misread you. I just switched to serious mode because we do
need to evaluate whether the checkpatch reports from CI are net positive
or negative, and, either way, what can we do to further improve the S/N.

BR,
Jani.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53f1c009ed7b..67b6a0f658d6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -831,7 +831,8 @@  static void execlists_submission_tasklet(unsigned long data)
 	struct drm_i915_private *dev_priv = engine->i915;
 	bool fw = false;
 
-	/* We can skip acquiring intel_runtime_pm_get() here as it was taken
+	/*
+	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
 	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
 	 * not be relinquished until the device is idle (see
 	 * i915_gem_idle_work_handler()). As a precaution, we make sure
@@ -840,7 +841,8 @@  static void execlists_submission_tasklet(unsigned long data)
 	 */
 	GEM_BUG_ON(!dev_priv->gt.awake);
 
-	/* Prefer doing test_and_clear_bit() as a two stage operation to avoid
+	/*
+	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
 	 * imposing the cost of a locked atomic transaction when submitting a
 	 * new request (outside of the context-switch interrupt).
 	 */
@@ -856,17 +858,10 @@  static void execlists_submission_tasklet(unsigned long data)
 			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
 		}
 
-		/* The write will be ordered by the uncached read (itself
-		 * a memory barrier), so we do not need another in the form
-		 * of a locked instruction. The race between the interrupt
-		 * handler and the split test/clear is harmless as we order
-		 * our clear before the CSB read. If the interrupt arrived
-		 * first between the test and the clear, we read the updated
-		 * CSB and clear the bit. If the interrupt arrives as we read
-		 * the CSB or later (i.e. after we had cleared the bit) the bit
-		 * is set and we do a new loop.
-		 */
-		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+		/* Clear before reading to catch new interrupts */
+		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+		smp_mb__after_atomic();
+
 		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
 			if (!fw) {
 				intel_uncore_forcewake_get(dev_priv,