diff mbox

[04/11] drm/i915/execlists: Pull CSB reset under the timeline.lock

Message ID 20180531185204.19520-5-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 31, 2018, 6:51 p.m. UTC
In the following patch, we will process the CSB interrupt under the
timeline.lock and not under the tasklet lock. This also means that we
will need to protect access to common variables such as
execlists->csb_head with the timeline.lock during reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Tvrtko Ursulin June 4, 2018, 2:25 p.m. UTC | #1
On 31/05/2018 19:51, Chris Wilson wrote:
> In the following patch, we will process the CSB interrupt under the
> timeline.lock and not under the tasklet lock. This also means that we

Implied tasklet lock? Or tasklet serialization?

> will need to protect access to common variables such as
> execlists->csb_head with the timeline.lock during reset.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d207a1bf9dc9..ec54c29b610f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
>   		  engine->name, request ? request->global_seqno : 0,
>   		  intel_engine_get_seqno(engine));
>   
> -	/* See execlists_cancel_requests() for the irq/spinlock split. */
> -	local_irq_save(flags);
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   
>   	/*
>   	 * Catch up with any missed context-switch interrupts.
> @@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
>   	reset_irq(engine);

There is a synchronize_hardirq in this one so this could be a deadlock I 
think depending on lock ordering. If another CPU enters the irq handler 
and tries to take the timeline lock which this thread already has, then 
they deadlock each other.

Regards,

Tvrtko

>   
>   	/* Push back any incomplete requests for replay after the reset. */
> -	spin_lock(&engine->timeline.lock);
>   	__unwind_incomplete_requests(engine);
> -	spin_unlock(&engine->timeline.lock);
>   
>   	/* Following the reset, we need to reload the CSB read/write pointers */
>   	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
>   
> -	local_irq_restore(flags);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   
>   	/*
>   	 * If the request was innocent, we leave the request in the ELSP
>
Chris Wilson June 4, 2018, 3:29 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-06-04 15:25:26)
> 
> On 31/05/2018 19:51, Chris Wilson wrote:
> > In the following patch, we will process the CSB interrupt under the
> > timeline.lock and not under the tasklet lock. This also means that we
> 
> Implied tasklet lock? Or tasklet serialization?

We were using the tasklet to provide exclusive access to the port[], so
were using it as a serialisation primitive and lock.
 
> > will need to protect access to common variables such as
> > execlists->csb_head with the timeline.lock during reset.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
> >   1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index d207a1bf9dc9..ec54c29b610f 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
> >                 engine->name, request ? request->global_seqno : 0,
> >                 intel_engine_get_seqno(engine));
> >   
> > -     /* See execlists_cancel_requests() for the irq/spinlock split. */
> > -     local_irq_save(flags);
> > +     spin_lock_irqsave(&engine->timeline.lock, flags);
> >   
> >       /*
> >        * Catch up with any missed context-switch interrupts.
> > @@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
> >       reset_irq(engine);
> 
> There is a synchronize_hardirq in this one so this could be a deadlock I 
> think depending on lock ordering. If another CPU enters the irq handler 
> and tries to take the timeline lock which this thread already has, then 
> they deadlock each other.

Nothing, and that would explain the deadlock in the other thread.
-Chris
Tvrtko Ursulin June 5, 2018, 8:25 a.m. UTC | #3
On 04/06/2018 16:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-04 15:25:26)
>>
>> On 31/05/2018 19:51, Chris Wilson wrote:
>>> In the following patch, we will process the CSB interrupt under the
>>> timeline.lock and not under the tasklet lock. This also means that we
>>
>> Implied tasklet lock? Or tasklet serialization?
> 
> We were using the tasklet to provide exclusive access to the port[], so
> were using it as a serialisation primitive and lock.

Yes of course, was just thinking if it worth being more verbose with the 
"tasklist lock" wording. I just think of it more as "serialized by the 
tasklet" and not so much as the lock.

>>> will need to protect access to common variables such as
>>> execlists->csb_head with the timeline.lock during reset.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
>>>    1 file changed, 2 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index d207a1bf9dc9..ec54c29b610f 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -1927,8 +1927,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
>>>                  engine->name, request ? request->global_seqno : 0,
>>>                  intel_engine_get_seqno(engine));
>>>    
>>> -     /* See execlists_cancel_requests() for the irq/spinlock split. */
>>> -     local_irq_save(flags);
>>> +     spin_lock_irqsave(&engine->timeline.lock, flags);
>>>    
>>>        /*
>>>         * Catch up with any missed context-switch interrupts.
>>> @@ -1943,14 +1942,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
>>>        reset_irq(engine);
>>
>> There is a synchronize_hardirq in this one so this could be a deadlock I
>> think depending on lock ordering. If another CPU enters the irq handler
>> and tries to take the timeline lock which this thread already has, then
>> they deadlock each other.
> 
> Nothing, and that would explain the deadlock in the other thread.

Nothing what? :)

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d207a1bf9dc9..ec54c29b610f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1927,8 +1927,7 @@  static void execlists_reset(struct intel_engine_cs *engine,
 		  engine->name, request ? request->global_seqno : 0,
 		  intel_engine_get_seqno(engine));
 
-	/* See execlists_cancel_requests() for the irq/spinlock split. */
-	local_irq_save(flags);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	/*
 	 * Catch up with any missed context-switch interrupts.
@@ -1943,14 +1942,12 @@  static void execlists_reset(struct intel_engine_cs *engine,
 	reset_irq(engine);
 
 	/* Push back any incomplete requests for replay after the reset. */
-	spin_lock(&engine->timeline.lock);
 	__unwind_incomplete_requests(engine);
-	spin_unlock(&engine->timeline.lock);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
 
-	local_irq_restore(flags);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	/*
 	 * If the request was innocent, we leave the request in the ELSP