diff mbox

[1/1] drm/i915: Fix for nested_enable_signaling BUG_ON

Message ID 20171128091059.33709-2-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Harrison Nov. 28, 2017, 9:10 a.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

The call to enable signaling was occuring after the request had been
sent to the GuC for execution on the hardware. That means that it is
possible for the request to actually complete before the code to
enable signaling has executed.

Potentially that means the request could be signalled before anyone is
ready to receive the signal and thus lead to a deadlock where the
driver is now waiting for a signal that will never arrive because it
has already missed it. Certainly it will lead to the BUG in
nested_enable_signaling() firing complaining that the request has
already been signaled.

This patch reverses the order. Signaling is now enabled before sending
the request to the GuC.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Chris Wilson Nov. 28, 2017, 9:40 a.m. UTC | #1
Quoting John.C.Harrison@Intel.com (2017-11-28 09:10:59)
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The call to enable signaling was occuring after the request had been
> sent to the GuC for execution on the hardware. That means that it is
> possible for the request to actually complete before the code to
> enable signaling has executed.

No, it wasn't, unless you had modified the code. Note also we switched
to a different scheme in upstream
-Chris
John Harrison Nov. 28, 2017, 3:38 p.m. UTC | #2
On 11/28/2017 1:40 AM, Chris Wilson wrote:
> Quoting John.C.Harrison@Intel.com (2017-11-28 09:10:59)
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The call to enable signaling was occuring after the request had been
>> sent to the GuC for execution on the hardware. That means that it is
>> possible for the request to actually complete before the code to
>> enable signaling has executed.
> No, it wasn't, unless you had modified the code. Note also we switched
> to a different scheme in upstream
> -Chris

It was as of:

    commit d99c1a7aff71536e909f54709023dcc5d6b559c0
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Thu Mar 16 12:56:18 2017 +0000

    drm/i915/scheduler: emulate a scheduler for guc


The code introduced was:

    +               ...
    +               if (last && rq->ctx != last->ctx) {
    +                       ...
    *+                       nested_enable_signaling(last);**
    * +                       port++;
    +               }
    +               ...
    *+               i915_guc_submit(rq);**
    *+               last = rq;
    +               submit = true;
    +       }
    +       if (submit) {
    +               i915_gem_request_assign(&port->request, last);
    *+               nested_enable_signaling(last);**
    *+               engine->execlist_first = rb;
    +       }
    +       ...


That quite definitely calls nested_enable_signaling() after calling 
i915_guc_submit(). It is either called on the last request submitted for 
a given context or on the last request submitted overall. But in either 
case it is only called after the actual submission occurs.

Maybe that was not exactly 4.11, there might have been a few more recent 
patches pulled in to this tree. It is definitely not latest upstream 
though. And as I noted in the cover letter, moving to latest upstream is 
not an option for them at this point.

John.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index ff82f0561ec1..edf29758f35a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -721,7 +721,6 @@  static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 			}
 
 			i915_gem_request_assign(&port->request, last);
-			nested_enable_signaling(last);
 			port++;
 		}
 
@@ -749,6 +748,7 @@  static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 			}
 #endif
 		}
+		nested_enable_signaling(rq);
 		i915_guc_submit(rq);
 		trace_i915_gem_request_in(rq, port - engine->execlist_port);
 		last = rq;
@@ -758,7 +758,6 @@  static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 	}
 	if (submit) {
 		i915_gem_request_assign(&port->request, last);
-		nested_enable_signaling(last);
 		engine->execlist_first = rb;
 	}
 out: