diff mbox

Revert "drm/i915: Skip execlists_dequeue() early if the list is empty"

Message ID 20170329100052.29505-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 29, 2017, 10 a.m. UTC
This reverts commit 6c943de6686f ("drm/i915: Skip execlists_dequeue()
early if the list is empty").

The validity of using READ_ONCE there depends upon having a mb to
coordinate the assignment of engine->execlist_first inside
submit_request() and checking prior to taking the spinlock in
execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
memory barrier making this cross-cpu checking safe", but failed to
notice that this mb was *conditional* on the execlists being ready, i.e.
there wasn't the request mb when it was most necessary!

We could install an unconditional memory barrier to fixup the
READ_ONCE():

Comments

Chris Wilson March 29, 2017, 11:18 a.m. UTC | #1
On Wed, Mar 29, 2017 at 11:00:52AM +0100, Chris Wilson wrote:
> This reverts commit 6c943de6686f ("drm/i915: Skip execlists_dequeue()
> early if the list is empty").
> 
> The validity of using READ_ONCE there depends upon having a mb to
> coordinate the assignment of engine->execlist_first inside
> submit_request() and checking prior to taking the spinlock in
> execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
> memory barrier making this cross-cpu checking safe", but failed to
> notice that this mb was *conditional* on the execlists being ready, i.e.
> there wasn't the request mb when it was most necessary!

s/request/required/

> We could install an unconditional memory barrier to fixup the
> READ_ONCE():
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 7dd732cb9f57..1ed164b16d44 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -616,6 +616,7 @@ static void execlists_submit_request(struct
> drm_i915_gem_request *request)
> 
>         if (insert_request(&request->priotree, &engine->execlist_queue))
> {
>                 engine->execlist_first = &request->priotree.node;
> +               smp_wmb();
>                 if (execlists_elsp_ready(engine))
> 
> But we have opted to remove the race as it should be rarely effective,
> and saves us having to explain the necessary memory barriers which we
> quite clearly failed at.
> 

Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Fixes: 6c943de6686f ("drm/i915: Skip execlists_dequeue() early if the list is empty")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
-Chris
Chris Wilson March 29, 2017, 12:03 p.m. UTC | #2
On Wed, Mar 29, 2017 at 12:18:57PM +0100, Chris Wilson wrote:
> On Wed, Mar 29, 2017 at 11:00:52AM +0100, Chris Wilson wrote:
> > This reverts commit 6c943de6686f ("drm/i915: Skip execlists_dequeue()
> > early if the list is empty").
> > 
> > The validity of using READ_ONCE there depends upon having a mb to
> > coordinate the assignment of engine->execlist_first inside
> > submit_request() and checking prior to taking the spinlock in
> > execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
> > memory barrier making this cross-cpu checking safe", but failed to
> > notice that this mb was *conditional* on the execlists being ready, i.e.
> > there wasn't the request mb when it was most necessary!
> 
> s/request/required/
> 
> > We could install an unconditional memory barrier to fixup the
> > READ_ONCE():
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 7dd732cb9f57..1ed164b16d44 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -616,6 +616,7 @@ static void execlists_submit_request(struct
> > drm_i915_gem_request *request)
> > 
> >         if (insert_request(&request->priotree, &engine->execlist_queue))
> > {
> >                 engine->execlist_first = &request->priotree.node;
> > +               smp_wmb();
> >                 if (execlists_elsp_ready(engine))
> > 
> > But we have opted to remove the race as it should be rarely effective,
> > and saves us having to explain the necessary memory barriers which we
> > quite clearly failed at.
> > 
> 
> Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Fixes: 6c943de6686f ("drm/i915: Skip execlists_dequeue() early if the list is empty")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>

Pushed with Tvrtko's r-b from irc.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 7dd732cb9f57..1ed164b16d44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,6 +616,7 @@  static void execlists_submit_request(struct
drm_i915_gem_request *request)

        if (insert_request(&request->priotree, &engine->execlist_queue))
{
                engine->execlist_first = &request->priotree.node;
+               smp_wmb();
                if (execlists_elsp_ready(engine))

But we have opted to remove the race as it should be rarely effective,
and saves us having to explain the necessary memory barriers which we
quite clearly failed at.

Fixes: 6c943de6686f ("drm/i915: Skip execlists_dequeue() early if the list is empty")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 12 ------------
 drivers/gpu/drm/i915/intel_lrc.c           | 12 ------------
 2 files changed, 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 6193ad7edcf4..1642fff9cf13 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -662,18 +662,6 @@  static bool i915_guc_dequeue(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	bool submit = false;
 
-	/* After execlist_first is updated, the tasklet will be rescheduled.
-	 *
-	 * If we are currently running (inside the tasklet) and a third
-	 * party queues a request and so updates engine->execlist_first under
-	 * the spinlock (which we have elided), it will atomically set the
-	 * TASKLET_SCHED flag causing the us to be re-executed and pick up
-	 * the change in state (the update to TASKLET_SCHED incurs a memory
-	 * barrier making this cross-cpu checking safe).
-	 */
-	if (!READ_ONCE(engine->execlist_first))
-		return false;
-
 	spin_lock_irq(&engine->timeline->lock);
 	rb = engine->execlist_first;
 	while (rb) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 91e38e80a095..a0ea89c2bb16 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -402,18 +402,6 @@  static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	bool submit = false;
 
-	/* After execlist_first is updated, the tasklet will be rescheduled.
-	 *
-	 * If we are currently running (inside the tasklet) and a third
-	 * party queues a request and so updates engine->execlist_first under
-	 * the spinlock (which we have elided), it will atomically set the
-	 * TASKLET_SCHED flag causing the us to be re-executed and pick up
-	 * the change in state (the update to TASKLET_SCHED incurs a memory
-	 * barrier making this cross-cpu checking safe).
-	 */
-	if (!READ_ONCE(engine->execlist_first))
-		return;
-
 	last = port->request;
 	if (last)
 		/* WaIdleLiteRestore:bdw,skl