diff mbox

drm/i915: avoid false fence signaling when enabling preemption

Message ID 1520835430-23933-1-git-send-email-min.he@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

He, Min March 12, 2018, 6:17 a.m. UTC
In i915_request_wait(), it adds a local intel_wait variable into rb tree
of waiters. However, when the corresponding request is preempted, the
seqno of this wait will not be updated, which will lead to a false
signaling to the request and cause the i915_request_wait() to return
early before the request is really completed.

In this patch, we check if the seqno in wait is same as the request
before we signal the fence, and thus fix the above issue.

Signed-off-by: Min He <min.he@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson March 12, 2018, 10:42 a.m. UTC | #1
Quoting Min He (2018-03-12 06:17:10)
> In i915_request_wait(), it adds a local intel_wait variable into rb tree
> of waiters. However, when the corresponding request is preempted, the
> seqno of this wait will not be updated, which will lead to a false
> signaling to the request and cause the i915_request_wait() to return
> early before the request is really completed.

Incorrect. We confirm that the request wasn't preempted before
signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot
unsubmit the request.
-Chris
Chris Wilson March 12, 2018, 10:44 a.m. UTC | #2
Quoting Chris Wilson (2018-03-12 10:42:56)
> Quoting Min He (2018-03-12 06:17:10)
> > In i915_request_wait(), it adds a local intel_wait variable into rb tree
> > of waiters. However, when the corresponding request is preempted, the
> > seqno of this wait will not be updated, which will lead to a false
> > signaling to the request and cause the i915_request_wait() to return
> > early before the request is really completed.
> 
> Incorrect. We confirm that the request wasn't preempted before
> signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot
> unsubmit the request.

Or to put it another way, there is an assertion there that would fire if
we make a mistake; you should cite that oops in the commit message.
-Chris
Chris Wilson March 12, 2018, 10:46 a.m. UTC | #3
Quoting Chris Wilson (2018-03-12 10:42:56)
> Quoting Min He (2018-03-12 06:17:10)
> > In i915_request_wait(), it adds a local intel_wait variable into rb tree
> > of waiters. However, when the corresponding request is preempted, the
> > seqno of this wait will not be updated, which will lead to a false
> > signaling to the request and cause the i915_request_wait() to return
> > early before the request is really completed.
> 
> Incorrect. We confirm that the request wasn't preempted before
> signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot
> unsubmit the request.

I should also point out that the locking around accessing the wait is
absent...
-Chris
Chris Wilson March 12, 2018, 12:44 p.m. UTC | #4
Quoting Patchwork (2018-03-12 12:40:11)
> == Series Details ==
> 
> Series: drm/i915: avoid false fence signaling when enabling preemption
> URL   : https://patchwork.freedesktop.org/series/39768/
> State : failure
> 
> == Summary ==
> 
> ---- Possible new issues:
> 
> Test kms_cursor_crc:
>         Subgroup cursor-64x64-suspend:
>                 fail       -> PASS       (shard-apl)
> Test kms_plane_multiple:
>         Subgroup legacy-pipe-a-tiling-x:
>                 pass       -> FAIL       (shard-apl)
> Test kms_vblank:
>         Subgroup pipe-a-wait-busy-hang:
>                 skip       -> PASS       (shard-snb)
> 
> ---- Known issues:
> 
> Test drv_hangman:
>         Subgroup error-state-capture-blt:
>                 pass       -> INCOMPLETE (shard-snb) fdo#104058
> Test drv_selftest:
>         Subgroup live_gtt:
>                 incomplete -> PASS       (shard-apl) fdo#103927
> Test gem_eio:
>         Subgroup in-flight-contexts:
>                 pass       -> INCOMPLETE (shard-apl) fdo#105341
> Test kms_chv_cursor_fail:
>         Subgroup pipe-a-64x64-top-edge:
>                 skip       -> PASS       (shard-snb) fdo#105185
> Test kms_flip:
>         Subgroup dpms-vs-vblank-race-interruptible:
>                 pass       -> FAIL       (shard-hsw) fdo#103060
> Test pm_lpsp:
>         Subgroup screens-disabled:
>                 pass       -> FAIL       (shard-hsw) fdo#104941
> Test pm_rpm:
>         Subgroup system-suspend:
>                 incomplete -> PASS       (shard-hsw) fdo#103375

There's a challenge then. Write a small enough test to demonstrate this
patch has issues. gem_exec_whisper would be my goto.
-Chris
He, Min March 13, 2018, 2:02 a.m. UTC | #5
Chris,
I just found that I was based on an older code branch, and your commit
" drm/i915: Check waiter->seqno carefully in case of preemption" already fixed
this issue I mentioned. 

I should be more careful and thanks a lot for your time and patience. 

> -----Original Message-----

> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

> Sent: Monday, March 12, 2018 6:47 PM

> To: He, Min <min.he@intel.com>; intel-gfx@lists.freedesktop.org

> Subject: Re: [Intel-gfx] [PATCH] drm/i915: avoid false fence signaling when

> enabling preemption

> 

> Quoting Chris Wilson (2018-03-12 10:42:56)

> > Quoting Min He (2018-03-12 06:17:10)

> > > In i915_request_wait(), it adds a local intel_wait variable into rb tree

> > > of waiters. However, when the corresponding request is preempted, the

> > > seqno of this wait will not be updated, which will lead to a false

> > > signaling to the request and cause the i915_request_wait() to return

> > > early before the request is really completed.

> >

> > Incorrect. We confirm that the request wasn't preempted before

> > signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot

> > unsubmit the request.

> 

> I should also point out that the locking around accessing the wait is

> absent...

> -Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 828f310..1ef961d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1121,7 +1121,7 @@  static void notify_ring(struct intel_engine_cs *engine)
 	}
 	spin_unlock(&engine->breadcrumbs.irq_lock);
 
-	if (rq) {
+	if (rq && intel_wait_check_request(wait, rq)) {
 		dma_fence_signal(&rq->fence);
 		GEM_BUG_ON(!i915_request_completed(rq));
 		i915_request_put(rq);