Message ID | 1520835430-23933-1-git-send-email-min.he@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Min He (2018-03-12 06:17:10) > In i915_request_wait(), it adds a local intel_wait variable into rb tree > of waiters. However, when the corresponding request is preempted, the > seqno of this wait will not be updated, which will lead to a false > signaling to the request and cause the i915_request_wait() to return > early before the request is really completed. Incorrect. We confirm that the request wasn't preempted before signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot unsubmit the request. -Chris
Quoting Chris Wilson (2018-03-12 10:42:56) > Quoting Min He (2018-03-12 06:17:10) > > In i915_request_wait(), it adds a local intel_wait variable into rb tree > > of waiters. However, when the corresponding request is preempted, the > > seqno of this wait will not be updated, which will lead to a false > > signaling to the request and cause the i915_request_wait() to return > > early before the request is really completed. > > Incorrect. We confirm that the request wasn't preempted before > signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot > unsubmit the request. Or to put it another way, there is an assertion there that would fire if we make a mistake; you should cite that oops in the commit message. -Chris
Quoting Chris Wilson (2018-03-12 10:42:56) > Quoting Min He (2018-03-12 06:17:10) > > In i915_request_wait(), it adds a local intel_wait variable into rb tree > > of waiters. However, when the corresponding request is preempted, the > > seqno of this wait will not be updated, which will lead to a false > > signaling to the request and cause the i915_request_wait() to return > > early before the request is really completed. > > Incorrect. We confirm that the request wasn't preempted before > signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot > unsubmit the request. I should also point out that the locking around accessing the wait is absent... -Chris
Quoting Patchwork (2018-03-12 12:40:11) > == Series Details == > > Series: drm/i915: avoid false fence signaling when enabling preemption > URL : https://patchwork.freedesktop.org/series/39768/ > State : failure > > == Summary == > > ---- Possible new issues: > > Test kms_cursor_crc: > Subgroup cursor-64x64-suspend: > fail -> PASS (shard-apl) > Test kms_plane_multiple: > Subgroup legacy-pipe-a-tiling-x: > pass -> FAIL (shard-apl) > Test kms_vblank: > Subgroup pipe-a-wait-busy-hang: > skip -> PASS (shard-snb) > > ---- Known issues: > > Test drv_hangman: > Subgroup error-state-capture-blt: > pass -> INCOMPLETE (shard-snb) fdo#104058 > Test drv_selftest: > Subgroup live_gtt: > incomplete -> PASS (shard-apl) fdo#103927 > Test gem_eio: > Subgroup in-flight-contexts: > pass -> INCOMPLETE (shard-apl) fdo#105341 > Test kms_chv_cursor_fail: > Subgroup pipe-a-64x64-top-edge: > skip -> PASS (shard-snb) fdo#105185 > Test kms_flip: > Subgroup dpms-vs-vblank-race-interruptible: > pass -> FAIL (shard-hsw) fdo#103060 > Test pm_lpsp: > Subgroup screens-disabled: > pass -> FAIL (shard-hsw) fdo#104941 > Test pm_rpm: > Subgroup system-suspend: > incomplete -> PASS (shard-hsw) fdo#103375 There's a challenge then. Write a small enough test to demonstrate this patch has issues. gem_exec_whisper would be my goto. -Chris
Chris, I just found that I was based on an older code branch, and your commit " drm/i915: Check waiter->seqno carefully in case of preemption" already fixed this issue I mentioned. I should be more careful and thanks a lot for your time and patience. > -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Monday, March 12, 2018 6:47 PM > To: He, Min <min.he@intel.com>; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: avoid false fence signaling when > enabling preemption > > Quoting Chris Wilson (2018-03-12 10:42:56) > > Quoting Min He (2018-03-12 06:17:10) > > > In i915_request_wait(), it adds a local intel_wait variable into rb tree > > > of waiters. However, when the corresponding request is preempted, the > > > seqno of this wait will not be updated, which will lead to a false > > > signaling to the request and cause the i915_request_wait() to return > > > early before the request is really completed. > > > > Incorrect. We confirm that the request wasn't preempted before > > signaling. Once i915_seqno_passed + intel_wait_check_request, we cannot > > unsubmit the request. > > I should also point out that the locking around accessing the wait is > absent... > -Chris
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 828f310..1ef961d 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1121,7 +1121,7 @@ static void notify_ring(struct intel_engine_cs *engine) } spin_unlock(&engine->breadcrumbs.irq_lock); - if (rq) { + if (rq && intel_wait_check_request(wait, rq)) { dma_fence_signal(&rq->fence); GEM_BUG_ON(!i915_request_completed(rq)); i915_request_put(rq);
In i915_request_wait(), it adds a local intel_wait variable into rb tree of waiters. However, when the corresponding request is preempted, the seqno of this wait will not be updated, which will lead to a false signaling to the request and cause the i915_request_wait() to return early before the request is really completed. In this patch, we check if the seqno in wait is same as the request before we signal the fence, and thus fix the above issue. Signed-off-by: Min He <min.he@intel.com> --- drivers/gpu/drm/i915/i915_irq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)