diff mbox series

[1/2] drm/i915: Seal races between async GPU cancellation, retirement and signaling

Message ID 20190508120542.28377-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915: Seal races between async GPU cancellation, retirement and signaling | expand

Commit Message

Chris Wilson May 8, 2019, 12:05 p.m. UTC
Currently there is an underlying assumption that i915_request_unsubmit()
is synchronous wrt the GPU -- that is the request is no longer in flight
as we remove it. In the near future that may change, and this may upset
our signaling as we can process an interrupt for that request while it
is no longer in flight.

CPU0					CPU1
intel_engine_breadcrumbs_irq
(queue request completion)
					i915_request_cancel_signaling
...					...
					i915_request_enable_signaling
dma_fence_signal

Hence in the time it took us to drop the lock to signal the request, a
preemption event may have occurred and re-queued the request. In the
process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
so reused the rq->signal_link that was in use on CPU0, leading to bad
pointer chasing in intel_engine_breadcrumbs_irq.

A related issue was that if someone started listening for a signal on a
completed but no longer in-flight request, we missed the opportunity to
immediately signal that request.

Furthermore, as intel_contexts may be immediately released during
request retirement, in order to be entirely sure that
intel_engine_breadcrumbs_irq may no longer dereference the intel_context
(ce->signals and ce->signal_link), we must wait for irq spinlock.

In order to prevent the race, we use a bit in the fence.flags to signal
the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
quickly signals to any outside observer that the fence is indeed signaled.

v2: Sketch out potential dma-fence API for manual signaling
v3: And the test_and_set_bit()

Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/dma-buf/dma-fence.c                 |  1 +
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 78 +++++++++++++++------
 drivers/gpu/drm/i915/i915_request.c         |  1 +
 drivers/gpu/drm/i915/intel_guc_submission.c |  1 -
 4 files changed, 59 insertions(+), 22 deletions(-)

Comments

Daniel Vetter May 8, 2019, 12:53 p.m. UTC | #1
On Wed, May 8, 2019 at 2:06 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Currently there is an underlying assumption that i915_request_unsubmit()
> is synchronous wrt the GPU -- that is the request is no longer in flight
> as we remove it. In the near future that may change, and this may upset
> our signaling as we can process an interrupt for that request while it
> is no longer in flight.
>
> CPU0                                    CPU1
> intel_engine_breadcrumbs_irq
> (queue request completion)
>                                         i915_request_cancel_signaling
> ...                                     ...
>                                         i915_request_enable_signaling
> dma_fence_signal
>
> Hence in the time it took us to drop the lock to signal the request, a
> preemption event may have occurred and re-queued the request. In the
> process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
> so reused the rq->signal_link that was in use on CPU0, leading to bad
> pointer chasing in intel_engine_breadcrumbs_irq.
>
> A related issue was that if someone started listening for a signal on a
> completed but no longer in-flight request, we missed the opportunity to
> immediately signal that request.
>
> Furthermore, as intel_contexts may be immediately released during
> request retirement, in order to be entirely sure that
> intel_engine_breadcrumbs_irq may no longer dereference the intel_context
> (ce->signals and ce->signal_link), we must wait for irq spinlock.
>
> In order to prevent the race, we use a bit in the fence.flags to signal
> the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
> For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
> quickly signals to any outside observer that the fence is indeed signaled.
>
> v2: Sketch out potential dma-fence API for manual signaling
> v3: And the test_and_set_bit()
>
> Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Ok chatted a bit with Chris on irc, and we already allow drivers to
pass whatever spinlock is suitable for their request/fence
handling/retiring, so allowing to split up this all makes sense I
think. Has my ack on the approach.

I think it'd be good to spell out the optimization this allows
(synchronous cleanup instead of refcounting all. the. things.) plus
showcase the link between the fence->lock pointer and the split-up
__dma_signal functions in the kerneldoc. Definitely for the core
patch.

Also not sure why __notify and __timestamp has a double underscore in
the middle. That color choice confuses me a bit :-)

Cheers, Daniel

> ---
>  drivers/dma-buf/dma-fence.c                 |  1 +
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 78 +++++++++++++++------
>  drivers/gpu/drm/i915/i915_request.c         |  1 +
>  drivers/gpu/drm/i915/intel_guc_submission.c |  1 -
>  4 files changed, 59 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3aa8733f832a..9bf06042619a 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -29,6 +29,7 @@
>
>  EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
>  EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>
>  static DEFINE_SPINLOCK(dma_fence_stub_lock);
>  static struct dma_fence dma_fence_stub;
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index fe455f01aa65..c092bdf5f0bf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -23,6 +23,7 @@
>   */
>
>  #include <linux/kthread.h>
> +#include <trace/events/dma_fence.h>
>  #include <uapi/linux/sched/types.h>
>
>  #include "i915_drv.h"
> @@ -96,9 +97,39 @@ check_signal_order(struct intel_context *ce, struct i915_request *rq)
>         return true;
>  }
>
> +static bool
> +__dma_fence_signal(struct dma_fence *fence)
> +{
> +       return !test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
> +}
> +
> +static void
> +__dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t timestamp)
> +{
> +       fence->timestamp = timestamp;
> +       set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> +       trace_dma_fence_signaled(fence);
> +}
> +
> +static void
> +__dma_fence_signal__notify(struct dma_fence *fence)
> +{
> +       struct dma_fence_cb *cur, *tmp;
> +
> +       lockdep_assert_held(fence->lock);
> +       lockdep_assert_irqs_disabled();
> +
> +       list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> +               INIT_LIST_HEAD(&cur->node);
> +               cur->func(fence, cur);
> +       }
> +       INIT_LIST_HEAD(&fence->cb_list);
> +}
> +
>  void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>  {
>         struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +       const ktime_t timestamp = ktime_get();
>         struct intel_context *ce, *cn;
>         struct list_head *pos, *next;
>         LIST_HEAD(signal);
> @@ -122,6 +153,10 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>
>                         GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
>                                              &rq->fence.flags));
> +                       clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +
> +                       if (!__dma_fence_signal(&rq->fence))
> +                               continue;
>
>                         /*
>                          * Queue for execution after dropping the signaling
> @@ -129,14 +164,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>                          * more signalers to the same context or engine.
>                          */
>                         i915_request_get(rq);
> -
> -                       /*
> -                        * We may race with direct invocation of
> -                        * dma_fence_signal(), e.g. i915_request_retire(),
> -                        * so we need to acquire our reference to the request
> -                        * before we cancel the breadcrumb.
> -                        */
> -                       clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>                         list_add_tail(&rq->signal_link, &signal);
>                 }
>
> @@ -159,7 +186,12 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>                 struct i915_request *rq =
>                         list_entry(pos, typeof(*rq), signal_link);
>
> -               dma_fence_signal(&rq->fence);
> +               __dma_fence_signal__timestamp(&rq->fence, timestamp);
> +
> +               spin_lock(&rq->lock);
> +               __dma_fence_signal__notify(&rq->fence);
> +               spin_unlock(&rq->lock);
> +
>                 i915_request_put(rq);
>         }
>  }
> @@ -261,19 +293,17 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>
>  bool i915_request_enable_breadcrumb(struct i915_request *rq)
>  {
> -       struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
> -
> -       GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
> +       lockdep_assert_held(&rq->lock);
> +       lockdep_assert_irqs_disabled();
>
> -       if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
> -               return true;
> -
> -       spin_lock(&b->irq_lock);
> -       if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
> -           !__request_completed(rq)) {
> +       if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) {
> +               struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>                 struct intel_context *ce = rq->hw_context;
>                 struct list_head *pos;
>
> +               spin_lock(&b->irq_lock);
> +               GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
> +
>                 __intel_breadcrumbs_arm_irq(b);
>
>                 /*
> @@ -303,8 +333,8 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
>                 GEM_BUG_ON(!check_signal_order(ce, rq));
>
>                 set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +               spin_unlock(&b->irq_lock);
>         }
> -       spin_unlock(&b->irq_lock);
>
>         return !__request_completed(rq);
>  }
> @@ -313,9 +343,15 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
>  {
>         struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>
> -       if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
> -               return;
> +       lockdep_assert_held(&rq->lock);
> +       lockdep_assert_irqs_disabled();
>
> +       /*
> +        * We must wait for b->irq_lock so that we know the interrupt handler
> +        * has released its reference to the intel_context and has completed
> +        * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if
> +        * required).
> +        */
>         spin_lock(&b->irq_lock);
>         if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
>                 struct intel_context *ce = rq->hw_context;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index fa955b7b6def..bed213148cbb 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -437,6 +437,7 @@ void __i915_request_submit(struct i915_request *request)
>         set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>
>         if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
> +           !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &request->fence.flags) &&
>             !i915_request_enable_breadcrumb(request))
>                 intel_engine_queue_breadcrumbs(engine);
>
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 380d83a2bfb6..ea0e3734d37c 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -23,7 +23,6 @@
>   */
>
>  #include <linux/circ_buf.h>
> -#include <trace/events/dma_fence.h>
>
>  #include "gt/intel_engine_pm.h"
>  #include "gt/intel_lrc_reg.h"
> --
> 2.20.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Chris Wilson May 8, 2019, 8:30 p.m. UTC | #2
Quoting Daniel Vetter (2019-05-08 13:53:30)
> On Wed, May 8, 2019 at 2:06 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Currently there is an underlying assumption that i915_request_unsubmit()
> > is synchronous wrt the GPU -- that is the request is no longer in flight
> > as we remove it. In the near future that may change, and this may upset
> > our signaling as we can process an interrupt for that request while it
> > is no longer in flight.
> >
> > CPU0                                    CPU1
> > intel_engine_breadcrumbs_irq
> > (queue request completion)
> >                                         i915_request_cancel_signaling
> > ...                                     ...
> >                                         i915_request_enable_signaling
> > dma_fence_signal
> >
> > Hence in the time it took us to drop the lock to signal the request, a
> > preemption event may have occurred and re-queued the request. In the
> > process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
> > so reused the rq->signal_link that was in use on CPU0, leading to bad
> > pointer chasing in intel_engine_breadcrumbs_irq.
> >
> > A related issue was that if someone started listening for a signal on a
> > completed but no longer in-flight request, we missed the opportunity to
> > immediately signal that request.
> >
> > Furthermore, as intel_contexts may be immediately released during
> > request retirement, in order to be entirely sure that
> > intel_engine_breadcrumbs_irq may no longer dereference the intel_context
> > (ce->signals and ce->signal_link), we must wait for irq spinlock.
> >
> > In order to prevent the race, we use a bit in the fence.flags to signal
> > the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
> > For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
> > quickly signals to any outside observer that the fence is indeed signaled.
> >
> > v2: Sketch out potential dma-fence API for manual signaling
> > v3: And the test_and_set_bit()
> >
> > Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Ok chatted a bit with Chris on irc, and we already allow drivers to
> pass whatever spinlock is suitable for their request/fence
> handling/retiring, so allowing to split up this all makes sense I
> think. Has my ack on the approach.
> 
> I think it'd be good to spell out the optimization this allows
> (synchronous cleanup instead of refcounting all. the. things.) plus
> showcase the link between the fence->lock pointer and the split-up
> __dma_signal functions in the kerneldoc. Definitely for the core
> patch.
> 
> Also not sure why __notify and __timestamp has a double underscore in
> the middle. That color choice confuses me a bit :-)

I like it for subphases. The overall action here is still the dma-fence
'signal', but now we are doing the 'set timestamp', 'notify callbacks'
etc. Otherwise we gain a subject to the verb, 'signal_notify' which says
to go and signal the notify, rather than do the notifications; for me
the '__' breaks the association. Maybe dma_fence_signal_do_notifies.
-Chris
Daniel Vetter May 9, 2019, 7:59 a.m. UTC | #3
On Wed, May 08, 2019 at 09:30:42PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2019-05-08 13:53:30)
> > On Wed, May 8, 2019 at 2:06 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > >
> > > Currently there is an underlying assumption that i915_request_unsubmit()
> > > is synchronous wrt the GPU -- that is the request is no longer in flight
> > > as we remove it. In the near future that may change, and this may upset
> > > our signaling as we can process an interrupt for that request while it
> > > is no longer in flight.
> > >
> > > CPU0                                    CPU1
> > > intel_engine_breadcrumbs_irq
> > > (queue request completion)
> > >                                         i915_request_cancel_signaling
> > > ...                                     ...
> > >                                         i915_request_enable_signaling
> > > dma_fence_signal
> > >
> > > Hence in the time it took us to drop the lock to signal the request, a
> > > preemption event may have occurred and re-queued the request. In the
> > > process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and
> > > so reused the rq->signal_link that was in use on CPU0, leading to bad
> > > pointer chasing in intel_engine_breadcrumbs_irq.
> > >
> > > A related issue was that if someone started listening for a signal on a
> > > completed but no longer in-flight request, we missed the opportunity to
> > > immediately signal that request.
> > >
> > > Furthermore, as intel_contexts may be immediately released during
> > > request retirement, in order to be entirely sure that
> > > intel_engine_breadcrumbs_irq may no longer dereference the intel_context
> > > (ce->signals and ce->signal_link), we must wait for irq spinlock.
> > >
> > > In order to prevent the race, we use a bit in the fence.flags to signal
> > > the transfer onto the signal list inside intel_engine_breadcrumbs_irq.
> > > For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then
> > > quickly signals to any outside observer that the fence is indeed signaled.
> > >
> > > v2: Sketch out potential dma-fence API for manual signaling
> > > v3: And the test_and_set_bit()
> > >
> > > Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Ok chatted a bit with Chris on irc, and we already allow drivers to
> > pass whatever spinlock is suitable for their request/fence
> > handling/retiring, so allowing to split up this all makes sense I
> > think. Has my ack on the approach.
> > 
> > I think it'd be good to spell out the optimization this allows
> > (synchronous cleanup instead of refcounting all. the. things.) plus
> > showcase the link between the fence->lock pointer and the split-up
> > __dma_signal functions in the kerneldoc. Definitely for the core
> > patch.
> > 
> > Also not sure why __notify and __timestamp has a double underscore in
> > the middle. That color choice confuses me a bit :-)
> 
> I like it for subphases. The overall action here is still the dma-fence
> 'signal', but now we are doing the 'set timestamp', 'notify callbacks'
> etc. Otherwise we gain a subject to the verb, 'signal_notify' which says
> to go and signal the notify, rather than do the notifications; for me
> the '__' breaks the association. Maybe dma_fence_signal_do_notifies.

at the cost of 2 more letters in already long function names, I think
_do_step1, _do_step2 are clearer ...

Aside: Could we merge the timestampe and do_notify steps together, maybe
with the spinlock in there? I think materially it doesn't matter whether
we set the timestampe before or in the spinlock protected section, as long
as we don't set it afterwards. And would simplify the interface a bit.
-Daniel
diff mbox series

Patch

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3aa8733f832a..9bf06042619a 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -29,6 +29,7 @@ 
 
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
+EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
 
 static DEFINE_SPINLOCK(dma_fence_stub_lock);
 static struct dma_fence dma_fence_stub;
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index fe455f01aa65..c092bdf5f0bf 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -23,6 +23,7 @@ 
  */
 
 #include <linux/kthread.h>
+#include <trace/events/dma_fence.h>
 #include <uapi/linux/sched/types.h>
 
 #include "i915_drv.h"
@@ -96,9 +97,39 @@  check_signal_order(struct intel_context *ce, struct i915_request *rq)
 	return true;
 }
 
+static bool
+__dma_fence_signal(struct dma_fence *fence)
+{
+	return !test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
+}
+
+static void
+__dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t timestamp)
+{
+	fence->timestamp = timestamp;
+	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
+	trace_dma_fence_signaled(fence);
+}
+
+static void
+__dma_fence_signal__notify(struct dma_fence *fence)
+{
+	struct dma_fence_cb *cur, *tmp;
+
+	lockdep_assert_held(fence->lock);
+	lockdep_assert_irqs_disabled();
+
+	list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
+		INIT_LIST_HEAD(&cur->node);
+		cur->func(fence, cur);
+	}
+	INIT_LIST_HEAD(&fence->cb_list);
+}
+
 void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	const ktime_t timestamp = ktime_get();
 	struct intel_context *ce, *cn;
 	struct list_head *pos, *next;
 	LIST_HEAD(signal);
@@ -122,6 +153,10 @@  void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 
 			GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
 					     &rq->fence.flags));
+			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+
+			if (!__dma_fence_signal(&rq->fence))
+				continue;
 
 			/*
 			 * Queue for execution after dropping the signaling
@@ -129,14 +164,6 @@  void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 			 * more signalers to the same context or engine.
 			 */
 			i915_request_get(rq);
-
-			/*
-			 * We may race with direct invocation of
-			 * dma_fence_signal(), e.g. i915_request_retire(),
-			 * so we need to acquire our reference to the request
-			 * before we cancel the breadcrumb.
-			 */
-			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 			list_add_tail(&rq->signal_link, &signal);
 		}
 
@@ -159,7 +186,12 @@  void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 		struct i915_request *rq =
 			list_entry(pos, typeof(*rq), signal_link);
 
-		dma_fence_signal(&rq->fence);
+		__dma_fence_signal__timestamp(&rq->fence, timestamp);
+
+		spin_lock(&rq->lock);
+		__dma_fence_signal__notify(&rq->fence);
+		spin_unlock(&rq->lock);
+
 		i915_request_put(rq);
 	}
 }
@@ -261,19 +293,17 @@  void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
 {
-	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
-
-	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
+	lockdep_assert_held(&rq->lock);
+	lockdep_assert_irqs_disabled();
 
-	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
-		return true;
-
-	spin_lock(&b->irq_lock);
-	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
-	    !__request_completed(rq)) {
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) {
+		struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 		struct intel_context *ce = rq->hw_context;
 		struct list_head *pos;
 
+		spin_lock(&b->irq_lock);
+		GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
+
 		__intel_breadcrumbs_arm_irq(b);
 
 		/*
@@ -303,8 +333,8 @@  bool i915_request_enable_breadcrumb(struct i915_request *rq)
 		GEM_BUG_ON(!check_signal_order(ce, rq));
 
 		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+		spin_unlock(&b->irq_lock);
 	}
-	spin_unlock(&b->irq_lock);
 
 	return !__request_completed(rq);
 }
@@ -313,9 +343,15 @@  void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
 	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
-		return;
+	lockdep_assert_held(&rq->lock);
+	lockdep_assert_irqs_disabled();
 
+	/*
+	 * We must wait for b->irq_lock so that we know the interrupt handler
+	 * has released its reference to the intel_context and has completed
+	 * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if
+	 * required).
+	 */
 	spin_lock(&b->irq_lock);
 	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
 		struct intel_context *ce = rq->hw_context;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index fa955b7b6def..bed213148cbb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -437,6 +437,7 @@  void __i915_request_submit(struct i915_request *request)
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
+	    !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 380d83a2bfb6..ea0e3734d37c 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -23,7 +23,6 @@ 
  */
 
 #include <linux/circ_buf.h>
-#include <trace/events/dma_fence.h>
 
 #include "gt/intel_engine_pm.h"
 #include "gt/intel_lrc_reg.h"