Message ID | 1463087196-11688-5-git-send-email-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 12, 2016 at 10:06:34PM +0100, John.C.Harrison@Intel.com wrote: > +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) > +{ > + struct drm_i915_gem_request *req, *req_next; > + unsigned long flags; > u32 seqno; > > - seqno = req->engine->get_seqno(req->engine); > + if (list_empty(&engine->fence_signal_list)) > + return; > + > + if (!fence_locked) > + spin_lock_irqsave(&engine->fence_lock, flags); > + > + if (engine->irq_seqno_barrier) > + engine->irq_seqno_barrier(engine); > + seqno = engine->get_seqno(engine); > + > + list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) { NO, NO, NO. As we said the very first time, you cannot do this from an irq handler. The current code is already bad enough, this is making it large constant + N times worse. Please do look at how to do signal driven fences in O(1) that I posted many months ago and several times since. -Chris
On 13/05/2016 08:27, Chris Wilson wrote: > On Thu, May 12, 2016 at 10:06:34PM +0100, John.C.Harrison@Intel.com wrote: >> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) >> +{ >> + struct drm_i915_gem_request *req, *req_next; >> + unsigned long flags; >> u32 seqno; >> >> - seqno = req->engine->get_seqno(req->engine); >> + if (list_empty(&engine->fence_signal_list)) >> + return; >> + >> + if (!fence_locked) >> + spin_lock_irqsave(&engine->fence_lock, flags); >> + >> + if (engine->irq_seqno_barrier) >> + engine->irq_seqno_barrier(engine); >> + seqno = engine->get_seqno(engine); >> + >> + list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) { > NO, NO, NO. As we said the very first time, you cannot do this from an > irq handler. > > The current code is already bad enough, this is making it large constant > + N times worse. Please do look at how to do signal driven fences in O(1) > that I posted many months ago and several times since. If you have a better solution available then please post it in a form that can be merged and get it reviewed and accepted. > -Chris >
Op 12-05-16 om 23:06 schreef John.C.Harrison@Intel.com: > From: John Harrison <John.C.Harrison@Intel.com> > > The intended usage model for struct fence is that the signalled status > should be set on demand rather than polled. That is, there should not > be a need for a 'signaled' function to be called everytime the status > is queried. Instead, 'something' should be done to enable a signal > callback from the hardware which will update the state directly. In > the case of requests, this is the seqno update interrupt. The idea is > that this callback will only be enabled on demand when something > actually tries to wait on the fence. > > This change removes the polling test and replaces it with the callback > scheme. Each fence is added to a 'please poke me' list at the start of > i915_add_request(). The interrupt handler then scans through the 'poke > me' list when a new seqno pops out and signals any matching > fence/request. The fence is then removed from the list so the entire > request stack does not need to be scanned every time. Note that the > fence is added to the list before the commands to generate the seqno > interrupt are added to the ring. Thus the sequence is guaranteed to be > race free if the interrupt is already enabled. > > Note that the interrupt is only enabled on demand (i.e. when > __wait_request() is called). Thus there is still a potential race when > enabling the interrupt as the request may already have completed. > However, this is simply solved by calling the interrupt processing > code immediately after enabling the interrupt and thereby checking for > already completed requests. > > Lastly, the ring clean up code has the possibility to cancel > outstanding requests (e.g. because TDR has reset the ring). These > requests will never get signalled and so must be removed from the > signal list manually. This is done by setting a 'cancelled' flag and > then calling the regular notify/retire code path rather than > attempting to duplicate the list manipulatation and clean up code in > multiple places. This also avoid any race condition where the > cancellation request might occur after/during the completion interrupt > actually arriving. > > v2: Updated to take advantage of the request unreference no longer > requiring the mutex lock. > > v3: Move the signal list processing around to prevent unsubmitted > requests being added to the list. This was occurring on Android > because the native sync implementation calls the > fence->enable_signalling API immediately on fence creation. > > Updated after review comments by Tvrtko Ursulin. Renamed list nodes to > 'link' instead of 'list'. Added support for returning an error code on > a cancelled fence. Update list processing to be more efficient/safer > with respect to spinlocks. > > v5: Made i915_gem_request_submit a static as it is only ever called > from one place. > > Fixed up the low latency wait optimisation. The time delay between the > seqno value being to memory and the drive's ISR running can be > significant, at least for the wait request micro-benchmark. This can > be greatly improved by explicitly checking for seqno updates in the > pre-wait busy poll loop. Also added some documentation comments to the > busy poll code. > > Fixed up support for the faking of lost interrupts > (test_irq_rings/missed_irq_rings). That is, there is an IGT test that > tells the driver to loose interrupts deliberately and then check that > everything still works as expected (albeit much slower). > > Updates from review comments: use non IRQ-save spinlocking, early exit > on WARN and improved comments (Tvrtko Ursulin). > > v6: Updated to newer nigthly and resolved conflicts around the > wait_request busy spin optimisation. Also fixed a race condition > between this early exit path and the regular completion path. > > v7: Updated to newer nightly - lots of ring -> engine renaming plus an > interface change on get_seqno(). Also added a list_empty() check > before acquring spinlocks and doing list processing. > > v8: Updated to newer nightly - changes to request clean up code mean > non of the deferred free mess is needed any more. > > For: VIZ-5190 > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 7 + > drivers/gpu/drm/i915/i915_gem.c | 235 +++++++++++++++++++++++++++++--- > drivers/gpu/drm/i915/i915_irq.c | 2 + > drivers/gpu/drm/i915/intel_lrc.c | 1 + > drivers/gpu/drm/i915/intel_ringbuffer.c | 1 + > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > 6 files changed, 225 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 433d99a..3adac59 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2278,6 +2278,10 @@ struct drm_i915_gem_request { > */ > struct fence fence; > struct rcu_head rcu_head; > + struct list_head signal_link; > + bool cancelled; > + bool irq_enabled; > + bool signal_requested; > > /** On Which ring this request was generated */ > struct drm_i915_private *i915; > @@ -2379,6 +2383,9 @@ struct drm_i915_gem_request { > struct drm_i915_gem_request * __must_check > i915_gem_request_alloc(struct intel_engine_cs *engine, > struct intel_context *ctx); > +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, > + bool fence_locked); > +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked); > > static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) > { > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 958edcb..6669508 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -39,6 +39,8 @@ > #include <linux/pci.h> > #include <linux/dma-buf.h> > > +static void i915_gem_request_submit(struct drm_i915_gem_request *req); > + > static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj); > static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj); > static void > @@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > struct intel_engine_cs *engine = i915_gem_request_get_engine(req); > struct drm_device *dev = engine->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > - const bool irq_test_in_progress = > - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine); > int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; > + uint32_t seqno; > DEFINE_WAIT(wait); > unsigned long timeout_expire; > s64 before = 0; /* Only to silence a compiler warning. */ > @@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > > WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled"); > > - if (list_empty(&req->list)) > - return 0; > - > if (i915_gem_request_completed(req)) > return 0; > > @@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > trace_i915_gem_request_wait_begin(req); > > /* Optimistic spin for the next jiffie before touching IRQs */ > - ret = __i915_spin_request(req, state); > - if (ret == 0) > - goto out; > - > - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) { > - ret = -ENODEV; > - goto out; > + if (req->seqno) { > + ret = __i915_spin_request(req, state); > + if (ret == 0) > + goto out; > } > > + /* > + * Enable interrupt completion of the request. > + */ > + fence_enable_sw_signaling(&req->fence); > + > for (;;) { > struct timer_list timer; > > @@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > break; > } > > + if (req->seqno) { > + /* > + * There is quite a lot of latency in the user interrupt > + * path. So do an explicit seqno check and potentially > + * remove all that delay. > + */ > + if (req->engine->irq_seqno_barrier) > + req->engine->irq_seqno_barrier(req->engine); > + seqno = engine->get_seqno(engine); > + if (i915_seqno_passed(seqno, req->seqno)) { > + ret = 0; > + break; > + } > + } > + > if (signal_pending_state(state, current)) { > ret = -ERESTARTSYS; > break; > @@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > destroy_timer_on_stack(&timer); > } > } > - if (!irq_test_in_progress) > - engine->irq_put(engine); > > finish_wait(&engine->irq_queue, &wait); > > out: > trace_i915_gem_request_wait_end(req); > > + if ((ret == 0) && (req->seqno)) { > + if (req->engine->irq_seqno_barrier) > + req->engine->irq_seqno_barrier(req->engine); > + seqno = engine->get_seqno(engine); > + if (i915_seqno_passed(seqno, req->seqno) && > + !i915_gem_request_completed(req)) { > + /* > + * Make sure the request is marked as completed before > + * returning. NB: Need to acquire the spinlock around > + * the whole call to avoid a race condition with the > + * interrupt handler is running concurrently and could > + * cause this invocation to early exit even though the > + * request has not actually been fully processed yet. > + */ > + spin_lock_irq(&req->engine->fence_lock); > + i915_gem_request_notify(req->engine, true); > + spin_unlock_irq(&req->engine->fence_lock); > + } > + } > + > if (timeout) { > s64 tres = *timeout - (ktime_get_raw_ns() - before); > > @@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) > { > trace_i915_gem_request_retire(request); > > + if (request->irq_enabled) { > + request->engine->irq_put(request->engine); > + request->irq_enabled = false; > + } > + > /* We know the GPU must have read the request to have > * sent us the seqno + interrupt, so use the position > * of tail of the request to update the last known position > @@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) > list_del_init(&request->list); > i915_gem_request_remove_from_client(request); > > + /* > + * In case the request is still in the signal pending list, > + * e.g. due to being cancelled by TDR, preemption, etc. > + */ > + if (!list_empty(&request->signal_link)) { > + /* > + * The request must be marked as cancelled and the underlying > + * fence as failed. NB: There is no explicit fence fail API, > + * there is only a manual poke and signal. > + */ > + request->cancelled = true; > + /* How to propagate to any associated sync_fence??? */ > + request->fence.status = -EIO; > + fence_signal_locked(&request->fence); > + } > + > if (request->previous_context) { > if (i915.enable_execlists) > intel_lr_context_unpin(request->previous_context, > @@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, > */ > request->postfix = intel_ring_get_tail(ringbuf); > > + /* > + * Add the fence to the pending list before emitting the commands to > + * generate a seqno notification interrupt. > + */ > + i915_gem_request_submit(request); > + > if (i915.enable_execlists) > ret = engine->emit_request(request); > else { > @@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence) > struct drm_i915_gem_request *req; > > req = container_of(req_fence, typeof(*req), fence); > + > + WARN_ON(req->irq_enabled); > + > call_rcu(&req->rcu_head, i915_gem_request_free_rcu); > } > > -static bool i915_gem_request_enable_signaling(struct fence *req_fence) > +/* > + * The request is about to be submitted to the hardware so add the fence to > + * the list of signalable fences. > + * > + * NB: This does not necessarily enable interrupts yet. That only occurs on > + * demand when the request is actually waited on. However, adding it to the > + * list early ensures that there is no race condition where the interrupt > + * could pop out prematurely and thus be completely lost. The race is merely > + * that the interrupt must be manually checked for after being enabled. > + */ > +static void i915_gem_request_submit(struct drm_i915_gem_request *req) > { > - /* Interrupt driven fences are not implemented yet.*/ > - WARN(true, "This should not be called!"); > - return true; > + /* > + * Always enable signal processing for the request's fence object > + * before that request is submitted to the hardware. Thus there is no > + * race condition whereby the interrupt could pop out before the > + * request has been added to the signal list. Hence no need to check > + * for completion, undo the list add and return false. > + */ > + i915_gem_request_reference(req); > + spin_lock_irq(&req->engine->fence_lock); > + WARN_ON(!list_empty(&req->signal_link)); > + list_add_tail(&req->signal_link, &req->engine->fence_signal_list); > + spin_unlock_irq(&req->engine->fence_lock); > + > + /* > + * NB: Interrupts are only enabled on demand. Thus there is still a > + * race where the request could complete before the interrupt has > + * been enabled. Thus care must be taken at that point. > + */ > + > + /* Have interrupts already been requested? */ > + if (req->signal_requested) > + i915_gem_request_enable_interrupt(req, false); > +} > + > +/* > + * The request is being actively waited on, so enable interrupt based > + * completion signalling. > + */ > +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, > + bool fence_locked) > +{ > + struct drm_i915_private *dev_priv = to_i915(req->engine->dev); > + const bool irq_test_in_progress = > + ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & > + intel_engine_flag(req->engine); > + > + if (req->irq_enabled) > + return; > + > + if (irq_test_in_progress) > + return; > + > + if (!WARN_ON(!req->engine->irq_get(req->engine))) > + req->irq_enabled = true; > + > + /* > + * Because the interrupt is only enabled on demand, there is a race > + * where the interrupt can fire before anyone is looking for it. So > + * do an explicit check for missed interrupts. > + */ > + i915_gem_request_notify(req->engine, fence_locked); > } > > -static bool i915_gem_request_is_completed(struct fence *req_fence) > +static bool i915_gem_request_enable_signaling(struct fence *req_fence) > { > struct drm_i915_gem_request *req = container_of(req_fence, > typeof(*req), fence); > + > + /* > + * No need to actually enable interrupt based processing until the > + * request has been submitted to the hardware. At which point > + * 'i915_gem_request_submit()' is called. So only really enable > + * signalling in there. Just set a flag to say that interrupts are > + * wanted when the request is eventually submitted. On the other hand > + * if the request has already been submitted then interrupts do need > + * to be enabled now. > + */ > + > + req->signal_requested = true; > + > + if (!list_empty(&req->signal_link)) > + i915_gem_request_enable_interrupt(req, true); > + > + return true; > +} > + > +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) > +{ > + struct drm_i915_gem_request *req, *req_next; > + unsigned long flags; > u32 seqno; > > - seqno = req->engine->get_seqno(req->engine); > + if (list_empty(&engine->fence_signal_list)) > + return; > + If you don't hold the lock then you should probably use list_empty_careful. ~Maarten
On Fri, May 13, 2016 at 10:19:11AM +0100, John Harrison wrote: > On 13/05/2016 08:27, Chris Wilson wrote: > >On Thu, May 12, 2016 at 10:06:34PM +0100, John.C.Harrison@Intel.com wrote: > >>+void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) > >>+{ > >>+ struct drm_i915_gem_request *req, *req_next; > >>+ unsigned long flags; > >> u32 seqno; > >>- seqno = req->engine->get_seqno(req->engine); > >>+ if (list_empty(&engine->fence_signal_list)) > >>+ return; > >>+ > >>+ if (!fence_locked) > >>+ spin_lock_irqsave(&engine->fence_lock, flags); > >>+ > >>+ if (engine->irq_seqno_barrier) > >>+ engine->irq_seqno_barrier(engine); > >>+ seqno = engine->get_seqno(engine); > >>+ > >>+ list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) { > >NO, NO, NO. As we said the very first time, you cannot do this from an > >irq handler. > > > >The current code is already bad enough, this is making it large constant > >+ N times worse. Please do look at how to do signal driven fences in O(1) > >that I posted many months ago and several times since. > > If you have a better solution available then please post it in a > form that can be merged and get it reviewed and accepted. Recently reposted as https://lists.freedesktop.org/archives/intel-gfx/2016-May/095040.html It is currently blocking getting request + vma tracking overhauled to fix our leaks, as well as the regression fixes that depend upon them. -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 433d99a..3adac59 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2278,6 +2278,10 @@ struct drm_i915_gem_request { */ struct fence fence; struct rcu_head rcu_head; + struct list_head signal_link; + bool cancelled; + bool irq_enabled; + bool signal_requested; /** On Which ring this request was generated */ struct drm_i915_private *i915; @@ -2379,6 +2383,9 @@ struct drm_i915_gem_request { struct drm_i915_gem_request * __must_check i915_gem_request_alloc(struct intel_engine_cs *engine, struct intel_context *ctx); +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, + bool fence_locked); +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked); static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req) { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 958edcb..6669508 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -39,6 +39,8 @@ #include <linux/pci.h> #include <linux/dma-buf.h> +static void i915_gem_request_submit(struct drm_i915_gem_request *req); + static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj); static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj); static void @@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req, struct intel_engine_cs *engine = i915_gem_request_get_engine(req); struct drm_device *dev = engine->dev; struct drm_i915_private *dev_priv = dev->dev_private; - const bool irq_test_in_progress = - ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine); int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; + uint32_t seqno; DEFINE_WAIT(wait); unsigned long timeout_expire; s64 before = 0; /* Only to silence a compiler warning. */ @@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req, WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled"); - if (list_empty(&req->list)) - return 0; - if (i915_gem_request_completed(req)) return 0; @@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req, trace_i915_gem_request_wait_begin(req); /* Optimistic spin for the next jiffie before touching IRQs */ - ret = __i915_spin_request(req, state); - if (ret == 0) - goto out; - - if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) { - ret = -ENODEV; - goto out; + if (req->seqno) { + ret = __i915_spin_request(req, state); + if (ret == 0) + goto out; } + /* + * Enable interrupt completion of the request. + */ + fence_enable_sw_signaling(&req->fence); + for (;;) { struct timer_list timer; @@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } + if (req->seqno) { + /* + * There is quite a lot of latency in the user interrupt + * path. So do an explicit seqno check and potentially + * remove all that delay. + */ + if (req->engine->irq_seqno_barrier) + req->engine->irq_seqno_barrier(req->engine); + seqno = engine->get_seqno(engine); + if (i915_seqno_passed(seqno, req->seqno)) { + ret = 0; + break; + } + } + if (signal_pending_state(state, current)) { ret = -ERESTARTSYS; break; @@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req, destroy_timer_on_stack(&timer); } } - if (!irq_test_in_progress) - engine->irq_put(engine); finish_wait(&engine->irq_queue, &wait); out: trace_i915_gem_request_wait_end(req); + if ((ret == 0) && (req->seqno)) { + if (req->engine->irq_seqno_barrier) + req->engine->irq_seqno_barrier(req->engine); + seqno = engine->get_seqno(engine); + if (i915_seqno_passed(seqno, req->seqno) && + !i915_gem_request_completed(req)) { + /* + * Make sure the request is marked as completed before + * returning. NB: Need to acquire the spinlock around + * the whole call to avoid a race condition with the + * interrupt handler is running concurrently and could + * cause this invocation to early exit even though the + * request has not actually been fully processed yet. + */ + spin_lock_irq(&req->engine->fence_lock); + i915_gem_request_notify(req->engine, true); + spin_unlock_irq(&req->engine->fence_lock); + } + } + if (timeout) { s64 tres = *timeout - (ktime_get_raw_ns() - before); @@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) { trace_i915_gem_request_retire(request); + if (request->irq_enabled) { + request->engine->irq_put(request->engine); + request->irq_enabled = false; + } + /* We know the GPU must have read the request to have * sent us the seqno + interrupt, so use the position * of tail of the request to update the last known position @@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) list_del_init(&request->list); i915_gem_request_remove_from_client(request); + /* + * In case the request is still in the signal pending list, + * e.g. due to being cancelled by TDR, preemption, etc. + */ + if (!list_empty(&request->signal_link)) { + /* + * The request must be marked as cancelled and the underlying + * fence as failed. NB: There is no explicit fence fail API, + * there is only a manual poke and signal. + */ + request->cancelled = true; + /* How to propagate to any associated sync_fence??? */ + request->fence.status = -EIO; + fence_signal_locked(&request->fence); + } + if (request->previous_context) { if (i915.enable_execlists) intel_lr_context_unpin(request->previous_context, @@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request, */ request->postfix = intel_ring_get_tail(ringbuf); + /* + * Add the fence to the pending list before emitting the commands to + * generate a seqno notification interrupt. + */ + i915_gem_request_submit(request); + if (i915.enable_execlists) ret = engine->emit_request(request); else { @@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence) struct drm_i915_gem_request *req; req = container_of(req_fence, typeof(*req), fence); + + WARN_ON(req->irq_enabled); + call_rcu(&req->rcu_head, i915_gem_request_free_rcu); } -static bool i915_gem_request_enable_signaling(struct fence *req_fence) +/* + * The request is about to be submitted to the hardware so add the fence to + * the list of signalable fences. + * + * NB: This does not necessarily enable interrupts yet. That only occurs on + * demand when the request is actually waited on. However, adding it to the + * list early ensures that there is no race condition where the interrupt + * could pop out prematurely and thus be completely lost. The race is merely + * that the interrupt must be manually checked for after being enabled. + */ +static void i915_gem_request_submit(struct drm_i915_gem_request *req) { - /* Interrupt driven fences are not implemented yet.*/ - WARN(true, "This should not be called!"); - return true; + /* + * Always enable signal processing for the request's fence object + * before that request is submitted to the hardware. Thus there is no + * race condition whereby the interrupt could pop out before the + * request has been added to the signal list. Hence no need to check + * for completion, undo the list add and return false. + */ + i915_gem_request_reference(req); + spin_lock_irq(&req->engine->fence_lock); + WARN_ON(!list_empty(&req->signal_link)); + list_add_tail(&req->signal_link, &req->engine->fence_signal_list); + spin_unlock_irq(&req->engine->fence_lock); + + /* + * NB: Interrupts are only enabled on demand. Thus there is still a + * race where the request could complete before the interrupt has + * been enabled. Thus care must be taken at that point. + */ + + /* Have interrupts already been requested? */ + if (req->signal_requested) + i915_gem_request_enable_interrupt(req, false); +} + +/* + * The request is being actively waited on, so enable interrupt based + * completion signalling. + */ +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req, + bool fence_locked) +{ + struct drm_i915_private *dev_priv = to_i915(req->engine->dev); + const bool irq_test_in_progress = + ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & + intel_engine_flag(req->engine); + + if (req->irq_enabled) + return; + + if (irq_test_in_progress) + return; + + if (!WARN_ON(!req->engine->irq_get(req->engine))) + req->irq_enabled = true; + + /* + * Because the interrupt is only enabled on demand, there is a race + * where the interrupt can fire before anyone is looking for it. So + * do an explicit check for missed interrupts. + */ + i915_gem_request_notify(req->engine, fence_locked); } -static bool i915_gem_request_is_completed(struct fence *req_fence) +static bool i915_gem_request_enable_signaling(struct fence *req_fence) { struct drm_i915_gem_request *req = container_of(req_fence, typeof(*req), fence); + + /* + * No need to actually enable interrupt based processing until the + * request has been submitted to the hardware. At which point + * 'i915_gem_request_submit()' is called. So only really enable + * signalling in there. Just set a flag to say that interrupts are + * wanted when the request is eventually submitted. On the other hand + * if the request has already been submitted then interrupts do need + * to be enabled now. + */ + + req->signal_requested = true; + + if (!list_empty(&req->signal_link)) + i915_gem_request_enable_interrupt(req, true); + + return true; +} + +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) +{ + struct drm_i915_gem_request *req, *req_next; + unsigned long flags; u32 seqno; - seqno = req->engine->get_seqno(req->engine); + if (list_empty(&engine->fence_signal_list)) + return; + + if (!fence_locked) + spin_lock_irqsave(&engine->fence_lock, flags); + + if (engine->irq_seqno_barrier) + engine->irq_seqno_barrier(engine); + seqno = engine->get_seqno(engine); + + list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) { + if (!req->cancelled) { + if (!i915_seqno_passed(seqno, req->seqno)) + break; + } + + /* + * Start by removing the fence from the signal list otherwise + * the retire code can run concurrently and get confused. + */ + list_del_init(&req->signal_link); - return i915_seqno_passed(seqno, req->seqno); + if (!req->cancelled) + fence_signal_locked(&req->fence); + + if (req->irq_enabled) { + req->engine->irq_put(req->engine); + req->irq_enabled = false; + } + + i915_gem_request_unreference(req); + } + + if (!fence_locked) + spin_unlock_irqrestore(&engine->fence_lock, flags); } static const char *i915_gem_request_get_driver_name(struct fence *req_fence) @@ -2796,7 +2972,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence, static const struct fence_ops i915_gem_request_fops = { .enable_signaling = i915_gem_request_enable_signaling, - .signaled = i915_gem_request_is_completed, .wait = fence_default_wait, .release = i915_gem_request_free, .get_driver_name = i915_gem_request_get_driver_name, @@ -2882,6 +3057,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine, req->ctx = ctx; i915_gem_context_reference(req->ctx); + INIT_LIST_HEAD(&req->signal_link); fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock, ctx->engine[engine->id].fence_timeline.fence_context, i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline)); @@ -3016,6 +3192,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv, i915_gem_request_retire(request); } + /* + * Tidy up anything left over. This includes a call to + * i915_gem_request_notify() which will make sure that any requests + * that were on the signal pending list get also cleaned up. + */ + i915_gem_retire_requests_ring(engine); + /* Having flushed all requests from all queues, we know that all * ringbuffers must now be empty. However, since we do not reclaim * all space when retiring the request (to prevent HEADs colliding @@ -3062,6 +3245,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine) { WARN_ON(i915_verify_lists(engine->dev)); + /* + * If no-one has waited on a request recently then interrupts will + * not have been enabled and thus no requests will ever be marked as + * completed. So do an interrupt check now. + */ + i915_gem_request_notify(engine, false); + /* Retire requests first as we use it above for the early return. * If we retire requests last, we may use a later seqno and so clear * the requests lists without clearing the active list, leading to @@ -5098,6 +5288,7 @@ init_engine_lists(struct intel_engine_cs *engine) { INIT_LIST_HEAD(&engine->active_list); INIT_LIST_HEAD(&engine->request_list); + INIT_LIST_HEAD(&engine->fence_signal_list); } void diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 7a07987..7cdd076 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1002,6 +1002,8 @@ static void notify_ring(struct intel_engine_cs *engine) trace_i915_gem_request_notify(engine); engine->user_interrupts++; + i915_gem_request_notify(engine, false); + wake_up_all(&engine->irq_queue); } diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 199aa6e..7858521 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1975,6 +1975,7 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine) engine->dev = dev; INIT_LIST_HEAD(&engine->active_list); INIT_LIST_HEAD(&engine->request_list); + INIT_LIST_HEAD(&engine->fence_signal_list); spin_lock_init(&engine->fence_lock); i915_gem_batch_pool_init(dev, &engine->batch_pool); init_waitqueue_head(&engine->irq_queue); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 671e49c..a2cfd10 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2287,6 +2287,7 @@ static int intel_init_ring_buffer(struct drm_device *dev, INIT_LIST_HEAD(&engine->request_list); INIT_LIST_HEAD(&engine->execlist_queue); INIT_LIST_HEAD(&engine->buffers); + INIT_LIST_HEAD(&engine->fence_signal_list); spin_lock_init(&engine->fence_lock); i915_gem_batch_pool_init(dev, &engine->batch_pool); memset(engine->semaphore.sync_seqno, 0, diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 510ed58..113646c 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -347,6 +347,7 @@ struct intel_engine_cs { u32 (*get_cmd_length_mask)(u32 cmd_header); spinlock_t fence_lock; + struct list_head fence_signal_list; }; static inline bool