Message ID | 1464800848-36672-7-git-send-email-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com: > From: John Harrison <John.C.Harrison@Intel.com> > > The notify function can be called many times without the seqno > changing. Some are to prevent races due to the requirement of not > enabling interrupts until requested. However, when interrupts are > enabled the IRQ handler can be called multiple times without the > ring's seqno value changing. E.g. two interrupts are generated by > batch buffers completing in quick succession, the first call to the > handler processes both completions but the handler still gets executed > a second time. This patch reduces the overhead of these extra calls by > caching the last processed seqno value and early exiting if it has not > changed. How significant is this overhead? Patch looks reasonable otherwise. ~Maarten
On 07/06/2016 13:47, Maarten Lankhorst wrote: > Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com: >> From: John Harrison <John.C.Harrison@Intel.com> >> >> The notify function can be called many times without the seqno >> changing. Some are to prevent races due to the requirement of not >> enabling interrupts until requested. However, when interrupts are >> enabled the IRQ handler can be called multiple times without the >> ring's seqno value changing. E.g. two interrupts are generated by >> batch buffers completing in quick succession, the first call to the >> handler processes both completions but the handler still gets executed >> a second time. This patch reduces the overhead of these extra calls by >> caching the last processed seqno value and early exiting if it has not >> changed. > How significant is this overhead? Doing the cache check hits the early exit approx 98% of the time when running GLBenchmark. Although the vast majority of duplicate calls are from having to call the notify function from i915_gem_retire_requests_ring() and that being called at least once for every execbuf IOCTL (possibly multiple times). I have just made a couple of tweaks to further reduce the number of these calls and their impact, but there are still a lot of them. > > Patch looks reasonable otherwise. > > ~Maarten
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index a8b4887..67f65f8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1368,6 +1368,7 @@ out: * request has not actually been fully processed yet. */ spin_lock_irq(&req->engine->fence_lock); + req->engine->last_irq_seqno = 0; i915_gem_request_notify(req->engine, true); spin_unlock_irq(&req->engine->fence_lock); } @@ -2599,9 +2600,12 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno) i915_gem_retire_requests(dev_priv); /* Finally reset hw state */ - for_each_engine(engine, dev_priv) + for_each_engine(engine, dev_priv) { intel_ring_init_seqno(engine, seqno); + engine->last_irq_seqno = 0; + } + return 0; } @@ -2933,13 +2937,24 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked) return; } - if (!fence_locked) - spin_lock_irqsave(&engine->fence_lock, flags); - + /* + * Check for a new seqno. If it hasn't actually changed then early + * exit without even grabbing the spinlock. Note that this is safe + * because any corruption of last_irq_seqno merely results in doing + * the full processing when there is potentially no work to be done. + * It can never lead to not processing work that does need to happen. + */ if (engine->irq_seqno_barrier) engine->irq_seqno_barrier(engine); seqno = engine->get_seqno(engine); trace_i915_gem_request_notify(engine, seqno); + if (seqno == engine->last_irq_seqno) + return; + + if (!fence_locked) + spin_lock_irqsave(&engine->fence_lock, flags); + + engine->last_irq_seqno = seqno; list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) { if (!req->cancelled) { @@ -3234,7 +3249,10 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv, * Tidy up anything left over. This includes a call to * i915_gem_request_notify() which will make sure that any requests * that were on the signal pending list get also cleaned up. + * NB: The seqno cache must be cleared otherwise the notify call will + * simply return immediately. */ + engine->last_irq_seqno = 0; i915_gem_retire_requests_ring(engine); /* Having flushed all requests from all queues, we know that all diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 51779b4..90de84e 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -348,6 +348,7 @@ struct intel_engine_cs { spinlock_t fence_lock; struct list_head fence_signal_list; + uint32_t last_irq_seqno; struct work_struct request_work; };