Message ID | 1467372140-30422-3-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 01/07/16 12:22, Chris Wilson wrote: > We can forgo queuing the hangcheck from the start of every request to > until we wait upon a request. This reduces the overhead of every > request, but may increase the latency of detecting a hang. Howeever, if > nothing every waits upon a hang, did it ever hang? It also improves the > robustness of the wait-request by ensuring that the hangchecker is > indeed running before we sleep indefinitely (and thereby ensuring that > we never actually sleep forever waiting for a dead GPU). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_gem.c | 9 +++++---- > drivers/gpu/drm/i915/i915_irq.c | 10 ++++------ > 2 files changed, 9 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 1d9878258103..34f724cc40b8 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1532,6 +1532,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > break; > } > > + /* Ensure that even if the GPU hangs, we get woken up. */ > + i915_queue_hangcheck(dev_priv); > + > timer.function = NULL; > if (timeout || missed_irq(dev_priv, engine)) { > unsigned long expire; > @@ -2919,8 +2922,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, > /* Not allowed to fail! */ > WARN(ret, "emit|add_request failed: %d!\n", ret); > > - i915_queue_hangcheck(engine->i915); > - > queue_delayed_work(dev_priv->wq, > &dev_priv->mm.retire_work, > round_jiffies_up_relative(HZ)); > @@ -3264,8 +3265,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv) > > if (idle) > mod_delayed_work(dev_priv->wq, > - &dev_priv->mm.idle_work, > - msecs_to_jiffies(100)); > + &dev_priv->mm.idle_work, > + msecs_to_jiffies(100)); > > return idle; > } > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 4378a659d962..5614582ca240 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > intel_uncore_arm_unclaimed_mmio_detection(dev_priv); > > for_each_engine_id(engine, dev_priv, id) { > + bool busy = waitqueue_active(&engine->irq_queue); > u64 acthd; > u32 seqno; > unsigned user_interrupts; > - bool busy = true; > > semaphore_clear_deadlocks(dev_priv); > > @@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > if (engine->hangcheck.seqno == seqno) { > if (ring_idle(engine, seqno)) { > engine->hangcheck.action = HANGCHECK_IDLE; > - if (waitqueue_active(&engine->irq_queue)) {, the > + if (busy) { > /* Safeguard against driver failure */ > user_interrupts = kick_waiters(engine); > engine->hangcheck.score += BUSY; > - } else > - busy = false; > + } > } else { > /* We always increment the hangcheck score > * if the ring is busy and still processing > @@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > goto out; > } > > + /* Reset timer in case GPU hangs without another request being added */ > if (busy_count) > - /* Reset timer case chip hangs without another request > - * being added */ > i915_queue_hangcheck(dev_priv); > > out: > I thought I see a problem here but I was just confused. I think it is OK. Just won't re-queue the hangcheck if no one is waiting and no new requests get submitted. It is unlikely that would cause a problem in practice. It sounds very unlucky that the last submitted request ever hangs. Balance with the benefit of not running while GPU is processing stuff I think we can give it a go. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1d9878258103..34f724cc40b8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1532,6 +1532,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } + /* Ensure that even if the GPU hangs, we get woken up. */ + i915_queue_hangcheck(dev_priv); + timer.function = NULL; if (timeout || missed_irq(dev_priv, engine)) { unsigned long expire; @@ -2919,8 +2922,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, /* Not allowed to fail! */ WARN(ret, "emit|add_request failed: %d!\n", ret); - i915_queue_hangcheck(engine->i915); - queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, round_jiffies_up_relative(HZ)); @@ -3264,8 +3265,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv) if (idle) mod_delayed_work(dev_priv->wq, - &dev_priv->mm.idle_work, - msecs_to_jiffies(100)); + &dev_priv->mm.idle_work, + msecs_to_jiffies(100)); return idle; } diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 4378a659d962..5614582ca240 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work) intel_uncore_arm_unclaimed_mmio_detection(dev_priv); for_each_engine_id(engine, dev_priv, id) { + bool busy = waitqueue_active(&engine->irq_queue); u64 acthd; u32 seqno; unsigned user_interrupts; - bool busy = true; semaphore_clear_deadlocks(dev_priv); @@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) if (engine->hangcheck.seqno == seqno) { if (ring_idle(engine, seqno)) { engine->hangcheck.action = HANGCHECK_IDLE; - if (waitqueue_active(&engine->irq_queue)) { + if (busy) { /* Safeguard against driver failure */ user_interrupts = kick_waiters(engine); engine->hangcheck.score += BUSY; - } else - busy = false; + } } else { /* We always increment the hangcheck score * if the ring is busy and still processing @@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work) goto out; } + /* Reset timer in case GPU hangs without another request being added */ if (busy_count) - /* Reset timer case chip hangs without another request - * being added */ i915_queue_hangcheck(dev_priv); out:
We can forgo queuing the hangcheck from the start of every request to until we wait upon a request. This reduces the overhead of every request, but may increase the latency of detecting a hang. Howeever, if nothing every waits upon a hang, did it ever hang? It also improves the robustness of the wait-request by ensuring that the hangchecker is indeed running before we sleep indefinitely (and thereby ensuring that we never actually sleep forever waiting for a dead GPU). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem.c | 9 +++++---- drivers/gpu/drm/i915/i915_irq.c | 10 ++++------ 2 files changed, 9 insertions(+), 10 deletions(-)