Message ID | 20180809235852.24516-1-paulo.r.zanoni@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC/CI] drm/i915/icl: account for context save/restore removed bits | expand |
Quoting Paulo Zanoni (2018-08-10 00:58:52) > The RS_CTX_ENABLE and CTX_SAVE_INHIBIT bits are not present on ICL > anymore, but we still try to set them and then check them with > GEM_BUG_ON, resulting in a BUG() call. The bug can be reproduced by > igt/drv_selftest/live_hangcheck/others-priority and our CI was able > to catch it. The machine hangs, which prevents further testing on it. Non-sequitur. Capture the bug report, move on after the panic. How does that prevent testing? What you might note is that CI's pstore is reporting -EIO, that is what has prevented remote debugging of this. > It is worth noticing that commit 05f0addd9b10 ("drm/i915/icl: Enhanced > execution list support") already tried to avoid the save/restore bits > on ICL, but only inside populate_lr_context(). > > TODO: Should we also avoid CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT on ICL > for execlists_init_reg_state()? We already avoid it inside > populate_lr_context(). RESTORE_INHIBIT is still very much required. It's just the SAVE_INHIBIT that they removed for whimsy. > TODO: Shouldn't a new problem surface when we remove these registers? > What should we do to replace the functionality that was provided by > them? Shed a tear. Resource Streamer doesn't exist and would require userspace support for it anyway. The SAVE_INHIBIT is the one that shaves a few cycles on preemption, but is supposed to be replaced by a bit in ELSQ at the expense of forking process_csb/preemption. No one has demonstrated that the cost would be worth it. -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index e5385dbfcdda..bcd0a1758f13 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -541,7 +541,8 @@ static void inject_preempt_context(struct intel_engine_cs *engine) GEM_BUG_ON(execlists->preempt_complete_status != upper_32_bits(ce->lrc_desc)); - GEM_BUG_ON((ce->lrc_reg_state[CTX_CONTEXT_CONTROL + 1] & + GEM_BUG_ON(INTEL_GEN(engine->i915) < 11 && + (ce->lrc_reg_state[CTX_CONTEXT_CONTROL + 1] & _MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT | CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT)) != _MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT | @@ -2569,6 +2570,7 @@ static void execlists_init_reg_state(u32 *regs, struct drm_i915_private *dev_priv = engine->i915; struct i915_hw_ppgtt *ppgtt = ctx->ppgtt ?: dev_priv->mm.aliasing_ppgtt; u32 base = engine->mmio_base; + u32 ctx_sr_ctl; bool rcs = engine->class == RENDER_CLASS; /* A context is actually a big batch buffer with several @@ -2581,10 +2583,12 @@ static void execlists_init_reg_state(u32 *regs, regs[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) | MI_LRI_FORCE_POSTED; + ctx_sr_ctl = CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT; + if (INTEL_GEN(dev_priv) < 11) + ctx_sr_ctl |= CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT | + CTX_CTRL_RS_CTX_ENABLE; CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(engine), - _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT | - CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT | - CTX_CTRL_RS_CTX_ENABLE) | + _MASKED_BIT_DISABLE(ctx_sr_ctl) | _MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH)); CTX_REG(regs, CTX_RING_HEAD, RING_HEAD(base), 0); CTX_REG(regs, CTX_RING_TAIL, RING_TAIL(base), 0);
The RS_CTX_ENABLE and CTX_SAVE_INHIBIT bits are not present on ICL anymore, but we still try to set them and then check them with GEM_BUG_ON, resulting in a BUG() call. The bug can be reproduced by igt/drv_selftest/live_hangcheck/others-priority and our CI was able to catch it. The machine hangs, which prevents further testing on it. It is worth noticing that commit 05f0addd9b10 ("drm/i915/icl: Enhanced execution list support") already tried to avoid the save/restore bits on ICL, but only inside populate_lr_context(). TODO: Should we also avoid CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT on ICL for execlists_init_reg_state()? We already avoid it inside populate_lr_context(). TODO: Shouldn't a new problem surface when we remove these registers? What should we do to replace the functionality that was provided by them? Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Testcase: igt/drv_selftest/live_hangcheck/others-priority Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107399 Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) As you may notice from the TODO comments, my GEM-fu is still not strong yet. Any help on the pending questions would be really appreciated.