Message ID | gtrmxhovj2qpmcica76f7uxharhztbt7fdoromyxbsd7ltjvuq@lhnv2wcxm7or (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | drm/i915/gt: Ensure irqs' status does not change with spin_unlock | expand |
Hi Maciej, > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 12f1ba7ca9c1..e9102f7246f5 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -4338,10 +4338,11 @@ static void guc_bump_inflight_request_prio(struct i915_request *rq, > > static void guc_retire_inflight_request_prio(struct i915_request *rq) > > { > > struct intel_context *ce = request_to_scheduling_context(rq); > > + unsigned long flags; > > - spin_lock(&ce->guc_state.lock); > > + spin_lock_irqsave(&ce->guc_state.lock, flags); > > guc_prio_fini(rq, ce); > > - spin_unlock(&ce->guc_state.lock); > > + spin_unlock_irqrestore(&ce->guc_state.lock, flags); > > } > > static void sanitize_hwsp(struct intel_engine_cs *engine) > > The guc_retire_inflight_request_prio is called in intel_breadcrumbs.c > signal_irq_work(). > > There is a similar situation > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c?h=v6.13-rc7#n255 > > There is also spin_(un)lock while potentially IRQs are disabled. > > Should it also be addressed? Thanks for spotting this. Yes, I believe we should also address other spin locks/unlocks on this path (inside list_for_each_entry_rcu loops for example), as they might cause similar problems. I will include these in v2. Krzysztof > > Regards, > > Maciej >
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 12f1ba7ca9c1..e9102f7246f5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4338,10 +4338,11 @@ static void guc_bump_inflight_request_prio(struct i915_request *rq, static void guc_retire_inflight_request_prio(struct i915_request *rq) { struct intel_context *ce = request_to_scheduling_context(rq); + unsigned long flags; - spin_lock(&ce->guc_state.lock); + spin_lock_irqsave(&ce->guc_state.lock, flags); guc_prio_fini(rq, ce); - spin_unlock(&ce->guc_state.lock); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); } static void sanitize_hwsp(struct intel_engine_cs *engine)
spin_unlock() function enables irqs regardless of their state before spin_lock() was called. This might result in an interrupt while holding a lock further down in the execution, as seen in GitLab issue #13399. Try to remedy the problem by saving irq state before spin lock acquisition. Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com> --- This issue is hit rarely on CI and I was not able to reproduce it locally. There might be more places where we should save and restore irq state, so I am not adding "Closes" label for the issue yet. drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)