Message ID | 20180801135611.20666-1-jakub.bartminski@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] drm/i915: Don't reset on preemptible workloads | expand |
Quoting Jakub Bartmiński (2018-08-01 14:56:11) > The current behaviour of the hangcheck is that if we detect that a request > is not making any forward progress, the driver will attempt the engine > reset. If that's not successful, we fall back to a full device reset. > > This patch would change it so that if hangcheck encounters a low-priority > workload, it will attempt to preempt it before declaring a hang. If the > preemption is successful, we allow the workload to continue "in background" > (until the next hangcheck run, and the next attempt to preempt it). If the > context was closed, we're simply skipping the workload's execution. > > This new behaviour would allow the user to define intentionally large or > passive workloads, that would normally be affected by the hangcheck, > without having to divide them into smaller work. > > Suggested-by: Michał Winiarski <michal.winiarski@intel.com> > Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/intel_hangcheck.c | 29 +++++++++++++++++++ > drivers/gpu/drm/i915/intel_lrc.c | 37 +++++++++++++++++++++++-- > drivers/gpu/drm/i915/intel_lrc.h | 1 + > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > 4 files changed, 65 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c > index 2fc7a0dd0df9..5ebd3ca74855 100644 > --- a/drivers/gpu/drm/i915/intel_hangcheck.c > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c > @@ -398,6 +398,28 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915, > return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg); > } > > +static bool hangcheck_preempt_workload(struct intel_engine_cs *engine) > +{ > + struct i915_request *active_request; > + int workload_priority; > + > + /* We have already tried preempting, but the hardware did not react */ > + if (engine->hangcheck.try_preempt) > + return false; > + > + active_request = i915_gem_find_active_request(engine); No ownership here of rq. You either need a new function to return a reference, or careful application of rcu. > + workload_priority = active_request->gem_context->sched.priority; > + > + if (workload_priority == I915_CONTEXT_MIN_USER_PRIORITY) { > + engine->hangcheck.try_preempt = true; > + engine->hangcheck.active_request = active_request; > + intel_lr_inject_preempt_context(engine); > + return true; > + } > + > + return false; > +} > + > /* > * This is called when the chip hasn't reported back with completed > * batchbuffers in a long time. We keep track per ring seqno progress and > @@ -440,6 +462,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work) > hangcheck_store_sample(engine, &hc); > > if (engine->hangcheck.stalled) { > + /* > + * Try preempting the current workload before > + * declaring the engine hung. > + */ > + if (hangcheck_preempt_workload(engine)) > + continue; > + > hung |= intel_engine_flag(engine); > if (hc.action != ENGINE_DEAD) > stuck |= intel_engine_flag(engine); > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index fad689efb67a..3ec8dcf64000 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -326,15 +326,39 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) > { > struct i915_request *rq, *rn; > struct i915_priolist *uninitialized_var(p); > + struct i915_gem_context *active_context = NULL; > + bool skip_seqno = false; > + u32 new_seqno = 0; > int last_prio = I915_PRIORITY_INVALID; > > lockdep_assert_held(&engine->timeline.lock); > > + if (engine->hangcheck.try_preempt) { > + rq = engine->hangcheck.active_request; > + GEM_BUG_ON(!rq); > + > + active_context = rq->gem_context; > + GEM_BUG_ON(!active_context); > + > + /* > + * If the workload is preemptible but its context was closed > + * we force the engine to skip its execution instead. > + */ > + if (i915_gem_context_is_closed(active_context)) > + skip_seqno = true; Nope. Even if the context is closed, its results may be depended upon by others; both GPU and userspace. Ok, I don't see any other use of try_preempt... So the gist of this is to excuse lowest priority user contexts from hangcheck, do we want to go one step further and ask userspace to opt in explicitly? Though the argument that as long as its preemptible (and isolated) it is not acting as DoS so we don't need to enforce a timeout. That also suggests that we need not reset until the request is blocking others or userspace. As discussed that seems reasonable, but I think this does need to actually trigger preemption. -Chris
Quoting Chris Wilson (2018-08-01 15:22:08) > Quoting Jakub Bartmiński (2018-08-01 14:56:11) > > + /* We have already tried preempting, but the hardware did not react */ > > + if (engine->hangcheck.try_preempt) > > + return false; > > + > > + active_request = i915_gem_find_active_request(engine); > > No ownership here of rq. You either need a new function to return a > reference, or careful application of rcu. Hmm, there may be no such thing as careful rcu that can prevent request reuse here except for obtaining the refcount. Requests are super weird, subtle and quick to anger. -Chris
On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote: > Quoting Jakub Bartmiński (2018-08-01 14:56:11) > > [...] > > + /* We have already tried preempting, but the hardware did > > not react */ > > + if (engine->hangcheck.try_preempt) > > + return false; > > + > > + active_request = i915_gem_find_active_request(engine); > > No ownership here of rq. You either need a new function to return a > reference, or careful application of rcu. This is a fair point, I'm assuming without taking ownership here we may end up with a nasty race? > > [...] > > + /* > > + * If the workload is preemptible but its context > > was closed > > + * we force the engine to skip its execution > > instead. > > + */ > > + if (i915_gem_context_is_closed(active_context)) > > + skip_seqno = true; > > Nope. Even if the context is closed, its results may be depended upon > by others; both GPU and userspace. While there may be dependencies, we currently break these dependencies anyway by resetting the entire engine, don't we? Couldn't we signal an error for the skipped requests as if we reseted the engine and continue on with the execution of the remaining requests? If we don't reset/skip we are left with a request that potentially spins forever, with blocked dependent requests, and since it's basically the halting problem we have to draw the line somewhere. > [...] > So the gist of this is to excuse lowest priority user contexts from > hangcheck, do we want to go one step further and ask userspace to opt > in explicitly? Making it explicit opt-in was the idea here, doing it by setting the lowest priority was just a rough way of accomplishing that. > Though the argument that as long as its preemptible (and isolated) it > is not acting as DoS so we don't need to enforce a timeout. That also > suggests that we need not reset until the request is blocking others > or userspace. So if I'm understanding correctly you are suggesting that if we have preemption enabled we could drop the hangcheck entirely, and just reset if preemption fails? > As discussed that seems reasonable, but I think this does need to > actually trigger preemption. > -Chris I'm not sure what you mean right here by "actually trigger preemption", since this patch does trigger the preemption, could you clarify? As I've mentioned before, one other problem with this patch is that we can't have more than one "background" request working before the first one finishes. While this could be solved at hangcheck by putting the background requests in the back of the background-priority queue instead of at the front during preemption, this breaks if the background requests are dependent on each other. These dependencies could be resolved in the tasklet (only in the case hangcheck was called so it wouldn't slow us down too much, they could also be resolved in the hangcheck itself but the timeline could change in between the hangcheck and the tasklet, which would make doing it correctly more complicated I think). - Jakub
Quoting Bartminski, Jakub (2018-08-02 09:56:10) > On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote: > > Quoting Jakub Bartmiński (2018-08-01 14:56:11) > > > [...] > > > + /* We have already tried preempting, but the hardware did > > > not react */ > > > + if (engine->hangcheck.try_preempt) > > > + return false; > > > + > > > + active_request = i915_gem_find_active_request(engine); > > > > No ownership here of rq. You either need a new function to return a > > reference, or careful application of rcu. > > This is a fair point, I'm assuming without taking ownership here we may > end up with a nasty race? > > > > [...] > > > + /* > > > + * If the workload is preemptible but its context > > > was closed > > > + * we force the engine to skip its execution > > > instead. > > > + */ > > > + if (i915_gem_context_is_closed(active_context)) > > > + skip_seqno = true; > > > > Nope. Even if the context is closed, its results may be depended upon > > by others; both GPU and userspace. > > While there may be dependencies, we currently break these dependencies > anyway by resetting the entire engine, don't we? Couldn't we signal an > error for the skipped requests as if we reseted the engine and continue > on with the execution of the remaining requests? That's what we do currently, but we don't bother propagating it past the immediate request. I've played with that in the past, especially important for propagating fatal errors from async workers. > If we don't reset/skip > we are left with a request that potentially spins forever, with blocked > dependent requests, and since it's basically the halting problem we > have to draw the line somewhere. Exactly. Why bother as we could just reset the engine within a few microseconds. > > [...] > > So the gist of this is to excuse lowest priority user contexts from > > hangcheck, do we want to go one step further and ask userspace to opt > > in explicitly? > > Making it explicit opt-in was the idea here, doing it by setting the > lowest priority was just a rough way of accomplishing that. > > > Though the argument that as long as its preemptible (and isolated) it > > is not acting as DoS so we don't need to enforce a timeout. That also > > suggests that we need not reset until the request is blocking others > > or userspace. > > So if I'm understanding correctly you are suggesting that if we have > preemption enabled we could drop the hangcheck entirely, and just reset > if preemption fails? Yes, but we still need the reset if we are stuck at the head of a chain. > > As discussed that seems reasonable, but I think this does need to > > actually trigger preemption. > > -Chris > > I'm not sure what you mean right here by "actually trigger preemption", > since this patch does trigger the preemption, could you clarify? I glossed over the direct call as that is not the way it should be done :-p > As I've mentioned before, one other problem with this patch is that we > can't have more than one "background" request working before the first > one finishes. While this could be solved at hangcheck by putting the > background requests in the back of the background-priority queue > instead of at the front during preemption, this breaks if the > background requests are dependent on each other. These dependencies > could be resolved in the tasklet (only in the case hangcheck was called > so it wouldn't slow us down too much, they could also be resolved in > the hangcheck itself but the timeline could change in between the > hangcheck and the tasklet, which would make doing it correctly more > complicated I think). I think the very simple rule that you reset if there are any dependencies and demote otherwise covers that. -Chris
On Thu, 2018-08-02 at 10:14 +0100, Chris Wilson wrote: > Quoting Bartminski, Jakub (2018-08-02 09:56:10) > > On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote: > > > Quoting Jakub Bartmiński (2018-08-01 14:56:11) > > > > [...] > > > Though the argument that as long as its preemptible (and > > > isolated) it > > > is not acting as DoS so we don't need to enforce a timeout. That > > > also > > > suggests that we need not reset until the request is blocking > > > others > > > or userspace. > > > > So if I'm understanding correctly you are suggesting that if we > > have > > preemption enabled we could drop the hangcheck entirely, and just > > reset > > if preemption fails? > > Yes, but we still need the reset if we are stuck at the head of a > chain. I see a potential problem with that approach. If we have a request that is actually hung and introduce a higher-priority request, when trying to preempt the hung request we experience some considerable latency since we need to do the hangcheck at that time, whereas if we were hangchecking regularly it would have been declared hung before that. > [...] > > As I've mentioned before, one other problem with this patch is that > > we > > can't have more than one "background" request working before the > > first > > one finishes. While this could be solved at hangcheck by putting > > the > > background requests in the back of the background-priority queue > > instead of at the front during preemption, this breaks if the > > background requests are dependent on each other. These dependencies > > could be resolved in the tasklet (only in the case hangcheck was > > called > > so it wouldn't slow us down too much, they could also be resolved > > in > > the hangcheck itself but the timeline could change in between the > > hangcheck and the tasklet, which would make doing it correctly more > > complicated I think). > > I think the very simple rule that you reset if there are any > dependencies and demote otherwise covers that. > -Chris So from what I understand, what you meant by that is that you would scratch the opt-in and just demote the priority by some constant on the hangcheck, handle the change in the tasklet, possibly preempting, and if so then wait until the next hangcheck to see if the preempt worked (and if not then reset the engine)? While this is an elegant solution, if we want to allow (do we want to?) multiple "background" requests working interweaved with each other this can still lead to starvation. Imagine the following scenario: we leave a "background" request working for a long enough time that its priority is decreased substantially. If we now introduce a second "background" request with the same starting priority, it will starve the first one for as long as it takes for it to be demoted to the first's current priority. We could avoid demoting if the request is the only one of that priority, but then the same starvation happens if we have two starting requests instead of one, and introduce the third one after some time. I think one way to fix this could be to demote not by a constant, but to a bottom ("background") queue, and then maybe rotate the bottom queue on hangcheck (if that's possible, we should have no dependencies there with what you proposed) to let multiple background requests work interweaved. What is your opinion on that approach?
Quoting Jakub Bartmiński (2018-08-01 14:56:11) > The current behaviour of the hangcheck is that if we detect that a request > is not making any forward progress, the driver will attempt the engine > reset. If that's not successful, we fall back to a full device reset. > > This patch would change it so that if hangcheck encounters a low-priority > workload, it will attempt to preempt it before declaring a hang. If the > preemption is successful, we allow the workload to continue "in background" > (until the next hangcheck run, and the next attempt to preempt it). If the > context was closed, we're simply skipping the workload's execution. > > This new behaviour would allow the user to define intentionally large or > passive workloads, that would normally be affected by the hangcheck, > without having to divide them into smaller work. I think the key challenge to this is suspend. We currently depend on batches being finite in order to suspend. So I think one of my requirements for this will be the ability to inject a preempt-to-idle (or force a reset-to-idle) from i915_pm_prepare() so that we suspend under load, and resume the previous load afterwards. -Chris
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c index 2fc7a0dd0df9..5ebd3ca74855 100644 --- a/drivers/gpu/drm/i915/intel_hangcheck.c +++ b/drivers/gpu/drm/i915/intel_hangcheck.c @@ -398,6 +398,28 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915, return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg); } +static bool hangcheck_preempt_workload(struct intel_engine_cs *engine) +{ + struct i915_request *active_request; + int workload_priority; + + /* We have already tried preempting, but the hardware did not react */ + if (engine->hangcheck.try_preempt) + return false; + + active_request = i915_gem_find_active_request(engine); + workload_priority = active_request->gem_context->sched.priority; + + if (workload_priority == I915_CONTEXT_MIN_USER_PRIORITY) { + engine->hangcheck.try_preempt = true; + engine->hangcheck.active_request = active_request; + intel_lr_inject_preempt_context(engine); + return true; + } + + return false; +} + /* * This is called when the chip hasn't reported back with completed * batchbuffers in a long time. We keep track per ring seqno progress and @@ -440,6 +462,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work) hangcheck_store_sample(engine, &hc); if (engine->hangcheck.stalled) { + /* + * Try preempting the current workload before + * declaring the engine hung. + */ + if (hangcheck_preempt_workload(engine)) + continue; + hung |= intel_engine_flag(engine); if (hc.action != ENGINE_DEAD) stuck |= intel_engine_flag(engine); diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index fad689efb67a..3ec8dcf64000 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -326,15 +326,39 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) { struct i915_request *rq, *rn; struct i915_priolist *uninitialized_var(p); + struct i915_gem_context *active_context = NULL; + bool skip_seqno = false; + u32 new_seqno = 0; int last_prio = I915_PRIORITY_INVALID; lockdep_assert_held(&engine->timeline.lock); + if (engine->hangcheck.try_preempt) { + rq = engine->hangcheck.active_request; + GEM_BUG_ON(!rq); + + active_context = rq->gem_context; + GEM_BUG_ON(!active_context); + + /* + * If the workload is preemptible but its context was closed + * we force the engine to skip its execution instead. + */ + if (i915_gem_context_is_closed(active_context)) + skip_seqno = true; + } + list_for_each_entry_safe_reverse(rq, rn, &engine->timeline.requests, link) { if (i915_request_completed(rq)) - return; + break; + + if (skip_seqno && rq->gem_context == active_context) { + new_seqno = max(new_seqno, + i915_request_global_seqno(rq)); + continue; + } __i915_request_unsubmit(rq); unwind_wa_tail(rq); @@ -348,6 +372,11 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine) GEM_BUG_ON(p->priority != rq_prio(rq)); list_add(&rq->sched.link, &p->requests); } + + if (skip_seqno) { + intel_write_status_page(engine, I915_GEM_HWS_INDEX, new_seqno); + engine->timeline.seqno = new_seqno; + } } void @@ -532,7 +561,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq) port_set(port, port_pack(i915_request_get(rq), port_count(port))); } -static void inject_preempt_context(struct intel_engine_cs *engine) +void intel_lr_inject_preempt_context(struct intel_engine_cs *engine) { struct intel_engine_execlists *execlists = &engine->execlists; struct intel_context *ce = @@ -632,7 +661,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) return; if (need_preempt(engine, last, execlists->queue_priority)) { - inject_preempt_context(engine); + intel_lr_inject_preempt_context(engine); return; } @@ -981,6 +1010,8 @@ static void process_csb(struct intel_engine_cs *engine) buf[2*head + 1] == execlists->preempt_complete_status) { GEM_TRACE("%s preempt-idle\n", engine->name); complete_preempt_context(execlists); + /* We tried and succeeded in preempting the engine */ + engine->hangcheck.try_preempt = false; continue; } diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h index f5a5502ecf70..164cd9e7ce05 100644 --- a/drivers/gpu/drm/i915/intel_lrc.h +++ b/drivers/gpu/drm/i915/intel_lrc.h @@ -101,6 +101,7 @@ struct drm_i915_private; struct i915_gem_context; void intel_lr_context_resume(struct drm_i915_private *dev_priv); +void intel_lr_inject_preempt_context(struct intel_engine_cs *engine); void intel_execlists_set_default_submission(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 57f3787ed6ec..eb38c1bec96b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -124,6 +124,7 @@ struct intel_engine_hangcheck { struct i915_request *active_request; bool stalled:1; bool wedged:1; + bool try_preempt:1; }; struct intel_ring {
The current behaviour of the hangcheck is that if we detect that a request is not making any forward progress, the driver will attempt the engine reset. If that's not successful, we fall back to a full device reset. This patch would change it so that if hangcheck encounters a low-priority workload, it will attempt to preempt it before declaring a hang. If the preemption is successful, we allow the workload to continue "in background" (until the next hangcheck run, and the next attempt to preempt it). If the context was closed, we're simply skipping the workload's execution. This new behaviour would allow the user to define intentionally large or passive workloads, that would normally be affected by the hangcheck, without having to divide them into smaller work. Suggested-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> --- drivers/gpu/drm/i915/intel_hangcheck.c | 29 +++++++++++++++++++ drivers/gpu/drm/i915/intel_lrc.c | 37 +++++++++++++++++++++++-- drivers/gpu/drm/i915/intel_lrc.h | 1 + drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 4 files changed, 65 insertions(+), 3 deletions(-)