[RFC] drm/i915: Don't reset on preemptible workloads

Message ID	20180801135611.20666-1-jakub.bartminski@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: =?utf-8?q?Jakub_Bartmi=C5=84ski?= <jakub.bartminski@intel.com> To: intel-gfx@lists.freedesktop.org Date: Wed, 1 Aug 2018 15:56:11 +0200 Message-Id: <20180801135611.20666-1-jakub.bartminski@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC] drm/i915: Don't reset on preemptible workloads Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[RFC] drm/i915: Don't reset on preemptible workloads \| expand [RFC] drm/i915: Don't reset on preemptible workloads

Bartminski, Jakub Aug. 1, 2018, 1:56 p.m. UTC

The current behaviour of the hangcheck is that if we detect that a request
is not making any forward progress, the driver will attempt the engine
reset. If that's not successful, we fall back to a full device reset.

This patch would change it so that if hangcheck encounters a low-priority
workload, it will attempt to preempt it before declaring a hang. If the
preemption is successful, we allow the workload to continue "in background"
(until the next hangcheck run, and the next attempt to preempt it). If the
context was closed, we're simply skipping the workload's execution.

This new behaviour would allow the user to define intentionally large or
passive workloads, that would normally be affected by the hangcheck,
without having to divide them into smaller work.

Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_hangcheck.c  | 29 +++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 37 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.h        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 65 insertions(+), 3 deletions(-)

Chris Wilson Aug. 1, 2018, 2:22 p.m. UTC | #1

Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> The current behaviour of the hangcheck is that if we detect that a request
> is not making any forward progress, the driver will attempt the engine
> reset. If that's not successful, we fall back to a full device reset.
> 
> This patch would change it so that if hangcheck encounters a low-priority
> workload, it will attempt to preempt it before declaring a hang. If the
> preemption is successful, we allow the workload to continue "in background"
> (until the next hangcheck run, and the next attempt to preempt it). If the
> context was closed, we're simply skipping the workload's execution.
> 
> This new behaviour would allow the user to define intentionally large or
> passive workloads, that would normally be affected by the hangcheck,
> without having to divide them into smaller work.
> 
> Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
> Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_hangcheck.c  | 29 +++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 37 +++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_lrc.h        |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  4 files changed, 65 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 2fc7a0dd0df9..5ebd3ca74855 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -398,6 +398,28 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
>         return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
>  }
>  
> +static bool hangcheck_preempt_workload(struct intel_engine_cs *engine)
> +{
> +       struct i915_request *active_request;
> +       int workload_priority;
> +
> +       /* We have already tried preempting, but the hardware did not react */
> +       if (engine->hangcheck.try_preempt)
> +               return false;
> +
> +       active_request = i915_gem_find_active_request(engine);

No ownership here of rq. You either need a new function to return a
reference, or careful application of rcu.

> +       workload_priority = active_request->gem_context->sched.priority;
> +
> +       if (workload_priority == I915_CONTEXT_MIN_USER_PRIORITY) {
> +               engine->hangcheck.try_preempt = true;
> +               engine->hangcheck.active_request = active_request;
> +               intel_lr_inject_preempt_context(engine);
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
>  /*
>   * This is called when the chip hasn't reported back with completed
>   * batchbuffers in a long time. We keep track per ring seqno progress and
> @@ -440,6 +462,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>                 hangcheck_store_sample(engine, &hc);
>  
>                 if (engine->hangcheck.stalled) {
> +                       /*
> +                        * Try preempting the current workload before
> +                        * declaring the engine hung.
> +                        */
> +                       if (hangcheck_preempt_workload(engine))
> +                               continue;
> +
>                         hung |= intel_engine_flag(engine);
>                         if (hc.action != ENGINE_DEAD)
>                                 stuck |= intel_engine_flag(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index fad689efb67a..3ec8dcf64000 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -326,15 +326,39 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine)
>  {
>         struct i915_request *rq, *rn;
>         struct i915_priolist *uninitialized_var(p);
> +       struct i915_gem_context *active_context = NULL;
> +       bool skip_seqno = false;
> +       u32 new_seqno = 0;
>         int last_prio = I915_PRIORITY_INVALID;
>  
>         lockdep_assert_held(&engine->timeline.lock);
>  
> +       if (engine->hangcheck.try_preempt) {
> +               rq = engine->hangcheck.active_request;
> +               GEM_BUG_ON(!rq);
> +
> +               active_context = rq->gem_context;
> +               GEM_BUG_ON(!active_context);
> +
> +               /*
> +                * If the workload is preemptible but its context was closed
> +                * we force the engine to skip its execution instead.
> +                */
> +               if (i915_gem_context_is_closed(active_context))
> +                       skip_seqno = true;

Nope. Even if the context is closed, its results may be depended upon by
others; both GPU and userspace.

Ok, I don't see any other use of try_preempt...

So the gist of this is to excuse lowest priority user contexts from
hangcheck, do we want to go one step further and ask userspace to opt in
explicitly? Though the argument that as long as its preemptible (and
isolated) it is not acting as DoS so we don't need to enforce a timeout.
That also suggests that we need not reset until the request is blocking
others or userspace. As discussed that seems reasonable, but I think this
does need to actually trigger preemption.
-Chris

Chris Wilson Aug. 1, 2018, 8:02 p.m. UTC | #2

Quoting Chris Wilson (2018-08-01 15:22:08)
> Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> > +       /* We have already tried preempting, but the hardware did not react */
> > +       if (engine->hangcheck.try_preempt)
> > +               return false;
> > +
> > +       active_request = i915_gem_find_active_request(engine);
> 
> No ownership here of rq. You either need a new function to return a
> reference, or careful application of rcu.

Hmm, there may be no such thing as careful rcu that can prevent request
reuse here except for obtaining the refcount.

Requests are super weird, subtle and quick to anger.
-Chris

Bartminski, Jakub Aug. 2, 2018, 8:56 a.m. UTC | #3

On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote:
> Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> > [...]
> > +       /* We have already tried preempting, but the hardware did
> > not react */
> > +       if (engine->hangcheck.try_preempt)
> > +               return false;
> > +
> > +       active_request = i915_gem_find_active_request(engine);
> 
> No ownership here of rq. You either need a new function to return a
> reference, or careful application of rcu.

This is a fair point, I'm assuming without taking ownership here we may
end up with a nasty race?

> > [...]
> > +               /*
> > +                * If the workload is preemptible but its context
> > was closed
> > +                * we force the engine to skip its execution
> > instead.
> > +                */
> > +               if (i915_gem_context_is_closed(active_context))
> > +                       skip_seqno = true;
> 
> Nope. Even if the context is closed, its results may be depended upon
> by others; both GPU and userspace.

While there may be dependencies, we currently break these dependencies
anyway by resetting the entire engine, don't we? Couldn't we signal an
error for the skipped requests as if we reseted the engine and continue
on with the execution of the remaining requests? If we don't reset/skip
we are left with a request that potentially spins forever, with blocked
dependent requests, and since it's basically the halting problem we
have to draw the line somewhere.

> [...]
> So the gist of this is to excuse lowest priority user contexts from
> hangcheck, do we want to go one step further and ask userspace to opt
> in explicitly? 

Making it explicit opt-in was the idea here, doing it by setting the
lowest priority was just a rough way of accomplishing that.

> Though the argument that as long as its preemptible (and isolated) it
> is not acting as DoS so we don't need to enforce a timeout. That also
> suggests that we need not reset until the request is blocking others
> or userspace.

So if I'm understanding correctly you are suggesting that if we have
preemption enabled we could drop the hangcheck entirely, and just reset
if preemption fails?

> As discussed that seems reasonable, but I think this does need to
> actually trigger preemption.
> -Chris

I'm not sure what you mean right here by "actually trigger preemption",
since this patch does trigger the preemption, could you clarify?

As I've mentioned before, one other problem with this patch is that we
can't have more than one "background" request working before the first
one finishes. While this could be solved at hangcheck by putting the
background requests in the back of the background-priority queue
instead of at the front during preemption, this breaks if the
background requests are dependent on each other. These dependencies
could be resolved in the tasklet (only in the case hangcheck was called
so it wouldn't slow us down too much, they could also be resolved in
the hangcheck itself but the timeline could change in between the
hangcheck and the tasklet, which would make doing it correctly more
complicated I think).
- Jakub

Chris Wilson Aug. 2, 2018, 9:14 a.m. UTC | #4

Quoting Bartminski, Jakub (2018-08-02 09:56:10)
> On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote:
> > Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> > > [...]
> > > +       /* We have already tried preempting, but the hardware did
> > > not react */
> > > +       if (engine->hangcheck.try_preempt)
> > > +               return false;
> > > +
> > > +       active_request = i915_gem_find_active_request(engine);
> > 
> > No ownership here of rq. You either need a new function to return a
> > reference, or careful application of rcu.
> 
> This is a fair point, I'm assuming without taking ownership here we may
> end up with a nasty race?
> 
> > > [...]
> > > +               /*
> > > +                * If the workload is preemptible but its context
> > > was closed
> > > +                * we force the engine to skip its execution
> > > instead.
> > > +                */
> > > +               if (i915_gem_context_is_closed(active_context))
> > > +                       skip_seqno = true;
> > 
> > Nope. Even if the context is closed, its results may be depended upon
> > by others; both GPU and userspace.
> 
> While there may be dependencies, we currently break these dependencies
> anyway by resetting the entire engine, don't we? Couldn't we signal an
> error for the skipped requests as if we reseted the engine and continue
> on with the execution of the remaining requests?

That's what we do currently, but we don't bother propagating it past the
immediate request. I've played with that in the past, especially
important for propagating fatal errors from async workers.

> If we don't reset/skip
> we are left with a request that potentially spins forever, with blocked
> dependent requests, and since it's basically the halting problem we
> have to draw the line somewhere.

Exactly. Why bother as we could just reset the engine within a few
microseconds.

> > [...]
> > So the gist of this is to excuse lowest priority user contexts from
> > hangcheck, do we want to go one step further and ask userspace to opt
> > in explicitly? 
> 
> Making it explicit opt-in was the idea here, doing it by setting the
> lowest priority was just a rough way of accomplishing that.
> 
> > Though the argument that as long as its preemptible (and isolated) it
> > is not acting as DoS so we don't need to enforce a timeout. That also
> > suggests that we need not reset until the request is blocking others
> > or userspace.
> 
> So if I'm understanding correctly you are suggesting that if we have
> preemption enabled we could drop the hangcheck entirely, and just reset
> if preemption fails?

Yes, but we still need the reset if we are stuck at the head of a chain.
 
> > As discussed that seems reasonable, but I think this does need to
> > actually trigger preemption.
> > -Chris
> 
> I'm not sure what you mean right here by "actually trigger preemption",
> since this patch does trigger the preemption, could you clarify?

I glossed over the direct call as that is not the way it should be done
:-p

> As I've mentioned before, one other problem with this patch is that we
> can't have more than one "background" request working before the first
> one finishes. While this could be solved at hangcheck by putting the
> background requests in the back of the background-priority queue
> instead of at the front during preemption, this breaks if the
> background requests are dependent on each other. These dependencies
> could be resolved in the tasklet (only in the case hangcheck was called
> so it wouldn't slow us down too much, they could also be resolved in
> the hangcheck itself but the timeline could change in between the
> hangcheck and the tasklet, which would make doing it correctly more
> complicated I think).

I think the very simple rule that you reset if there are any
dependencies and demote otherwise covers that.
-Chris

Bartminski, Jakub Aug. 2, 2018, 12:32 p.m. UTC | #5

On Thu, 2018-08-02 at 10:14 +0100, Chris Wilson wrote:
> Quoting Bartminski, Jakub (2018-08-02 09:56:10)
> > On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote:
> > > Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> > > > [...]
> > > Though the argument that as long as its preemptible (and
> > > isolated) it
> > > is not acting as DoS so we don't need to enforce a timeout. That
> > > also
> > > suggests that we need not reset until the request is blocking
> > > others
> > > or userspace.
> > 
> > So if I'm understanding correctly you are suggesting that if we
> > have
> > preemption enabled we could drop the hangcheck entirely, and just
> > reset
> > if preemption fails?
> 
> Yes, but we still need the reset if we are stuck at the head of a
> chain.

I see a potential problem with that approach. If we have a request that
is actually hung and introduce a higher-priority request, when trying
to preempt the hung request we experience some considerable latency
since we need to do the hangcheck at that time, whereas if we were
hangchecking regularly it would have been declared hung before that.

> [...]
> > As I've mentioned before, one other problem with this patch is that
> > we
> > can't have more than one "background" request working before the
> > first
> > one finishes. While this could be solved at hangcheck by putting
> > the
> > background requests in the back of the background-priority queue
> > instead of at the front during preemption, this breaks if the
> > background requests are dependent on each other. These dependencies
> > could be resolved in the tasklet (only in the case hangcheck was
> > called
> > so it wouldn't slow us down too much, they could also be resolved
> > in
> > the hangcheck itself but the timeline could change in between the
> > hangcheck and the tasklet, which would make doing it correctly more
> > complicated I think).
> 
> I think the very simple rule that you reset if there are any
> dependencies and demote otherwise covers that.
> -Chris

So from what I understand, what you meant by that is that you would
scratch the opt-in and just demote the priority by some constant on the
hangcheck, handle the change in the tasklet, possibly preempting, and
if so then wait until the next hangcheck to see if the preempt worked
(and if not then reset the engine)? 
While this is an elegant solution, if we want to allow (do we want to?)
multiple "background" requests working interweaved with each other this
can still lead to starvation. 
Imagine the following scenario: we leave a "background" request working
for a long enough time that its priority is decreased substantially. If
we now introduce a second "background" request with the same starting
priority, it will starve the first one for as long as it takes for it
to be demoted to the first's current priority.
We could avoid demoting if the request is the only one of that
priority, but then the same starvation happens if we have two starting
requests instead of one, and introduce the third one after some time.
I think one way to fix this could be to demote not by a constant, but
to a bottom ("background") queue, and then maybe rotate the bottom
queue on hangcheck (if that's possible, we should have no dependencies
there with what you proposed) to let multiple background requests work
interweaved. What is your opinion on that approach?

Chris Wilson Aug. 3, 2018, 12:16 p.m. UTC | #6

Quoting Jakub Bartmiński (2018-08-01 14:56:11)
> The current behaviour of the hangcheck is that if we detect that a request
> is not making any forward progress, the driver will attempt the engine
> reset. If that's not successful, we fall back to a full device reset.
> 
> This patch would change it so that if hangcheck encounters a low-priority
> workload, it will attempt to preempt it before declaring a hang. If the
> preemption is successful, we allow the workload to continue "in background"
> (until the next hangcheck run, and the next attempt to preempt it). If the
> context was closed, we're simply skipping the workload's execution.
> 
> This new behaviour would allow the user to define intentionally large or
> passive workloads, that would normally be affected by the hangcheck,
> without having to divide them into smaller work.

I think the key challenge to this is suspend. We currently depend on
batches being finite in order to suspend. So I think one of my
requirements for this will be the ability to inject a preempt-to-idle
(or force a reset-to-idle) from i915_pm_prepare() so that we suspend under
load, and resume the previous load afterwards.
-Chris

[RFC] drm/i915: Don't reset on preemptible workloads

Commit Message

Comments

Patch