diff mbox series

drm/i915: Neuter virtual rq->engine on retire

Message ID 20200519145131.2893-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915: Neuter virtual rq->engine on retire | expand

Commit Message

Chris Wilson May 19, 2020, 2:51 p.m. UTC
We do not hold a reference to rq->engine, and so if it is a virtual
engine it may have already been freed by the time we free the request.
The last reference we hold on the virtual engine is via rq->context,
and that is released on request retirement. So if we find ourselves
retiring a virtual request, redirect it to a real sibling.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Chris Wilson May 19, 2020, 5 p.m. UTC | #1
Quoting Chris Wilson (2020-05-19 15:51:31)
> We do not hold a reference to rq->engine, and so if it is a virtual
> engine it may have already been freed by the time we free the request.
> The last reference we hold on the virtual engine is via rq->context,
> and that is released on request retirement. So if we find ourselves
> retiring a virtual request, redirect it to a real sibling.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
> Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 31ef683d27b4..a816218cc693 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -242,9 +242,26 @@ static void remove_from_engine(struct i915_request *rq)
>                 spin_lock(&engine->active.lock);
>                 locked = engine;
>         }
> +
>         list_del_init(&rq->sched.link);
>         clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>         clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> +
> +       /*
> +        * During i915_fence_release we stash one request on the
> +        * rq->engine for use as an emergency reserve. However, we
> +        * neither want to keep a request on a virtual engine, nor do
> +        * we hold a reference to a virtual engine at that point. So
> +        * if rq->engine is virtual, replace it with a real one. Which
> +        * one is immaterial at this point as the request has been
> +        * retired, and if it was a virtual engine will not have any
> +        * signaling or other related paraphernalia.
> +        *
> +        * However, it would be nice if we didn't have to...
> +        */
> +       if (intel_engine_is_virtual(rq->engine))

Hmm. execlists_dequeue will assert that rq->engine == veng before
finding out that the request was completed. Annoyingly we would need
some veng magic to cmpxchg(&ve->request, rq, NULL)

> +               rq->engine = intel_virtual_engine_get_sibling(rq->engine, 0);

Back to the drawing board for a bit. Although removing the assert might
be the easiest course of action.
-Chris
Chris Wilson May 19, 2020, 5:21 p.m. UTC | #2
Quoting Chris Wilson (2020-05-19 18:00:04)
> Quoting Chris Wilson (2020-05-19 15:51:31)
> > We do not hold a reference to rq->engine, and so if it is a virtual
> > engine it may have already been freed by the time we free the request.
> > The last reference we hold on the virtual engine is via rq->context,
> > and that is released on request retirement. So if we find ourselves
> > retiring a virtual request, redirect it to a real sibling.
> > 
> > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1906
> > Fixes: 43acd6516ca9 ("drm/i915: Keep a per-engine request pool")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_request.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 31ef683d27b4..a816218cc693 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -242,9 +242,26 @@ static void remove_from_engine(struct i915_request *rq)
> >                 spin_lock(&engine->active.lock);
> >                 locked = engine;
> >         }
> > +
> >         list_del_init(&rq->sched.link);
> >         clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >         clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> > +
> > +       /*
> > +        * During i915_fence_release we stash one request on the
> > +        * rq->engine for use as an emergency reserve. However, we
> > +        * neither want to keep a request on a virtual engine, nor do
> > +        * we hold a reference to a virtual engine at that point. So
> > +        * if rq->engine is virtual, replace it with a real one. Which
> > +        * one is immaterial at this point as the request has been
> > +        * retired, and if it was a virtual engine will not have any
> > +        * signaling or other related paraphernalia.
> > +        *
> > +        * However, it would be nice if we didn't have to...
> > +        */
> > +       if (intel_engine_is_virtual(rq->engine))
> 
> Hmm. execlists_dequeue will assert that rq->engine == veng before
> finding out that the request was completed. Annoyingly we would need
> some veng magic to cmpxchg(&ve->request, rq, NULL)
> 
> > +               rq->engine = intel_virtual_engine_get_sibling(rq->engine, 0);
> 
> Back to the drawing board for a bit. Although removing the assert might
> be the easiest course of action.

A viable alternative would appear not to be to reset rq->engine back to
veng on preemption. It's currently done for consistency, but correctness
trumps consistency. :|
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 31ef683d27b4..a816218cc693 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -242,9 +242,26 @@  static void remove_from_engine(struct i915_request *rq)
 		spin_lock(&engine->active.lock);
 		locked = engine;
 	}
+
 	list_del_init(&rq->sched.link);
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+	/*
+	 * During i915_fence_release we stash one request on the
+	 * rq->engine for use as an emergency reserve. However, we
+	 * neither want to keep a request on a virtual engine, nor do
+	 * we hold a reference to a virtual engine at that point. So
+	 * if rq->engine is virtual, replace it with a real one. Which
+	 * one is immaterial at this point as the request has been
+	 * retired, and if it was a virtual engine will not have any
+	 * signaling or other related paraphernalia.
+	 *
+	 * However, it would be nice if we didn't have to...
+	 */
+	if (intel_engine_is_virtual(rq->engine))
+		rq->engine = intel_virtual_engine_get_sibling(rq->engine, 0);
+
 	spin_unlock_irq(&locked->active.lock);
 }