diff mbox series

drm/i915/gt: Prevent queuing retire workers on the virtual engine

Message ID 20200206163243.2559830-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Prevent queuing retire workers on the virtual engine | expand

Commit Message

Chris Wilson Feb. 6, 2020, 4:32 p.m. UTC
Virtual engines are fleeting. They carry a reference count and may be freed
when their last request is retired. This makes them unsuitable for the
task of housing engine->retire.work so assert that it is not used.

Tvrtko tracked down an instance where we did indeed violate this rule.
In virtual_submit_request, we flush a completed request directly with
__i915_request_submit and this causes us to queue that request on the
veng's breadcrumb list and signal it. Leading us down a path where we
should not attach the retire.

v2: Always select a physical engine before submitting, and so avoid
using the veng as a signaler.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine.h      |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c |  3 +++
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 21 ++++++++++++++++++---
 drivers/gpu/drm/i915/i915_request.c         |  2 ++
 4 files changed, 24 insertions(+), 3 deletions(-)

Comments

Chris Wilson Feb. 6, 2020, 4:40 p.m. UTC | #1
Quoting Chris Wilson (2020-02-06 16:32:43)
> Virtual engines are fleeting. They carry a reference count and may be freed
> when their last request is retired. This makes them unsuitable for the
> task of housing engine->retire.work so assert that it is not used.
> 
> Tvrtko tracked down an instance where we did indeed violate this rule.
> In virtual_submit_request, we flush a completed request directly with
> __i915_request_submit and this causes us to queue that request on the
> veng's breadcrumb list and signal it. Leading us down a path where we
> should not attach the retire.
> 
> v2: Always select a physical engine before submitting, and so avoid
> using the veng as a signaler.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine.h      |  1 +
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c |  3 +++
>  drivers/gpu/drm/i915/gt/intel_lrc.c         | 21 ++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_request.c         |  2 ++
>  4 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index b36ec1fddc3d..5b21ca5478c2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -217,6 +217,7 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
>  static inline void
>  intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
>  {
> +       GEM_BUG_ON(!engine->breadcrumbs.irq_work.func);
>         irq_work_queue(&engine->breadcrumbs.irq_work);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 7ef1d37970f6..8a5054f21bf8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine,
>  void intel_engine_add_retire(struct intel_engine_cs *engine,
>                              struct intel_timeline *tl)
>  {
> +       /* We don't deal well with the engine disappearing beneath us */
> +       GEM_BUG_ON(intel_engine_is_virtual(engine));
> +
>         if (add_retire(engine, tl))
>                 schedule_work(&engine->retire_work);
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index c196fb90c59f..639b5be56026 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -4883,6 +4883,22 @@ static void virtual_submission_tasklet(unsigned long data)
>         local_irq_enable();
>  }
>  
> +static void __ve_request_submit(const struct virtual_engine *ve,
> +                               struct i915_request *rq)
> +{
> +       struct intel_engine_cs *engine = ve->siblings[0]; /* totally random! */
> +
> +       /*
> +        * Select a real engine to act as our permanent storage
> +        * and signaler for the stale request, and prevent
> +        * this virtual engine from leaking into the execution state.
> +        */
> +       spin_lock(&engine->active.lock);
> +       rq->engine = engine;
> +       __i915_request_submit(rq);
> +       spin_unlock(&engine->active.lock);

This won't do either as it inverts the ve/phys locking order... And wait
for it...

We call ve->submit_request() underneath the phys->active.lock when
unsubmitting.

Bleurgh. Let's take the path in v1 for a bit while I see if this can be
unravelled.
-Chris
Tvrtko Ursulin Feb. 6, 2020, 4:44 p.m. UTC | #2
On 06/02/2020 16:32, Chris Wilson wrote:
> Virtual engines are fleeting. They carry a reference count and may be freed
> when their last request is retired. This makes them unsuitable for the
> task of housing engine->retire.work so assert that it is not used.
> 
> Tvrtko tracked down an instance where we did indeed violate this rule.
> In virtual_submit_request, we flush a completed request directly with
> __i915_request_submit and this causes us to queue that request on the
> veng's breadcrumb list and signal it. Leading us down a path where we
> should not attach the retire.
> 
> v2: Always select a physical engine before submitting, and so avoid
> using the veng as a signaler.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine.h      |  1 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c |  3 +++
>   drivers/gpu/drm/i915/gt/intel_lrc.c         | 21 ++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_request.c         |  2 ++
>   4 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index b36ec1fddc3d..5b21ca5478c2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -217,6 +217,7 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
>   static inline void
>   intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
>   {
> +	GEM_BUG_ON(!engine->breadcrumbs.irq_work.func);
>   	irq_work_queue(&engine->breadcrumbs.irq_work);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 7ef1d37970f6..8a5054f21bf8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine,
>   void intel_engine_add_retire(struct intel_engine_cs *engine,
>   			     struct intel_timeline *tl)
>   {
> +	/* We don't deal well with the engine disappearing beneath us */
> +	GEM_BUG_ON(intel_engine_is_virtual(engine));
> +
>   	if (add_retire(engine, tl))
>   		schedule_work(&engine->retire_work);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index c196fb90c59f..639b5be56026 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -4883,6 +4883,22 @@ static void virtual_submission_tasklet(unsigned long data)
>   	local_irq_enable();
>   }
>   
> +static void __ve_request_submit(const struct virtual_engine *ve,
> +				struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine = ve->siblings[0]; /* totally random! */

We don't preserve the execution engine in ce->inflight? No.. Will random 
engine have any effect? Will proper waiters get signaled?

> +
> +	/*
> +	 * Select a real engine to act as our permanent storage
> +	 * and signaler for the stale request, and prevent
> +	 * this virtual engine from leaking into the execution state.
> +	 */
> +	spin_lock(&engine->active.lock);

Nesting phys lock under veng lock will be okay?

Regards,

Tvrtko

> +	rq->engine = engine;
> +	__i915_request_submit(rq);
> +	spin_unlock(&engine->active.lock);
> +}
> +
>   static void virtual_submit_request(struct i915_request *rq)
>   {
>   	struct virtual_engine *ve = to_virtual_engine(rq->engine);
> @@ -4900,12 +4916,12 @@ static void virtual_submit_request(struct i915_request *rq)
>   	old = ve->request;
>   	if (old) { /* background completion event from preempt-to-busy */
>   		GEM_BUG_ON(!i915_request_completed(old));
> -		__i915_request_submit(old);
> +		__ve_request_submit(ve, old);
>   		i915_request_put(old);
>   	}
>   
>   	if (i915_request_completed(rq)) {
> -		__i915_request_submit(rq);
> +		__ve_request_submit(ve, rq);
>   
>   		ve->base.execlists.queue_priority_hint = INT_MIN;
>   		ve->request = NULL;
> @@ -5004,7 +5020,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
>   	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
> -	intel_engine_init_breadcrumbs(&ve->base);
>   	intel_engine_init_execlists(&ve->base);
>   
>   	ve->base.cops = &virtual_context_ops;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0ecc2cf64216..2c45d4b93e2c 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -358,6 +358,8 @@ bool __i915_request_submit(struct i915_request *request)
>   	GEM_BUG_ON(!irqs_disabled());
>   	lockdep_assert_held(&engine->active.lock);
>   
> +	GEM_BUG_ON(intel_engine_is_virtual(engine));
> +
>   	/*
>   	 * With the advent of preempt-to-busy, we frequently encounter
>   	 * requests that we have unsubmitted from HW, but left running
>
Chris Wilson Feb. 6, 2020, 4:48 p.m. UTC | #3
Quoting Tvrtko Ursulin (2020-02-06 16:44:34)
> 
> On 06/02/2020 16:32, Chris Wilson wrote:
> > Virtual engines are fleeting. They carry a reference count and may be freed
> > when their last request is retired. This makes them unsuitable for the
> > task of housing engine->retire.work so assert that it is not used.
> > 
> > Tvrtko tracked down an instance where we did indeed violate this rule.
> > In virtual_submit_request, we flush a completed request directly with
> > __i915_request_submit and this causes us to queue that request on the
> > veng's breadcrumb list and signal it. Leading us down a path where we
> > should not attach the retire.
> > 
> > v2: Always select a physical engine before submitting, and so avoid
> > using the veng as a signaler.
> > 
> > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Fixes: dc93c9b69315 ("drm/i915/gt: Schedule request retirement when signaler idles")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine.h      |  1 +
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.c |  3 +++
> >   drivers/gpu/drm/i915/gt/intel_lrc.c         | 21 ++++++++++++++++++---
> >   drivers/gpu/drm/i915/i915_request.c         |  2 ++
> >   4 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index b36ec1fddc3d..5b21ca5478c2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -217,6 +217,7 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
> >   static inline void
> >   intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
> >   {
> > +     GEM_BUG_ON(!engine->breadcrumbs.irq_work.func);
> >       irq_work_queue(&engine->breadcrumbs.irq_work);
> >   }
> >   
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > index 7ef1d37970f6..8a5054f21bf8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > @@ -99,6 +99,9 @@ static bool add_retire(struct intel_engine_cs *engine,
> >   void intel_engine_add_retire(struct intel_engine_cs *engine,
> >                            struct intel_timeline *tl)
> >   {
> > +     /* We don't deal well with the engine disappearing beneath us */
> > +     GEM_BUG_ON(intel_engine_is_virtual(engine));
> > +
> >       if (add_retire(engine, tl))
> >               schedule_work(&engine->retire_work);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index c196fb90c59f..639b5be56026 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -4883,6 +4883,22 @@ static void virtual_submission_tasklet(unsigned long data)
> >       local_irq_enable();
> >   }
> >   
> > +static void __ve_request_submit(const struct virtual_engine *ve,
> > +                             struct i915_request *rq)
> > +{
> > +     struct intel_engine_cs *engine = ve->siblings[0]; /* totally random! */
> 
> We don't preserve the execution engine in ce->inflight? No.. Will random 
> engine have any effect? Will proper waiters get signaled?

Ok, it's not totally random ;) it's the engine on which we last executed
on, so it's a match wrt to the previous breadcrumbs/waiters. It's a good
choice :)

> > +     /*
> > +      * Select a real engine to act as our permanent storage
> > +      * and signaler for the stale request, and prevent
> > +      * this virtual engine from leaking into the execution state.
> > +      */
> > +     spin_lock(&engine->active.lock);
> 
> Nesting phys lock under veng lock will be okay?

No. Far from it.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index b36ec1fddc3d..5b21ca5478c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -217,6 +217,7 @@  void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
 static inline void
 intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
 {
+	GEM_BUG_ON(!engine->breadcrumbs.irq_work.func);
 	irq_work_queue(&engine->breadcrumbs.irq_work);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 7ef1d37970f6..8a5054f21bf8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -99,6 +99,9 @@  static bool add_retire(struct intel_engine_cs *engine,
 void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl)
 {
+	/* We don't deal well with the engine disappearing beneath us */
+	GEM_BUG_ON(intel_engine_is_virtual(engine));
+
 	if (add_retire(engine, tl))
 		schedule_work(&engine->retire_work);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index c196fb90c59f..639b5be56026 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4883,6 +4883,22 @@  static void virtual_submission_tasklet(unsigned long data)
 	local_irq_enable();
 }
 
+static void __ve_request_submit(const struct virtual_engine *ve,
+				struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = ve->siblings[0]; /* totally random! */
+
+	/*
+	 * Select a real engine to act as our permanent storage
+	 * and signaler for the stale request, and prevent
+	 * this virtual engine from leaking into the execution state.
+	 */
+	spin_lock(&engine->active.lock);
+	rq->engine = engine;
+	__i915_request_submit(rq);
+	spin_unlock(&engine->active.lock);
+}
+
 static void virtual_submit_request(struct i915_request *rq)
 {
 	struct virtual_engine *ve = to_virtual_engine(rq->engine);
@@ -4900,12 +4916,12 @@  static void virtual_submit_request(struct i915_request *rq)
 	old = ve->request;
 	if (old) { /* background completion event from preempt-to-busy */
 		GEM_BUG_ON(!i915_request_completed(old));
-		__i915_request_submit(old);
+		__ve_request_submit(ve, old);
 		i915_request_put(old);
 	}
 
 	if (i915_request_completed(rq)) {
-		__i915_request_submit(rq);
+		__ve_request_submit(ve, rq);
 
 		ve->base.execlists.queue_priority_hint = INT_MIN;
 		ve->request = NULL;
@@ -5004,7 +5020,6 @@  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
 	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
-	intel_engine_init_breadcrumbs(&ve->base);
 	intel_engine_init_execlists(&ve->base);
 
 	ve->base.cops = &virtual_context_ops;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0ecc2cf64216..2c45d4b93e2c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -358,6 +358,8 @@  bool __i915_request_submit(struct i915_request *request)
 	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->active.lock);
 
+	GEM_BUG_ON(intel_engine_is_virtual(engine));
+
 	/*
 	 * With the advent of preempt-to-busy, we frequently encounter
 	 * requests that we have unsubmitted from HW, but left running